Open Access
ARTICLE
Fire Hawk Optimizer with Deep Learning Enabled Human Activity Recognition
1 Department of Information Systems, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-Kharj, 16273, Saudi Arabia
2 Department of Information Technology, College of Computers and Information Technology, Taif University, Taif P.O. Box 11099, Taif, 21944, Saudi Arabia
* Corresponding Author: Mrim M. Alnfiai. Email:
Computer Systems Science and Engineering 2023, 45(3), 3135-3150. https://doi.org/10.32604/csse.2023.034124
Received 06 July 2022; Accepted 27 August 2022; Issue published 21 December 2022
Abstract
Human-Computer Interaction (HCI) is a sub-area within computer science focused on the study of the communication between people (users) and computers and the evaluation, implementation, and design of user interfaces for computer systems. HCI has accomplished effective incorporation of the human factors and software engineering of computing systems through the methods and concepts of cognitive science. Usability is an aspect of HCI dedicated to guaranteeing that human–computer communication is, amongst other things, efficient, effective, and sustaining for the user. Simultaneously, Human activity recognition (HAR) aim is to identify actions from a sequence of observations on the activities of subjects and the environmental conditions. The vision-based HAR study is the basis of several applications involving health care, HCI, and video surveillance. This article develops a Fire Hawk Optimizer with Deep Learning Enabled Activity Recognition (FHODL-AR) on HCI driven usability. In the presented FHODL-AR technique, the input images are investigated for the identification of different human activities. For feature extraction, a modified SqueezeNet model is introduced by the inclusion of few bypass connections to the SqueezeNet among Fire modules. Besides, the FHO algorithm is utilized as a hyperparameter optimization algorithm, which in turn boosts the classification performance. To detect and categorize different kinds of activities, probabilistic neural network (PNN) classifier is applied. The experimental validation of the FHODL-AR technique is tested using benchmark datasets, and the outcomes reported the improvements of the FHODL-AR technique over other recent approaches.Keywords
Usability and Human-Computer Interaction (HCI) are main aspects of the system development processes for enhancing and improving system facilities and for satisfying necessities and needs of users [1]. HCI would support users, designers, and analysts in identifying the requirements of system from graphics, text style, color, fonts, and layout whereas usability would verify if the mechanism was easy to use, efficient, easy to learn, utility, easy to evaluate, easy to remember, and safe practical visible and offer job satisfaction to users [2]. HCI becomes a sub-field in computer science that deals with the study of the communication among people computers and users and the implementation, design, and evaluation of user interfaces for computer systems that were receptive to the user’s habits and needs. It was a multidisciplinary domain, that involves design, computer science, and behavioral sciences. The main goal of HCI was to make computer systems user-friendly and highly usable [3]. Users communicate with computer system via user interfaces, that has software and hardware which offers means of input, permitting users for manipulating the output and system, permitting the system for providing data to the end-user [4]. The evaluation, design, and application of interfaces was been main aim of HCI. It can be identified in HCI whose good interface model presupposes a better theory or method of HCI, and that’s why a theory must depend in large part on a theory of human cognition for modelling the cognitive process of users communicating with computer systems [5]. Different fields of HCI are displayed in Fig. 1.
Human activity recognition (HAR) serves an important role in human-to-human communication and inter-personal relationships. The reason behind offering information regarding the identity of an individual, their psychological state, and personality, it becomes tough for extraction [6]. The human capacity for recognizing another individual’s actions becomes one chief subjects of research in the scientific regions of machine learning (ML) and computer vision (CV). So, in this research, many applications, which include robotics, video surveillance systems, and human-computer interaction for human behavior characterization, need multiple activity recognition mechanisms [7].
This article develops a Fire Hawk Optimizer with Deep Learning Enabled Activity Recognition (FHODL-AR) on HCI driven usability. In the presented FHODL-AR technique, the input images are investigated for the identification of different human activities. For feature extraction, a modified SqueezeNet model is introduced by the inclusion of few bypass connections to the SqueezeNet among Fire modules. Besides, the FHO algorithm is utilized as a hyperparameter optimization algorithm, which in turn boosts the classification performance. To detect and categorize different kinds of activities, probabilistic neural network (PNN) classifier is applied. The experimental validation of the FHODL-AR technique is tested using benchmark datasets.
Much prevailing research concentrates on feature extraction approaches due to the discriminative characteristics being significant to ensuring the generalizing ability of the HAR mechanism. There were 2 principal ways for extracting features from sensor-related data. One uses hand-crafted features related to the statistical knowledge; another one automatically derives features utilizing neural networks (NN) [8]. The derivation of meaningful hand-crafted features from time and frequency fields depends heavily on domain knowledge and human experience. Moreover, hand-crafted features were generally devised for a particular task and were not appropriate for more general tasks and atmospheres [9]. Deep learning (DL) advancements were broadly implemented due to DL methods could automatically extract high dimensional features and were not relies on field knowledge [10].
Ronald et al. [11] devised a convolutional neural architecture by ensembling transfer learning (TL) related multi-channel attention network. In this study, four convolutional neural networks (CNN) branches are utilized to make feature fusion related ensembling and in every branch, an attention method has been employed for extracting the contextual data from the feature map generated by prevailing pretrained methods. Lastly, the feature maps which is extracted from 4 branches concatenated, and put into fully-connected network for producing the final recognition output. Hirooka et al. [12] offer a new hybrid DL network for HAR which uses multi-modal sensor data, but this presented method was ConvLSTM pipeline which completely uses the data in every layer derived from the temporal domain. At last, a fully-connected layer and a softmax function were utilized for computing the probability of every class. Lv et al. [13] introduce a technique to recognize human actions utilizing skeleton data by RGB-D camera, called Kinect device. The HAR was learning in the CV field. In its application, the recognition of human activities is employed for image processing, sign language learning, surveillance of the elderly, and HCI. This technique depends upon skeleton data having coordinate value of every joint in human body, which is categorized utilizing SVM method when executing a movement for predicting the name of activities.
Komang et al. [14] modelled a new DL-related architecture for prediction and recognition of human activities related to a hybrid method. The foremost contribution of this work was to devise a novel hybrid architecture, compiling 4 wide-ranging pretrained network methods in an optimized way, utilizing a meta-heuristic technique. Yilmaz et al. [15] project a robust classifier method for HAR utilizing bidirectional long short-term memory (BiLSTM), wearable sensor data, and a hybrid of CNN can be leveraged. The devised multibranch CNN-BiLSTM network performs automated feature extraction from raw sensor data having minimum data preprocessing. The utility of BiLSTM and CNN forms the model to learn local features along with long-term dependencies in sequential data.
In this article, a new FHODL-AR technique has been developed for activity recognition on HCI driven usability. In the presented FHODL-AR technique, the input images are investigated for the identification of different human activities. The overall block diagram is shown in Fig. 2. Initially, the input data is pre-processed and then the features are derived using the modified SqueezeNet model. Next, the recognition process take place using the PNN. Lastly, the FHO algorithm is used for parameter optimization.
3.1 Feature Extraction Using Modified SqueezeNet
For feature extraction, a modified SqueezeNet model is introduced by the inclusion of few bypass connections to the SqueezeNet among Fire modules [16]. The SqueezeNet is a small CNN structure that employs less parameter when preserving competitive performance. Various approaches are applied based on CNN to develop the SqueezeNet: (1) substitute 3 × 3 filters with 1 × 1 filter, (2) reduce the input channel number to 3 × 3 filters, and (3) down-sample in the network such that the convolutional layer has larger activation map. The SqueezeNet chiefly encompassed of Fire module that is squeeze convolutional layer with 1 × 1 filter. Then, these layer is fed into an extended layer that has a mixture of 1 × 1 and 3 × 3 convolutions. Here, a small CNN named SqueezeNet is applied that could accomplish more efficiency than other CNN architectures namely AlexNet when requiring less parameter that is useful in realtime scenarios. The fundamental SqueezeNet initiates by a convolutional layer and eight Fire modules that end with other convolutional layers. The filter count of each Fire module is progressively improved from the start to the ending of the network. The
To increase the detection performance, an adapted SqueezeNet is developed by including bypass connection to the SqueezeNet among Fire modules. In this framework, bypass connection is additional nearby Fire modules 3, 5, 7, and 9, necessitating the module to learn a residual function among inputs and outputs. To perform a bypass connection around Fire3, we fixed the input to Fire4 equivalent to the output of Fire2 + output of Fire3, whereby the + operator is a component-wise calculation. These variations of the regularization employed to the parameter of the Fire module and, according to ResNet, might enhance the concluding accurateness or trainability.
3.2 Hyperparameter Tuning Using FHO Algorithm
Here, the FHO algorithm is utilized as a hyperparameter optimization algorithm, which in turn boosts the classification performance [17]. The FHO meta-heuristic approach stimulates the foraging behaviors of fire hawks that consider the procedure of catching prey, and setting and spreading fires. Firstly, a solution candidate (X) is defined by the location vector of the prey and fire hawks. An initial process is applied for identifying the first location of this vector in the searching region.
Now,
Let
In Eq. (5),
In Eq. (6),
Then, the motion of prey inside the territory of Fire Hawk assumed a major aspect of animal activities for the location updating method.
In Eq. (7),
In addition, the prey motion toward the other Fire Hawk territories while there is a chance the prey might be approaching the Fire Hawk in the nearby ambushes or hide in a safe position outside the Fire Hawk territories where they are trapped:
In Eq. (8),
Now,
3.3 Activity Recognition Using PNN
To detect and categorize different kinds of activities, PNN classifier is applied [18]. The PNN is an effective neural network that is commonly employed for object classification. Usually, PNN is more accurate and faster when compared to the multiple layer perceptron network, and is comparatively impervious to the outlier. PNN makes use of the Parzen estimator to estimate the probability distribution function (PDF) of all the classes. The multi-variate Bayesian rule is employed to distribute the class with the maximum posterior possibility to original input dataset. The architecture of PNN is discussed in the following. Given that
In Eq. (5),
In Eq. (6),
The input unit is distribution unit that supplies a similar input value to the pattern unit and creates a dot product of pattern vector
Then, the non-linear function is a similar method to a Parzen estimator with a Gaussian kernel. The summation units sum the output for pattern unit respective to the class and evaluate the PDF. The output pattern applied the maximum vote to forecast the target class. Because the input layer is applied to the connection weight, PNN doesn’t require altering the connection weight. As a result, the training speed is quicker when compared to the conventional backpropagation neural network (BP-NN).
This section inspects the HAR outcomes of the FHODL-AR model on two datasets. The first KTH dataset (http://www.nada.kth.se/cvap/actions/) comprises 600 samples under six class labels. Table 1 reports the details related to the KTH dataset. Next, the UCF Sports dataset comprises 1000 samples under ten classes (http://crcv.ucf.edu/data/UCF_Sports_Action.php). The details relevant to the UCF Sports dataset are given in Table 2.
The confusion matrices produced by the FHODL-AR model on KTH dataset are portrayed in Fig. 3. The results indicated that the FHODL-AR model has proficiently recognized all the class labels.
Table 3 exhibits detailed HAR outcomes of the FHODL-AR model under distinct aspects of KTH dataset. The results implied that the FHODL-AR model has shown enhanced outcomes. With 80% of TR data, the FHODL-AR model has obtained average
The training accuracy (TA) and validation accuracy (VA) achieved by the FHODL-AR algorithm on KTH dataset is displayed in Fig. 4. The experimental outcome denoted the DSOCDBN-STC system has established highest values of TA and VA. Especially, the VA is larger than TA.
The training loss (TL) and validation loss (VL) acquired by FHODL-AR process on KTH dataset are shown in Fig. 5. The experimental outcome implicit the FHODL-AR technique has attained minimum values of TL and VL. Especially, the VL is smaller than TL.
Table 4 and Fig. 6 exhibit a comprehensive
The confusion matrices produced by the FHODL-AR technique on UCF Sports data are indicated in Fig. 7. The outcomes portrayed that the FHODL-AR approach has proficiently identified each class label.
Table 5 displays comprehensive HAR outcomes of the FHODL-AR method under distinct aspects of UCF sports dataset. The results inferred that the FHODL-AR technique has shown better results. With 80% of TR dataset, the FHODL-AR approach has acquired average
Table 6 and Fig. 8 show a comprehensive
Then, the DBN technique has accomplished somewhat better
In this article, a new FHODL-AR technique has been developed for activity recognition on HCI driven usability. In the presented FHODL-AR technique, the input images are investigated for the identification of different human activities. For feature extraction, a modified SqueezeNet model is introduced by the inclusion of few bypass connections to the SqueezeNet among Fire modules. Besides, the FHO algorithm is utilized as a hyperparameter optimization algorithm, which in turn boosts the classification performance. To detect and categorize different kinds of activities, PNN classifier is applied. The experimental validation of the FHODL-AR technique is tested using benchmark datasets, and the outcomes reported the improvements of the FHODL-AR technique over other recent approaches with maximum accuracy of 99.10%. In the future, the detection performance of the FHODL-AR technique can be boosted by the use of ensemble fusion approaches.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
References
1. A. Jorio, S. El Fkihi, B. Elbhiri and D. Aboutajdine, “An energy-efficient clustering routing algorithm based on geographic position and residual energy for wireless sensor network,” Journal of Computer Networks and Communications, vol. 2015, no. 8, pp. 1–11, 2015. [Google Scholar]
2. F. Ren and Y. Bao, “A review on human-computer interaction and intelligent robots,” International Journal of Information Technology & Decision Making, vol. 19, no. 1, pp. 5–47, 2020. [Google Scholar]
3. J. Preece, Y. Rogers and H. Sharp, Interaction Design: Beyond Human-Computer Interaction, 4st ed., Chichester: Wiley, 2015. [Google Scholar]
4. P. Pareek and A. Thakkar, “A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications,” Artificial Intelligence Review, vol. 54, no. 3, pp. 2259–2322, 2020. [Google Scholar]
5. P. Antonik, N. Marsal, D. Brunner and D. Rontani, “Human action recognition with a large-scale brain-inspired photonic computer,” Nature Machine Intelligence, vol. 1, no. 11, pp. 530–537, 2019. [Google Scholar]
6. S. Majumder and N. Kehtarnavaz, “A review of real-time human action recognition involving vision sensing,” Real-Time Image Processing and Deep Learning, United States, Vol. 11736, pp. 9, 2021. [Google Scholar]
7. F. Gu, M. H. Chung, M. Chignell, S. Valaee, B. Zhou et al., “A survey on deep learning for human activity recognition,” ACM Computing Surveys, vol. 54, no. 8, pp. 1–34, 2022. [Google Scholar]
8. K. Verma and B. Singh, “Deep multi-model fusion for human activity recognition using evolutionary algorithms,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 2, pp. 44, 2021. [Google Scholar]
9. P. Shrivastava, K. Singh and A. Pancham, “Classification of grain s and quality analysis u sing deep learning,” International Journal of Engineering and Advanced Technology, vol. 11, no. 1, pp. 244–250, 2021. [Google Scholar]
10. Y. Abdulazeem, H. M. Balaha, W. M. Bahgat and M. Badawy, “Human action recognition based on transfer learning approach,” IEEE Access, vol. 9, pp. 82058–82082, 2021. [Google Scholar]
11. M. Ronald, A. Poulose and D. S. Han, “iSPLInception: An inception-resnet deep learning architecture for human activity recognition,” IEEE Access, vol. 9, pp. 68985–69001, 2021. [Google Scholar]
12. K. Hirooka, M. A. M. Hasan, J. Shin and A. Y. Srizon, “Ensembled transfer learning based multichannel attention networks for human activity recognition in still images,” IEEE Access, vol. 10, pp. 47051–47062, 2022. [Google Scholar]
13. T. Lv, X. Wang, L. Jin, Y. Xiao and M. Song, “A hybrid network based on dense connection and weighted feature aggregation for human activity recognition,” IEEE Access, vol. 8, pp. 68320–68368, 2020. [Google Scholar]
14. M. G. A. Komang, M. N. Surya and A. N. Ratna, “Human activity recognition using skeleton data and support vector machine,” Journal of Physics: Conference Series, vol. 1192, pp. 1–9, 2019. [Google Scholar]
15. A. A. Yilmaz, M. S. Guzel, E. Bostanci and I. Askerzade, “A novel action recognition framework based on deep-learning and genetic algorithms,” IEEE Access, vol. 8, pp. 100631–100644, 2020. [Google Scholar]
16. S. K. Challa, A. Kumar and V. B. Semwal, “A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data,” The Visual Computer, vol. 33, no. 12, pp. 1529, 2021. [Google Scholar]
17. Y. Yang, R. Yang, L. Pan, J. Ma, Y. Zh et al., “A lightweight deep learning algorithm for inspection of laser welding defects on safety vent of power battery,” Computers in Industry, vol. 123, no. 4, pp. 103306, 2020. [Google Scholar]
18. M. Azizi, S. Talatahari and A. H. Gandomi, “Fire Hawk Optimizer: a novel metaheuristic algorithm,” Artificial Intelligence Review, vol. 376, no. 1, pp. 113609, 2022. [Google Scholar]
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.