This paper proposes a method for detecting a helmet for the safety of workers from risk factors and a mask worn indoors and verifying a worker’s identity while wearing a helmet and mask for security. The proposed method consists of a part for detecting the worker’s helmet and mask and a part for verifying the worker’s identity. An algorithm for helmet and mask detection is generated by transfer learning of Yolov5’s s-model and m-model. Both models are trained by changing the learning rate, batch size, and epoch. The model with the best performance is selected as the model for detecting masks and helmets. At a learning rate of 0.001, a batch size of 32, and an epoch of 200, the s-model showed the best performance with a mAP of 0.954, and this was selected as an optimal model. The worker’s identification algorithm consists of a facial feature extraction part and a classifier part for the worker’s identification. The algorithm for facial feature extraction is generated by transfer learning of Facenet, and SVM is used as the classifier for identification. The proposed method makes trained models using two datasets, a masked face dataset with only a masked face, and a mixed face dataset with both a masked face and an unmasked face. And the model with the best performance among the trained models was selected as the optimal model for identification when using a mask. As a result of the experiment, the model by transfer learning of Facenet and SVM using a mixed face dataset showed the best performance. When the optimal model was tested with a mixed dataset, it showed an accuracy of 95.4%. Also, the proposed model was evaluated as data from 500 images of taking 10 people with a mobile phone. The results showed that the helmet and mask were detected well and identification was also good.
In today’s large-scale construction/manufacturing and other complexes, diverse, and unsafe industrial sites, workers’ activities are always exposed to many risks anytime, anywhere. Therefore, there are more risk factors than in other industries, so the frequency of accidents is high, and it is stipulated that personal protective equipment (PPE) that protects the body of workers from hazardous factors must be worn [
These CCTV monitoring systems and safety management systems have a problem in that they are not suitable for prevention due to the limitations of human control capabilities (decreased reliability due to detection errors). Therefore, smart safety management with cutting-edge technology that can immediately identify the situation of workers in the industrial site in real-time and manage them is required. In order to solve this problem, methods for automatically detecting the helmets of construction workers have been studied based on image data obtained from camera devices such as CCTVs in the workplace. Recently, with the development of deep learning technology, an automation method for preventing worker accidents by image recognition technology using deep learning is being actively studied. In a study regarding personal protective equipment detection, faster regions with convolutional neural networks features (faster R-CNN) [
A region-based object detection and classification algorithm based on a convolutional neural network (CNN) is a study to improve the safety of construction site workers by establishing an effective automatic safety helmet detection system using an object detection and classification algorithm based on construction site image data. Region-based fully convolutional networks (R-FCN) [
Before the worker enters the workplace, the identification of the worker must be simultaneously confirmed for security, along with the confirmation of wearing personal protective equipment for personal safety. Current identity recognition systems are based on the face recognition method. Facial recognition systems based on deep learning have demonstrated excellent accuracy [
Therefore, in this paper, a system that can detect a helmet and mask and confirm workers’ identities by face recognition before they enter the workplace is proposed. The proposed system combines the confirmation of compliance with the wearing of a safety helmet and mask and the confirmation of workers’ identity in real-time. The paper is organized as follows: Section 2 describes the related work on which this study is based. Section 3 explains the proposed method and analyzes the experimental results. Section 4 summarizes the proposed method and performance.
The YOLO algorithm [
The neck of the model that mixes the functions formed in the backbone uses the path aggregation network (PA-Net) [
Face recognition is a technology studied to identify people and is largely performed as face recognition or verification. Face verification is a 1:1 verification problem that determines whether two face images coming in as inputs are the same person. Face identification can be viewed as a 1:N problem in which one input face image corresponds to which of the N people registered in advance.
In the face recognition technology, features such as histogram of oriented gradient (HOG) [
A support vector machine (SVM) is one of the fields of machine learning and is a supervised learning model for pattern recognition and data analysis, and is mainly used for classification and regression analysis. With a set of data belonging to either of the two categories, the SVM algorithm creates a non-stochastic binary linear classification model that determines which category the new data belongs to based on the given dataset. The classification model is expressed as a boundary in the space where the data is mapped, and the SVM algorithm is an algorithm that finds the boundary with the largest width. This separation boundary is also called a decision boundary which is a straight line in two dimensions. A decision boundary is a plane that cannot be visualized and has more than two dimensions which are called a hyperplane. The distance between the decision boundary and the support vector is called the margin, and the data that affect the decision of the margin are called support vectors. The optimal decision boundary of the support vector is the boundary that maximizes the margin. SVM shows good performance among classification algorithms as well. The margin data of SVM is shown in
In this study, we implement a system that detects safety helmets and masks of the person and checks the his or her identity. For the detection of safety helmets and masks, Yolov5’s s-model and m-model are transfer-learned, and their performance is evaluated to select the optimal model for object detection. The model of facial feature extraction for identification was derived by transfer learning of Facenet by inputting masked faces and mixed faces (masked faces + unmasked faces). The SVM classifier was trained using the embedded facial features for worker identification.
OS | Ubuntu 18.04.5 LTS |
GPU | Tesla V100-SXM216 core Intel (R) |
CPU | Xeon (R) Gold 5120 CPU @ 2.20 GHz |
RAM | 177 GB |
CUDA | 10.1 |
cuDNN | 7.6.0 |
Software | python/pytorch |
YOLOv5/Facenet |
The safety helmet and mask detection models are trained by Kaggle’s face mask dataset [
In this study, in order to generate a model with optimal safety helmet/mask detection performance, mAP is obtained by changing the learning rate, batch size, and epoch, and the model with the best performance of the mAP is selected as the safety helmet/mask detection model. The learning rates were 0.01 and 0.001, the batch sizes were 16, 32, 64, and 128, and the epochs were 100, 200, and 500. Among the yolov5 models, the results were analyzed for the s-model and the m-model in consideration of the speed and performance for real-time processing.
To verify the speed and the detection performance of the selected model, a mAP and an fps of YOLOv5s model and Yolov4 were measured at 200 epochs and @0.5.
Model | Size | mAP | fps |
---|---|---|---|
Yolov4 | 512 | 0.944 | 95.7 |
Yolov5s | 640 | 0.943 | 104.2 |
In this paper, facial features are generated using Facenet among models for face recognition, and the identity of the worker is verified using the SVM classifier. The dataset is a masked face dataset [
The masked face dataset in [
In this paper, we used the LFW (eval plus test) dataset among the various datasets in [
The mixed face dataset is made from the masked face dataset in [
Facial feature extraction uses models trained by learning Facenet using a masked face dataset and a mixed face dataset. 20% of the entire training dataset is used as the validation dataset, and the facial feature extraction model is created by changing the learning rate, batch size, and epoch. Two models with good performance are selected from the model created with the masked face dataset and the model created with the mixed face dataset, respectively, and the performance of the models is evaluated with the test dataset. For each model, the test is performed as a masked face dataset and a mixed face dataset. Among the test results, we selected two models with good performance.
The worker’s identity verification models also are created by learning the SVM classifier with the masked face dataset and the mixed face dataset. The SVM classifier is trained by changing the learning rate, batch size, and epoch using the same dataset as when learning the Facenet. The input of the classifier is the facial features generated by two selected models among the Facenet models. Among them, the model with the best performance is selected as the optimal model.
Model | Training data | Test data |
---|---|---|
Facenet | Masked face dataset | Masked face dataset |
Mixed face dataset | Masked face dataset |
|
Facenet+SVM | Masked face dataset | Masked face dataset |
Mixed face dataset | Masked face dataset |
The experiment of feature extraction model was performed at learning rates of 0.01 and 0.001, and batch sizes of 16, 32, 64, and 128, epochs of 500, 1000, 1500, and 2000 and 20% of the total training data was used as validation data.
Both models (the masked face dataset and the mixed face dataset) achieved the best results at epochs of 2000, learning rate of 0.001, and batch sizes of 32 and 64 as shown in
Model name | Training dataset | Learning rate | Epoch | Batch size | Accuracy |
---|---|---|---|---|---|
Model1 | Masked face dataset | 0.001 | 2000 | 32 | 0.959 |
Model2 | Masked face dataset | 0.001 | 2000 | 64 | 0.948 |
Model3 | Mixed face dataset | 0.001 | 2000 | 32 | 0.949 |
Model4 | Mixed face dataset | 0.001 | 2000 | 64 | 0.941 |
The performance of the selected four models was evaluated with the test dataset. The test dataset consists of a masked face dataset and a mixed face dataset, and the evaluation results are shown in
Model | Test dataset | |
---|---|---|
Masked face dataset | Mixed face dataset | |
Model1 | 0.947 | 0.933 |
Model2 | 0.941 | 0.924 |
Model3 | 0.939 | 0.941 |
Model4 | 0.928 | 0.933 |
In
In this paper, we combine model1 and model3 selected in the previous section with the SVM classifier, train the combined models with the masked face dataset and the mixed face dataset, and then select the optimal model for identity recognition by evaluating the performance. During training, model1 and model3 were not trained, and only the SVM classifier was trained with learning rates of 0.01 and 0.001, and batch sizes of 16, 32, and 64, epochs of 100, 200, 500 and 700. Accuracy was used to evaluate the performance of each model.
Among the four models, the model with the best performance was selected one by one. The performance of the selected four models was evaluated with the test dataset.
Model | Test dataset | |
---|---|---|
Masked face dataset | Mixed face dataset | |
Model1+SVM by Masked face dataset | 0.948 | 0.912 |
Model1+SVM by Mixed face dataset | 0.911 | 0.929 |
Model3+SVM by Masked face dataset | 0.942 | 0.938 |
Model3+SVM by Mixed face dataset | 0.941 | 0.954 |
In
The proposed system was evaluated as data composed of 500 photos by taking 50 photos of 10 people each with a mobile phone. Since the number of classification classes is different, only the SVM classifier in the whole system is transfer-learned with the test dataset. Only 20% of the total data was used for testing.
Class no | Accuracy | Precision | Recall | F1 score |
---|---|---|---|---|
1 | 0.965 | 1.0 | 0.981 | 1.0 |
4 | 1.0 | 1.0 | 1.0 | 1.0 |
5 | 0.965 | 1.0 | 0.981 | 1.0 |
7 | 1.0 | 1.0 | 1.0 | 1.0 |
8 | 1.0 | 1.0 | 1.0 | 1.0 |
The average values of accuracy, precision, and recall were found 0.98, 1.0, and 0.98 were found respectively out of 100 images.
In this paper, a model for safety helmet/mask detection and a model for worker identification is proposed and evaluated. The model for safety helmet and mask detection is generated by transfer learning of Yolov5’s s-model and m-model. The two models were transfer-learned by changing the learning rate, batch size, and epoch, and their performance was measured with mAP, and the model with the best performance was selected as the optimal model. The learning rates were 0.01 and 0.001, the batch sizes were 16, 32, 64, 128, and the epochs were 100, 200, and 500. Among the yolov5 models, the results were analyzed for the s-model and the m-model in consideration of the speed and performance for real-time processing. At a learning rate of 0.001, a batch size of 32, and an epoch of 200, the s-model showed the best performance with a mAP of 0.95, and this was selected as the object detection model.
The identification part of the worker was composed of a facial feature extraction part and a classifier part for identification. Facial feature extraction is generated by transfer learning of Facenet, and the classifier for identification uses the SVM classifier. The dataset uses a masked face dataset containing only masked faces and then a mixed face dataset containing both masked and unmasked faces. The optimal model is selected by comparing the performance of the models generated by the two datasets. The combined model of Facenet and SVM classifier trained by the mixed face dataset showed the best performance. When this model was tested on the mixed face dataset, it showed an accuracy of 95.4%. The selected object detection model and identification model were tested with data from 10 people photographed with a smartphone. As a result, it was confirmed that safety helmets and masks were detected well with confirmation of identification.
This study shows good performance in detecting masks or helmets. However, protective equipment is more diverse than helmets. Therefore, research on algorithms for detecting various protective devices in the future can be conducted.
This research was supported by a grant (20015427) of Regional Customized Disaster-Safety R&D Program, funded by Ministry of Interior and Safety (MOIS, Korea), and was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2022R1A6A1A03052954).
The authors received no specific funding for this study.
The authors declare that they have no conflicts of interest to report regarding the present study.