Real-time and effective monitoring of a driver’s physiological parameters and psychological states can provide early warnings and help avoid traffic accidents. In this paper, we propose a non-contact real-time monitoring algorithm for physiological parameters of drivers under ambient light conditions. First, video sequences of the driver’s head are obtained by an ordinary USB camera and the AdaBoost algorithm is used to locate the driver’s facial region. Second, a face expression recognition algorithm based on an improved convolutional neural network (CNN) is proposed to recognize the driver’s facial expression. The forehead region is divided into three RGB channels as the region of interest (ROI), and the ICA algorithm is used to separate the ROI into three independent components. After that, the most significant component is selected for calculation of the heart rate and respiratory rate of the driver. Comparing the experimental results with the indications of finger clip devices, the proposed algorithm can monitor a driver’s physiological parameters in real time in a non-contact way that will not interfere with normal driving. The results of facial expression recognition can help verify the monitoring results of physiological parameters, and therefore, more accurately evaluate drivers’ physical condition.
While the transportation industry facilitates economic development, it also brings great challenges to personal safety. The number of traffic accidents has increased greatly, with both human and economic losses: in 2019, there were nearly 200,000 traffic accidents in China, resulting in 52,388 deaths, 275,125 injuries, and 160 million dollars of direct property losses. Although the number of accidents decreased by 18.3% in 2019, the number of casualties is still substantial [
Real-time and effective monitoring of a driver’s physical and mental states can effectively reduce the probability of traffic accidents [
At present, ECG [
In 2000, Wu et al. [
At present, only a few research institutions in China have used imaging technology to detect non-contact physiological signals. Since 2012, Sun et al. [
In this paper, a non-contact real-time monitoring algorithm is proposed for examining driver’s physiological parameters under ambient light conditions. This algorithm can be combined with non-contact real-time physiological parameter monitoring techniques and deep learning algorithms for facial expression recognition in the evaluation of driver’s physical conditions. The remainder of the paper is organized as follows: In Section 2, a brief introduction is made of the related previous research; in Section 3, the new model is proposed with detailed descriptions; in Section 4, the experiment design and results are provided; and in the last section, a discussion is made and a conclusion is drawn.
PPG and IPPG have identical basic principles. Both of them make use of the optical characteristics of human tissues, capturing the changes of light intensity caused by changes in blood volume and analyzing thses changes to get the relevant physiological parameters. However, PPG and IPPG differ in their modes of implementation. In physiological signal measurement based on PPG, a specific light source is used to emit a certain wavelength of light, with which the physiological signals are measured. The wavelength of light penetrates the skin or is detected by the sensor after being scattered by the skin. The light needs to have a strong ability to penetrate the skin, so red or near-infrared light is generally selected as the light source. By contrast, in physiological signal measurement based on IPPG, imaging equipment is used to collect the reflected light on the skin surface. Therefore, as long as the light can reach the dermis of the skin and be absorbed and reflected by the blood, a light source that emits long wavelength light can be used, or the visible light in the environment can be directly used without an additional light source.
The basic principle of using PPG or IPPG to detect changes in blood volume is the Lamber-Beer law [
If the monochromatic light with wavelength and light intensity is vertically irradiated into a medium, the transmitted light intensity after passing through the medium is as shown in formula 1.
where
The absorbance A of the medium is as shown in formula 2.
Formula 2 is referred to as the Lamber-Beer law.
In 2005, Wieringa et al. [
Compared with the traditional expression recognition method [
A convolutional neural network (CNN) is a special artificial neural network. Its architecture mainly includes the feature extraction layer and the feature mapping layer. Different from general artificial neural networks, a CNN has two characteristics: local connection and weight sharing. A CNN can significantly reduce the number of training parameters and improve the learning efficiency of the network. Therefore, CNNs are widely applied in many fields, especially in image processing and pattern recognition. Thus far, CNNs have adhered to a standard design criterion, that is, the way to stack the convolution layers and pooling layers [
In this paper, a real-time monitoring algorithm is proposed to examine a driver’s physiological parameters under ambient light conditions. The main flow of the proposed algorithm is shown in
After the video stream is obtained in real time through an ordinary USB camera, it is necessary to determine the position of the driver’s face in the video. In this paper, the AdaBoost face detection algorithm [
Verkruysse et al. [
In this section, a CNN was used to identify driver’s expressions. The expression analysis results can be used further to verify the monitoring results of physiological parameters. If a driver is in a negative state, they may have different expressions on their face, such as pain or excitement.
In view of the low recognition rate of the classical AlexNet, an expression recognition method based on improved CNN is proposed. First, a small convolution kernel was used to extract the local features carefully. Then, two consecutive convolution layers were constructed to increase the nonlinear expression ability of the network. Finally, the batch normalization layer was added after each convolution layer to improve the speed and further enhance the feature expression ability of the network. The improved network model is comprised of three continuous convolution layers, three pooling layers, two fully connected layers and one Softmax classification layer. As shown in
Layer | Kernel | Stride | Output |
---|---|---|---|
Convolution 1 | 5 |
2 |
48 |
Convolution 2 | 5 |
2 |
48 |
Pooling 1 | – | 2 |
24 |
Convolution 3 | 3 |
2 |
24 |
Convolution 4 | 3 |
2 |
24 |
Pooling 2 | – | 2 |
12 |
Convolution 5 | 3 |
2 |
12 |
Convolution 6 | 3 |
2 |
12 |
Pooling 3 | – | 2 |
6 |
Network | Training time (s) | Epochs (times) | Recognition rate |
---|---|---|---|
AlexNet | 121 | 78 | 61% |
AlexNet+BN | 109 | 65 | 64% |
Proposed (No BN) | 132 | 70 | 66% |
Proposed | 112 | 60 | 69% |
Network | Recognition rate | ||||||
---|---|---|---|---|---|---|---|
Anger | Disgust | Fear | Happy | Sad | Surprised | Normal | |
AlexNet+BN | 51% | 69% | 47% | 81% | 57% | 71% | 57% |
Proposed | 53% | 54% | 58% | 84% | 68% | 75% | 65% |
After the driver’s facial expressions, such as happy, surprised, and disgusted, were recognized, the recognition results were combined with the monitoring results of physiological parameters for validation analysis. As a result, this analysis can accurately evaluate the driver’s physical state.
In this section, the BVP signal was extracted from the ROI in the face video sequences.
The experimental data were a 10-second face video of a driver driving a vehicle. First, the video was extracted according to the frequency of 25 frames per second. Second, the ROI in each frame image was divided into three RGB channels and the gray mean value of the ROI region under each channel was calculated. Third, the BVP signals under the three color channels were obtained, as shown in
The respiratory rate and heart rate of typical adults in a resting state are 12–20 times and 60–100 times per minute, respectively. In consideration of the possible frequency ranges of respiratory rate and heart rate, the respiratory rate extraction frequency band was set to 0.15–0.6 Hz and the heart rate extraction frequency band was set to 0.8–2.4 Hz. The peak power of the frequency spectrum images after the Fourier transform was used in the two frequency bands to obtain the respiratory rate and heart rate of the subject, which were 0.3 Hz and 1.2 Hz, respectively (see
The experimental environment was an automobile being driven under normal road conditions. An ordinary USB camera (see
The experimental environment is shown in
Item | Parameter |
---|---|
Brand | AONI A30 |
Interface | USB |
Pixels | 500M |
Resolution | 1920*1080 |
Sensor | CMOS |
Max FPS | 30 |
Five volunteers were selected to participate in the road experiment. Each volunteer was measured once in the morning and once in the afternoon, and a total of 10 measurements were measured. Each time the driver’s heart rate was stable, data at the time were selected as the experimental results, and then the monitoring results were compared with the measured results of a simultaneous finger clip device. The comparison data are shown in
No | HR (times/min) | RR (times/min) | |||
---|---|---|---|---|---|
Proposed Method | Reference | Proposed Method | Reference | ||
1 | 66 | 69 | 17 | 18 | |
2 | 68 | 70 | 17 | 18 | |
3 | 72 | 71 | 18 | 19 | |
4 | 75 | 72 | 19 | 19 | |
5 | 63 | 67 | 16 | 17 | |
6 | 64 | 68 | 16 | 17 | |
7 | 71 | 73 | 18 | 19 | |
8 | 72 | 73 | 18 | 19 | |
9 | 76 | 75 | 19 | 18 | |
10 | 74 | 74 | 19 | 19 |
It can be seen from the experimental results that the monitoring results of the algorithm proposed in this paper are relatively accurate when compared with the contact medical equipment, and the consistency is maintained above 95%.
In this paper, a real-time monitoring algorithm is proposed to examine a driver’s physiological parameters under ambient light conditions. The video sequences of the driver’s head were captured by an ordinary USB camera. Subsequently, BVP signals of the driver were acquired in real time from the video, and both heart rate and respiratory rate data were calculated according to the BVP signals. Compared with the monitoring results of contact medical equipment, the proposed algorithm has a relatively high accuracy, and the non-contact data acquisition method does not interfere with normal driving. In addition, a facial expression recognition algorithm based on improved CNN is proposed to recognize the driver’s facial expressions. The results of facial expression recognition can help verify the monitoring results of physiological parameters, so as to evaluate the driver’s physical condition more accurately. The research content in this paper has valuable practical application potential such as real-time monitoring of driver status and related early warning.
We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript.