|Computer Modeling in |
Engineering & Sciences
Classification of Domestic Refuse in Medical Institutions Based on Transfer Learning and Convolutional Neural Network
1School of Control Engineering, Chengdu University of Information Technology, Chengdu, 610225, China
2College of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
3Department of Informatics, University of Leicester, Leicester, LE1 7RH, UK
*Corresponding Author: Hanbing Yan. Email: email@example.com
Received: 01 September 2020; Accepted: 01 February 2021
Abstract: The problem of domestic refuse is becoming more and more serious with the use of all kinds of equipment in medical institutions. This matter arouses people’s attention. Traditional artificial waste classification is subjective and cannot be put accurately; moreover, the working environment of sorting is poor and the efficiency is low. Therefore, automated and effective sorting is needed. In view of the current development of deep learning, it can provide a good auxiliary role for classification and realize automatic classification. In this paper, the ResNet-50 convolutional neural network based on the transfer learning method is applied to design the image classifier to obtain the domestic refuse classification with high accuracy. By comparing the method designed in this paper with back propagation neural network and convolutional neural network, it is concluded that the CNN based on transfer learning method applied in this paper with higher accuracy rate and lower false detection rate. Further, under the shortage situation of data samples, the method with transfer learning and ResNet-50 training model is effective to improve the accuracy of image classification.
Keywords: Domestic refuse; image classification deep learning; transfer learning; convolutional neural network
Faced with the situation of increasing domestic refuse in medical institutions output and deteriorating environmental conditions, how to utilize waste resources and improve the quality of living environment through garbage classification management is one of the urgent issues for all countries in the world. Medical waste management is of great importance due to its infectious and hazardous nature that can cause undesirable effects on humans and the environment . The work with good garbage classification, can not only save resources, reduce environmental pollution and land occupation, but also for the sake of human health, social sustainable development. At present, many countries are starting to concern the garbage classification. The errors in garbage classification will cause the overall unsatisfactory result, which requires to arrange manpower to classify secondly. However, the working environment of this traditional garbage classification method is poor and the sorting efficiency is low. These results suggest that the quantities of medical waste are not controlled, and hospitals have a defective monitoring management system of their waste . Song et al.  analyzed the harm of medical waste and composition, especially, pointed out the importance of medical waste management, such as the awareness of the harm, medical waste sorting, and some effective monitoring mechanisms. Medical waste is related with our health, environment, economic and society. Therefore, more delicate measures are needed in collection, storage, transfer and disposal of medical waste.
With the development of image recognition technology, image classification based on machine learning involves all aspects of human life. At present, image classification technology has been widely used to detect and identify various objects. In the field of medicine , due to the intuitive, non-invasive, safe and convenient characteristics, image recognition technology is widely applied in clinical diagnosis and pathological research to help complete automatic recognition of medical image diagnosis, digitize auxiliary medical diagnosis process, and reduce the workload of medical workers. In terms of crop disease prevention , the convolutional neural network (CNN) is used to automatically identify rice sheath blight, one of the three major diseases affecting rice production and planting, to make up for the lack of artificial judgment, which is conducive to the accurate identification and prevention of rice sheath blight. A deep learning CNN was applied to classify the type of e-waste, and a faster region-based convolutional neural network (R-CNN) was used to detect the category and size of the waste equipment in the images . The development of machine learning technology is gradually becoming mature, which can be fully used to realize the extraction and classification of garbage image features. Under the condition of ensuring the classification speed and accuracy, the cost of labor and time can be saved. It will contribute to the improvement of their knowledge of medical waste classification.
If the machine learning techniques are used to implement medical waste classification, it will improve the scientific awareness of garbage classification and garbage sorting efficiency. This paper is based on deep learning technology in machine learning methods, and designs an image classifier. By using deep learning method of neural network algorithm, it realizes garbage classification with high precision, high efficiency. The work of proper waste separation will greatly alleviate the environmental pollution caused by wrong garbage classification.
The organization of this paper is as follows. In Section 2, related works are briefly reviewed. In Section 3, The main algorithms are introduced, it includes the BP neural network, ResNet-50 convolutional neural network and ResNet-50 convolutional neural network based on the transfer learning method. Section 4 describes the good performance in quality and quantitative experimental results for the proposed algorithm. Finally, conclusions are drawn in Section 5.
2 Related Work
In view of the research on garbage image classification methods, Cai et al.  added color classification of plastic garbage on the basis of waste plastic garbage classification with materials in 2007. The color space features of different waste plastics were extracted. The color space patterns based on I1, I2 and I3 were selected. Back propagation (BP) neural network was used for pattern recognition of various colors. It realized the classification of waste plastics. In 2016, Mittal et al.  initialized the parameters of the target neural network with the pre-training model of AlexNet convolutional neural network , and took the GINI data set as the training set of garbage image classification model. After 150,000 iterations of training, the accuracy of the final model in garbage image recognition reached 87.69%. 2018, Rabano et al.  used MobileNet deep learning and transfer learning with the aid of model parameters in ImageNet large visual recognition challenge data set. It included common garbage classification training: glass, paper, cardboard, plastic, metal and other waste. After 500 iterations training, it finally got spam image classifier testing accuracy with 87.2%. In 2019, Zhao et al.  used different wavelengths of light reflectance spectrum information from garbage, through the establishment of classification model and analyze the reflectivity spectrum information, finally realized garbage image recognition and classification. In 2020, Yuan et al.  applied a 23-layer convolutional neural network model to garbage classification. It emphasized on the real-time garbage classification and solved the low accuracy of garbage classification. In the same year, Lu et al.  also used convolutional neural network iteration to train a classifier model for garbage classification.
Some machine learning, vision algorithms and fusion methods are employed in the garbage sorting. For example, Chen et al.  proposed a vision-based robotic grasping system by deep learning for garbage sorting, with region proposal generation and the modified VGG-16 model. They proposed an intelligent garbage classifier with identification method, efficient separation system and intelligent control system by variety of sensors and machine vision technology . Cao et al.  proposed a method of garbage classification and recognition by transfer learning, which migrated the existing InceptionV3 model recognition on the dataset to identify garbage, with the training accuracy 99.3% and the test accuracy 93.2%, respectively. Ma et al.  proposed an enhanced single shot multibox detector with a lightweight and novel feature fusion module to solve the problems of heavy tasks and low sorting efficiency in garbage sorting with mean average precision 83.48%. They proposed an architecture to classify waste videos through features learnt by 2D convolutional and 3D convolutional neural networks. In their paper, the accuracy is 79.99% in the challenging dataset .
The application of neural network in garbage classifier is more and more popular, because it overcomes the limitation on the input dimension of BP neural network. In the framework of neural network, the image can be directly as input data. Due to the weights of the shared characteristics in convolution the training of the neural network, it can save much time. For traditional machine learning, a large number of training samples are needed to extract effective image features and achieve the required accuracy. It is not easy to collect certain types of images in some classification problems, because sufficient training data cannot be guaranteed. This will lead to the failure of classification accuracy, what’s more, a large number of image feature extraction will consume a great amount of training time cost.
To solve these problems, transfer learning method is adopted, which transfers the parameters of the pre-training model after a large number of sample training to the model used for training classifier. Since it already contains enough feature parameters for image classification, the neural network parameter only needs to fine-tune the final full connection layer, according to the number of classification categories. When the classifier model is generated, it is only necessary to train the final full connection layer and fix the weight of transferred parameters. The transfer learning method not only shorts training time, but also ensures the high accuracy of image classification.
3 Analysis Method
The selection of neural network determines the quality of image feature extraction and the accuracy of classification. In this paper, BP neural network, ResNet-50 convolutional neural network and ResNet-50 convolutional neural network combining with transfer learning are discussed respectively. By verifying the training and verification sets of the same training set, the neural network with the best classification effect is selected to generate the final classifier model.
3.1 BP Neural Network
Neural network is a network with multiple levels and feedback systems, which is divided into the following three layers: input layer, hidden layer and output layer . The neurons are transmitted layer by layer. If there is an error between the output and the ideal value, the back propagation is carried out. In the process of error back propagation, the weight and threshold of the network are adjusted by the gradient of error change . After each iteration training, the difference between the predicted results obtained by BP neural network and the real value will gradually decrease.
In 1986, Rumelhart et al.  proposed error back propagation neural network (BP network). The network is a multi-layer forward network of one-way propagation. The BP algorithm of error back propagation is called BP algorithm and its basic idea is gradient descent method. The gradient search technique is used to minimize the mean square error between the actual output value and the expected output value. ui is the internal state of the neuron, is the threshold, xj is the input signal, , wij is the connection weight coefficient from the unit uj to the unit ui; si is the external input signal . Neuronal structural model in Fig. 1 is depicted as following:
Generally, from Eq. (1), , that is . i is the input layer, j is the intermediate (hidden) layer, k is the output layer  and the BP neural network structure is described in Fig. 2.
3.2 Convolutional Neural Network
CNN is a kind of feedforward neural network, which is composed of input layer, convolution layer, pooling layer, activation layer, full connection layer and output layer . Unlike BP neural network, images can be directly used as input data to the convolutional neural network. The output of the convolutional neural network is converted into the relative probability of categories through Softmax , and the category with the highest probability is taken as the final classification result. After feature extraction in the convolutional layer, the output feature map  will be transferred to the pooling layer for feature selection and information filtering. Because the number of feature maps are determined by the convolution kernel, the dimension of feature maps obtained by convolution operation is very large, which will lead to increase computation cost and the computing burden of computer equipment.
The activation layer  can carry out nonlinear mapping for the output of the convolutional layer. The Sigmoid and ReLU functions are commonly used in neural networks. ReLU function is commonly used in convolutional neural networks, because Sigmoid function is used to calculate the error gradient in back propagation, which requires a relatively large amount of computation. Moreover, gradient explosion is likely to occur in back propagation with deep networks. The output layer of the convolutional neural network is the probability of converting the output of the full connection layer into each type through the Softmax function. The Softmax function is shown in Eq. (2):
where Zi is the output value of the full connection layer,and Si is the probability value of the i class.
To express quantitative differences in neural network calculation results and the actual category, the output of the Softmax is combined with the actual category labels to construct cross entropy model, calculation formula is shown in Eq. (3), N is the number the training in the batch sample, C is the classification number, is the output of i port in convolution neural network output, yi is the output of i port corresponds to the actual category:
The parameter matrix of the convolutional neural network is updated by using the weight of the loss function and the gradient of the offset. For example, a weight matrix W with is represented by the loss function L, and the learning rate is a.
According to the partial derivatives of the weight matrix of the loss function in Eq. (3), parameters are updated to form a new weight matrix W′ :
The weights and bias of convolution neural network in the training process is similar to the BP neural network. Both include the loss function and the actual values of gradient parameters, which are considered as the basis of updating parameters and the overall gradient is in a downward trend. The convolutional neural network model parameters are adjusted to image classification task and achieve the goal of classification accuracy.
3.2.1 ResNet-50 Convolutional Neural Network
As the number of layers increases in the convolutional neural network, gradient dissipation and gradient explosion are likely to result in the decrease of accuracy of the training model. ResNet  are applied the residual structure to solve this problem, which can ensure that the performance of the network can be improved with the augment of the depth of the neural network. Fig. 3 is the structure diagram of residual block.
ResNet is with two mapping relationships: Identity mapping and residual mapping. The final residual block structure output is shown in Eq. (6), where x is identity mapping, and the residual mapping is :
From Eq. (6), the residual block of the original input is added directly to the original positive propagation path. When the gradient is obtained by BP neural network, due to the existence of x, gradient is small. Combined with the gradient, the size can effectively increase the spread of the reverse gradient, and guarantee that the update of parameter will not be interrupted by the gradient disappeared. In this study, we use architectures for ImageNet in the [27,28], which improved the network degradation performance. In view of speed and accuracy, ResNet-50 is selected. Based on the existing training depth network, a residual learning framework with the advantages of easy optimization and low computational burden is proposed. Residuals are designed to solve degradation and gradient problems, so that network performance can be improved as depth increases. Resnet-50 consists of 49 convolutional layers and 1 full connection layer. Its structure is shown in Fig. 4.
From stages 2 to 5, represents two residual blocks that do not change the size of the input data. CONV BLOCK represents the residual BLOCK that does change the size of the data. Such residual BLOCK is composed of three convolutional layers. CONV is the convolutional layer for convolution operation. Batch Norm is batch regularization operation. ReLU is activation function. MAX POOL is maximum pooling operation. AVG POOL stands for global average pooling. The ResNet-50 structure requires the input data to be scaled or cropped into data of a specified size. After the convolution operation of the input image with resnet-50 residual block, the pixel matrix size of the image is processed into by Flatten. Finally, the full connection layer is used to synthesize the image features, and Softmax function is used to convert the output value of the convolutional neural network into the probability of each category.
3.2.2 Based on Transfer Learning ResNet-50 Convolutional Neural Network
Transfer learning  is one method of the machine learning, which can reuse the model parameters developed for one task in another different task and serve as the starting point of another task model. This is a common approach in deep learning. In terms of computer vision and natural language processing, the development of neural network model requires a large amount of computing and time resources, and the technical span is also large. Therefore, the parameters of the pre-training model are often reused as starting points for computer vision and natural language processing tasks. The central idea of transfer learning is in Fig. 5.
Transfer learning takes the model developed for task A as the starting point and reuses it in the process of developing the model for task B. The main advantage of transfer learning is that it saves training time and, in most cases, the neural network can achieve better performance without a lot of data. Transfer learning is employed in these situations, for example, the current task lacks training data. There are already neural network learners trained with a large amount of data in the approximate field. Fig. 6 lists two strategies for transfer learning, these are: finetuning, freeze and train. Finetuning includes the pre-training network on the underlying data set and training all layers in the target data set. Freeze and train include freezing all layers except the last layer (weights are not updated) and training the last layer. You can also freeze the first few layers and fine-tune the rest.
The transfer learning is often used in the following scenario. Firstly, Small target set, image similarity: When the target data set is smaller than the basic data set and the image is similar, it is recommended to freeze and train, and only train the last layer. Secondly, Large target set, image similarity: Fine tuning is recommended. Thirdly, Small target set, different images: It is recommended to take freezing and training, training the last layer or the last layer. Fourthly, Large target set with different images: fine tuning is recommended.
In the application of image classification, traditional machine learning algorithms require sufficient image data as training samples, some of which are difficult to obtain. With the increasing number of neural network layers, it will take a lot of time to update the parameters of each layer through training. If the whole neural network is retrained, it will consume a lot of time and labor cost. The transfer learning approach can overcome these difficulties. The features of the underlying image extracted from the pre-trained convolutional neural network model are universal in many classification problems, and the classification accuracy can be higher even with fewer training samples. The ResNet-50 convolutional neural network based on transfer learning combines the advantages of transfer learning and ResNet residual structure, which can ensure high classification accuracy on the basis of reducing training time.
4 Experimental Method
4.1 Data Set Collection and Image Preprocessing
According to management measures on garbage classification , garbage is divided into the following four categories: other garbage, perishable garbage, recyclable garbage and hazardous garbage. The range of Sigmoid function is [0, 1]. Medical waste refers to waste with direct or indirect infectivity, toxicity and other hazards produced by medical and health institutions in medical treatment, prevention, health care and other related activities. Whether the management of medical waste can achieve standardized management is a major concern of the current society. The effective management of medical waste is an important way and link to control the spread of the epidemic and an effective measure to prevent and control environmental pollution and harm to human survival . According to the circular, the household waste generated in medical institutions can be classified into four categories: hazardous waste (labeled by waste 3), recyclable waste (labeled by waste 2), perishable waste (labeled by waste 1) and other waste (labeled by waste 0). Hazardous waste mainly includes waste batteries, waste fluorescent tubes, waste film and waste paper. This category also consists of harmful and highly polluting substances, such as medicine box, drug bag, infusion tube, needle tube. Recyclable materials mainly include infusion bottles (bags) that have not been contaminated by patients’ blood, body fluids and excreta, plastic packaging bags, packaging boxes, paper packaging, express package, washbasin, clothes rack, empty beverage bottle, waste electrical and electronic products, and discarded hospital beds, wheelchairs, infusion racks, etc., after being wiped or fumigated. Perishable garbage mainly includes kitchen garbage, melon and fruit garbage, flower garbage, dishes, cigarette end, eggshell, and so on generated in the canteen, office building and other areas. In the data-set, other waste is what we don’t know the name of the garbage generally and it is infrequent relatively in our life.
For the above four types of sample data, the number of each type of data is 2000, which is divided into a training set of 1600 and a verification set of 400. As the input of neural network, the garbage images need preprocessing to meet the input requirements of the algorithm. At the same time, image preprocessing can improve the training speed and classification accuracy of neural network. The steps of image preprocessing are as follows:
The size of the input image of ResNet-50 convolutional neural network model is specified as . The clipping operation is carried out for the image whose size does not meet the requirements in the training set and the verification set. The direct clipping will lead to the missing of key information of the image, and the bilinear interpolation algorithm is used to rescale image size to before clipping. Then, the pixel value is normalized, and the data mapping between 0 and 1 is achieved by dividing the pixel value of each channel in an image by 255, which can improve the speed and accuracy of iterative training. Then, the pixel value is standardized  and scaled to make it fall into a small specific interval, so that the characteristics of different measurements can be comparable. The effect on the objective function is reflected in the geometric distribution rather than the numerical value, and the distribution of the original data will not be changed.
4.2 Confounding Matrix and Model Evaluation
Confounding matrix  is mainly used to compare the classification results with the actual measured values, and it will show the accuracy of the classification results. The columns of the confounding matrix represent the predicted categories, and the rows represent the actual categories. The values on the diagonal of the confounding matrix represent the correct part of the prediction, and the off-diagonal elements are the wrong part of the prediction. The higher the diagonal value is, the more correctly the prediction is. The evaluation of model classification mainly includes: accuracy rate and error detection rate . positive (P) and negative (N) represent the judgment results of the model, and true (T) and false (F) evaluate whether the judgment results of the model are correct. For example, FP: the judgment of the model is positive, but the judgment result is actually wrong. , the proportion of correct data is predicted in all data. , the proportion of data with wrong predictions is in the data with positive predictions.
4.3 The Simulation of BP Neural Network Classifier
4.3.1 Determine the Number of Neurons in Each Layer and Hidden Layers
The size of the input image is , and the number of input neurons is 65536. The output garbage is divided into four categories: hazardous waste, recyclable waste, perishable waste and other waste so the number of output neurons is 4.
The neurons in the hidden layer are determined by empirical Eq. (7), where h is the number of nodes in the hidden layer, m is the number of nodes in the input layer, n is the number of nodes in the output layer, and a is the adjustment constant between 1 and 10.
Substitute the corresponding parameters into Eq. (7), and the number of hidden layer neurons in the experiment is 266.
4.3.2 Learning Rate and Batchsize Parameters
Parameter adjustment method: First, the batch size of each training is fixed, and the learning rate is set by 0.75, 0.075 and 0.0075 respectively, and the learning rate is iterated for 50 times to obtain the learning rate with the best convergence effect of the loss function. Then fix the optimal learning rate, set Batchsize of each training by 25, 50 and 75, respectively, and train iteration for 50 times to get the Batchsize with the best effect of loss function convergence.
It can be seen from Fig. 7 that too small a learning rate of 0.0075 leads to a slow convergence rate, while the learning rate of 0.75 is a fast convergence rate. It starts to oscillate and diverge after 10 iterations. Therefore, the learning rate is selected to be 0.075. Next, with fixed learning rate parameters, loss function reduction experiments with Batchsize 25, 50 and 75 are conducted, respectively.
In Fig. 8, the loss function curve with Batchsize 25 is the fastest and most stable decline. Combined with the above parameter adjustment experiment, BP neural network iterative training is conducted 100 times with the learning rate 0.075 and Batchsize 25. The trained model is used to classify the garbage image, and its confounding matrix is obtained as shown in Fig. 9.
It can be seen from the confounding matrix of classification results of BP neural network in Fig. 9 that the classifier model trained by this method is the highest correct number of kitchen waste classification, with 64 frames. On the contrary, the correct amount of hazardous waste classification is the lowest, with 26 frames.
4.4 The Simulation of ResNet-50 Convolutional Neural Network Classifier
ResNet-50 convolutional neural network mainly conducts parameter tuning for learning rate parameter and Batchsize. In order to save parameter tuning time cost, the change value of learning rate is 0.5, 0.05 and 0.005, and the Batchsize 25, 50 and 75, respectively. In each case, 50 iterations of training are conducted. The loss function decline curve is obtained through iterative training, and the learning rate with the fastest convergence speed is selected. Then, the learning rate is fixed, and the Batchsize with the fastest convergence speed is also selected according to the loss function.
4.4.1 The Tuning of Earning Rate Parameter
The Batchsize of each training is fixed at 50, and the learning rate is set 0.5, 0.05 and 0.005, respectively. The learning rate of iterative training is performed 50 times, and the learning rate with the best effect of loss function convergence is obtained.
It can be seen from Fig. 10 that the learning rate 0.5 converges rapidly at the beginning, but the loss curve stops falling and even oscillates during the training. Learning rate 0.005 convergence rate is too slow, therefore, learning rate is selected 0.05.
4.4.2 The Tuning of Batchsize Parameter
Fix the optimal learning rate, set Batchsize of each training Batchsize by 25, 50, 75, and iterative training 50 times, and get the Batchsize with the best convergence effect of the loss function.
As can be seen from Fig. 11 that the Batchsize with the fastest convergence speed of the loss curve is 50. Combined with the above parameter adjustment experiment, the ResNet-50 convolutional neural network iterative training is conducted 100 times with the learning rate of 0.05 and Batchsize 50.
The trained model is used to classify the garbage image, and its confounding matrix is obtained as shown in Fig. 12.
It can be seen that the confounding matrix of the classification results by ResNet-50 convolutional neural network in Fig. 12 that the classifier model trained by this method is the highest correct amount of waste classification with 89 frames. On the contrary, the number of hazardous wastes correctly classified is the lowest with a result of 63. As can be seen from the color depth of the heat map, the diagonal color is obviously compared with other positions, which indicates that ResNet-50 convolutional neural network is a high identification degree for these four types of garbage and is suitable for garbage image classification.
4.5 The Simulation of Resnet-50 Convolutional Neural Network Classifier and Transfer Learning
The ResNet-50 convolutional neural network parameter adjustment method based on transfer learning method is the same as ResNet-50 convolutional neural network, which also adjusts the learning rate and Batchsize parameters. The value of learning rate is selected 0.5, 0.05 and 0.005, and the Batchsize is 25, 50 and 75. 50 iteration training is conducted for each case. First, according to the decline of the loss function during iterative training, the learning rate with the fastest convergence speed is selected, then the learning rate is fixed, and the Batchsize with the fastest convergence speed is selected through the loss function.
4.5.1 The Tuning of Learning Rate Parameters
Fixed the Batchsize of each training, set the learning rate 0.5, 0.05 and 0.005, respectively, and conducted iterative training 50 times to obtain the learning rate with the best convergence effect of the loss function.
It can be seen from Fig. 13 that too small a learning rate 0.005 leads to a slow convergence rate, while the learning rate 0.5 will not decline after 3 iterations. Therefore, the learning rate is selected 0.05.
4.5.2 The Tuning of Batchsize Parameter
Fix the optimal learning rate and set Batchsize for each training 25, 50 and 75, respectively. After iterative training 50 times, the Batchsize with the best convergence effect of the loss function is obtained.
It can be seen in Fig. 14 that the loss function curve with Batchsize 50 is the fastest decreasing speed. Combined with the above parameter adjustment experiment, the ResNet-50 convolutional neural network iterative training is conducted 100 times with the learning rate 0.05 and Batchsize 50. The trained model is used to classify garbage images, and its confounding matrix is obtained as shown in Fig. 15.
It can be seen that the confounding matrix of the classification results of ResNet-50 convolutional neural network based on transfer learning in Fig. 15 that the classifier model trained by this method can correctly classify the four types of garbage with a quantity of more than 90 frames. The heat map corresponding to its confounding matrix is obvious diagonals, and the minimum number of correct classification is 93, which is higher than that of BP neural network and ResNet-50 convolutional neural network. The classifier obtained by ResNet-50 convolutional neural network training and transfer learning is a high identification degree for the types of garbage. This method combines the advantages of ResNet-50 convolutional neural network structure and transfer learning method, which not only ensures sufficient image feature extraction, but also saves the time of classifier training.
4.6 Experimental Evaluation
The BP neural network is with three layers, namely, the input layer, the hidden layer, and the output layer. This three-layer BP neural network can satisfy nonlinear mapping and is suitable for the training of medical waste image classifier. There is no standard library for medical waste classification yet. In this study, we use mobile phone shooting and web crawler to search and sort out the images, which are derived from actual images. The parameters are set as follows: iteration , batch , the number of , learning , activation function is ‘SGD,’ . and other parameters are 0. The ResNet-50 convolutional neural network parameter adjustment method based on transfer learning method is the same as ResNet-50 convolutional neural network, which also adjusts the learning rate and Batchsize parameters. The value of learning rate is selected by 0.5, 0.05 and 0.005, and the Batchsize is 25, 50 and 75. 50 iteration training is conducted for each case. According to the decline of the loss function during iterative training, the learning rate with the fastest convergence speed is selected, then the learning rate is fixed, and the Batchsize with the fastest convergence speed is selected through the loss function.
4.6.1 Accuracy Comparison
The accuracy rate and average value of each method for different types of garbage are calculated by BP neural network, ResNet-50 convolutional neural network and ResNet-50 convolutional neural network based on transfer learning. Under the training of the same sample data, the ResNet-50 convolutional neural network based on transfer learning is the highest classification accuracy, with an average accuracy of 47.8% higher than that of BP neural network and 17.9% higher than that of ResNet-50 convolutional neural network without transfer learning. The bar diagram of accuracy comparison in the three methods is shown in Fig. 16.
It can be seen from Fig. 16 that the classification accuracy of ResNet-50 convolutional neural network based on transfer learning is higher than that of BP neural network and ResNet-50 convolutional neural network for four different types of garbage. The classifier model trained in this method overcomes the problem that BP neural network cannot extract enough important features of the image, due to the limitation of the number of layers in the neural network, and also solves the problem that the classifier accuracy cannot be improved for the insufficient number of samples in ResNet-50 convolutional neural network.
4.6.2 Comparison of False Detection Rate
The error detection rate and average value of each method for different kinds of garbage are calculated. As can be seen from the classification results, under the training of the same sample data, the average false detection rate of ResNet-50 convolutional neural network and transfer learning is 16.0% lower than that of BP neural network and 5.9% lower than that of ResNet-50 convolutional neural network. The comparison diagram of the error detection rate of the three methods is shown in Fig. 17.
From Fig. 17, the error detection rate of the classifier model obtained by ResNet-50 convolutional neural network training and transfer learning in four different types of garbage is much lower than that of BP neural network and ResNet-50 convolutional neural network. This method combines the advantages of transfer learning and ResNet-50 structure. It uses ResNet-50 pre-training model with rich sample features and residual structure to improve the identification degree of the classifier. It enhances the ability of classify in four kinds of garbage and effectively reduces the rate of false detection.
4.6.3 Object Classification Results
It can be seen from the classification results of the classifier that each frame () outputs the classification results within one second, and the overall classification accuracy is 96.7%. Part results are shown in Fig. 18. The experimental results also show that the classifier is with certain rotation invariability for the following two reasons: (1) The training samples of the pre-training model used in the transfer learning method are very rich, and the image features that can be used to identify garbage from different angles have been extracted, so that the ResNet-50 neural network of the transfer learning method has certain recognition ability for images from different angles. (2) Max pooling of convolutional neural network makes certain effect on the recognition of images with rotation angle. After the image is rotated in a certain angle, the output value obtained through the max pooling processing of the convolutional neural network may be the same as the original image, so that the rotation of the image will not affect the classification.
4.6.4 Comparation with Other Waste Detection
To verify the effectiveness of our method, we select some representative algorithms for comparation. In the practical application, some garbage are linked with others, it is difficult to separate them into complete individual. The dataset is not exposed, and no standard datasets are found. Therefore, we used accuracy rates directly given in other authors’ papers for comparison.
From the quantity comparations in Tab. 1, the accuracy of our method is better than other results in the previous studies related to waste detection and classification. Our experimental results are better benefited from the employment of transfer learning, to obtain the existing performance, and with the data set what we selected, it is helpful to further optimize the performance.
The selection of neural network determines the speed and accuracy of image classification. There are great differences between BP neural network and convolutional neural network in structure and input requirements. The neurons between the layers of BP neural network are directly connected by weight and bias, and the excessively large size of the input image leads to an exponential increase in the amount of computation, which limits the range of the ability of BP neural network to classify images.
For a convolutional neural network, the weight is shared between layers, and the weight contained in each convolution kernel is applicable to all neurons of the corresponding layer. Moreover, convolution operation can control the size of the output feature map, and reduce the dimension of feature map to reduce computational complexity and improve training speed. BP neural network needs to convert the input image pixel matrix into a one-dimensional vector, while the convolutional neural network can take the original image as the input directly, providing a guarantee for the accurate extraction of image features.
The more layers of the neural network, the stronger the ability of feature extraction, but it is more likely to cause the problem of gradient disappearance. ResNet-50 convolutional neural network is with a residual block structure, adding the original input to the final result, which ensures that the gradient of back propagation will not interrupt parameter update due to too small. Model training on the ResNet-50 convolutional neural network method combined with transfer learning can effectively reduce training time and improve classification accuracy. At present, convolutional neural network and transfer learning are widely used to solve classification problems. Aiming at the problem of insufficient data samples, transfer learning can aid ResNet-50 to train classifier task. Compared with ResNet-50 without of transfer learning, it can improve the average accuracy rate and reduce the average detection error rate.
It can be seen that the classification application of neural networks  based on transfer learning [9,37] helps to improve the efficiency of medical waste sorting and recycling resource utilization, improve the ecological environment. The convolutional neural network is with certain rotation invariability. Due to the extracted typical features of the image during the training process, and such features will not be changed with the rotation of the image, the classifier can still recognize and classify the image.
The image feature extraction is a complicated process, when neural network is implemented in image classifier training. There are many limitations that need to be considered, such as, insufficient samples, low image quality, resolution, contrast and so on. Even for the same image, it may also be completely inconsistent classification results, since the operation of cutting, compression, and rotation.
Therefore, there are still many new problems to be solved in this field. We will consider the optimization [38,39], wavelet transform [40,41] heuristic neural network  and improve the transfer learning. Further research and development are needed in the following aspects:
(1) This study only uses the existing neural network model to train the samples, but does not carry out in-depth exploration of the internal structure of the neural network. The structure of neural network can be further to improve the ability of feature extraction.
(2) As time goes on, some new medical waste will appear, and identification and classification database will need to be constantly updated. By increasing the amount of sample data, the image classifier can improve the ability of medical waste classification.
Funding Statement: This work was supported in part by the National Natural Science Foundation of China under Grant 61806028, Grant 61672437 and Grant 61702428, Sichuan Science and Technology Program under Grants 21ZDYF2484, 2021YFN0104, 21GJHZ0061, 21ZDYF3629, 21ZDYF2907, 21ZDYF0418, 21YYJC1827, 21ZDYF3537, 2019YJ0356, and the Chinese Scholarship Council under Grants 202008510036, 201908515022.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|