For improving the accuracy of weed recognition under complex field conditions, a weed recognition method using depthwise separable convolutional neural network based on deep transfer learning was proposed in this study. To improve the model classification accuracy, the Xception model was refined by using model transferring and fine-tuning. Specifically, the weight parameters trained by ImageNet data set were transferred to the Xception model. Then a global average pooling layer replaced the full connection layer of the Xception model. Finally, the XGBoost classifier was added to the top layer of the model to output results. The performance of the proposed model was validated using the digital field weed images. The experimental results demonstrated that the proposed method had significant improvement in both classification accuracy and training speed in comparison of VGG16, ResNet50, and Xception depth models. The test recognition accuracy of the proposed model reached to 99.63%. Further, the training of each round time cost was 208 s, less than VGG16, ResNet50 and Xception models’, which were 248 s, 245 s and 217 s, respectively. Therefore, the proposed model has a promising ability to process image detection and output more accurate recognition results, which can be used for other crops’ precision management.
Adequate agricultural products are an important prerequisite to sustain population growth. Although China's grain yield has grown steadily, there still exist some problems. Weeds in the seedling stage are harmful to the yield and quality of major crops, such as maize and wheat [
Crop and weed recognition methods mainly include manual recognition, remote sensing recognition and machine learning recognition [
Deep learning eliminated the feature extraction disadvantage in the traditional machine learning recognition method and achieved better recognition results. While deep neural network models usually require large sample data and time-consuming to ensure the model performance.
In addition, higher recognition and classification accuracy often requires huge labeled data. In contrast, when the amount of labeled data is small, the neural network model will be over-fitted, resulting in low recognition accuracy. Meanwhile, it is difficult to obtain sufficient labeled data in practice, as a result, the model will lack model training. In addition, transfer learning can transfer knowledge from existing field data to help model training in new fields, which provides the possibility for neural network model training with few data.
In this study, we propose a weed identification method using the deep separable convolutional network based on transfer learning. The method aimed to introduce and improve a lightweight network model-Xception model based on the transfer learning strategy, which was named TX-XGBoost.
The implementation process of the proposed model (TX-XGBoost) is shown in
In our experiment, all the weed images were from the publicly available database (
The images of field plants (crops and weeds) have complex backgrounds, such as soil and gravel, which could affect the model to extract valuable features. Therefore, to improve the recognition accuracy, it is necessary to segment the plant images and extract valuable features.
According to the green characteristics of seedling crops and weeds, we used the HSV-based green identification decision tree method [
where: R, G, B—pixel channel values of RGB color space
Plant HSV three-component threshold range was determined as the following steps: 1. Randomly selected 5 sample images from each species of plants to convert HSV color space. 2. Extracted 3 × 3 pixels from plants region in each image. 3. Calculated maximum and minimum values of each sample region to determine HSV three-component segmentation threshold range. According to this threshold, plants would be separated from non-target objects such as sand, rocks, debris, etc.
Good performance models require huge training data due to the deep learning model training is driven by big data [
Class | Original image | Expanded image | Test image |
---|---|---|---|
Black-grass | 263 | 779 | 156 |
Charlock | 390 | 1070 | 214 |
Cleavers | 287 | 861 | 172 |
Chickweed | 611 | 1533 | 307 |
Wheat | 221 | 683 | 137 |
Fat Hen | 475 | 1392 | 278 |
Loose Silky-bent | 654 | 1562 | 312 |
Maize | 221 | 683 | 137 |
Scentless Mayweed | 516 | 1548 | 310 |
Shepherd’s Purse | 231 | 703 | 141 |
Cranesbill | 496 | 1503 | 300 |
Sugar beet | 385 | 1175 | 234 |
Total | 4750 | 13492 | 2698 |
The core of a convolutional neural network is the convolutional layer. Increasing the number of convolutional layers tends to extract abundant feature information and improve the model performance due to the network properties. Convolutional networks have developed from the initial 7-layer AlexNet [
The Xception model separated inter-channel correlation and spatial correlation mapping completely, which was the core conception of Xception called depthwise separable convolution. Traditional standard convolution applied convolution kernels to each channel of the input feature map and mapped spatial correlation and inter-channel correlation. However, depthwise separable convolution divided the standard convolution process into 2 steps, as shown in
Assuming that the size of input feature map is
The
In addition, Xception network model introduced a linear residual connection structure and a Batch Normalization (BN) layer to further accelerate the convergence of the network. Finally, network is classified by the Softmax classification layer and mapped to the probability space, which in turn outputs classification results. It should be noted that we chose the Xception convolutional layer as an image feature extractor to learn abstract features of images in this paper.
We modified the Xception model for reducing the model parameters and improving model classification accuracy. The steps included: (1) Replaced the full connection layer in the Xception model with a global average pooling layer. (2) Added an XGBoost classifier to the top layer of the model for classifying the data output.
Global Average Pooling
Traditional convolutional neural networks used the full connection layer to reduce dimensions and perform non-linear transformation for high-dimensional feature data extracted from the convolutional layer, and the resulted data would be input into the classification layer for classification. This structure established a connection between the convolutional structure and the traditional neural network classifier. However, the full connection layer will cause parameter redundancy and over-fitting, which will affect the generalization ability of the network and result in needing more model training time.
Therefore, we replaced the full connection layer using the global average pooling layer [ XGBoost
Convolutional neural networks have an efficient feature extraction mechanism, which could classify images effectively and demonstrate their reliability. However, traditional classifiers cannot make full use of image information in image classification applications, and it is difficult to establish reliable recognition features. Therefore, for traditional classifiers, there is still much space for improvement in classification efficiency and accuracy.
XGBoost (eXtreme Gradient Boosting) classifier is on the top layer of the model to connect the global average pooling layer for classifying the images. In fact, XGBoost [
where
The introduced objective function was shown in
where
Transfer learning is to use the existing source domain knowledge to solve different but similar domain problems. The purpose is to transfer existing knowledge to help solve learning problems in the object domain with less training sampling images [
The ImageNet dataset has more than 14 million labeled images and covers more than 20,000 categories. It is recognized as a “standard” data set for verification algorithms in the deep learning image field and currently widely used in the computer vision. Therefore, we combined transfer learning and deep learning to build a weed recognition network model. As shown in
In our validation experiment, Python 3.6 using Keras library with Tensorflow backend was applied to build and train the TX-XGBoost network. The environment is Ubuntu 16.04 system, the server configuration is equipped with Intel Xeon E5-2680 v4 CPU, Samsung SSD 860 512 GB hard disk, Kingston DDR4 64 GB memory. The optimization made use of NVIDIA TITAN Xp GPU with 12 GB video memory to increase training speed.
Model training process had 3 steps as follows:
We resized the image into fixed-size 229 × 229 RGB image to reduce the dimension of training data and keep the details of the input image. The batch size was 16, indicating each batch requires 16 samples to participate in training and update weight parameters once. The learning rate was 0.0001, and epoch was set 60. Besides, the modified CNN was trained using weed images on the Adam optimizer platform. After the model convergence, resulting model automatically extracted features from the input sample data, as input of the XGBoost classifier. The completion of XGBoost training showed the end of the training of the entire model.
The confusion matrix was presented to evaluate the performance of the classifier in weed identification, as shown in
We used the statistical parameters of the confusion matrix (precision, recall, and F1-score) further analyze the performance of the classifier. Precision is the proportion of correct predictions to all predictions. The higher the value (closer to 1), the more accurate. Recall is the proportion of correct predictions to all positive predictions. Precision and recall can reveal accuracy and the completeness of the weed recognition. F1-score is harmotic average of precision and recall. Since precision and recall affect each other, F1-score can be used to balance them. The definition formulas of these three evaluation indexes are shown in
Class | Precision | Recall | F1-score |
---|---|---|---|
Black-grass | 1 | 0.9744 | 0.987 |
Charlock | 1 | 0.986 | 0.993 |
Cleavers | 0.9885 | 1 | 0.9942 |
Chickweed | 1 | 1 | 1 |
Wheat | 1 | 1 | 1 |
Fat Hen | 1 | 1 | 1 |
Loose Silky-bent | 0.9873 | 0.9936 | 0.9931 |
Maize | 1 | 0.9927 | 0.9963 |
Scentless Mayweed | 0.9968 | 1 | 0.9984 |
Shepherd’s Purse | 1 | 1 | 1 |
Cranesbill | 1 | 1 | 1 |
Sugar beet | 0.9967 | 1 | 0.9983 |
The TX-XGBoost model was evaluated on the weeds dataset and compared it with the models including the VGG16, ResNet50, Xception under the same experimental environment. Meanwhile, the accuracy-fitting curves of the mentioned four methods under various epochs are shown in
Datasets | Model | Conv-layers | Epoch | Accuracy/% | Epoch Time/s |
---|---|---|---|---|---|
Training set | VGG16 | 13 | 60 | 96.12 | 248 |
ResNet50 | 49 | 60 | 97.92 | 245 | |
Xception | 36 | 60 | 96.36 | 217 | |
TX-XGBoost | 36 | 60 | 99.78 | 208 | |
Test set | VGG16 | 13 | 60 | 95.81 | 248 |
ResNet50 | 49 | 60 | 97.17 | 245 | |
Xception | 36 | 60 | 96.85 | 217 | |
TX-XGBoost | 36 | 60 | 99.63 | 208 |
To further show the performance of TX-XGBoost, in 60 epochs, we compared the proposed model with other models regarding on running time. As shown in
In this study, we proposed a novel weeds recognition model using depth wise separable convolutional neural network based on transfer learning. We introduced transfer learning into the model training, which solved the network over-fitting problem caused by insufficient training data and improved the generalization ability of the model. Meanwhile, to address the issue of large number of parameters and time-consumed training for conventional models, the lightweight model Xception was proposed and improved. The global average pooling layer was used to replace the full connection layer in the Xception model, and the XGBoost classifier was added to the top layer of the model to reduce the complexity of the model and improve the recognition accuracy of the model. The numerical experiment results demonstrated that our proposed model has advantages in accuracy and saving time compared with the traditional Xception, ResNet50 and VGG16 models. In the future, we will collect the experimental field data and establish weed data set to ensure the recognition accuracy of the proposed method in the practical application. Furthermore, the proposed method has broad applications in precision agriculture, such as weeding robots, pesticide spraying and other fields.
The authors wish to thank R. He who help to make experiments.