Brain tumor is one of the most common tumors with high mortality. Early detection is of great significance for the treatment and rehabilitation of patients. The single channel convolution layer and pool layer of traditional convolutional neural network (CNN) structure can only accept limited local context information. And most of the current methods only focus on the classification of benign and malignant brain tumors, multi classification of brain tumors is not common. In response to these shortcomings, considering that convolution kernels of different sizes can extract more comprehensive features, we put forward the multi-size convolutional kernel module. And considering that the combination of average-pooling with max-pooling can realize the complementary of the high-dimensional information extracted by the two structures, we proposed the dual-channel pooling layer. Combining the two structures with ResNet50, we proposed an improved ResNet50 CNN for the applications in multi-category brain tumor classification. We used data enhancement before training to avoid model over fitting and used five-fold cross-validation in experiments. Finally, the experimental results show that the network proposed in this paper can effectively classify healthy brain, meningioma, diffuse astrocytoma, anaplastic oligodendroglioma and glioblastoma.
Brain tumors are one of the most deadly cancers. Early detection plays an important role in the treatment and rehabilitation of patients. However, manual brain tumor diagnosis by physicians is a less accurate and time-consuming procedure [
Medical image analysis has become a recent research hotspot. Machine learning and deep learning approaches in particular play core roles in computer-assisted brain image analysis, segmentation, registration, and tumor tissue classification [
The detection of brain tumors has received extensive research attention, and various detection methods have been proposed in several years. Abd-Ellah [
Sajjad [
To solve the problem of gradient vanishing or gradient explosion when simply deepening the network, He proposed a deep convolutional residual network in 2015—-ResNet50 [
Finally, according to the extracted 2048 features, the tensor was flattened into to a 4096-dimensional vector and then sent to softmax for classification. Taking the tumor classification in this paper as an example, let the final classification number K be five, and give a training set
Due to the excellent structure of the residual network, ResNet50 was selected for the experiment and two improvements were made to it. The improved network structure is shown in
In the literature [
Suppose that the input characteristic dimension of the multi-size convolutional kernel module is
The concat( ) function only adds the dimension describing the features of the image itself when used for the combination of channels, and the feature information under each dimension is not added. Although the add() function does not add the dimensions describing the features of the image itself, the amount of information under each dimension increases, which is obviously beneficial to the classification of the final images.
In ResNet50 network, there is only a pooling layer after Stage 1 and Stage 5, which are max-pooling and average-pooling respectively. The pooling layer can be understood as extracting significant features once again, and then obtaining more advanced features after repeated stacking. This operation can reduce the amount of data processing while retaining useful information, thus achieving feature dimension reduction, data compression and over-fitting reduction. Average-pooling can reduce the error caused by the increase of the estimated variance due to the size limit of the neighborhood and save most of the background information. Max-pooling focusing on texture information can balance the deviation of the estimated mean value caused by convolution parameter error [
The dataset used in this paper consisted of healthy brain images and diseased brain images, totaling 555 MRI brain images. The healthy brain images were from
Rich data is the key to build deep learning model effectively. The most common method of data enhancement is to add noise or apply geometric transformation to the image, which helps to prevent over fitting of the network model. Therefore, this paper makes a series of extensions to the images in the dataset:
Unify the size of the image: the size of all pictures is 224 × 224. Image rotation and flipping operation: CNN can learn different features when it is placed on images in different directions. Therefore, this paper uses rotation, horizontal flip and vertical flip data enhancement methods so that the same image in the training set can be enhanced to multiple images with the same essence. The operation of image rotation and flipping is described as follows: suppose that in the coordinate axis, the coordinate of a certain point in the image is
If the height of the image is
(3) Add salt and pepper noise: salt and pepper noise is characterized by random black and white pixels propagating in the image, which is similar to the effect produced by adding Gaussian noise to the image, but has a low level of information distortion. Therefore, this paper uses salt and pepper noise as data enhancement method.
The improved network model based on ResNet50 which proposed in this paper was built using keras library of python, and the training batch size was set to 32 and the epoch was 100. During the data enhancement experiment, the training images were traversed by the ImageDataGenerator, and the random transformed data enhancement was performed on each image. The initial learning rate was set to 0.001, and the weight decay to 0.0002. And we use the stochastic gradient descent (SGD) method to train the improved network model. The expression of SGD is as follows:
After the training was completed, the model effect was verified using K-fold cross-validation. Cross-validation is a method of model selection by estimating the generalization error of the model [
In order to comprehensively evaluate the performance of the network in this paper, the following indicators will be used as the evaluation criteria of the experimental results.
Accuracy: the ratio of the number of correct predictions in the sample to the total number of samples. The calculation formula is as follows:
Precision: the ratio of the samples with positive prediction to all samples with positive prediction. The calculation formula is as follows:
Recall rate: it refers to the ratio of positive samples correctly predicted to all positive samples. The calculation formula is as follows:
F1Score: this index takes into account both the precision and recall rate of the classification model, and can be regarded as a weighted average of the precision and recall rate of the model. The calculation formula is as follows:
Wherein
In addition, the loss function (Loss) can also be used to estimate the degree of inconsistency between the predicted value and the real value of the model, and the smaller the loss function is, the better the robustness of the model is. Therefore, this paper uses cross loss quotient to judge the performance of the model. For the sample points
In order to achieve the best classification effect of the network in this paper, the size of the convolution kernel on the third branch of the multi-size convolution kernel module is set up and tested in different methods. The experiment is carried out on the first cross validation scheme. In this paper, the following experiments are carried out when the 7 × 7 single convolution layer in Stage 1 of ResNet50 is improved to a multi-size convolution kernel module. Method A: a commonly used 5 × 5 convolutional kernel was connected to the behind of 1 × 1 convolutional kernel. Method B: considering convolutional kernels of different sizes can extract features of different sizes, the commonly used 5 × 5 convolutional kernel behind the 1 × 1 convolutional kernel in the third branch was replaced with 7 × 7 convolutional kernel tentatively. Method C: considering that replacing large convolutional kernels with two small convolutional kernels could reduce the number of parameters and speed up computation, two 3 × 3 convolutional kernels were used to replace the 5 × 5 convolutional kernels behind the 1 × 1 convolutional kernel in the third branch tentatively. The specific parameters and experimental results of the multi-size convolutional kernel module are shown in
As indicated by
In order to determine the optimal method of adding dual-channel pooling layer, the following experiments are carried out under the condition that the multi-size convolution kernel module is determined as the combination method in method A in
In ResNet50 network and the improved network, two sets of experiments were carried out with two training sets before and after data enhancement, and the test set images were used as validation. The training set curve is drawn as shown in
Method | Without data enhancement | With data enhancement |
---|---|---|
ResNet50 | 85.71% | 95.37% |
Improved network | 88.62% | 98.59% |
In order to verify the significance of the improvement of ResNet50, the experimental comparison between the improved network and AlexNet, VGGNet, GooleNet and ResNet50 is carried out. The experimental results are shown in
Network | Precision (%) | Recall (%) | F1Score (%) | Accuracy (%) |
---|---|---|---|---|
AlexNet | 65.36 | 67.24 | 66.25 | 84.27 |
VGGNet | 70.43 | 73.33 | 71.82 | 87.09 |
GooLeNet | 76.1 | 77.74 | 76.72 | 89.04 |
ResNet50 | 90.3 | 91.85 | 91.04 | 95.37 |
Proposed network | 96.97 | 96.77 | 96.87 | 98.59 |
The AlexNet model [
It can be found from the above results that due to the close relationship between ResNet50 and the improved network, they can effectively solve the vanishing gradient problem by adding the residual module, which greatly reduces the number of parameters and increases the utilization of features. Both models outperform AlexNet, VGGNet and GoogLeNet in terms of precision, recall, F1Score and accuracy. Among them, AlexNet has the worst classification effect. However, the improved network has the highest results with regard to the four indices. The average classification precision of the improved network reaches 96.97%, and the average classification accuracy reaches 98.59%. The results show that the improved network can extract more features and it has effectively combined the advantages of max-pooling and average-pooling, so that the background information and the detail information extracted by the two can complement each other, thus the final recognition rate is relatively high.
For the multi classification of brain tumors, an improved convolutional neural network based on ResNet50 is proposed in this paper. The network uses multi-scale convolution core module and dual channel pooling layer to improve the single 7 × 7 convolution layer and single pooling layer in ResNet50. The multi size convolution core module is composed of convolution branches with different size convolution cores, which can extract rich feature information from the input image; the dual channel pooling layer combines the advantages of maximum pooling and average pooling, so that the extracted detail information and background information can complement each other. Accuracy, precision, recall and f1score were used to evaluate the network performance, and the five-fold cross validation method was used to analyze the classification effect of the network. Finally, the average classification accuracy of healthy brain, meningioma, diffuse astrocytoma, anaplastic oligodendroglioma and glioblastoma was 98.59%.
I would like to solemnly thank those who helped me generously in the process of my research. They are the instructors who give me pertinent suggestions on revision and the classmates who help me to collect dataset together. With their support and encouragement, I can successfully complete this paper.