|Computers, Materials & Continua |
VISPNN: VGG-Inspired Stochastic Pooling Neural Network
1School of Mathematics and Actuarial Science, University of Leicester, Leicester, LE1 7RH, United Kingdom
2Department of Computer Science, HITEC University Taxila, Taxila, 47080, Pakistan
3School of Informatics, University of Leicester, Leicester, LE1 7RH, United Kingdom
*Correspondence Author: Yu-Dong Zhang. Email: email@example.com
Received: 14 April 2021; Accepted: 28 June 2021
Abstract: Aim Alcoholism is a disease that a patient becomes dependent or addicted to alcohol. This paper aims to design a novel artificial intelligence model that can recognize alcoholism more accurately. Methods We propose the VGG-Inspired stochastic pooling neural network (VISPNN) model based on three components: (i) a VGG-inspired mainstay network, (ii) the stochastic pooling technique, which aims to outperform traditional max pooling and average pooling, and (iii) an improved 20-way data augmentation (Gaussian noise, salt-and-pepper noise, speckle noise, Poisson noise, horizontal shear, vertical shear, rotation, Gamma correction, random translation, and scaling on both raw image and its horizontally mirrored image). In addition, two networks (Net-I and Net-II) are proposed in ablation studies. Net-I is based on VISPNN by replacing stochastic pooling with ordinary max pooling. Net-II removes the 20-way data augmentation. Results The results by ten runs of 10-fold cross-validation show that our VISPNN model gains a sensitivity of 97.98 ± 1.32, a specificity of 97.80 ± 1.35, a precision of 97.78 ± 1.35, an accuracy of 97.89 ± 1.11, an F1 score of 97.87 ± 1.12, an MCC of 95.79 ± 2.22, an FMI of 97.88 ± 1.12, and an AUC of 0.9849, respectively. Conclusion The performance of our VISPNN model is better than two internal networks (Net-I and Net-II) and ten state-of-the-art alcoholism recognition methods.
Keywords: Deep learning; alcoholism; multiple-way data augmentation; VGG; convolutional neural network; stochastic pooling
Alcoholism (also known as alcohol use disorder) is a disease that a patient becomes dependent or addicted to alcohol . Patients with alcoholism continue to drink alcohol; even drinking causes negative consequences to themselves. The difference between alcoholism and alcohol abuse is people of alcohol abuse are not physically dependent on alcohol . Excessive alcohol use will damage all organ systems, but it mainly influences the heart, liver, brain, pancreas, and immune system . Besides, alcoholism may cause schizophrenia, bipolar disorder, depression , irregular heartbeat , Wernicke–Korsakoff syndrome , etc.
Long-term alcoholism affects the brain, i.e., the volumes of white matter and grey matter of patients are smaller than age-matched controls. The brain shrinkages and lateral ventricle enlargement  caused by alcoholism can be observed via magnetic resonance imaging (MRI) scanning, which facilitates doctors diagnosing alcoholism.
Nevertheless, the diagnosis of alcoholism is mainly based on manual observation of brain images in the current clinical routine, which is naturally labor-intensive and onerous. Mainly, the slight shrinkage of alcoholism brains in the early prodromal stage  associated with mild symptoms is susceptible to be neglected by radiologists and clinicians, which may trigger costs to the patient and his/her family. In light of the above limitations, accurate and fast diagnostic artificial intelligence (AI) models to recognize alcoholism are beneficial to patients, families, and society.
In the past, various AI models have been proposed to recognize alcoholism. Fig. 1 shows the relationship between AI with machine learning (ML) and deep learning (DL). ML is a subfield of AI, while DL is a subfield of ML. Hou  brought about a novel algorithm—Predator-prey Adaptive-inertia Chaotic Particle Swarm Optimization (PACPSO). The authors applied the PACPSO to identify alcoholism. Jenitta et al.  presented a local mesh vector co-occurrence pattern (LMVCoP) feature for assisting diagnosis. This method can be used for the application of alcoholism identification. Han  proposed a three-segmented encoded Jaya (3SJ) method to identify alcoholism. The authors found the 3SJ gave better performance than other optimization methods, such as multi-objective genetic algorithm, plain Jaya, bee colony optimization, particle swarm optimization, etc. Lima  presented a novel method utilizing Haar wavelet transform (HWT) to extract features from brain scanning images of patients. Their method achieved an accuracy of 81.57 ± 2.18% on their dataset. Afterward, Macdonald  presented a wavelet energy logistic regression (WELR) model. The authors used 5-fold stratified cross-validation to verify the performances of their model. Qian  proposed a computer vision-based technique that utilizes cat swarm optimization (CSO), which mimics the behaviour of the cats. In their experiment, CSO was demonstrated to have better performances than four bio-inspired algorithms. Chen  presented a new model combining support vector machine (SVM) with genetic algorithm (GA). The combined model is abbreviated as SVMGA. The authors stated their model was effective in alcoholism detection, showing an average accuracy of 88.68 ± 0.30%. Chen  presented an AI model based on a linear regression classifier (LRC) for alcoholism detection.
Recently, DL techniques have been successfully applied to alcoholism recognition. Lv  created a 7-layer convolutional neural network (CNN). Their experiments showed stochastic pooling (SP) provided better performance than other pooling methods. Nevertheless, their CNN structure was simple, so its expression ability was limited. Xie  used the AlexNet transfer learning (ANTL) model. The authors fine-tuned their model and tested five different replacement configurations.
There are some other ML methods based on different data sources. For example, Kamarajan et al.  used random forest and Electroencephalogram (EEG) source functional connectivity, neuropsychological functioning, and impulsivity measures to classify alcohol use disorder. Quaglieri et al.  harnessed functional MRI (fMRI) to analyze brain network underlying executive functions in gambling and alcohol use disorder. Many other scanning modalities and protocols may help identify alcoholism; however, we focus on MRI in this study due to its high-resolution three-dimensional imaging ability.
The motivation of this paper is to propose a novel model, VGG-inspired stochastic pooling neural network (VISPNN), for alcoholism recognition with an expectation to obtain better performance than existing alcoholism identification approaches. The contributions of our study are the following four aspects.
(a) A VGG-inspired network is used as a mainstay network.
(b) Stochastic pooling is used to replace traditional max pooling.
(c) Improved multiple-way data augmentation is proposed to avoid overfitting.
(d) Our model is proven to render better performances than state-of-the-art methods.
2.1 Introduction of VGG
Tab. 1 displays the abbreviation list in this study for ease of reading. First, we introduce VGG, which stands for Visual Geometry Group, an academic group at Oxford University. The VGG team presented two renowned networks: VGG-16  and VGG-19, encompassed as library packages of prevalent programming language platforms, e.g., Python and MATLAB.
Fig. 2 displays the structure of VGG-16, which is composed of five convolutional blocks (CBs) and three fully connected layers (FCLs). The size of the input of VGG-16 is . After the 1st CB, the output is . Components of 1st CB are shown in Tab. 2. The 1st CB can be written as
which means “ repetitions of kernels with sizes of followed by a max-pooling with a size of .”
Note that (i) the activation function: rectified linear unit (ReLU) layers are skipped in the subsequent texts as default. (ii) Stride and padding are not reported because they can be calculated easily. The five CBs are itemized in Tab. 3, and the feature map (FM) of the output is displayed in the final column. After five CBs, the FM is compressed from to a vector with a size of 25,088 neurons. Three FCLs with neurons of 4096, 4096, and 1000 are appended at last.
2.2 Improvement I: Stochastic Pooling
Within the standard CNNs, pooling is an essential component followed by a convolution layer (See Layer 5 in Tab. 2) to reduce the size of FMs. Traditional pooling methods are either max-pooling (MP) or average pooling (AP).
Suppose we have an FM, which can be separated into blocks, in which every block has the size of . Now let us focus on the -th row and -th column block
where (See Fig. 3).
The strided convolution (SC) traverses the input activation map with the strides, which equals the size of the block , so here its output is set to
The l2-norm pooling (L2P), average pooling (AP), and max pooling (MP) generate the l2-norm value, average value, and maximum value within the block , respectively. Nevertheless, the AP outputs the average, downscaling the greatest values, where the essential features may lie. In contrast, MP stores the most significant value but deteriorates the overfitting obstacle.
Alternatively, stochastic pooling (SP) provides a way to the defects of AP and MP. Successful application cases are using SP in the stochastic resonance model , COVID-19 recognition , etc. SP is a four-step procedure. Step 1 generates the probability map (PM) for each entry in the block .
where stands for the PM value at pixel .
Step 2 creates a random location vector (RLV) that takes the discrete probability distribution (DPD) as
where represents the probability.
Step 3, a sample location vector is drawn from the RLV , and we have .
Step 4, the output of SP is at location , namely
Fig. 4 gives an example where we can observe the 1-st row and 2-nd column block , outlined by red rectangle. The L2P calculates
The AP and MP output the pooling values of 5.37 and 9.5, respectively. In contrast, the SP generates the PM and selects the top-left pixel in the block and output the SP value of 6.4.
2.3 Improvement 2: VGG-Inspired Stochastic Pooling Neural Network
A novel VGG-inspired mainstay network is proposed. Tab. 4 shows the structure of the proposed 10-layer VGG-inspired mainstay network. The definition of can be found in Tabs. 2 and 3. The variables represents the weights and biases of FCL, respectively. The NWL in Tab. 4 represents the number of weighted layers. The total weighted layers in this VGG-inspired mainstay network is . We can observe that after four CBs, the size of output FM is , which is flattened to a vector of 7744, which is then sent through an FCL with 100 neurons, finally outputting two neurons indicating alcoholism or healthy.
The structure of our VGG-inspired mainstay network is displayed in Fig. 5a. If we replace the max-pooling in each CB with stochastic pooling, we can get the proposed VGG-Inspired Stochastic Pooling Neural Network (VISPNN), as shown in Fig. 5b.
2.4 Improvement 3: 20-Way Data Augmentation
The dataset in this study was reported in Ref. , composed of 188 alcoholic brain images and 191 non-alcoholic brain images. Fig. 6 shows two samples of our dataset.
The relatively small dataset may breed the overfitting problem. To avoid this, data augmentation (DA) is a powerful tool because it can generate fake images on the training set. Cheng  presented a 16-way DA, in which 8 DA techniques were applied on both and . The multiple-way DA shows better performance than traditional DA. This study is based on 16-way DA from Cheng ; furthermore, we add two new DAs on both and . One DA is speckle noise (SN), which alters the image as
where is uniformly distributed random noise. The mean and variance of are set to 0 and 0.05, respectively.
The other DA is Poisson noise (PN). In the electronics field, PN originates from the discrete nature of the electric charge. Instead of adding artificial noise to the raw image, we generate PN from the raw image. The pixel values of raw images are stored in uint 8 format; if a pixel has the value of 20, then the corresponding pixel of the PN altered image will be generated from a Poisson distribution with a mean of 20. Mathematically
where represents a Poisson distribution with a mean of , and are the coordinates. is the temporary variable. The min function helps the final output is within the value of . Using a colourful natural image can observe how those two noises alter the image, as shown in Fig. 7.
With the help of SN and PN, we can propose a novel 20-way DA, First, different DA methods as shown in Fig. 8 are applied to . Let be each DA operation, we have augmented datasets on raw image as:
Let stand for the size of generated new images for each DA method, we have
Second, horizontally mirrored image is generated as:
where stands for horizontal mirror function.
Third, all the different DA methods are performed on the mirrored image . We generate new datasets as:
Fourth, the raw image , the mirrored image , all the above -way results of raw image , and all -way DA results of horizontal mirrored image are combined. The final generated dataset from is defined as :
where stands for the concatenation function. Suppose augmentation factor is , which stands for the number of images in , we obtain
Algorithm 1 summarizes the pseudocode of the proposed 20-way DA method. In this study, we set , i.e., a 20-way DA. We also set , thus , indicating each raw training image will generate 602 images, including the raw image itself.
-fold cross-validation is employed. The whole dataset is divided into folds . At -th trial, , the -th fold is picked up as the test, and the rest folds: are chosen as training set (Fig. 9). We let , namely a 10-fold cross validation. Furthermore, we run the 10-fold cross validation ten times.
Seven measures are used based on the confusion matrix of 10 runs of 10-fold cross-validation. Let stands for the confusion matrix
where means true positive, false negative (FN), false positive, and true negative. Sensitivity, specificity, precision, and accuracy are already familiarized to readers, so we will not give their definitions. Besides, we use the F1 score, Matthews correlation coefficient (MCC), and Fowlkes–Mallows index (FMI).
The receiver operating characteristic (ROC) curve is used to provide a graphical plot of measuring AI models. First, the ROC curve is produced by plotting the true positive rate against the false-positive rate at various threshold levels. Then, the area under the curve (AUC) is calculated via ROC.
3.1 20-Way DA Results
Fig. 10 shows the -way DA results of raw image, which is chosen as Fig. 6a. Due to the page limit, we do not display the horizontally mirrored image and the corresponding -way DA results.
3.2 Statistical Results of Proposed Method
Tab. 5 itemizes the statistical results (10 runs of 10-fold cross-validation) of the proposed VISPNN method. The mean and standard deviation (MSD) over ten runs are displayed in the last row. It shows our model reaches a sensitivity of 97.98 ± 1.32, a specificity of 97.80 ± 1.35, an accuracy of 97.89 ± 1.11, respectively.
3.3 Ablation Studies
An ablation study is a procedural experiment that removes a network's submodule to understand that submodule better. Two ablation studies are carried out: (i) Net-I: We remove stochastic pooling from the proposed VISPNN model and use max-pooling to replace the removed layers. Thus, the network is named Net-I. (ii) Net-II: We remove the multiple-way data augmentation. The resultant network is named Net-II. The comparison of our VISPNN model with Net-I and Net-II is shown in Tab. 6.
Fig. 11 displays the ROC curves comparison of the proposed VISPNN model with Net-I and Net-II. The blue patches correspond to the lower and upper confidence bounds. The AUC of Net-I is 0.9683, compared to that of VISPNN of 0.9849. Therefore, we can observe stochastic pooling indeed increase performances. Meanwhile, the AUC of Net-II is 0.9602, which is a significant drop from VISPNN (0.9849). This drop reflects that multiple-way data augmentation can significantly increase the prediction performance due to its ability to generate diverse ``fake'' training images.
3.4 Comparison to Other Alcoholism Recognition Methods
This proposed VISPNN model is compared with 10 state-of-the-art alcoholism recognition methods: PACPSO , LMVCoP , WRE , HWT , WELR , CSO , SVMGA , LRC , CNNSP , and ANTL , respectively. The comparison results are itemized in Tab. 7, with the cognate bar plot shown in Fig. 12 which ranks all the methods in order of MCC.
We can observe from Fig. 12 that the proposed VISPNN model beats all the other ten state-of-the-art methods in terms of all seven measures. The reason is three folds. First, the VGG-inspired mainstay network gains many benefits by mimicking the similar structures from VGG-16. Second, stochastic pooling helps our model more robust than max pooling does. Third, the improved 20-way data augmentation generates diverse fake training images to help our model more resistant to overfitting.
To identify alcoholism more efficiently, we propose the VISPNN model based on a VGG-inspired mainstay network, stochastic pooling technique, and an improved 20-way data augmentation. The results show that our model gains a sensitivity of 97.98 ± 1.32, a specificity of 97.80 ± 1.35, an accuracy of 97.89 ± 1.11, and an AUC of 0.9849, respectively. The performance is better than 10 state-of-the-art alcoholism recognition methods.
The limitations of this study are that this model does not go through strict clinician verification; also, the dataset is relatively small. Hence, we will try to collect more brain images of both alcoholism and healthy subjects. Meanwhile, we shall deploy our VISPNN model to the cloud server and invite clinicians and radiologists to use our web app and get their feedback to improve our model further.
Funding Statement: This paper is partially supported by the Royal Society International Exchanges Cost Share Award, UK (RP202G0230); Medical Research Council Confidence in Concept Award, UK (MC_PC_17171); Hope Foundation for Cancer Research, UK (RM60G0680); British Heart Foundation Accelerator Award, UK; Sino-UK Industrial Fund, UK (RP202G0289); Global Challenges Research Fund (GCRF), UK (P202PF11). In addition, we acknowledge the help of Dr. Hemil Patel and Dr. Qinghua Zhou for their help in English correction.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|