#These authors contributed to the work equally and should be regarded as co-first authors
Since 2019, the coronavirus disease-19 (COVID-19) has been spreading rapidly worldwide, posing an unignorable threat to the global economy and human health. It is a disease caused by severe acute respiratory syndrome coronavirus 2, a single-stranded RNA virus of the genus Betacoronavirus. This virus is highly infectious and relies on its angiotensin-converting enzyme 2-receptor to enter cells. With the increase in the number of confirmed COVID-19 diagnoses, the difficulty of diagnosis due to the lack of global healthcare resources becomes increasingly apparent. Deep learning-based computer-aided diagnosis models with high generalisability can effectively alleviate this pressure. Hyperparameter tuning is essential in training such models and significantly impacts their final performance and training speed. However, traditional hyperparameter tuning methods are usually time-consuming and unstable. To solve this issue, we introduce Particle Swarm Optimisation to build a PSO-guided Self-Tuning Convolution Neural Network (PSTCNN), allowing the model to tune hyperparameters automatically. Therefore, the proposed approach can reduce human involvement. Also, the optimisation algorithm can select the combination of hyperparameters in a targeted manner, thus stably achieving a solution closer to the global optimum. Experimentally, the PSTCNN can obtain quite excellent results, with a sensitivity of 93.65% ± 1.86%, a specificity of 94.32% ± 2.07%, a precision of 94.30% ± 2.04%, an accuracy of 93.99% ± 1.78%, an F1-score of 93.97% ± 1.78%, Matthews Correlation Coefficient of 87.99% ± 3.56%, and Fowlkes-Mallows Index of 93.97% ± 1.78%. Our experiments demonstrate that compared to traditional methods, hyperparameter tuning of the model using an optimisation algorithm is faster and more effective.
COVID-19 is a new global epidemic characterised by high infectivity and variability, posing a significant threat to human life and the global economy (
Symptoms of COVID-19 vary significantly among individuals with a continuous cough, fever, taste loss, and, in severe cases, death, thus making COVID-19 even more dangerous. In general, the spread of infectious diseases can be stopped by isolating the source of infection and blocking the transmission route. However, because of (1) the mutation of the new coronavirus, (2) the increase in the number of asymptomatic patients, and (3) the difficulties of diagnosis, isolating the source of COVID-19 infection becomes a challenge (
The most widely used test for COVID-19 is the reverse transcription polymerase chain reaction (RT-PCR) (
COVID-19 is an infectious emergency respiratory disease, and the disease state is usually reflected in the lungs. X-ray and CT, the most common medical imaging techniques in modern medicine, can play a vital role in diagnosing COVID-19 by reflecting changes in the lung tissue through their chest impact. As a relatively new technology, computed tomography (CT) imaging allows multi-layered photography of the target area to form a three-dimensional image, providing multi-angle image data with a higher resolution than X-ray images (
As one of the most influential frontier technologies of the 20th century, artificial intelligence (AI) significantly impacts human work and life (
This paper uses the particle swarm optimisation algorithm (PSO) to optimise the three hyperparameters and gradient-based localisation of CNN to generate visual explanations. The proposed approach uses particle swarm optimisation algorithms to perform auto hyperparameters tuning, reducing the dependence of model construction on machine learning experts. In addition, PSO purposefully finds the hyperparameters closest to the optimal solution more consistently. Our method has achieved a promising performance in COVID-19 diagnosis.
Our contributions to this study are as follows: (i) we experimentally demonstrated the possibility of using optimisation algorithms for hyperparameter tuning, (ii) we proposed a high-performance COVID-19 diagnostic method with a visual explanation based on CT chest images, and (iii) we further explored the potential of AI-based techniques in medical image processing. In the rest of the paper,
The experiment used a publicly available chest CT image slice dataset proposed by
Class | Ratio | No. of samples |
---|---|---|
Positive (COVID-19) | 0.5 | 148 |
Negative (Health Control) | 0.5 | 148 |
The dataset used for the study was small, so we introduced the 10-Fold Cross Validation to train and evaluate the model. Specifically, we divided the dataset into ten groups and performed ten runs, with different groups selected as the test set and all other groups as the training set for each run. The model was thoroughly trained and evaluated for each run to obtain performance metric values. The final performance of the model was obtained by calculating the sum, the mean and standard deviation (MSD) of all ten sets of performance metric values. This approach allowed for efficient use of the samples in the dataset and effectively avoided overfitting.
CNN is one of the trendiest research directions in computer-aided diagnosis tasks. It comprises different network layers, e.g., the input layer, convolutional layer, activation layer, pooling layer, and output layer. By combining these network layers, CNNs can effectively solve the problems of spatial information loss when expanding images into vectors, inefficiency training and network overfitting caused by high parameter volume when processing large images with fully connected neural networks (
Many existing deep learning-based CAD methods are based on CNNs.
Our method uses a five-layer neural network, including three convolution layers and two fully connected layers.
Convolutional layers are based on the concept of convolution. It is the core component of CNNs and generates most of the computation in the network. A convolutional layer contains a number of learnable filters (kernels), where the kernels are usually squares with smaller widths and lengths but the same depth as the input image. The convolution is the process by which the kernel slides over the image. The sliding direction is from left to right for the width direction and then repeats sliding following the width direction from the top rows to the bottom rows of the image until reaching the bottom edge of the image. For each step of the sliding process of the convolution, each pixel of the region mapped by the kernel in the input image is multiplied by the information at the location corresponding to the flipped kernel. All the results generated are summed to aggregate the information. The region mapped by the kernel in the input image is called the sliding window. The step size of the sliding,
In this process, the filters scan across the input image, so every neighbourhood of the image is processed by the same filter. This sharing weight feature reduces the number of parameters, thus reducing computational costs and preventing overfitting due to too many parameters. Assume the size of kernel
The output
where
where
Activation functions play an essential role in CNNs, bringing a non-linear factor to the neural network, enhancing its expressive power, and thus improving the final classification performance. Working with high-dimensional input data such as images would be computationally prohibitive for each neuron in a network layer to fully connect to all the neurons in the previous layer (
Applying
where
Padding can regulate the size of the output of the network layer. In the convolution process, pixels at the edges of input images are never located in the centre of the kernel. These pixels are used far less than the pixels in the centre of the image, resulting in a significant loss of information at the image boundaries. In addition, the output image from the convolution process often does not maintain the same size as the input image, and different kernel sizes result in various degrees of image shrinkage. Padding is designed to address this issue, which has two modes, VALID mode and SAME mode.
In the VALID mode, padding does not perform any operations, and convolution performs a basic convolution operation, where the output image size is smaller than the input image. In the SAME mode, additional pixels are padded around the input image according to the kernel size (the padding value is usually 0). It allows the kernel to extend beyond the original image boundaries, thus allowing the output image to remain the same size as the original and avoiding losing information from the edges of the input image.
Training algorithms can determine the way how neural networks run. The essence of deep learning is to (1) take the loss function as the objective function, (2) input a large amount of data, (3) calculate the value of the objective function, and then (4) adjust and optimise the learnable parameters in the model to obtain the model that will give the closest result to the true value. In this process, the optimiser algorithm was followed. The choice of optimiser plays a vital role in deep learning training, as it affects the speed of convergence of the model and the final performance achieved.
Adam (
Adam can adaptively adjust the learning rate of the model parameter updates, implement the step annealing process naturally, and makes the model parameter updates independent of the gradient scaling transformation. In addition, Adam has the advantages of high computational efficiency and low memory consumption (
Grad-CAM is a method for visualising the basis of CNN decisions. It uses a heat map to mark how much attention the neural network pays to different regions when classifying data, thus highlighting the regions on which the neural network focuses its attention. In detail, Grad-CAM uses the global average of the gradients to calculate the weight
where
where
Almost all deep learning optimisers have customisable hyperparameters that can significantly influence the performance of the optimiser, hence the speed of convergence and the ultimate performance of the model. Many studies try to optimise hyperparameter tuning for better COVID-19 diagnostics.
However, the hyperparameters can have an infinite number of possible combinations. Therefore, hyperparameter tuning becomes a challenging, time-consuming and computationally expensive stage in training deep learning models. There is no straightforward and efficient way to accurately and quickly find the optimal hyperparameters. The most commonly used hyperparameter selection methods are Random Search and Grid Search.
Most hyperparameter tuning methods require many aimless attempts to find the most suitable hyperparameters, which are inefficient and ineffective, resulting in high consumption of time and computational resources. Optimisation algorithms could be a solution to this issue. This paper aims to discover the possibility of PSO in hyperparameter tuning.
Search method | Advantage | Disadvantage |
---|---|---|
Grid Search | The grid search process iterates through all possible combinations of hyperparameters without missing possible combinations as far as time and computational resources allow. | Grid search is difficult to traverse a nearly infinite number of all hyperparameter combinations, requiring huge time and computational costs, and is inefficient. |
Random Search | The random search process is stochastic and can cover a much larger range of hyperparameter combinations. | The search process is so random that, even though it can be optimized to prevent repeated computation of the same hyperparameter combinations, the large number of hyperparameter combinations makes it difficult to guarantee that the optimal hyperparameter is found. |
Particle swarm optimisation (PSO; Ours) | PSO uses many particles to purposefully find the optimal combination of hyperparameters, making the search process more efficient. | Updating particle positions requires training neural networks based on different combinations of hyperparameters, which requires high computational resources and time costs. |
To discover more possibilities of optimisation algorithm-based hyperparameter tuning. Our experiments employed the PSO algorithm to adjust the three hyperparameters in the Adam optimiser. PSO (
In PSO, a swarm represents the collection of all particles. Each particle
The three essential variables of the Adam training algorithm to tune to obtain the highest performance of the model (as shown in
At the beginning of PSO, all particles are initialised with random positions (hyperparameter configurations) and random velocity vectors. The CNN runs iteratively in each iteration with different hyperparameter configurations. It calculates the mean squared errors (MSE) for every hyperparameter configuration as its fitness value (
where
where
Repeating the above steps, all particles keep moving to hyperparameter configurations that can obtain better performance until the algorithm reaches the maximum number of iterations.
In our research, a number of experiments were performed in incremental steps to find out the neural network structure with the best performance. The final neural network is a five-layer CNN consisting of three convolution layers and two fully connected layers. A softmax function is introduced to classify the extracted features. All trainable parameters are updated following the Adam optimiser during the training process. In the tuning process, the neural networks are trained with a number of hyperparameter combinations generated or updated by PSO to obtain the performance of different combinations. Finally, the output of PSO is the hyperparameter combination of the final model with the best performance.
The following values were used for various performance indicators to evaluate the model’s performance comprehensively. (1) True Positive (TP) represents the number of positive samples that the model correctly predicts as the positive class, (2) True Negative (TN) represents the number of negative samples that the model correctly predicts as the negative class, (3) False Positive (FP) represents the number of negative samples that the model incorrectly predicted as the positive class, and (4) False Negative (FN) represents the number of positive samples that the model incorrectly predicts as the negative class.
The seven performance metrics used to assess the model’s performance are accuracy, precision, sensitivity, specificity, F1-score, Matthews correlation coefficient, and the Fowlkes-Mallows index, which provide a comprehensive evaluation of the model from a variety of perspectives to ensure a comprehensive performance evaluation.
Accuracy is one of the most common metrics used to evaluate the performance of a model. The core idea is to calculate the number of correct predictions as a percentage of the total number of samples, covering both positive and negative samples. The formula for accuracy is shown in
Although accuracy can assess the overall performance of a model with a dataset containing both positive and negative samples, it is not rigorous for an unbalanced dataset. For example, suppose there is a dataset with 90% positive samples and only 10% negative samples. A model that predicts all samples as positive can achieve 90% accuracy, but this is not an accurate representation of the model’s performance. In short, accuracy is not an effective way to evaluate the predictive performance of a model for the positive and negative samples separately, so three performance metrics, precision, sensitivity, and specificity, were introduced to provide a more comprehensive evaluation of the model.
Precision evaluates model performance primarily based on prediction results by calculating the number of samples correctly predicted as positive as a proportion of all samples predicted as positive to assess the probability that the model is correctly predicted in the samples where the model is predicted as positive. The precision of a model increases as the FP decreases, which can be a guide for finding the lowest FP. The formula of precision is shown in
Specificity is another metric that can be used to measure the performance of a model for positive samples. The difference is that specificity is calculated based on the true labels of the data rather than the model predictions, and the ’performance of the model is assessed by calculating the number of samples predicted to be negative as a proportion of the total number of negative samples in the dataset. The specificity of a model also increases as the FP decreases and is calculated as shown in
Sensitivity is also calculated based on the true label of the data, except that it evaluates the model’s performance by calculating the proportion of samples predicted to be positive to the total number of positive samples in the data set. The phenomenon that sensitivity reflects is somewhat the opposite of precision. It increases as FN decreases, which can guide finding the lowest FN. Its formula is shown in
F1-score is the performance metric that considers both precision and sensitivity, tries to find the balance between these two metrics, and simultaneously makes them the highest possible values. The calculation of the F1-score is as shown in
Matthews Correlation Coefficient (MCC) compensates that the four elements TP, TN, FP, and FN are not fully considered in the abovementioned metrics. It considers the true and predicted values as two variables and calculates the correlation coefficient. The higher the correlation between the true and predicted values, the better the model performance. An MCC value of 1 indicates a perfect positive correlation between the predicted and true results
The Fowlkes-Mallows Index (FMI) is a performance metric that considers both precision and sensitivity. Its calculation is as shown in
Area Under Curve (AUC) is an important performance metric for evaluating binary classification models and is derived by calculating the area under the receiver operating characteristic (ROC) curve. The vertical and horizontal axes are true positive rate (TPR) = sensitivity and false positive rate (FPR) = specificity. The ROC curves were obtained by traversing
To minimise the bias in the model performance evaluation results, we used the 10-fold cross-validation to test the model’s performance under the optimal hyperparameter configuration obtained by the optimisation algorithm. As a result, we obtained a sensitivity (Sen) of 93.65% ± 1.86%, a specificity (Spc) of 94.32% ± 2.07%, a precision (Prc) of 94.30% ± 2.04%, an accuracy (Acc) of 93.99% ± 1.78%, an F1-score (F1) of 93.97% ± 1.78%, Matthews correlation coefficient (MCC) of 87.99% ± 3.56%, and Fowlkes-Mallows index (FMI) of 93.97% ± 1.78%. The 10-runs 10-fold cross-validation results are shown in
Sen | Spc | Prc | Acc | F1 | MCC | FMI | |
---|---|---|---|---|---|---|---|
R1 | 89.86 | 91.89 | 91.72 | 90.88 | 90.78 | 81.77 | 90.79 |
R2 | 91.89 | 95.27 | 95.10 | 93.58 | 93.47 | 87.21 | 93.48 |
R3 | 94.59 | 93.92 | 93.96 | 94.26 | 94.28 | 88.52 | 94.28 |
R4 | 93.92 | 93.92 | 93.92 | 93.92 | 93.92 | 87.84 | 93.92 |
R5 | 92.57 | 91.89 | 81.95 | 92.23 | 92.26 | 84.46 | 92.26 |
R6 | 93.92 | 96.62 | 96.53 | 95.27 | 95.21 | 90.57 | 95.21 |
R7 | 93.92 | 91.89 | 92.05 | 92.91 | 92.98 | 85.83 | 92.98 |
R8 | 93.92 | ||||||
R9 | 93.92 | 93.92 | 93.92 | 93.92 | 93.92 | 87.84 | 93.92 |
R10 | 95.27 | 96.62 | 96.58 | 95.95 | 95.92 | 95.92 | |
MSD | 93.65 ± 1.86 | 94.32 ± 2.07 | 94.30 ± 2.04 | 93.99 ± 1.78 | 93.97 ± 1.78 | 87.99 ± 3.56 | 93.97 ± 1.78 |
Note: R1, R2, …, R10: Run 1, Run 2, …, Run 10; MSD: Mean ± Standard Deviation; Sen: Sensitivity; Spc: Specificity; Prc: Precision; Acc: Accuracy; F1: F1-score; Mcc: Matthews correlation coefficient; FMI: Fowlkes-Mallows index.
Model | Sen | Spc | Prc | Acc | F1 | MCC | FMI |
---|---|---|---|---|---|---|---|
WRE+ 3SBBO ( |
86.40 ± 3.00 | 85.81 ± 3.14 | 86.14 ± 3.03 | 86.12 ± 2.75 | 86.16 ± 2.77 | 72.42 ± 5.55 | 86.15 ± 2.76 |
GoogLeNet-COD-A ( |
90.54 ± 2.16 | 82.77 ± 2.65 | 84.07 ± 1.93 | 86.66 ± 1.14 | 87.15 ± 1.06 | 73.59 ± 2.25 | 87.23 ± 1.07 |
GLCM-SVM ( |
72.03 ± 2.94 | 78.04 ± 1.72 | 76.66 ± 1.07 | 75.03 ± 1.12 | 74.24 ± 1.57 | 50.20 ± 2.17 | 74.29 ± 1.53 |
6L-CNN ( |
89.47 ± 1.50 | 87.47 ± 2.11 | 87.75 ± 1.76 | 88.47 ± 1.05 | 88.59 ± 0.99 | 76.98 ± 2.09 | 88.60 ± 0.99 |
SIDCAN ( |
92.86 ± 1.59 | 93.64 ± 2.09 | 93.36 ± 2.02 | 93.26 ± 0.74 | 93.08 ± 0.71 | 86.55 ± 1.49 | 93.10 ± 0.72 |
PZM-DSSAE ( |
92.06 ± 1.54 | 92.56 ± 1.06 | 92.53 ± 1.03 | 92.31 ± 1.08 | 92.29 ± 1.10 | 84.64 ± 2.15 | 92.29 ± 1.10 |
GLCM-ELM ( |
74.19 ± 2.74 | 77.81 ± 2.03 | 77.01 ± 1.29 | 76.00 ± 0.98 | 75.54 ± 1.31 | 52.08 ± 1.95 | 75.57 ± 1.28 |
WE-Jaya ( |
73.31 ± 2.26 | 78.11 ± 1.92 | 77.03 ± 1.35 | 75.71 ± 1.04 | 75.10 ± 1.23 | 51.51 ± 2.07 | 75.14 ± 1.22 |
GLCM+SNN ( |
74.66 ± 1.87 | 78.00 ± 1.29 | 77.24 ± 1.15 | 76.33 ± 1.18 | 75.92 ± 1.31 | 52.70 ± 2.34 | 75.93 ± 1.30 |
WE-SAJ ( |
85.47 ± 1.84 | 87.23 ± 1.67 | 87.03 ± 1.34 | 86.35 ± 0.70 | 86.23 ± 0.77 | 72.75 ± 1.38 | 86.24 ± 0.76 |
PSTCNN (Ours) |
Note: Sen: Sensitivity; Spc: Specificity; Prc: Precision; Acc: Accuracy; F1: F1-score; Mcc: Matthews correlation coefficient; FMI: Fowlkes-Mallows index.
Since 2019, the global economy and human health have continuously received threats from COVID-19. In addition, the COVID-19 pandemic has highlighted the global shortage of healthcare resources. AI technologies to aid diagnosis are one of the vital viable options to alleviate this problem. This paper explores the possibility of further automation based on traditional AI techniques. It confirms this possibility by automating hyperparameter tuning using an optimisation algorithm and the excellent performance achieved by this method.
However, our method was only experimentally tested for three hyperparameters of the neural network training process, i.e., the learning rate, the coefficient that controls the exponential decay rates of the past gradient, and the coefficient that controls the exponential decay rates of the square of the past gradient. A few more hyperparameters can be tuned that we did not cover in this report, e.g., the number of network layers and the type of network layers. It means that the proposed method is insufficient as a final solution for self-tuning neural networks. In addition, the proposed method takes advantage of the purposeful movement of particles in the particle swarm optimisation algorithm for the hyperparameters to approach the optimal solution in each iteration. However, the movement of particles in each iteration is updated according to the direction of the best solution in the previous iteration, which may deviate from the global optimal solution and fall into the local optimal solution if the best solution is not in the path between the particles and the global optimal solution.
In future research, we will further explore the applicability of other optimisation algorithms to this task and attempt to avoid locally optimal solutions when obtaining combinations of hyperparameters while covering more hyperparameters in the experiment. We believe that reducing the dependence of model training on machine learning experts can effectively accelerate the generalisation of AI technologies across different domains. AI techniques will therefore become an essential tool in human life and industries in the near future.