Shui-Hua Wang1, Muhammad Attique Khan2, Ziquan Zhu1, Yu-Dong Zhang1,*
1 School of Computing and Mathematical Sciences, University of Leicester, Leicester, LE1 7RH, UK
2 Department of Computer Science, HITEC University Taxila, Taxila, Pakistan
* Corresponding Author: Yu-Dong Zhang. Email:
Computer Systems Science and Engineering 2023, 45(1), 21-34. https://doi.org/10.32604/csse.2023.031330
Received 14 April 2022; Accepted 17 May 2022; Issue published 16 August 2022
Community-acquired pneumonia (CAP) is considered a sort of pneumonia  developed outside hospitals, and clinics, along with infirmaries . CAP may affect people of any age, but it is more prevalent in very young and elderly groups, which may need hospital treatment if they develop CAP . Chest computed tomography (CCT) is a crucial way to help radiologists/physicians to diagnose CAP patients. Recently, automatic diagnosis models based on artificial intelligence (AI) have gained promising performances and attracted researchers’ attention. For example, Heckerling, et al.  employed the genetic algorithm for neural networks to foresee CAP. This approach is shortened to the genetic algorithm for pneumonia (GAN). Afterward, Liu, et al.  proposed a computer-aided detection (CADe) model to uncover lung nodules in the CCT slides. Strehlitz, et al.  presented several prediction systems by means of support vector machines (SVMs) together with Monte Carlo cross-validation. Dong, et al.  proposed an improved quantum neural network (IQNN) for pneumonia image recognition. Ishimaru, et al.  proposed a decision tree (DT) model to foresee the atypical pathogens of CAP. Zhou  introduced the cat swarm optimization (CSO) method to recognize CAP. Wang, et al.  proposed an advanced deep residual dense network for the image super-resolution problem. Wang, et al.  proposed a CFW-Net for X-ray based COVID-19 detection.
However, the above methods still have room to improve. Their recognition performances, for example, the accuracies, are no more than or barely above 91.0%. We analyze their models and believe the reason is their training algorithms. After comparing recent global optimization algorithms, we find that particle swarm optimization (PSO) is one of the most successful optimization algorithms, compared to otheroptimization algorithms such as artificial bee colony  and bat algorithm . Hence, we use the framework in Zhou  but replace CSO with an improved PSO. In addition, we introduce the two-dimensional wavelet-entropy (2d-WE) layer, introduce an improved PSO method—adaptive chaotic PSO (ACP) , and combine it with a feed-forward neural network. The final combined model is named WE-layer ACP-based network (WACPN). The experiments show the effectiveness of this proposed WACPN model. In all, we exhibit three contributions:
(a) The 2d-WE layer is managed as the feature extractor.
(b) ACP is utilized for training the neural network to gain a robust classifier.
(c) The proposed WACPN is proven to give better results than six state-of-the-art models.
The dataset is described in Zhou , where we have 305 CAP images and 298 healthy control (HC) images. The detailed demographical information can be found in Ref. . Assume the raw CCT dataset is signified as , within which each image be signified as , and the number of entire images of both classes is , we get . The size of each image can be obtained as:
where connotes the width and height of the image set and outputs the size of x. Here . Figs. 1a and 1b depicts the schematic for preprocessing, which aims to grayscale the raw images, enhance their contrasts, cut the margins and texts, and resize the images.
Initially, the color CCT image set is transformed into grayscale images by holding the luminance channel. The grayscaled CCT image set is symbolized as .
Second, we use histogram stretching (HS) on all images to enhance the contrast. Take the i-th image as a case, its image-wise minimum, and maximum grayscale value and are calculated as:
where are temporary variables signifying the index of width and height along with the image , respectively. The HSed image set can be determined as:
Third, margin & text cropping (MTC) is implemented to eradicate (a) the checkup bed at the bottom zone, (b) the privacy-related scripts at the margin or corner zones, and (c) the ruler adjacent to the right-side and bottom zones. The MTCed image set can be determined as , where stand for pixels to be cut from four directions (left, right, top, and bottom) with the unit of pixels. Note here the size of is . By means of straightforward maths calculation, we reckon that
Lastly, each image in is resized to the extent of , acquiring the resized image set as , where signifies the resizing function.
Fig. 1c shows the extent of every raw image in is , and that of the final preprocessed image in is reduced to . In addition, the value of data-compression ratio (DCR) is obtained as . The value of space-saving ratio (SSR) is calculated as . Fig. 2 shows two examples of the preprocessed image set. We use 10-fold cross-validation in our experiment.
Tab. 1 enumerates all abbreviations and their associated meanings. The advantage of wavelet transform (WT) is that it holds both time/spatial and frequency information of the given signal/image. Nevertheless, the discrete wavelet transform (DWT) is chosen to convert the raw signal into the wavelet coefficient domain  in reality. Suppose the signal is one-dimension, first, we define the continuous wavelet transform (CWT) of as:
in which E stands for the wavelet coefficient, the mother wavelet. is defined as:
where the signifies the scale factor (SF) and the translation factor (TF).
Now, we deduct the definition of DWT from CWT. The Eq. (5) is discretized by substituting and with two discrete variables (DVs) c and v,
where c signifies the DV of the SF , and v the DV of the TF . Moreover, the original signal is a DV to , of which q signifies the DV of t. Like this, two subbands (SBs) can be calculated. The approximation SB is determined as:
where signifies the low-pass filter. is the down-sampling operation. The detail SB is determined as:
where signifies the high-pass filter.
Suppose we handle a two-dimensional (2d) image Q; the 2d-DWT  is worked out by processing row-wise and column-wise 1d-DWT in succession . Initially, the 2d-DWT operates on the original image Q. Later, four SBs are generated, where the subscript i means -th level decomposition. Tab. 2 itemizes the description of four SBs. Note here MDL means the maximum decomposition level.
Assuming signifies a 2D-DWT decomposition operation, we deduce
The subsequent decompositions run as:
where M is the MDL and m the current decomposition level .
The subband is further decomposed into four SBs at the 2nd level. The SB is later decomposed to , and then SB is decomposed accordingly. Fig. 3 portrays a diagram of 5-level 2d-DWT, whose pseudocode is represented in Algorithm 1. This study chooses a -level decomposition. The optimal value of M is found via trial-and-error approach  and related in Section 4.1.
The SBs may contain redundant features. Here we use the db4 wavelet. To decrease the number of features, we employ two-dimensional wavelet entropy (2d-WE) layer. The pseudocode of 2d-WE is illustrated in Algorithm 2. For each SB s in the generated SBs, we imagine s to be a random DV with H quantization values . In the beginning, we gauge the matching probability mass function (PMF) .
where signifies the probability function.
Second, the entropy of the PMF is calculated as :
where is the entropy function.
Lastly, the entropy values of the whole SBs are concatenated to grow a feature vector I.
where the number of the features in I is , which equals the number of SBs.
The features are thrown into a feed-forward neural network (FNN)—in which its inner connections do not make a loop. One-hidden-layer FNN (OFNN), represented in Fig. 4, is established due to the universal approximation theory. Assume stands for a training case as: signifies the input feature vector with -dimension, i denotes the neuron index at the input layer, t is the corresponding target label where signifies the number of prediction categories and k the node index at the output layer. Assuming n is the case index and N the number of entire training cases, this study symbolizes the training case as . The training of the weights/biases (WBs) of OFNN is considered an optimization problem that minimizes the loss between the target t and the real output y. This study chooses the loss as the sum of the mean-squared error (MSE) :
Assume is the activation function (AF) in the output layer, and are the WBs of neurons that connect the hidden layer (HL) to the output layer. and It is easy to reckon the output as
where signifies the output of -th neuron in the HL. The description of is
where and are the WBs of the neurons that connect the input layer with the HL, and the AF linked to the HL.
The parameter training is an optimization problem that guides us to search for the optimal WB parametric vector . The length of is the number of parameters we need to optimize and is calculated as . The training algorithm we choose is adaptive chaotic PSO (ACP) .
Recap that two attributes (position x and velocity ) are linked with each particle p in the standard PSO algorithm. Those two attributes are defined as the position of the particle (PoP) and the velocity of the particle (VoP). In each epoch, the fitness function E is re-calculated for the entire particles in the swarm. The VoP v is re-evaluated by keeping track of the two best positions (BPs).
The first is the BP a particle p has traversed till now. It is dubbed pBest and symbolized as . The second is the BP that any neighbor of p has traversed till now. It is a neighborhood best and is named nBest and symbolized as .
If p takes the entire swarm as its neighborhood, the nBest turns to the global best and is for that reason named gBest. In standard PSO, the VoP v of particle p is updated as:
where signifies the inertia weight (IW) controlling the influence of the preceding velocity of the particle on its present one. and stand for two positive constants named acceleration coefficients. and mean two random numbers, uniformly distributed in the range of [0,1]. and are re-calculated whenever they occur. The PoP x of the particle p is updated as:
where is the assumed time step and always equals 1 for simplicity.
The ACP algorithm proposed an adaptive IW factor (AIWF) strategy. It uses to replace .
Here, signifies the maximum IW, the minimum IW, the epoch once the IW goes to the final minimum IW, and k the present epoch.
Another improvement in ACP is upon the two random numbers . In reality, are created by pseudo-random number generators (RNG), which cannot guarantee the optimization’s ergodicity in solution space since they are pseudo-random. Rossler attractor (RA) is a good choice to calculate the random numbers . RA equations are defined:
We agree and to implant the chaotic properties of RA into the two parameters in standard PSO. The plane of RA is displayed in Fig. 5b.
Ten runs of 10-fold cross-validation are used to relate a reliable performance of our WACPN model. Besides, we use the following measures—sensitivity (Sen, symbolized as ), specificity (Spc, symbolized as ), precision (Prc, symbolized as ), accuracy (Acc, symbolized as ), F1 score (symbolized as ), Matthews correlation coefficient (MCC, symbolized as ), Fowlkes–Mallows index (FMI, symbolized as ), and the area under the curve (AUC)—to appraise the performances of different models.
The parameters of this study are listed in Tab. 3. The sizes of the original images are if we do not consider the number of color channels. The sizes of MTCed images are , and the sizes of preprocessed images are . The DCR is , and the SSR is . The MDL is . The number of features is . The number of neurons in HL is . The number of output neurons is . The number of parameters to be optimized is . The parameters in RA are .
Fig. 6 shows the wavelet decomposition results with . The raw image is shown in Fig. 2a. The reason why we choose is the trial-and-error method. We test other values of M and find can obtain the best result.
Tab. 4 shows the ten runs of 10-fold CV via the parameters shown in Tab. 3, where means the run index. The final row in Tab. 4 presents the mean and standard deviation (MSD) of the results of 10 runs. WACPN attains a sensitivity of 91.87 ± 1.37%, a specificity of 90.70 ± 1.19%, a precision of 91.01 ± 1.12%, an accuracy of 91.29 ± 1.09%, an F1 score of 91.43 ± 1.09%, an MCC of 82.59 ± 2.19%, and an FMI of 91.44 ± 1.09%.
If we remove the AIWF from our WACPN model, the results using the same configuration are shown in Tab. 5. Similarly, the results of removing RA from our WACPN model are shown in Tab. 6. After comparing the results in Tab. 4 against the results in Tabs. 5 and 6, we can deduce that both strategies—AIWF and RA—are beneficial to our WACPN model.
Fig. 7 represents the ROC curves together with their upper and lower bounds of the proposed WACPN model and its two ablation studies (without AIWF and without RA). The AUC of WACPN model is 0.9577. The AUCs of the models removing AIWF or RA are only 0.9319 and 0.9456, respectively, demonstrating that both AIWF and RA help improve the standard PSO.
The proposed WACPN model is judged with six state-of-the-art models: GAN , CADe , SVM , IQNN , DT , and CSO . The evaluation results on the same dataset via ten runs of 10-fold CV are listed in Tab. 7.
Error bar (EB) is an excellent tool for ease of visual evaluation. Fig. 8 presents the EB of model comparison, from which we can observe that the proposed WACPN model is superior to six state-of-the-art models. The causes are triple. First, the 2d-WE layer stands as a proficient way to designate CCT images. Second, ACP is efficient in training FNN. Third, we fine-tune and select the best parameters for the RA. In the future, our model may be applied to other fields [21,22].
A novel WACPN method is proposed for diagnosing the CAP in CCT images. In WACPN, the 2d-WE layer works as feature extraction, and the optimization algorithm—ACP—is exercised to optimize the neural network. This proposed WACPN model is verified to have better results than six state-of-the-art models.
Three defects of the proposed WACPN model exist: (i) Deep learning models are not exercised. The reason is the small amount of our image set. (ii) Strict clinical validation is not tested either on-site or in cloud computing (CC) environments. (iii) The model is a black box, which does not go well with patients and doctors.
To work out the three limitations, first, we shall utilize the data augmentation method to enlarge the number of images in the dataset. Second, our team shall circulate the proposed WACPN model to the online CC environment (such as Azure) and summon specialists, clinicians, and physicians to examine its efficiency. Third, trustworthy or explainable Ais, which may provide the heatmaps pointing out the lesions, are two optional models to assist in adding explainability to the proposed WACPN model.
Funding Statement: This paper is partially supported by Medical Research Council Confidence in Concept Award, UK (MC_PC_17171); Royal Society International Exchanges Cost Share Award, UK (RP202G0230); British Heart Foundation Accelerator Award, UK (AA/18/3/34220); Hope Foundation for Cancer Research, UK (RM60G0680); Global Challenges Research Fund (GCRF), UK (P202PF11); Sino-UK Industrial Fund, UK (RP202G0289); LIAS Pioneering Partnerships award, UK (P202ED10); Data Science Enhancement Fund, UK (P202RE237).
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.