Heart disease, which is also known as cardiovascular disease, includes various conditions that affect the heart and has been considered a major cause of death over the past decades. Accurate and timely detection of heart disease is the single key factor for appropriate investigation, treatment, and prescription of medication. Emerging technologies such as fog, cloud, and mobile computing provide substantial support for the diagnosis and prediction of fatal diseases such as diabetes, cancer, and cardiovascular disease. Cloud computing provides a cost-efficient infrastructure for data processing, storage, and retrieval, with much of the extant research recommending machine learning (ML) algorithms for generating models for sample data. ML is considered best suited to explore hidden patterns, which is ultimately helpful for analysis and prediction. Accordingly, this study combines cloud computing with ML, collecting datasets from different geographical areas and applying fusion techniques to maintain data accuracy and consistency for the ML algorithms. Our recommended model considered three ML techniques: Artificial Neural Network, Decision Tree, and Naïve Bayes. Real-time patient data were extracted using the fuzzy-based model stored in the cloud.
The clinical investigation of heart disease, which is also known as cardiovascular disease, constitutes a major topic of interest for medical research, both historically and in contemporary times. According to the World Health Organization, around 23 million cardiovascular disease patients die annually due to cardiac arrest and stroke [
Several underlying factors constitute the root causes of heart disease, including excessive intake of saturated fats, lack of exercise, and an imbalanced diet. In addition, genetic predisposition is increasingly recognized as a prominent cause [
This paper organizes our approach into seven phases. Phase 1 concerns data collection. We collected datasets from geographically diffuse locations to ensure maximum coverage. Phase 2 consolidated all datasets into the fuzzy dataset. Phase 3 was a pre-processing layer involving the elimination of records with missing values; this included normalization and, ultimately, splitting training and testing data. Phase 4 concerned the training layer, in which we applied three algorithms: Artificial Neural Network (ANN), Decision Tree (DT), and Naïve Bayes (NB). Next, in Phase 5, we evaluated the data to obtain target accuracy. In the ML-fusion phase (Phase 6), the fuzzy-based system accepted data meeting our predefined criteria for two of three brains. Finally, in Phase 7, the fuzzy model was compared with the model stored in the cloud.
Researchers have explored various alternative techniques for identifying cardiovascular disease. For example, some researchers have applied the neural method, obtaining results with 83% accuracy [
Meanwhile, ANN techniques have been widely used to predict heart disease. Generalized regression neural networks and radial basis functions have been widely used to investigate heart function problems, with experimental analysis proving that ANNs provide more accurate results than any other technique [
Early-stage mild cardiovascular disease is curable through significant lifestyle changes, including adopting a more balanced diet [
Dataset selection [
After the cleaning and normalization process, the dataset was divided into training data (70%) and test data (30%). Next, the classification process was started, which first involved training for the three classification techniques: ANN, NB, and DT. The classification process generated three predictions that were based on algorithms optimized to achieve maximum accuracy. A hidden layer was used with 12 neurons during the configuration of the ANN, with the weight backpropagation technique used to fine-tune the hidden layer. This involved multiple steps, including initialization of weight, feedforward, backpropagation of error and weight updating. In addition to the input and output layers, a multilayer perceptron was also used for at least one hidden layer. The sigmoid function for input and the hidden layer of the proposed back propagation neural network was expressed as follows:
The input derived from the output layer is given by:
The output layer activation function is as follows:
where
After applying the chain rule method, this can be presented as:
By substituting the values in
where
Next, applying the chain rule for the updating of weights between input and hidden layers gives:
This can be presented as
where
In the DT, three optimizers were applied individually, including random search, Bayesian optimization, and grid search. The Bayesian optimization performed well and it was therefore selected for this framework:
The GINI index is provided by
Information gain is provided by
Here,
This is intended to optimize expected improvement with respect to the proposed set of hyperparameters
The hyperparameter does not expect to produce any improvement if
There are two different distributions for the hyperparameters in this equation, one where the value of the objective function is less than
For NB, the following three kernel types were used: box, Gaussian, and triangle:
The traditional NB classifier estimates probabilities by approximating the data through a function such as a Gaussian distribution:
where
The two-parameter Box-Cox transformation is defined as:
After each optimization, the optimized model was stored in the cloud before creating and implementing fuzzy logic on the results of the optimized classification algorithms as shown in
Conditional (if–then) statements are used to construct fuzzy logic. Fuzzy rules are then constructed based on this logic. In these statements, HD represents heart disease:
IF (ANN is yes, and NB is yes, and DT is also yes) THEN (HD is yes).
IF (ANN is yes, and NB is yes, and DT is no) THEN (HD is yes).
IF (ANN is yes, and NB is no, and DT is yes) THEN (HD is yes).
IF (ANN is no, and NB is yes, and DT is yes) THEN (HD is yes).
IF (ANN is no, and NB is no, and DT is also no) THEN (HD is no).
IF (ANN is yes, and NB is no, and DT is no) THEN (HD is no).
IF (ANN is no, and NB is no, and DT is yes) THEN (HD is no).
IF (ANN is no, and NB is yes, and DT is no) THEN (HD is no).
The rules indicate that if any two of the three supervised classification techniques are true then heart disease is considered present; if not, heart disease is not present.
The second layer of the recommended framework concerns the real-time classification of heart disease. Real-time patient data were inputted into the ML-fused model; hypothetically, the results can then be used to schedule appointments. Patients predicted to have cardiovascular disease could be given appointments on an emergency basis; patients predicted to have non-cardiovascular disease could be given a regularly scheduled appointment.
Each stage systematically interacts with the next stage. We generated a dataset comprising five databases to initiate the model. For greater accuracy, we optimized geodemographic diffusion.
Our experiment comprised 1190 cases and considered 12 attributes shown in
No. attributes | Attributes | No. of attributes | Attributes |
---|---|---|---|
1 | Patient age | 7 | Resting electrocardiogram |
2 | Sex | 8 | Max heart rate |
3 | Chest pain | 9 | Exercise angina |
4 | Blood pressure | 10 | Old peak |
5 | Cholesterol | 11 | ST slope |
6 | Fasting blood sugar | 12 | Target |
The following mathematical equations were applied to obtain results:
First, we used a neural network to classify the data, which involved establishing an ANN structure using 70% of the cases for training data (833 of 1190) and the remaining 30% of cases (357) for testing data. As shown in
Training data | Testing data | ||||||
---|---|---|---|---|---|---|---|
Result (output) ( |
Result (output) ( |
||||||
INPUT | Expected output( |
Expected output( |
|||||
351 | 42 | 144 | 24 | ||||
40 | 400 | 28 | 161 |
The NB classification shown in
Training data | Testing data | ||||||
---|---|---|---|---|---|---|---|
Result (output) ( |
Result (output) ( |
||||||
INPUT | Expected output( |
Expected output( |
|||||
337 | 56 | 142 | 26 | ||||
74 | 366 | 31 | 158 |
The DT classification shown in
Training data | Testing data | ||||||
---|---|---|---|---|---|---|---|
Result (output) ( |
Result (output) ( |
||||||
INPUT | Expected output( |
Expected output( |
|||||
358 | 35 | 141 | 27 | ||||
41 | 399 | 15 | 174 |
Subsequent test data records were used for the fuzzy-based system along with the output class to arrive at the final classification. The fuzzy-based system classified 150 records as negative and 176 records as positive (
Result (Output) ( |
|||
---|---|---|---|
Expected output( |
|||
150 | 18 | ||
13 | 176 |
The consolidated results of all classification techniques and the proposed model are presented in
ML algorithm | Type | Specificity (SPEC) % | Sensitivity (SEN) % | False positive value (FPV) % | False negative value (FNV) % | Likelihood ratio positive (LRP) | Likelihood ratio negative (LRN) | Positive prediction value (PPV) % | Negative prediction value (NPV) % | |
---|---|---|---|---|---|---|---|---|---|---|
NaïveBayes | Training | (0.8199)81.9 | (0.8673)86.7 | (0.1800)18.0 | (0.1327)13.3 | 4.82 | 0.16 | (0.8318)83.2 | (0.8575)85.8 | |
Testing | (0.8208)82.1 | (0.8587)85.9 | (0.1792)17.9 | (0.1413)14.1 | 4.79 | 0.17 | (0.8360)83.6 | (0.8453)94.5 | ||
Decision tree | Training | (0.8972)89.7 | (0.9194)91.9 | (0.1028)10.3 | (0.0806)8.1 | 8.94 | 0.09 | (0.9068)90.7 | (0.9109)91.1 | |
Testing | (0.9038)90.4 | (0.8657)86.6 | (0.0962)9.6 | (0.1343)13.4 | 9.00 | 0.15 | (0.9206)92.1 | (0.8393)83.9 | ||
Artificial neural network | Training | (0.8977)89.8 | (0.9049)90.5 | (0.1023)10.2 | (0.0950)9.5 | 8.85 | 0.12 | (0.9090)90.9 | (0.8931)89.3 | |
Testing | (0.8372)83.7 | (0.8702)87.0 | (0.1628)16.3 | (0.1297)12.9 | 5.35 | 0.15 | (0.8519)85.2 | (0.8571)85.7 | ||
Proposed fuzzy model | Testing | (0.9202)92.0 | (0.9072)90.7 | (0.0798)7.9 | (0.0928)9.3 | 11.38 | 0.1 | (0.9312)93.1 | (0.8929)89.3 |
Further analysis of the model in relation to input parameters was provided by the decision support system. Accordingly, the specific predictions of the three classifiers along with the results derived from the fuzzy-based system are presented in
INPUT | Human |
||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Patient ID | Sex | Chest pain type | Resting PBs | Cholesterol | Fasting blood sugar | Resting electrocardiogram | Maximum heart rate | Exercise angina | Old peak | ST slope | Class | NN | NB | DT | Fuzzy- based system |
48 | 1 | 2 | 130 | 245 | 0 | 2 | 180 | 0 | 0.2 | 2 | 0 | 0 | 0 | 0 | 0 |
44 | 1 | 2 | 120 | 263 | 0 | 0 | 173 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
41 | 1 | 2 | 110 | 235 | 0 | 0 | 153 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
55 | 0 | 2 | 135 | 250 | 0 | 2 | 161 | 0 | 1.4 | 2 | 0 | 0 | 0 | 0 | 0 |
41 | 0 | 2 | 105 | 198 | 0 | 0 | 168 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
56 | 1 | 4 | 120 | 85 | 0 | 0 | 140 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
52 | 1 | 3 | 172 | 199 | 1 | 0 | 162 | 0 | 0.5 | 1 | 0 | 0 | 0 | 0 | 0 |
54 | 1 | 4 | 140 | 239 | 0 | 0 | 160 | 0 | 1.2 | 1 | 0 | 0 | 0 | 0 | 0 |
47 | 1 | 3 | 130 | 253 | 0 | 0 | 179 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
59 | 1 | 3 | 130 | 318 | 0 | 0 | 120 | 1 | 1 | 2 | 0 | 1 | 1 | 1 | 1 |
54 | 0 | 3 | 110 | 214 | 0 | 0 | 158 | 0 | 1.6 | 2 | 0 | 1 | 0 | 1 | 1 |
44 | 0 | 4 | 120 | 218 | 0 | 1 | 115 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
54 | 1 | 3 | 133 | 203 | 0 | 1 | 137 | 0 | 0.2 | 1 | 1 | 0 | 0 | 0 | 0 |
62 | 1 | 2 | 128 | 208 | 1 | 2 | 140 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
52 | 1 | 2 | 128 | 205 | 1 | 0 | 184 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Finally, the proposed framework is compared with frameworks described in previous research (
Algorithms | Accuracy (%) | Miss rate (%) |
---|---|---|
HRFLM [ |
88.40 | 11.60 |
Naïve Bayes [ |
75.80 | 24.20 |
Decision Tree [ |
85.00 | 15.00 |
SVM (RBF) [ |
88.00 | 12.00 |
Logistic regression [ |
89.00 | 11.00 |
Logistic regression [ |
86.11 | 13.89 |
Framingham risk score (FRS) [ |
87.04 | 12.96 |
Proposed fuzzy-based ML | 91.30 | 08.70 |
Accurately predicting heart disease using ML techniques is a challenge. This research paper proposed a cloud-based prediction model that used ML techniques. The approach features seven phases: dataset collection, data fusion, pre-processing, training, performance evolution, ML fusion, and real-time testing. Three widely used ML techniques were used: ANN, DT, and NB. The combined results of the ANN, NB, and DT classifications were tested using a fuzzy-based system. The ratio of training data to testing data was set to 70:30, which enabled accurate prediction. The classification process for all of the techniques was combined with results obtained by the fuzzy-based system, and the processes were conducted until accuracy levels could be observed. The results demonstrated that the proposed fuzzy-based model is 91.30% accurate.
We are grateful to our families and colleagues for their emotional support.