Today, more families are affected by Diabetes Mellitus (DM) disease on account of its continually increasing occurrence. Most patients remain unknown about their health quality or the DM’s risk factors prior to diagnosis. The medical world has witnessed that individuals are affected by two different diabetes namely a) Type-1 diabetes (T1D), as well as b) Type-2 diabetes (T2D). As Type 2 Diabetes affects the other organs of the body, the proposed system concentrates specifically on Type 2 Diabetes. This work aims to ascertain the cardiac disorder in T2D patients. As of the ECG dataset, the requisite data is gathered it contains healthy volunteer and patients record with pathologies like Myocardial Infarction, Cardiomyopathy, Bundle branch block, Dysrhythmia, from the dataset, the system regarded 245 persons of data in which 160 volunteers are non-diabetic and 85 volunteers are diabetic. The classification is performed. Here, a K-Nearest Neighbor (KNN), Multi-layer Perceptron’s (MLP), along with Support Vector Machines (SVM) learning models is concerned for the investigation of typical cardiac abnormality in diabetic persons. From the attained outcomes, the proposed work could be perceived to show maximal accuracy and minimal error rate percentage in the least time while comparing existing machine learning algorithms. KNN attained 80%, MLP attained 93.8% and SVM attained 96.25% of accuracy, respectively.
Diabetes mellitus is basically a chronic disease and it occurs when the percentage of glucose is remarkably high in blood. Glucose is the key energy source that the body absorbs as of the food that one consumes regularity [
The American heart association finds diabetes mellitus to be major controllable hazard factors aimed at cardio vascular disease. During the Framingham study, an assortment of predictors is observed in the prediction of heart disease in diabetes patients centered on ECG. Some of such predictors like body mass index (BMI), gender, age, fasting glucose, higher-density lipoprotein, blood pressure, family history of diabetes mellitus, triglycerides, etc. contribute to cardio vascular disease along with heart attack. Envisage of the heart disease in diabetes mellitus is the main challenge. Centered on the challenge, lots of researches were conducted for detecting proper traits as of ECG signal data and predicting Heart Disease [
Lately, an assortment of machine learning for the heart disease prediction in diabetes patients has been performed [
Further, the paper is prearranged as: Section 2 illustrates the associated related work. Section 3 about the Data collection and ECG signal, Section 4 Classification of disease or non-disease by different Machine learning techniques, Section 5 states the result and its discussion for analyzing the proposed model’s performance. At last, 6 concludes the work.
A deep-transfer learning framework aimed at the automatic diabetes mellitus diagnosis. This framework was suggested centered on the heart rates signals acquired as of ECG information. The suggested framework was deployed for 2D signals. Here, the models previously trained with 2D huge image data were employed to 1D heart rates signals. The 1D signals were then transmuted into frequency spectrum imageries, which were chiefly utilized for application to eminent pre-trained models, particularly: AlexNet, DenseNet, VggNet, and ResNet. The DenseNet model acquired the highest (97.62%) classification accuracy and (90%) sensitivity for detecting diabetes mellitus subjects through heart rates signal recordings. But this framework led to higher computational time [
Recommended an automatic system intended for effectively classifying diabetes and normal classes with the Heart rate information attained as of the ECG signals. Here, the 5 levels of DWT decomposition, that is, the extraction of the kurtosis, sample entropy, skewness, approximation entropy, and energy features at disparate detailed coefficient levels were done for automatically detecting the diabetes mellitus. The ranking approaches, say, the t-test, wilcoxon, Bhattacharyya-space, and entropy test were employed to rank those features. The ranked features were sent to disparate classifiers that embrace NB, DT, and KNN, along with SVM. The outcomes had evinced maximal diagnostic differentiation performance with minimal features. they were ineffectual for training the machine learning models [
It builds an improved fuzzy logic-centric artificial Neural networks (IFANN) classifier to predict coronary artery heart disease amongst diabetes mellitus patients. The data were compiled and the built IFANN was analogized to certain approaches in respect of certain performance metrics. The Mathews’ Correlation Coefficients (MCC) tested the competency of the machine learning classifier for certain performance metrics. Amid their Implementations in Scilab, the acquired results corroborate that the built IFANN performed well when contrasted to the existing approaches. The classifier evinced pre-eminent performance but had a constraint like over-fitting, which elevated the false Positive rate (FPR) [
Delineated some signal processing approaches that picked features as of heart rates signals and proffered an analysis process that utilized those features for diagnosing diabetes mellitus. Via the statistical analysis, it recognized the correlation dimensions, recurrence plot, and Poincare geometry properties as valuable features. The features distinguished the heart rate information of diabetes mellitus individuals from the normal people and had validated it with the classifier termed “AdaBoost” using the perceptron weak learner, which acquired 86% classification accurateness. The picked features contain disparate irregular artifacts that not permitted the model for accurate prediction [
It propounded a framework grounded on data mining for exactly diagnosing the type 2 diabetes. Grounded on innumerable preprocessing rules, the framework comprised 2 parts, the logistic regression along with improved K-means algorithms. The Waikato settings and Pima Indians Diabetes Dataset for Knowledge Analyses toolkit were employed for contrasting the outcomes with the ones attained as of certain researchers. The conclusion corroborated that the framework showed 3.04% prediction accuracy, which was higher when analogized to those of researchers. Additionally, the framework ensured the concerned dataset’s quality. For further evaluating the framework’s performance, it was employed to 2 other diabetes mellitus datasets. Both experiments’ outcomes evinced a pre-eminent performance. But, the framework consumed more time while training the data and showed less accuracy [
The elucidated two novel approaches for ascertaining the risk-aspects and employed a machine learning pipeline for the longer-term prediction of Type2 Diabetes. The approaches had been assessed with data as of longitudinal clinical analysis, termed San Antonio Heart research. The approach acquired 95.94% accuracy for predicting whether a person would build Type2 Diabetes within the subsequent 7–8 years or not, on account of improper data, there might be more chance for false positive [
This collects and measures information as of the dataset; in the proposed case, the PTB-Diagnostic ECG dataset is used. PTB-Diagnostic is an online database which is available on physionet.org site. The dataset contains of healthy volunteer and patients records with pathologies like Myocardial Infarction, Cardiomyopathy, Bundle branch block, Dysrhythmia. Each ECG lead have 10000 samples and their amplitude range is ±16.384 mV with sampling frequency 1000 Hz. For the experiment we have selected ECG data, the system regarded 245 persons in which 160 volunteers are non-diabetic and 85 volunteers are diabetic, their medical history is available [
The electrocardiogram is a picture of the electrical waves in the heart that electrocardiography creates (ECG). The periodic impulse propagation of the cardiac muscle’s Pacemaker nerve fibers (SA node, AV node, Purkinje fiber) generates cycles of depolarization and repolarization [ The depolarization of the atrial muscle as a P wave. ‘QRS’ wave is the depolarization of the ventricular muscle. ‘T’ wave represents the repolarization of the ventricular muscle.
There is a ‘R’wave in each ECG cycle, which is the largest potential difference as a result of the maximal depolarization of the ventricular muscle. Impulse because the human body is an excellent conductor, the pacemaker propagates from the SA node until the Purkinje fiber reaches the surface of the human skin. The ECG signal is commonly employed in clinical practice. The Holter ECG device [
In this paper we focus on classification using different machine learning algorithm such as, Multilayer perceptron, Support vector machine and K-nearest neighbor. Subsequent to gathering the data as one of the datasets, Classification is executed, which is a vital task to ascertain a better performance on a heart disease with diabetes mellitus in addition to a healthy group. The proposed work employed disparate techniques of machine learning as a classifier for the complete experiment machine learning classification techniques were exhibited to potentially enhance prediction results in coronary heart disease. Such classification techniques include KNN, MLP-NN, along with SVM. The proposed method’s structural design is evinced in the below
KNN is basically a simple classifier that ascertains the KNN utilizing the minimal distance betwixt the testing and training data. The commonest one amongst the KNN is allocated to a class. This has bad run-time performance when the training set is larger. Here, the proposed system utilized k = 2, 5, 8, 11, 14, 17, 20, and 23. This utilizes ‘feature similarity’ to envisage the values of new data points that further means that the new point will be allotted with a value-centered on how intimately it matches the points on the training set. The KNN’s algorithmic procedures are elucidated below,
Step 1: Initially, take an ECG dataset of
Step 2: Regard a test dataset of
Step 3: After that, gauge the Euclidean distances betwixt ‘2’ points
Step 4: Subsequently, decide a random value of
MLP is one of the main branches of feedforward artificial neural networks. MLP consists of a minimum of three layers of nodes. MLP utilizes the Feed forward neural network for its training which is part of the supervised learning method. This structure of deep learning is able to distinguish data which are not linearly separable. Whenever data is linearly separable, all neurons can have a linear activation function, which will linearly map the input to the output. For non-linearly separable data, the algorithm will use a non-linear activation function, such as a sigmoidal or logistic function. MLP is very popular in diverse fields, such as speech recognition, image recognition, and machine translation software [
Mathematically, it is illustrated as follows:
The lowest layer that gathers input as of the ECG dataset is termed the non-hidden layer as it is the exposed part of a network. An NN is often designed with a non-hidden layer bearing one neuron per input value or column in the considered ECG dataset. As above-mentioned, they remain not neurons but they simply transfer the inputted value to the succeeding layer.
Layers successive to the inputted layer are termed as hidden layer since it is not showing directly to the input. The network structure with only one neuron on the hidden layer that outputs the value directly is concerned as the simple structure.
The final hidden layer is labeled as the output and it is answerable for outputting a value or vector of values that match the arrangement requisite for the problem. The MLP has the succeeding structure.
Step 1: First, input the ECG dataset on behalf of training the structure and allot their corresponding weight, this is written as
Step 2: After initialization, the input training ECG dataset values “
Here,
Assigned value
Step 3: Next, the Activations Function of the network, also termed as a transfer function, is evaluated. it is an easy approach of charting the summated weighted input to the neuron’s output. It is termed an activation function since it directs the strength of the output and the threshold upon which the neuron gets activated. It has the mathematical denotation of:
Step 4: Then, the first hidden layer’s output is computed as,
The
Step 5: Next, the output of the required input is estimated. This computation aids to acquire the neuron values on the OL. It is evaluated mathematically as:
The output is linked to the inputs of other neurons on the HL and is non-visible in the output. The output is symbolized as “0” and “1”, where, 0–“healthy people” and 1–“heart disease with diabetes patients”.
Step 6: At last, the error in respect of the preferred outputs are evaluated as,
Here, a threshold is set with a minimal value for the loss function. If the initialized threshold satisfies this fitness, then the output is concerned as the last output, else, the weight value’s position is renewed. Again, the output unit is ascertained grounded on this MLP algorithm, and also the output data is trained on behalf of the retrieval process.
SVM is centered on the supervised learning algorithm and is utilized for classifying the sample data to dissimilar classes. It is primarily utilized in the domain of medicinal diagnosis for their classification together with regression purposes. The SVM can well be employed for binary classification; it constructs a model amid the training stage and creates a decision line betwixt the sample groups with the utilization of Hyper-Planes (HP). When the distance betwixt the classes is increasing, the classification accuracy also elevates.
As the considered samples are in the non-linear form, it is non-separable. On that account, the Kernel Function (KF) is utilized for the classification. In Kernel Function, low dimensional features space is concerned as input and gets transmuted to the data output in a high-dimension space. Now the data is turned into the separable form and it could be easily utilized in the classification. And here, the data is transmuted in the model of 1D to 2D. SVM ascertains a Hyper-Planes bearing the highest feasible segment points of an identical class on a similar plane. This parallel line separation Hyper-Planes is termed as optimum separating Hyper-Planes. It elevates the distance stuck betwixt the 2 parallel Hyper-Planes and diminishes the risk of misclassification of the testing dataset. The SVM algorithm performs the steps proffered below,
Step 1: The input training ECG dataset
Step 2: The positive data and negative data are isolated via the separating HP as,
This brings about the optimization problem that highly lessens an objective function,
By regarding the constraints,
Step 3: A minimization or maximization optimization issue has restraints in the variables being optimized. The error or cost function is added to those constraints and multiplied with the Lagrange multipliers for its augmentation. Contrarily, the Lagrangian function is developed for SVM via increasing the objective function using a weighted total of those constraints,
Step 4: The discriminant function is evaluated by utilizing,
Step 5: High-level noise is existent in the inputted data. Hence, this work utilizes a soft-margin SVM and is detailed below with the indication of the non-negatives. The problem in the primal variable is now regarded as the minimization of an OF, which is written as,
By regarding the constraints:
Here,
Where,
In this results analysis section, the proposed system’s performance in predicting the cardiac abnormality in diabetic patients utilizing disparate machine learning algorithms is analyzed. Here, the outcomes were acquired by employing three algorithms (KNN, MLP, along with SVM) to show top-level accuracy. The recognition of disease diagnostic tests is also included. The proposed approaches’ performances are validated and detailed through the below sections. As given in
Authors | AI approach | Performance evaluation |
---|---|---|
Bhatia et al. [ |
GA + SVM | 90.57% |
Asyali et al. [ |
Linear discriminant analysis (LDA) | 93% |
Bayesian classifier Based on HRV features | ||
Abusharian et al. [ |
1. ANN | 87.04% |
2. ANFIS | 75.93% | |
Chen et al. [ |
Non-equilibrium decision-tree based on support vector machine classifier | 96% |
This work utilizes and employs the ECG dataset to different ML approaches (KNN, MLP, and SVM).
The Dataset utilized for training the classifier encompasses 245 patients’ records out of which 85 are diabetes patient and also the other 160 records are non-diabetic persons. After the analysis of the proposed algorithm is finding abnormality in the diabetic and non-diabetic patients here, out of 245 patients, 186 persons ECG has shown abnormality in heart like, Myocardial infarction, Cardiomyopathy, Myocardial ischemia, Bundle branch block and remaining 59 persons of ECG are normal i.e., healthy. Here, the performance rendered by the proposed ML algorithms (KNN, MLP, along with SVM) is analyzed. Some qualitative metrics, say, specificity, accuracy, and sensitivity are evaluated for this comparison and are mathematically described below,
It is the percentage of precise predictions done by a classifier when analogized to the label’s actual value in the testing phase. Also, it is the ratio between the number of precise assessments and that of all assessments, which is signified as:
It is the ability to recognize the proportions of
It is the percentage of
These performance analyses are tabulated in the below table,
Here, the proposed KNN, SVM, along with MLP are represented in a graphical form centered on their performance. The decision of treatment relies upon the diagnosis. The appropriate test and the medical treatment are chosen grounded on the factors say specificity, accuracy, and sensitivity, which are the mainly utilized statistics aid to decide a diagnostic medical test. They are elucidated using the below
Techniques | KNN | MLP | SVM |
---|---|---|---|
Accuracy | 80.0% | 93.8% | 96.25% |
Sensitivity | 85.3% | 97.2% | 97.6% |
Specificity | 86.9% | 90.6% | 94.5% |
The prediction of diabetes mellitus disease is a hard task that could offer people an advantage of early knowledge and intervention. For people, this prediction enhances the health quality and averts the possibility of heart disease. An accurate prediction of the disease could significantly lessen national healthcare expenditure, specifically for diabetes mellitus and the associated complications. On that account, this work proposes to predict the chances of heart disease in diabetes mellitus patients utilizing disparate ML approaches like KNN, MLP, and SVM. The classifier predicts cardiovascular disease on diabetic persons or non-diabetic persons. The proposed KNN, MLP, and SVM are analyzed centered on its performance in respect of specificity, accuracy, and sensitivity metrics. Here, the proposed KNN, SVM, and MLP, and acquire 80%, 96.25%, and 93.80% accuracies. From the outcomes, the SVM is confirmed to have higher-most accuracy and minimum error rate than some existing classifiers. In the future, the research can well be made with the deep convolutional approaches for acquiring a minimal error rate and maximal accuracy in less time.
This research work was partially helped by Dr. Archana Gupta, General physician department, GRMC hospital, Gwalior and Dr. Nishika Saraswat, OMFS oncology department, BSES hospital, Mumbai and Biomedical laboratory MITS, College under Quality Improvement Programme Scheme Gwalior.