|Computers, Materials & Continua |
Automatic Heart Disease Detection by Classification of Ventricular Arrhythmias on ECG Using Machine Learning
1Department of Computer Science and Information Technology, University of Sargodha, Sargodha, 40100, Pakistan
2School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
3COMSAT University Islamabad, Wah Campus, Wah Cantt, Pakistan
4College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Khraj, Saudi Arabia
5Department of ICT Convergence, Soonchunhyang University, Asan, 31538, Korea
6Department of Computer Science, HITEC University Taxila, Pakistan
*Corresponding Author: Yunyoung Nam. Email: email@example.com
Received: 14 March 2021; Accepted: 18 April 2021
Abstract: This paper focuses on detecting diseased signals and arrhythmias classification into two classes: ventricular tachycardia and premature ventricular contraction. The sole purpose of the signal detection is used to determine if a signal has been collected from a healthy or sick person. The proposed research approach presents a mathematical model for the signal detector based on calculating the instantaneous frequency (IF). Once a signal taken from a patient is detected, then the classifier takes that signal as input and classifies the target disease by predicting the class label. While applying the classifier, templates are designed separately for ventricular tachycardia and premature ventricular contraction. Similarities of a given signal with both the templates are computed in the spectral domain. The empirical analysis reveals precisions for the detector and the applied classifier are 100% and 77.27%, respectively. Moreover, instantaneous frequency analysis provides a benchmark that IF of a normal signal ranges from 0.8 to 1.1 Hz whereas IF range for ventricular tachycardia and premature ventricular contraction is 0.08–0.6 Hz. This indicates a serious loss of high-frequency contents in the spectrum, implying that the heart’s overall activity is slowed down. This study may help medical practitioners in detecting the heart disease type based on signal analysis.
Keywords: Heart disease; signals; preprocessing; detection; machine learning
Machine learning showed significant success in the area of medicine such as heart disease classification , brain tumor [2,3], lung cancer , skin cancer [5,6], stomach [7,8], Covid19 [9,10], and detection of cancer and hypertension [11,12]. These diseases cause severe consequences; even human deaths like Cardiac Arrhythmia is one of the most common causes of death in the world. Cardiac Arrhythmia represents cardiac or heartbeat disorders. It indicates the perturbation in the normal sinus rhythm of the myocardium. In cardiac Arrhythmia, the heartbeat may be very slow, fast, or irregular. It is the unresolved clinical problem causing 0.4 million deaths in the United States annually while in Pakistan, facing cardiac Arrhythmia, and the death rate is almost 15.36% (196,258 people)  of total deaths. According to the statistics of the World Health Organization, the death toll caused by cardiovascular disease (CVD) is up to 17.9 million worldwide every year, accounting for 1/3 of the total death toll in the world .
This paper focuses on detecting diseased signals and arrhythmias classification into two target classes: ventricular tachycardia and premature ventricular contraction. The main aim is to determine if a signal was collected from a healthy or sick person. The detector works by calculating the instantaneous frequency (IF). First, a given signal is detected from a patient, and a classifier classifies the disease by predicting the class. Applying the supervised learning-based classifier, templates are designed separately for ventricular tachycardia and premature ventricular contraction. Similarities of a given signal with both the templates are computed in the spectral domain.
The heart is the central part of the human body that supplies oxygenated blood to the entire body. It is composed of four valves, and four chambers;  the upper chambers are called atria, while the lower chambers are called ventricles. The right ventricle is the lower right part of the heart. When the right atrium contracts, it receives deoxygenated blood. The left ventricle is the lower left part of the myocardium that accepts oxygenated blood from the atrium. The right atrium is the upper right chamber of the heart that receives deoxygenated blood from the body. Left Atrium is the upper left part of the heart. It accepts oxygenated blood from the lungs. Once both atria are filled with blood, they contract, and the oxygenated blood from the left atrium flows into the left ventricle . Veins from the upper body and head go empty into the right atrium of the heart while the veins from the lower parts of the body feed into it, which goes empty into the right atrium of the heart.
Electrocardiogram (ECG) represents the amount of electrical activity vs. time. Normally, the ECG signal frequency is 0.05–100 Hz, and the dynamic range is 1–10 mv. ECG signal is recognized by its five peaks denoted by letters of the alphabets P, Q, R, S, and T . Its components explain the ECG, so we present them in a separate heading as below.
1.2 Components of ECG
As shown in Fig. 1, an ECG contains five waves (P, R, Q, S, and T). The P wave is generated with the contraction of the atria . It is the small upward wave that is generated by the first deflection of the heartbeat. It indicates that the atrial vein has an electrical impulse. The magnitude of the P wave is 50–100 mili-volts, and the duration is 100 ms. After the P wave, another wave appears the heartbeat’s downward deflection, known as the Q wave. The duration of Q wave is 0.08–0.10 s. The R wave represents the second trough or second peak after the P wave. It intimates the beginning of depolarization in ventricles. The extraction of heart ventricles produces a wave known as T wave. The approximate deflection of T wave is 0.5 mv, and the duration is 0.20 s.
In the normal rhythm, the heart rate ranges from 60 to 100 bpm (beats per minute In electrocardiography. There is no disorder in the signal, then it is called normal sinus rhythm.). When the heart rate increases above 100 bpm, the rhythm is known as sinus tachycardia. If the heart rate is much slower, it is known as bradycardia . Heart Arrhythmias can be categorized as Sinus Node Arrhythmias, Atrial Arrhythmias, Junctional Arrhythmias, and Ventricular arrhythmias. Disturbance in the Sinoatrial node causes the disturbance in the normal rhythm of the signal; this causes the sinus node arrhythmias . During the sinus node arrhythmias, the P-wave in the ECG is normal. Atrial Arrhythmias arise outside the Sino-atrial node, forming electrical impulses. In this type of Arrhythmia, the heartbeat is very fast that ranges from 160 to 240 bpm, and the QRS complex and a T-wave are normal. The cause of the junctional arrhythmias is the atrioventricular junction node that generates the impulse. These arrhythmias cause the abnormality in P wave. In Ventricular arrhythmias, the impulses arise from the ventricles and go to the rest of the heart outwards. In ventricular arrhythmias, QRS complexes are ample and odd in shape. The ventricular arrhythmias have three types: premature ventricular contractions, ventricular tachycardia, and ventricular fibrillation, as shown in Figs. 2a–2c, respectively.
This work aims at detecting automatic heart disease by classification of ventricular arrhythmias on ECG using machine learning techniques.
i) The proposed methodology helps determine and detect whether the person belongs to a healthy or a diseased category.
ii) This proposed research methodology consists of preprocessing, resampling and normalization, and the detector is also designed.
iii) Mathematical modeling is presented for signal detection, template, detector.
iv) The proposed system shows enhanced performance results as compared to existing approaches I the relevant literature.
The paper’s remainder is arranged as follows: Section 2 discusses the literature work related to the problem and discusses methods related to the classification of ventricular arrhythmias from ECG; in Section 3, the research methodology followed in this paper is given. Section 4 presents and discusses the results of the findings, and in Section 5 conclusion of the overall research work is given.
2 Related Work
Digital signal and data processing is a significant research domain in biomedical engineering research. Researchers from different parts of the world have applied different method and techniques for the automatic detection of heart diseases. Various methodologies have been implemented for digital signal preprocessing, feature extraction from ECG and ECG signals, specifically ventricular arrhythmias signals classification.
2.1 Artifacts in ECG Signals and Feature Extraction
The ECG signals are normally corrupted with different types of noise. Signal preprocessing extracts the required information from a noisy ECG signal. There are various types of artifacts that can cause abnormal ECG patterns; signals become noisy and distorted. Base-line drift can be caused in ECG signals when breathing . Its frequency is 0.25 Hz. To remove baseline drift, a notch filter is used . Additive white Gaussian noise is assumed as noise statistics. To remove noise, we use a low pass filter with a cut-off frequency of 0.2 Hz .
The features of ECG signals include the information regarding classification present in the signals. To recognize and extract different ECG waveforms features, a neuro-fuzzy approach has been used . It applied two techniques to characterize the QRS complex by using Hermite polynomials and Hermite kernel expansion coefficients. The research consisted of Markov modeling techniques to detect and analyze QRS complex and R-R intervals as ECG features to classify ventricular Arrhythmia, which researchers used . This technique was used to detect low amplitude P wave from the ECG signal. A wavelet transform based approach extracts P, T and QRS waves and baseline artefacts. Frequency domain, time domain and statistical feature were utilized in  for feature extraction. According to this approach, for each segment that is extracted from the signal, the attributes P, Q, R, S and T are determined by discrete wavelet to transform [25,26].
An efficient approach to retrieve features from ECG signals is given in . A synthesis coding technique was presented. The algorithm used in that approach has predefined distortion to encode each heartbeat to extract ECG features while maintaining the defined distortion level . Zhao et al. used wavelet transform and support vector machine for ECG feature extraction and heart rhythms classification . A multiresolution wavelet transform was introduced in 2005 to extract feature from the ECG signals. It used the modified lead II for signal processing. The researchers used records from the MIT-BIH Database to apply two different types of wavelet filters which are Daubechies4 and Daubechies6.
2.2 Ventricular Diseases Classification Approaches
In ECG classification analysis, most of the researchers have focused on increasing the accuracy level of classification. Several computational approaches have been applied, such as neural networks, digital signal analysis, statistical techniques and support vector machine. Detection of life-threatening ventricular arrhythmias in real-time was addressed by [30,31]. An algorithm named DIAGNOSIS was developed to classify ECG signals using four parameters regarding frequency domain. Sets of rules were used that were based on the comparison of the parameters with predefined thresholds. Four types of signals were discriminated including ventricular fibrillation, imitative artifacts, ventricular rhythms and predominant sinus rhythm. The sequential hypothesis method was detected between two ventricular Arrhythmia; ventricular fibrillation and ventricular tachycardia . The researchers in  used early detection automatic detection based on artificial intelligence to assist the ophthalmologist in testing the eyes to avoid blindness and provide a more precise and reliable assessment of a patient’s condition. Diabetic retinopathy (DR) is an inevitable retinal disease caused by diabetes. The patient’s elevated blood sugar level causes DR, which is difficult to treat. Since no early signs occur at the initial level, they can’t be identified early. A generalized discriminant analysis (GDA) technique and a multi-layer perceptron (MLP) neural network classifier were utilized by  to develop an effective algorithm for arrhythmia classification. Nine features were obtained from the heart rate variability (HRV) signals.
The novel multi-threaded fitness evaluation approach and the genetic algorithm were used to handle many data sets. A backpropagation neural network addressed by  used discrete wavelet transform to classify ECG signals. The features were broken up into two classes: discrete wavelet transform-based features and morphological features. The feed-forward backpropagation and logistic regression variable selection method was used for classification [36,37].
There are several shortcomings in the above-discussed techniques. Some approaches consider a long period for processing, while others are dependent on the ECG signal characteristics. Some of the classifiers need an artifact removal procedure. A few algorithms analyze one part of the signal and detect only one or two abnormalities. It is observed that a common problem with the above-discussed techniques is that when the size of the training parameters increases, structural complexity also grows.
3 Research Methodology
This paper’s research methodology consists of preprocessing, resampling and normalization; and is discussed in detail in later sections. In this paper, the model of the signals is considered as:
where s(n) and are true ECG signal and additive white Gaussian noise. The spectrum of is:
where N0 is a constant
We carry out the following steps in preprocessing to achieve the same sampling rate with the same signal powers. ECG signals we obtained are sampled at different sampling rates. Also, signals are recorded using different machines, which may cause some comparative amplification/attenuation to signals.
Resampling This step resamples the signals so that all signals have the same sampling rate.
Normalization Signals are normalized so that the power of every signal is unity. Let xi(n) and be a signal and its normalized form, for .
where d is defined as follows.
3.2 Design of Detector
For a time-frequency distribution P(; t), the frequencies’ expected value at a particular time is defined by Eq. (5). As IF has no single definition, one good possibility is to define IF as an expectation of all the frequencies of all the tones in the signal. We find IF for respiration. As the signals are discrete, the process is as follows.
For a signal x(n) of length N, spectrum (k) of x(n) is computed as:
For , and and f are vectors of the spectral density and corresponding frequency values. Instantaneous frequency (IF), , is calculated as given below.
3.3 Design of Template
Let be a set of ECG signals, of a disease, for training, where i is an integer and . We shortly write xi in place of xi[n] and . Let us consider a template vector as a linear combination of all xi as follows.
where and . The coefficients kj are computed as:
where is a dot product of xj and f. Consider a matrix as follows.
where xi (for ) are vectors and are columns of X. Eq. (8) is rewritten in matrix form, as follows.
where for . The matrix C is computed as:
3.3.1 Complexity Analysis of the Template Design
Complexity of the equation is computed as follows. The equation involves multiplication of two matrices of sizes and . Therefore, the complexity of multiplication is O(M2N). Then an inverse of a matrix of size requires a complexity of O(M3). Therefore, the overall complexity for computation of Eq. (16) is O(M2N)+O (M3). As N > M, complexity turns out to be O(M2N). Eq. (11) involves multiplication of a matrix of size with a vector of size . Therefore, the complexity of this multiplication is O(MN). Therefore, the overall complexity is O.
It is assumed that the templates are different from each other for different diseases. Suppose the classifier is designed to calculate a signal’s similarity with both the templates and decide (label). This task is performed in the spectral domain. Eq. (5), with a change of units, is rewritten as:
The sequence of spectral coefficients given by the above equation is called power spectral density in decibels, where f is the same as given by Eq. (6). Let (k), and be power spectral densities of a given signal and both the templates, respectively, having means mx, mf1 and mf2. The similarity metric S() between (k) and (k) is computed as follows.
3.5 Detection Algorithm
The algorithm for detecting a patient’s condition and detecting whether a person belongs to a healthy or diseased category is given below.
3.6 Classification Algorithms
There are two parts of classification. One is training the process, which is given in the form of an algorithm given as a Training Algorithm. In the second part, signals were introduced to the classifying algorithm given as Testing Algorithm to know whether the signal has ventricular tachycardia or premature ventricular contraction. The training Algorithm is given below.
3.7 Complexity Analysis of Algorithms
Complexities of steps of Training Algorithms and Classification Algorithms are given in Tabs. 1 and 2 respectively.
4 Results and Discussions
We have used three datasets, namely MIT-BIH Normal Sinus Rhythm (nsrdb) , MIT-BIH Arrhythmia Database (mitdb)  and MIT-BIH MalignantVentricular Ectopy Database (vfdb) , for detection and classification purpose. Every ECG is in vector form. Such a form of data represents the input to experiments. We carried out simulations using Matlab software.
Data preprocessing included two steps: namely normalization and resampling. Normalization is performed for all the signals in the processes of detection and classification. Tab. 3 shows the datasets and respective sampling rates. Tab. 4 shows the datasets for Training and Testing Algorithms. For detection, signals are not resampled as it is possible to compute instantaneous frequencies for any arbitrary sampling rate. For the classification of diseases, ECGs taken from MITBIH Arrhythmia Database are resampled at 0.004 sec as shown in Tabs. 5 and 6.
4.2 Results of Detection and Classification
As given in Tab. 7, 30 records were given as input to the detector’s training part. Tab. 7 shows the instantaneous frequencies of all the 30 signals. From Tab. 7, we set the value of IF as 0.7. Therefore, any value of IF smaller than 0.7 indicates that the given signal is diseased.
Detection rate during training is 100%. Next, the diseased signals were introduced to the classifier’s training part to achieve the templates’ power spectral densities. K1 and K2 set for training are shown in Figs. 3a and 4b. And Tab. 8 shows the values of K1 and K2, plotted in Figs. 3a and 3b.
Figs. 4, 5 shows the templates and their power spectral densities, respectively. It is observed that there is a wide difference between the power spectral densities of the two templates.
As given in Tab. 7, 26 records were given as input to the detector’s testing part. Tab. 7 shows the instantaneous frequencies of all 28 signals. All the diseased signals are below 0.7, and normal signals are above 0.7. Therefore, 22 signals of the two diseases were subject to the classifier. Results of the classifier for both the diseases are given in Tabs. 9 and 10.
4.3 Performance Analysis
In order to test the effectiveness of the proposed detector and classifier, standard performance metrics have been used, such as sensitivity, specificity, and precision.
Considering Tab. 7, with the value of as 0.7, we have 6 normal (NP = 6) and 22 diseased signals (NN = 22) as a test dataset to the detector; thus N = 28. All 6 healthy signals were classified correctly. Therefore, TP = 6 and FP = 0. It is observed that TN = 22 and FN = 0. Ideally, the values of all these measurements must be 1. We fund 100% performance analysis results of detector whereas; performance analysis of the classifier in the classification of arrhythmias is 77.27 percent.
Considering Tab. 7, we have taken 15 signals from MIT-BIH Arrhythmia Database with a label PVC (NP = 15) and 7 signals from MIT-BIH Malignant Ventricular Ectopy Database (NN = 7) with a label of VF, as a test dataset subjected to the classifier, thus N = 22. Again from Tab. 4, it is clear that TP = 12, FP = 3, TN = 5 and FN = 2. Ideally, the values of all these measurements must be high. Performance analysis of the classifier is given in Tab. 11.
In this research, we designed algorithms for the detection and classification of ventricular cardiac diseases. Firstly, the detection algorithm works for the detection of a diseased signal. Normal signals were taken from MIT-BIH Normal Sinus Rhythm. If a given signal is diseased, it passes through the classifier to identify the disease. We considered only two diseases: ventricular tachycardia (taken from MIT-BIH Malignant Ventricular Ectopy Database) and premature ventricular contraction (taken from MIT-BIH Arrhythmia Database). Signals taken from every database were divided into two sets. One set was introduced in the training of the detector and classifier. The other set was used to test and observe the performance of the proposed detector and classifier. Through the evaluation process, the main conclusion drawn is as follows. Removal of noise and wander baseline are not necessary if the signals are transformed to a spectral domain. As it is assumed that signals have additive white Gaussian noise, there is a constant value added to the signals’ spectrum. This does not affect the value of the instantaneous frequencies and does not affect decisions drawn from spectra of the templates. As far as wander baseline is concerned, its magnitude is so small that it has a little effect on IF spectra of templates. It has been observed that the same results are obtained if X1 is resampled to the sampling rate of X2 and vice versa. High values of instantaneous frequencies for normal signals show that high-frequency components are suppressed in the case of diseased signals.
Funding Statement: This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ICAN (ICT Challenge and Advanced Network of HRD) program (IITP-2021-2020-0-01832) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation) and the Soonchunhyang University Research Fund.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|