In recent times, big data analytics using Machine Learning (ML) possesses several merits for assimilation and validation of massive quantity of complicated healthcare data. ML models are found to be scalable and flexible over conventional statistical tools, which makes them suitable for risk stratification, diagnosis, classification and survival prediction. In spite of these benefits, the utilization of ML in healthcare sector faces challenges which necessitate massive training data, data preprocessing, model training and parameter optimization based on the clinical problem. To resolve these issues, this paper presents new Big Data Analytics with Optimal Elman Neural network (BDA-OENN) for clinical decision support system. The focus of the BDA-OENN model is to design a diagnostic tool for Autism Spectral Disorder (ASD), which is a neurological illness related to communication, social skills and repetitive behaviors. The presented BDA-OENN model involves different stages of operations such as data preprocessing, synthetic data generation, classification and parameter optimization. For the generation of synthetic data, Synthetic Minority Over-sampling Technique (SMOTE) is used. Hadoop Ecosystem tool is employed to manage big data. Besides, the OENN model is used for classification process in which the optimal parameter setting of the ENN model by using Binary Grey Wolf Optimization (BGWO) algorithm. A detailed set of simulations were performed to highlight the improved performance of the BDA-OENN model. The resultant experimental values report the betterment of the BDA-OENN model over the other methods in terms of distinct performance measures. Ligent healthcare systems assists to make better decision, which further enables the patient to provide improved medical services. At the same time, skin lesion is a deadly disease that affects people of all age groups. Early, skin lesion segmentation and classification play a vital role in the precise diagnosis of skin cancer by intelligent system. But the automated diagnosis of skin lesions in dermoscopic images is a challenging process because of the problems such as artifacts (hair, gel bubble, ruler marker), blurry boundary, poor contrast and variable sizes and shapes of the lesion images. To address these problems, this study develops Intelligent Multi-Level Thresholding with Deep Learning (IMLT-DL) based skin lesion segmentation and classification model using dermoscopic images. Primarily, the presented IMLT-DL model incorporates the Top hat filtering and inpainting technique for the preprocessing of the dermoscopic images. In addition, the Mayfly Optimization (MFO) with multilevel Kapur's thresholding-based segmentation process is used to determine the affected region. Besides, Inception v3 based feature extractor is applied to derive the useful set of feature vectors. Finally, the classification process is carried out using a Gradient Boosting Tree (GBT) model. The performance of the presented model takes place against International Skin Imaging Collaboration (ISIC) dataset and the experimental outcome is inspected in distinct evaluation measures. The resultant experimental values ensure that the proposed IMLT-DL model outperforms the existing methods by achieving a higher accuracy of 99.2%.
In recent times, big data in healthcare field have been developed significantly with useful datasets that are highly complex and massive. In medical field, the size of the information qualifies the big data. Several limitations are existing like heterogeneity, speed and variation of information in healthcare [
Autism Spectrum Disorder (ASD) is a neuro developing disease categorized by pervasive defects in diverse interests, functions, repeated behavior and social communication. The conventional ideas are related to distinct ailments such as genetic disintegrative disorder, Asperger’s ailments and autistic infection [
Wall et al. [
The workflow of BDA-OENN model is illustrated in
To manage the Big Data, Hadoop Eco-system and its components are extremely utilized. In a shared platform, Hadoop is a type of open source framework, which allows the stakeholders to process and save the Big-Data on computer cluster by using simpler programming methods. Over 1000 nodes from an individual server is demonstrated to include fault tolerance and enhanced scalability. The three major components of Hadoop are (i) Hadoop YARN (ii) MapReduce and (iii) Hadoop Distributed File System (HDFS).
According to Google File System (GFS), the HDFS is exhibited. It is demonstrated as slave or master architecture where the master has more than 1 data node that is known as actual data and a different name node that is known as metadata.
To provide massive adaptability on 1000 Hadoop clusters, Hadoop Map Reduce is utilized and it is the programming architecture at Apache Hadoop heart. For processing huge data on massive clusters, MapReduce is utilized. MapReduce in task processing is comprised of two significant phases such as Map and Reduce stage. Both the phases comprises of pair such as input and output which is the keyvalue especially, in file system where both input and output of the task are stored. The framework handles failed controlling, task re-execution and task scheduling. The framework of MapReduce comprises of single slave node manager and one master resource manager for entire cluster nodes.
Hadoop YARN method is utilized to manage cluster. From the knowledge gained at initial Hadoop generation, it is demonstrated as a secondary Hadoop generation that performs as the major feature. On Hadoop cluster for providing data governance tools, safety and consistent process, YARN performs as a central architecture and resource manager. In dealing with Big Data, the other framework components and tools may be installed on the Hadoop framework.
The SMOTE technique is used to synthesize the input medical data into massive amount of big data. SMOTE is an oversampling method presented by Chawla et al. [
where
Once the synthesized data has been generated, the ENN model is applied for classification of medical data. The ENN presented by Xiaobo et al. [
The ENN has
with
The input of hidden layer is comprised of 2 parts that are external and context input, given by
The aim of this network in minimizing the error can be given by:
To reduce
At this point,
In order to tune the performance of the ENN model, the parameter optimization is carried out using the BGWO algorithm. GWO is the recently developed metaheuristic algorithm derived from hunting nature of grey wolves. Generally, the wolves live in a group of 5–12 members. It is inspired by hunting and searching prey characteristics of grey wolves. The wolves in GWO is separated as
where
where
where
where
where
where
where
where
where
where
where
where
where
where
where
where
where
where
This section validates the ASD diagnostic performance of the BDA-OENN model on three benchmark datasets namely ASD-Children Dataset, ASD-Adolescent Dataset and ASD-Adult Dataset. The details related to the dataset are provided in
S.No. | Dataset Name | Sources | Number of Attributes | Number of Instances |
---|---|---|---|---|
1 | ASD-Children Dataset | UCI | 21 | 292 |
2 | ASD-Adolescent Dataset | UCI | 21 | 104 |
3 | ASD-Adult Dataset | UCI | 21 | 704 |
Number | Attributes Description |
---|---|
1 | Patient age |
2 | Sex |
3 | Ethnicity |
4 | Born with jaundice |
5 | Family Member with Pervasive Development Disorders (PDD) |
6 | Who Fulfills the Test |
7 | Country of Residence |
8 | Usage of Screening App earlier Or Not |
9 | Screening Test Type |
10–19 | Based on the screening method answers for10 questions |
20 | Screening Score |
21 | Target Class [Yes/No] |
Dataset | Sensitivity | Specificity | Accuracy | F-score | Kappa |
---|---|---|---|---|---|
Proposed OENN | |||||
ASD-Children | 98.13 | 98.65 | 98.17 | 98.25 | 98.02 |
ASD-Adolescent | 96.47 | 98.94 | 97.86 | 96.90 | 97.36 |
ASD-Adult | 97.43 | 98.21 | 97.94 | 97.80 | 97.21 |
Proposed BDA-OENN | |||||
ASD-Children | 98.83 | 98.90 | 98.89 | 98.65 | 98.42 |
ASD-Adolescent | 97.21 | 99.10 | 98.43 | 98.12 | 98.23 |
ASD-Adult | 98.89 | 99.34 | 98.95 | 98.86 | 98.67 |
The ASD-Adult dataset in the OENN approach has reached a sensitivity, specificity, accuracy, F-score and kappa of 97.43%, 98.21%, 97.94%, 97.80% and 97.21% respectively while the ASD-Children dataset in the BDA-OENN model has obtained sensitivity, specificity, accuracy, F-score and kappa of 98.83%, 98.90%, 98.89%, 98.65% and 98.42% respectively. Meanwhile, the ASD-Adolescent dataset in the BDA-OENN model has obtained sensitivity, specificity, accuracy, F-score and kappa of 97.21%, 99.10%, 98.43%, 98.12% and 98.23% respectively. In the same way, the ASD-Adult dataset in the BDA-OENN technique has attained a sensitivity, specificity, accuracy, F-score and kappa of 98.89%, 99.34%, 98.95%, 98.86% and 98.67% respectively.
A detailed comparative result analysis of the proposed BDA-OENN model takes place with other existing techniques in
Methods | Sensitivity | Specificity | Accuracy | F-score | Kappa |
---|---|---|---|---|---|
BDA-OENN (Children) | 98.83 | 98.90 | 98.89 | 98.65 | 98.42 |
BDA-OENN (Adolescent) | 97.21 | 99.10 | 98.43 | 98.12 | 98.23 |
BDA-OENN (Adult) | 98.89 | 99.34 | 98.95 | 98.86 | 98.67 |
OENN (Children) | 98.13 | 98.65 | 98.17 | 98.25 | 98.02 |
OENN (Adolescent) | 96.47 | 98.94 | 97.86 | 96.90 | 97.36 |
OENN (Adult) | 97.43 | 98.21 | 97.94 | 97.80 | 97.21 |
QODF-DSAN | 97.86 | 97.37 | 97.60 | 97.51 | 95.19 |
Decision tree | 53.30 | 54.90 | 54.70 | – | – |
Logistic regression | 55.50 | 62.60 | 59.10 | – | – |
Neural network | 53.30 | 71.20 | 62.00 | – | – |
k-Nearest neighbor | 46.60 | 72.10 | 61.80 | – | – |
SVM (linear) | 57.10 | 66.70 | 61.40 | – | – |
RF-CART | 82.06 | 77.02 | 80.71 | – | – |
Opt. KNN | – | – | 69.20 | – | – |
Opt. LR | – | – | 68.60 | – | – |
Opt. RF | – | – | 67.78 | – | – |
This paper develops an effective BDA-OENN model for clinical decision support systems to diagnose ASD accurately. The presented BDA-OENN model involves different stages of operations such as data preprocessing, synthetic data generation, classification and parameter optimization. The medical data is firstly preprocessed in three diverse ways such as data transformation, class labeling and min-max based data normalization. Next, the preprocessed data is fed into the SMOTE technique to create big healthcare data. Followed by, the big data is analyzed in the Hadoop Ecosystem environment, where the actual classification process gets executed. Lastly, the OENN based classification model is applied to determine the class labels and the parameter tuning of OENN model takes place using the BGWO algorithm. Extensive experimental analysis is carried out to ensure the classification performance of the BDA-OENN model on the applied ASD dataset. The experimental values obtained results in the betterment of the BDA-OENN model over the other methods in terms of distinct performance measures. The BDA-OENN model has resulted in a maximal sensitivity of 98.89% and specificity of 99.34% on the applied ASD-Adult dataset. BDA-OENN (Children) technique results in a significant F-score of 98.65% and kappa of 98.42%. But, the BDA-OENN technique results in a higher F-score of 98.86% and kappa of 98.67% on the applied ASD-Adult dataset. In future, the performance of the proposed BDA-OENN method is extended further for social media information by dimensionality reduction and clustering techniques. Applying different machine learning algorithm to reduce time complexity to improve the performance of BDA-OENN.