Lung cancer is a poorly understood disease. Smokers may develop lung cancer due to the inhalation of carcinogenic substances while smoking, but non-smokers may develop this disease as well. Lung cancer can spread to other parts of the body and this process is called metastasis. Because the lung cancer is difficult to identify in the initial stages. The objective of this work is to reduce the mortality rate of the disease by identifying it at an earlier stage based on the existing symptoms. Artificial intelligence plays active roles in tasks such as entropy extraction through preprocessing strategies, ordinal to cardinal value conversions, table normalizations for easy meta computations, and preparation of machine learning tools for iterative processes to achieve rational convergence. The machine learning methodologies incorporated in this work are the cross-validation classification tree, random forest cross-validation classification, and random tree, all of which are included in an ensemble algorithm. The ensemble algorithm classifies lung cancer with maximum precision rates. The outcome of the classification provides 94.3% accuracy, which is the highest precision rate in comparison with the conventional methodologies. Semantics preprocessing of a lung cancer training set is performed with least entropy, and then translation, aggregation, and navigation based methodologies are applied for identifying the disease at its initial stage.
Lung cancer is one of the most common types of cancer today. It is a malignant tumour capable of growing at rapid rates in an uncontrolled manner. The malignancy of the tumour can be determined with the help of the ground-glass capacity strategy as well as image cropping and feature extraction using gray-level co-occurrence matrices (GLCM). Classification procedures have been accomplished with the help of the naive Bayes classifier. The outcome obtained by using these approaches in a previous study was an increase of 8.34% in the accuracy rate, 11.76% in the sensitivity rate, and 5.26% in the specificity [
In a study by Wu et al., classifiers were employed to identify lung cancer in its earlier stages. The association between the radiomic features and tumour histologic sub-types of lung cancer patients was revealed. Random forest, naive Bayes, and K-nearest neighbours (KNN) classification methods were used. The naive Bayes classifier outperformed the other classifiers and thus achieved the highest area under the curve (AUC) [
Every year, around 1,600,000 deaths due to lung cancer are recorded, which is higher than the number of deaths caused by other types of cancer(including breast and prostate cancer). Tobacco use has been the reason for the death of approximately 7 million people every year globally, and more than 89,000 deaths have been recorded due to exposure to second-hand smoke. Cigarette smoking is the main cause of lung cancer, contributing to 80% of lung cancer cases worldwide. According to the American Cancer Society, there will be around 235,760 new cases of lung cancer in the United States in 2021 (119,100 in men and 116,660 in women). Lung cancer claimed the lives of 131,880 people (69,410 in men and 62,470 in women).
In order to analyze the prediction of survival rates from the Electronic Health Records and to provide treatment from there on, we use methods such as naïve Bayes, support vector machine (SVM), and classification trees(C4.5) in this study; the latter method is selected because classification trees have been found to produce enhanced lung cancer prediction results [
This study proposes that the specimen of bronchial biopsy could be used as a substitute for the analysis of DNA methylation in patients with untreatable lung cancer [
The main goal of a study by Kureshi et al. was to represent the relationships between the patient’s symptoms and tumour responses in lateral stages of NSCLC. Support vector machine, a supervised learning model, and a rule-based classier were used. These methods were observed to be promising approaches in supporting the selection of patients for the targeted treatments of advanced NSCLC [
Dass et al. analyzed the gene mutations and gene expression data for the phenotypic classification of lung cancer. The methods involved included the integrated classification hierarchal induction algorithm, cross-validation technique, and J48 Weka tool. The outcomes indicated that the improved decision tree worked best, resulting in higher accuracy, which could lower the pain of examination of the patients [
In a study by Naftchali, and, the goal was to produce a computational intelligent predictive model to predict the chemotherapy effectiveness/futility in patients in order to prevent unnecessary treatment. The method was applied in two steps. The first step was a purposeful cleansing technique involving chi-square distribution, SVM recursive feature elimination SVM-RFE, and a correlation 2D matrix, all of which were employed in the NSCLC gene expression dataset as a novel dimensionality reduction method to tackle the curse of the number of attributes and to identify the chemotherapy target genes from tens of thousands of features. A basic mathematical approach to the issue of pattern classification is the Bayesian decision theory. This method is focused on calculating the tradeoffs between different classification decisions and the costs associated with them using probability. The results of this study suggested that the deep learning feature selection approach improved the precision of classifying patients eligible for being treated with chemotherapy by minimizing the dimensionality. The results also indicated the approach would be powerful when used in medical datasets containing a small training set coupled with numerous features [
The aim of the study by Dong et al. was to develop a small-cell lung cancer (SCLC genetic database through comprehensive ResNet relationship data analysis, where 557 SCLC target genes were curated. Multiple levels of associations between these genes and SCLC were analyzed. The methods included sparse representation-based variable selection (SRVS) for gene selection of four SCLC gene expression datasets followed by a case-control classification procedure. The results suggested that for a given SCLC patient group, a gene vector may be present among the 557 recorded or collected SCLC genes that possesses notable prediction power. Thus, SRVS is prolific in identifying the optimal gene subset targeting customized treatment [
The previous related works illustrate that lung cancer detection has been accomplished with the help of various classification methods and genetic algorithms. The classification mechanisms, such as the naïve Bayes, ANN, DBN, KNN, Fisher linear discriminant analysis, self-adaptive machine learning, SVM, computed tomography, and K-means neighbor network, were used, and feature selection using deep learning methodologies, were applied in the previous strategies. In our proposed work, three different methodologies—cross-validation classification tree using R part function in R, random forest cross-validation classification, and random tree—are implemented to determine the accuracy levels of the above mentioned methodologies.
Dataset pre-processing can essentially eliminate the outliers and inconsistencies in real-time data. Statistical modelling helps in resolving the problem of missing data in real-time systems. Entropy is created by the irrelevant and incomplete data in the dataset. Highly precise cells can be formed by converting the dataset to its rational format; this transformation further assists in the elimination of the newly created entropies. Such converted datasets can now be used for analysing real-time systems in applications such as hospital information systems, enterprise resource planning, customer relationship management, and finance management in the banking sector.
The ensemble and hybrid ensemble models have been observed to offer greater accuracy levels for the concerned applications like Healthcare of lung carcinoma due to their associated cascading classification methodologies. A single classifier prediction model would not be accepted in industry today. Rather, there is great demand for numerous methodologies for choosing between existing alternatives. The mathematical modelling hybrid ensemble model is composed of concurrent classifiers that can be applied on a single dataset for obtaining highly accurate outcomes. At times, the dataset can be trained according to its associated models and can be incorporated into ensemble models or the classifiers; else some special type of classifier like SVM can be applied for enhancing the accuracy of the outcomes. Hybrid ensemble models can be created by applying artificial intelligence to the ensemble models. Intelligence can be achieved via the heuristic and the meta-heuristic methods. Currently, many applications rely on empirical methodologies for achieving intelligent hidden answers for their specific applications.
The sample training set for lung cancer prediction is shown in the
By applying the method of linear regression analysis, the real-time response data are denoted as
Using the principle of least squares method, we can construct a function m = γ + δTl to fit the data (li, mi) 1 ≤ i ≤ n. Using the mean squared error (MSE), we can find the appropriate fit. The estimated parameter values γ and δ of the MSE on the observed set (
To check the cross-validation, the test error rate on the held out point on the observed model on every point except
represents the identifying sample ‘
For a single classification,
For linear regression, let us consider
where
For
This approximation is valid for small values of
There are identically independent distributed random vectors, and each tree casts a unit vote for the most popular class at an input. The equally weighted voting model is defined as
Given an ensemble of classifiers {h1(l), h2(l), …, hm(l)}, each of these can acquirea classification procedure. A classifier hk(l) represents a common way of h(l, μk). Using the training set drawn at random from the random vector distributions, the margin function is defined as
where I(.) is the indicator function. The margin measures the extent to which the average number of votes for the right class exceeds the average vote for any other class. The larger the margin, the greater the confidence in the classification.
Generalization error is defined by PE* = PL, M(mg(L, M) < 0),and its upper bound is determined by
The strength of the classifier set can be determined by the following relationship:
{h(l, μ} is , S = EL, M (Pμ(h(L,
The training dataset for lung carcinoma is taken for the illustration work. The trained data are depicted in
Tr | Gn | Ag | Sm | Yf | Ax | Pp | Cd | Ft | Al | Wh | Ac | Co | Sob | Sd | Cp | Lc |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | M | 69 | 1 | 2 | 2 | 1 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | YES |
2 | M | 74 | 2 | 1 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 2 | 2 | 2 | YES |
3 | F | 59 | 1 | 1 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 2 | 1 | 2 | NO |
4 | M | 63 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 2 | 2 | NO |
5 | F | 63 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | NO |
6 | F | 75 | 1 | 2 | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 1 | 1 | YES |
7 | M | 52 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 1 | 2 | YES |
8 | F | 51 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 1 | 1 | 1 | 2 | 2 | 1 | YES |
9 | F | 68 | 2 | 1 | 2 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NO |
10 | M | 53 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 1 | 1 | 2 | 2 | YES |
11 | F | 61 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | 2 | 1 | YES |
12 | M | 72 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | YES |
13 | F | 60 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | NO |
14 | M | 58 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | YES |
15 | M | 69 | 2 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 1 | 2 | NO |
16 | F | 48 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 1 | YES |
17 | M | 75 | 2 | 1 | 1 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | YES |
18 | M | 57 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | 2 | 2 | YES |
19 | F | 68 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | YES |
20 | F | 61 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | NO |
21 | F | 44 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | YES |
22 | F | 64 | 1 | 2 | 2 | 2 | 1 | 1 | 2 | 2 | 1 | 2 | 1 | 2 | 1 | YES |
23 | F | 21 | 2 | 1 | 1 | 1 | 2 | 2 | 2 | 1 | 1 | 1 | 2 | 1 | 1 | NO |
24 | M | 60 | 2 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | YES |
25 | M | 72 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | YES |
26 | M | 65 | 1 | 2 | 2 | 1 | 1 | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | YES |
27 | F | 61 | 2 | 2 | 2 | 1 | 1 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | 2 | YES |
28 | M | 69 | 1 | 1 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 2 | 1 | 2 | NO |
29 | F | 53 | 2 | 2 | 2 | 1 | 2 | 1 | 1 | 2 | 2 | 1 | 2 | 2 | 2 | YES |
Legend 1: Tr-Transaction Table, Gn-Gender, Ag-Age, Sm-Smoking, Yf-Yellow_Fingers, Ax-Anxiety, Pp-Peer Pressure, Cd-Chronic Disease, Ft-Fatigue, Al-Allergy, Wh-Wheezing, Ac-Alcohol Consuming, Co-Coughing, Sob-Shortness Of Breath, Sd-Swallowing Difficulty, Cp-Chest Pain, Lc-Lung_Cancer. In
Tr | Gn | Ag | Sm | Yf | Ax | Pp | Cd | Ft | Al | Wh | Ac | Co | Sob | Sd | Cp | Class |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | M | 69 | 0.021 | 0.044 | 0.045 | 0.025 | 0.023 | 0.041 | 0.023 | 0.043 | 0.047 | 0.043 | 0.039 | 0.047 | 0.043 | YES |
2 | M | 74 | 0.042 | 0.022 | 0.023 | 0.025 | 0.047 | 0.041 | 0.047 | 0.022 | 0.023 | 0.022 | 0.039 | 0.047 | 0.043 | NO |
3 | F | 59 | 0.021 | 0.022 | 0.023 | 0.05 | 0.023 | 0.041 | 0.023 | 0.043 | 0.023 | 0.043 | 0.039 | 0.023 | 0.043 | NO |
4 | M | 63 | 0.042 | 0.044 | 0.045 | 0.025 | 0.023 | 0.02 | 0.023 | 0.022 | 0.047 | 0.022 | 0.02 | 0.047 | 0.043 | NO |
5 | F | 63 | 0.021 | 0.044 | 0.023 | 0.025 | 0.023 | 0.02 | 0.023 | 0.043 | 0.023 | 0.043 | 0.039 | 0.023 | 0.022 | NO |
6 | F | 75 | 0.021 | 0.044 | 0.023 | 0.025 | 0.047 | 0.041 | 0.047 | 0.043 | 0.023 | 0.043 | 0.039 | 0.023 | 0.022 | NO |
7 | M | 52 | 0.042 | 0.022 | 0.023 | 0.025 | 0.023 | 0.041 | 0.023 | 0.043 | 0.047 | 0.043 | 0.039 | 0.023 | 0.043 | NO |
8 | F | 51 | 0.042 | 0.044 | 0.045 | 0.05 | 0.023 | 0.041 | 0.047 | 0.022 | 0.023 | 0.022 | 0.039 | 0.047 | 0.022 | YES |
9 | F | 68 | 0.042 | 0.022 | 0.045 | 0.025 | 0.023 | 0.041 | 0.023 | 0.022 | 0.023 | 0.022 | 0.02 | 0.023 | 0.022 | NO |
10 | M | 53 | 0.042 | 0.044 | 0.045 | 0.05 | 0.047 | 0.02 | 0.047 | 0.022 | 0.047 | 0.022 | 0.02 | 0.047 | 0.043 | YES |
11 | F | 61 | 0.042 | 0.044 | 0.045 | 0.05 | 0.047 | 0.041 | 0.023 | 0.043 | 0.023 | 0.043 | 0.039 | 0.047 | 0.022 | YES |
12 | M | 72 | 0.021 | 0.022 | 0.023 | 0.025 | 0.047 | 0.041 | 0.047 | 0.043 | 0.047 | 0.043 | 0.039 | 0.023 | 0.043 | YES |
13 | F | 60 | 0.042 | 0.022 | 0.023 | 0.025 | 0.023 | 0.041 | 0.023 | 0.022 | 0.023 | 0.022 | 0.039 | 0.023 | 0.022 | NO |
14 | M | 58 | 0.042 | 0.022 | 0.023 | 0.025 | 0.023 | 0.041 | 0.047 | 0.043 | 0.047 | 0.043 | 0.039 | 0.023 | 0.043 | YES |
15 | M | 69 | 0.042 | 0.022 | 0.023 | 0.025 | 0.023 | 0.02 | 0.047 | 0.043 | 0.047 | 0.043 | 0.02 | 0.023 | 0.043 | NO |
16 | F | 48 | 0.021 | 0.044 | 0.045 | 0.05 | 0.047 | 0.041 | 0.047 | 0.043 | 0.023 | 0.043 | 0.039 | 0.047 | 0.022 | YES |
17 | M | 75 | 0.042 | 0.022 | 0.023 | 0.025 | 0.047 | 0.02 | 0.047 | 0.043 | 0.047 | 0.043 | 0.039 | 0.023 | 0.043 | YES |
18 | M | 57 | 0.042 | 0.044 | 0.045 | 0.05 | 0.047 | 0.02 | 0.023 | 0.022 | 0.047 | 0.022 | 0.02 | 0.047 | 0.043 | YES |
19 | F | 68 | 0.042 | 0.044 | 0.045 | 0.05 | 0.047 | 0.041 | 0.023 | 0.022 | 0.023 | 0.043 | 0.039 | 0.023 | 0.022 | YES |
20 | F | 61 | 0.021 | 0.022 | 0.023 | 0.025 | 0.047 | 0.041 | 0.023 | 0.022 | 0.023 | 0.022 | 0.039 | 0.023 | 0.022 | NO |
21 | F | 44 | 0.042 | 0.044 | 0.045 | 0.05 | 0.047 | 0.041 | 0.023 | 0.022 | 0.023 | 0.022 | 0.039 | 0.047 | 0.022 | YES |
22 | F | 64 | 0.021 | 0.044 | 0.045 | 0.05 | 0.023 | 0.02 | 0.047 | 0.043 | 0.023 | 0.043 | 0.02 | 0.047 | 0.022 | YES |
23 | F | 21 | 0.042 | 0.022 | 0.023 | 0.025 | 0.047 | 0.041 | 0.047 | 0.022 | 0.023 | 0.022 | 0.039 | 0.023 | 0.022 | NO |
24 | M | 60 | 0.042 | 0.022 | 0.023 | 0.025 | 0.023 | 0.041 | 0.047 | 0.043 | 0.047 | 0.043 | 0.039 | 0.023 | 0.043 | YES |
25 | M | 72 | 0.042 | 0.044 | 0.045 | 0.05 | 0.047 | 0.02 | 0.047 | 0.043 | 0.047 | 0.043 | 0.02 | 0.047 | 0.043 | YES |
26 | M | 65 | 0.021 | 0.044 | 0.045 | 0.025 | 0.023 | 0.041 | 0.023 | 0.043 | 0.047 | 0.043 | 0.039 | 0.047 | 0.043 | YES |
27 | F | 61 | 0.042 | 0.044 | 0.045 | 0.025 | 0.023 | 0.041 | 0.047 | 0.022 | 0.047 | 0.022 | 0.039 | 0.047 | 0.043 | YES |
28 | M | 69 | 0.021 | 0.022 | 0.023 | 0.05 | 0.023 | 0.041 | 0.023 | 0.043 | 0.023 | 0.043 | 0.039 | 0.023 | 0.043 | NO |
29 | F | 53 | 0.042 | 0.044 | 0.045 | 0.025 | 0.047 | 0.02 | 0.023 | 0.043 | 0.047 | 0.022 | 0.039 | 0.047 | 0.043 | YES |
The cross-validation model is the method of using a rotational model in statistics for the predictive analysis. Partitioning the training set and training the same dataset is performed via the sampling models. The mean averages will be taken into account for accurate prediction. The result’s performance will always be on the greater side because of the mean averages.
C1-Smoking, C2-Yellow Fingers, C3-Anxiety, C4-Peer Pressure, C5-Chronic Disease, C6-Fatigue, C7-Allergy, C8-Wheezing, C9-Alcohol Consuming, C10-Coughing, C11-Shortness Of Breath, C12-Swallowing Difficulty, C13-Chest Pain.
Legends: Attributes available in the sample training test
rpart(formula = CLASS ~ AGE + C1 + C2 + C3 + C4 + C5 + C6 + C7 +
C8 + C9 + C10 + C11 + C12 + C13, data = data1, method = “class”)
Variables actually used in tree construction: [
Root node error: 12/29 = 0.41379
No of observations = 29
CP nsplit results
1 0.50 0 1.0 1.000000.22102 |
2 0.01 1 0.5 0.666670.20057 |
Call: rpart(formula = CLASS ~ AGE + C1 + C2 + C3 + C4 + C5 + C6 + C7 +
C8 + C9 + C10 + C11 + C12 + C13, data = data1, method = “class”)
Variable importance
C3 C12 C2 C4 AGE C11 28 22 22 12 8 8
Node number 1: 29observations, complexity param = 0.5 |
predicted class = YES expected loss = 0.4137931 P(node) =1 |
class counts: 0 12 17, probabilities: 0.000 0.414 0.586 |
left son = 2 (14 obs) right son = 3 (15 obs)
Primary splits: C3 < 0.034 to the left, improve = 4.888013, (0 missing)
C12 < 0.035 to the left | improve = 3.973727 | (0 missing) |
C2 < 0.033 to the left | improve = 3.655504 | (0 missing) |
C9 < 0.035 to the left | improve = 2.154680 | (0 missing) |
C4 < 0.0375 to the left | improve = 1.907349 | (0 missing) |
Surrogate splits: C2 < 0.033 to the left, agree = 0.897, adj = 0.786, (0 split)
C12 < 0.035 to the left | agree = 0.897 | adj = 0.786 | (0 split) |
C4 < 0.0375 to the left | agree = 0.724 | adj = 0.429 | (0 split) |
AGE < 68.5 to the right | agree = 0.655 | adj = 0.286 | (0 split) |
C11 < 0.0295 to the right | agree = 0.655 | adj = 0.286 | (0 split) |
Node number 2: 14 observations |
predicted class = NO expected loss = 0.2857143 P(node) = 0.4827586 |
class counts: 0 10 4 probabilities: 0.000 0.714 0.286 |
No of observations: 15 |
predicted class = YES expected loss = 0.1333333 P(node) = 0.5172414 |
class counts: 0 2 13 probabilities: 0.000 0.133 0.867 |
Loss and probability are the two factors taken into account for finding the right prediction for the patterns. Because of less entropy, the training set produces the prediction given below for the said pattern.
Pattern: M | 63 | 0.042 | 0.044 | 0.045 | 0.025 | 0.023 | 0.02 | 0.023 | 0.022 | 0.047 | 0.022 | 0.02 | 0.047 | 0.043 |
Result obtained: No | ||||||||||||||
Pattern: F | 51 | 0.042 | 0.044 | 0.045 | 0.05 | 0.023 | 0.041 | 0.047 | 0.022 | 0.023 | 0.022 | 0.039 | 0.047 | 0.022 |
Result obtained: YES
From the above patterns, it is clear that the obtained results accurately match the selected patterns due to their equal distributions.
Random forest is a type of tree induction method for classification. A multitude of trees were formed while performing random forest classifications. According to the statistical phenomena, random forest and cross-validation would not be applied together. However, for simple confirmation in our research work, both are applied as a hybrid technology for errorless prediction.
Scheme: weka.classifiers.trees.RandomForest-I 10-K 0-S 1
Relation: modifiedLung Instances: 29, Attributes: 16
Attributes are GEN AGE C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 CLASS
C1-Smoking, C2-Yellow Fingers, C3-Anxiety, C4-Peer Pressure, C5-Chronic Disease, C6-Fatigue, C7-Allergy, C8-Wheezing, C9-Alcohol Consuming, C10-Coughing, C11-Shortness Of Breath, C12-Swallowing Difficulty, C13-Chest Pain.
Legends: Attributes and its corresponding strings
Test mode: 10-fold cross-validation
Trained dataset: Random forest of 10 trees, each constructed while considering 5 random features.
Out of bag error: 0.2414, Time taken to build model: 0.03 seconds corrected
Correctly classified instances | 21 | 72.41% |
Incorrectly classified instances | 8 | 27.59% |
Kappa statistic | 0.4021 | – |
Mean absolute error | 0.3517 | – |
Root mean squared error | 0.4418 | – |
Relative absolute error | 71.76% | – |
Root relative squared error | 88.94% | – |
Total Number of Instances | 29 | – |
0.882 | 0.5 | 0.714 | 0.882 | 0.789 | 0.728 | YES |
0.5 | 0.118 | 0.75 | 0.5 | 0.6 | 0.728 | NO |
Weighted Avg. 0.724 0.342 0.729 0.724 0.711 0.728
Confusion matrix will provide the real application of the prediction in terms of the segment of classes.
A | b | <- classified as |
1 | 2 | a = YES |
6 | 6 | b = NO |
Pattern: M 63 0.042 0.044 0.045 0.025 0.023 0.02 0.023 0.022 0.047 0.022 0.02 0.047 0.043
Result obtained : No
Pattern : F 51 0.042 0.044 0.045 0.05 0.023 0.041 0.047 0.022 0.023 0.022 0.039 0.047 0.022
Result obtained: YES
From the above patterns it is clear that the obtained results accurately matches due to their equal distributions.
From the above patterns it is clear that the obtained results accurately match due to their equal distributions.
Random tree forest is a legitimate classifier with roots and siblings in every layer. Identification and prediction can be understood easily because of their hierarchical structure. Leaf node is the class node, which aids in the prediction of patterns. The cross validation diagram shows in
Relation: modifiedLung
Instances: 29, Attributes: 16
Test mode: 10-fold cross-validation
C12 < 0.04,| C10 < 0.03 : NO (4/0), | C10 >= 0.03, C1 < 0.03, C7 < 0.04 : NO (3/0)., |
C7 >= 0.04, C2 < 0.03 : YES (1/0), C2 >= 0.03 : NO (1/0),| C1 >= 0.03, C11 < 0.03 : NO (1/0) |
C11 >= 0.03,| C2 < 0.03, AGE < 55 : NO (1/0), AGE >= 55 : YES (3/0), C2 >= 0.03 : YES (1/0) |
C12 >= 0.04, GEN = M, C6 < 0.03, C7 < 0.04, C5 < 0.04 : NO (1/0), C5 >= 0.04 : YES (1/0) |
C7 >= 0.04 : YES (2/0), C6 >= 0.03, C9 < 0.04 : NO (1/0), C9 >= 0.04 : YES (2/0), GEN = F : YES (7/0) |
Size of the tree : 27, Time taken to build model: 0.01 s
0.765 | 0.417 | 0.722 | 0.765 | 0.743 | 0.674 | YES |
0.583 | 0.235 | 0.636 | 0.583 | 0.609 | 0.674 | NO |
Weighted Average. 0.69 0.342 0.687 0.69 0.687 0.674
Confusion matrix will provide the real application of the prediction in terms of the segment of classes.
Correctly classified instances | 20 | 68.97% |
Incorrectly classified instances | 9 | 31.03% |
Kappa statistic | 0.35% | |
Mean absolute error | 0.31% | |
Root mean squared error | 0.56% | |
Relative absolute error | 63.32% | |
Root relative squared error | 112.15% | |
Total Number of Instances | 29 |
a | b | <- classified as |
13 | 4 | a = YES |
5 | 7 | b = NO |
Pattern: M 63 0.042 0.044 0.045 0.025 0.023 0.02 0.023 0.022 0.047 0.022 0.02 0.047 0.043
Result obtained : No
Pattern : F 51 0.042 0.044 0.045 0.05 0.023 0.041 0.047 0.022 0.023 0.022 0.039 0.047 0.022
Result obtained: YES
In this study on the clinical evaluation of lung cancer, 4337 records are procured from the repository data world. The application of crisp ensemble modeling approaches such as random forest, cross-validation, and decision tree classifications are found to offer high-precision results, as demonstrated in the results and discussion sections. Among the classifier models, the ensemble classifier, cascading classifier, and concurrent classifier always result in good predictions of the incoming patterns. Reduction in the entropy levels is achieved due to the execution of appropriate preprocessing procedures. The results arrived were compared with the ensemble model, providing the predicted accuracy.
We thank LetPub (