Open Access
ARTICLE
An Ensemble Methods for Medical Insurance Costs Prediction Task
1 Department of Artificial Intelligence, Lviv Polytechnic National University, Lviv, 79013, Ukraine
2 Department of Clinical Immunology and Allergology, Danylo Halytsky Lviv National Medical University, Lviv, 79010, Ukraine
3 Faculty of Management, Comenius University, Bratislava, 814 99, Slovakia
* Corresponding Author: Nataliia Melnykova. Email:
(This article belongs to the Special Issue: Machine Learning Applications in Medical, Finance, Education and Cyber Security)
Computers, Materials & Continua 2022, 70(2), 3969-3984. https://doi.org/10.32604/cmc.2022.019882
Received 29 April 2021; Accepted 15 June 2021; Issue published 27 September 2021
Abstract
The paper reports three new ensembles of supervised learning predictors for managing medical insurance costs. The open dataset is used for data analysis methods development. The usage of artificial intelligence in the management of financial risks will facilitate economic wear time and money and protect patients’ health. Machine learning is associated with many expectations, but its quality is determined by choosing a good algorithm and the proper steps to plan, develop, and implement the model. The paper aims to develop three new ensembles for individual insurance costs prediction to provide high prediction accuracy. Pierson coefficient and Boruta algorithm are used for feature selection. The boosting, stacking, and bagging ensembles are built. A comparison with existing machine learning algorithms is given. Boosting modes based on regression tree and stochastic gradient descent is built. Bagged CART and Random Forest algorithms are proposed. The boosting and stacking ensembles shown better accuracy than bagging. The tuning parameters for boosting do not allow to decrease the RMSE too. So, bagging shows its weakness in generalizing the prediction. The stacking is developed using K Nearest Neighbors (KNN), Support Vector Machine (SVM), Regression Tree, Linear Regression, Stochastic Gradient Boosting. The random forest (RF) algorithm is used to combine the predictions. One hundred trees are built for RF. Root Mean Square Error (RMSE) has lifted the to 3173.213 in comparison with other predictors. The quality of the developed ensemble for Root Mean Squared Error metric is 1.47 better than for the best weak predictor (SVR).Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.