Open Access

ARTICLE

An Ensemble Methods for Medical Insurance Costs Prediction Task

Nataliya Shakhovska1, Nataliia Melnykova1,*, Valentyna Chopiyak2, Michal Gregus ml3
1 Department of Artificial Intelligence, Lviv Polytechnic National University, Lviv, 79013, Ukraine
2 Department of Clinical Immunology and Allergology, Danylo Halytsky Lviv National Medical University, Lviv, 79010, Ukraine
3 Faculty of Management, Comenius University, Bratislava, 814 99, Slovakia
* Corresponding Author: Nataliia Melnykova. Email:
(This article belongs to this Special Issue: Machine Learning Applications in Medical, Finance, Education and Cyber Security)

Computers, Materials & Continua 2022, 70(2), 3969-3984. https://doi.org/10.32604/cmc.2022.019882

Received 29 April 2021; Accepted 15 June 2021; Issue published 27 September 2021

Abstract

The paper reports three new ensembles of supervised learning predictors for managing medical insurance costs. The open dataset is used for data analysis methods development. The usage of artificial intelligence in the management of financial risks will facilitate economic wear time and money and protect patients’ health. Machine learning is associated with many expectations, but its quality is determined by choosing a good algorithm and the proper steps to plan, develop, and implement the model. The paper aims to develop three new ensembles for individual insurance costs prediction to provide high prediction accuracy. Pierson coefficient and Boruta algorithm are used for feature selection. The boosting, stacking, and bagging ensembles are built. A comparison with existing machine learning algorithms is given. Boosting modes based on regression tree and stochastic gradient descent is built. Bagged CART and Random Forest algorithms are proposed. The boosting and stacking ensembles shown better accuracy than bagging. The tuning parameters for boosting do not allow to decrease the RMSE too. So, bagging shows its weakness in generalizing the prediction. The stacking is developed using K Nearest Neighbors (KNN), Support Vector Machine (SVM), Regression Tree, Linear Regression, Stochastic Gradient Boosting. The random forest (RF) algorithm is used to combine the predictions. One hundred trees are built for RF. Root Mean Square Error (RMSE) has lifted the to 3173.213 in comparison with other predictors. The quality of the developed ensemble for Root Mean Squared Error metric is 1.47 better than for the best weak predictor (SVR).

Keywords

Healthcare; medical insurance; prediction task; machine learning; ensemble; data analysis

Cite This Article

N. Shakhovska, N. Melnykova, V. Chopiyak and M. Gregus ml, "An ensemble methods for medical insurance costs prediction task," Computers, Materials & Continua, vol. 70, no.2, pp. 3969–3984, 2022.



This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1645

    View

  • 2139

    Download

  • 0

    Like

Share Link

WeChat scan