Comparative Analysis of Machine Learning Models for Health Insurance Premium Prediction Using R

Journal: Modern Economics & Management Forum DOI: 10.32629/memf.v6i3.4029

Xiaowei Fang, Zhuoxuan Zhang

Xi'an Jiaotong-Liverpool University, Suzhou 215123, Jiangsu, China

Abstract

This study explores methodologies for forecasting health insurance premiums, focusing on predictive accuracy and reliability. Using a dataset with variables such as age, gender, BMI, and diseases, we apply multiple techniques-including the K-Nearest Neighbors (KNN) algorithm, voting methods, and other machine learning algorithms-to predict premiums. A comparative analysis highlights each method's strengths and limitations, offering insights into which approach provides the most accurate and practical predictions. The findings aim to guide insurers in selecting effective forecasting methods to enhance premium pricing strategies and improve risk management.

Keywords

Health Insurance; Machine Learning Models; Random Forest; XGBoost; Gradient Boosting; Predictive Modeling.

References

[1] Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects models usinglme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01
[2] Charpentier, A. (2014). Computational Actuarial Science with R. In Chapman and Hall/CRC eBooks. https://doi.org/10.1201/b17230
[3] Dash, S., Panigrahi, B. S., Sanikommu, V. V. B. R., Madhavi, B. K., & Sahoo, S. K. (2024). A Comparative Analysis of Different Machine Learning Techniques For Medical Insurance Premium Prediction. Dash, S. Et Al., 1–6. https://doi.org/10.1109/ic-cgu58078.2024.10530731
[4] Hassan, C. a. U., Iqbal, J., Hussain, S., AlSalman, H., Mosleh, M. a. A., & Ullah, S. S. (2021). A computational intelligence approach for predicting medical insurance cost. Mathematical Problems in Engineering, 2021, 1–13. https://doi.org/10.1155/2021/1162553
[5] Kandula, A. R., Kalyanapu, S., Rayapalli, S. N., Veerabathina, K. R., Modugumudi, V., & Kanikella, S. R. (2024). Medical Insurance Predictive Modelling: An Analysis of Machine Learning Methods. Kandula, A.R. Et Al., 1–5. https://doi.org/10.1109/iatmsi60426.2024.10502643
[6] Manathunga, V., & Zhu, D. (2022). Unearned premium risk and machine learning techniques. Frontiers in Applied Mathematics and Statistics, 8. https://doi.org/10.3389/fams.2022.1056529
[7] Najar, P.A. (2024). Adopting Global Health Insurance Models for Medical Tourists in India: Implications for Stakeholders in Digitalized Era. Journal of the Insurance Institute of India, 11(3), 102–111.
[8] Patidar, S., Dudi, S., & Rohit, N. (2023). Estimating Medical Insurance Cost using Linear Regression with HyperParameterization, Decision Tree and Random Forest Models. 2022 12th International Conference on Cloud Computing, Data Science &Amp; Engineering (Confluence), 504–508. https://doi.org/10.1109/confluence56041.2023.10048836
[9] Su, L., Sha, M., & Liu, R. (2023). Medical insurance, labor supply, and anti‐poverty initiatives: Micro‐evidence from China. International Studies of Economics, 19(2), 268–292. https://doi.org/10.1002/ise3.70
[10] Von Ulmenstein, U., Tretter, M., Ehrlich, D. B., & Von Peharnik, C. L. (2022). Limiting medical certainties? Funding challenges for German and comparable public healthcare systems due to AI prediction and how to address them. Frontiers in Artificial Intelligence, 5. https://doi.org/10.3389/frai.2022.913093

Copyright © 2025 Xiaowei Fang, Zhuoxuan Zhang

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License