Document Type : Original Research Paper

Authors

1 Department of Computer, Faculty of Computer Engineering, Al Taha Institute of Higher Education, Tehran, Iran

2 Department of Artificial Intelligence, Faculty of Computer Engineering, Isfahan University, Isfahan, Iran

Abstract

BACKGROUND AND OBJECTIVES: The healthcare insurance industry faces a significant challenge predicting individuals' insurance costs, which are based on complex parameters such as age and physical characteristics. Insurance companies categorize policyholders into high-risk and low-risk groups to manage risks and avoid potential losses. However, the accurate estimation of costs for each individual can be a daunting task. By leveraging data science and machine learning techniques, insurance companies can improve their cost estimation accuracy and better manage risks. This approach can help insurance companies to provide more accurate insurance coverage and pricing for individuals leading to higher customer satisfaction and lower financial losses.
METHODS: To address this challenge, a data science and machine learning-based approach that uses ensemble learning to predict high-risk and low-risk individuals is used. The method involves several steps including data preprocessing, feature engineering, and cross-validation to evaluate the model's performance. The first step involves preprocessing the data by cleaning it, handling missing values, and encoding categorical variables. The second step generates new features using feature engineering techniques such as scaling, normalization, and dimensionality reduction. Next, ensemble learning is used to combine multiple regression methods such as logistic regression, neural networks, support vector machines, random forests, LightGBM, and XGBoost. By combining these methods, the aim is to leverage their strengths and minimize their weaknesses to achieve better prediction accuracy. Finally, the model's performance is evaluated using cross-validation techniques such as k-fold cross-validation. These techniques help to validate the model's accuracy and prevent overfitting.
FINDINGS: The proposed approach achieves an AUC of 0.73 demonstrating its effectiveness in predicting high-risk and low-risk individuals.
CONCLUSION: In conclusion, the healthcare insurance industry can benefit greatly from data science and machine learning-based approaches. By accurately predicting high-risk and low-risk individuals, insurance companies can better manage risks and provide more accurate coverage and pricing for their customers. This can lead to the improvement of  customer satisfaction and the reduction of financial losses for insurance companies.

Keywords

Main Subjects

Brownlee, J., (2016). Feature importance and feature selection with XGBoost in python.
Hanafy, M.; Omar, M.A.M., (2021). Predict health insurance cost by using machine learning and DNN regression models. Int. J. Innov. Technol. Explor. Eng., 10(3): 137-143 (7 Pages).
Kafuria., A.D., (2022). Predictive model for computing health insurance premium rates using machine learning algorithms. J. Comput., 44(1): 21-38 (18 Pages).
 
Kumar Sharma, D.; Sharma, A., (2020). Prediction of health insurance emergency using multiple linear regression technique. Eur. J. Mol. Clin. Med., 7: 95-105 (11 Pages).
 
Lakshmanarao, A.; Koppireddy, C.S.; Kumar, G.V., (2020). Prediction of medical costs using regression algorithms. J. Inf. Comput. Sci., 10(5): 751-757 (7 Pages).
 

Letters to Editor


IJIR Journal welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in IJIR should be sent to the editorial office of IJIR within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.
CAPTCHA Image