پیش بینی هزینه های بیمه درمانی افراد با استفاده از یادگیری ماشین و روش یادگیری جمعی

تجددی نودهی, مهسا; حسینی خطیبانی, سمانه; یزدی نژاد, محسن; زلفی, سمیه

doi:10.22056/ijir.2024.01.01

نوع مقاله : مقاله علمی

نویسندگان

¹ گروه کامپیوتر، دانشکده مهندسی کامپیوتر، موسسه آموزش عالی آل طه، تهران، ایران

² گروه هوش مصنوعی، دانشکده مهندسی کامپیوتر، دانشگاه اصفهان، اصفهان، ایران

https://doi.org/10.22056/ijir.2024.01.01

چکیده

پیشینه و اهداف: صنعت بیمة درمانی‌‌ در پیش‌‌بینی هزینه‌‌های بیمه افراد که براساس پارامترهای پیچیده‌‌ای مانند سن و ویژگی‌های فیزیکی است، با چالش مهمی مواجه است. شرکت‌های بیمه برای مدیریت ریسک و جلوگیری از زیان احتمالی، بیمه‌گذاران را به دو گروه پرخطر و کم‌خطر دسته‌‌بندی می‌کنند. بااین‌حال، برآورد دقیق هزینه‌ها برای هر فرد می‌تواند کار سختی باشد. برای مقابله با این چالش، ما رویکردی مبتنی بر علم داده و یادگیری ماشین را پیشنهاد می‌کنیم که از یادگیری جمعی برای پیش‌بینی افراد پرخطر و کم‌‌خطر استفاده می‌کند.
روش‌شناسی: روش پیشنهادی شامل مراحل مختلفی از جمله پیش‌‌پردازش داده‌‌ها، مهندسی ویژگی‌‌ها و اعتبارسنجی متقابل برای ارزیابی عملکرد مدل است. در مرحلة اول، داده‌‌ها را با پاک کردن، مدیریت مقادیر ازدست‌رفته و رمزگذاری متغیرهای طبقه‌‌بندی، پیش‌‌پردازش می‌‌کنیم. در مرحلة دوم، ما ویژگی‌های جدیدی را با استفاده از روش‌های مهندسی ویژگی‌‌ها مانند مقیاس‌بندی، نرمال‌سازی و کاهش ابعاد تولید می‌کنیم. این روش‌‌ها به استخراج اطلاعات معنادار از داده‌‌ها و بهبود عملکرد مدل کمک می‌‌کند. در مرحلة بعد، ما از یادگیری جمعی برای ترکیب روش‌های رگرسیون متعدد، مانند رگرسیون لجستیک، شبکه‌های عصبی، ماشین‌های بردار پشتیبانی، جنگل‌های تصادفی، LightGBM و XGBoost استفاده می‌کنیم. هدف از ترکیب این روش‌‌ها این است که از نقاط قوت آن‌ها استفاده کنیم و نقاط ضعف آن‌ها را به حداقل برسانیم تا به دقت پیش‌‌بینی بهتری دست یابیم. در نهایت، عملکرد مدل را با استفاده از روش اعتبارسنجی متقاطع k-fold ارزیابی می‌کنیم. این روش به اعتبارسنجی دقت مدل و جلوگیری از برازش بیش از حد کمک می‌کند.
یافته‌ها: رویکرد پیشنهادی ما به AUC برابر با 73/0 دست می‌‌یابد که اثربخشی آن را در پیش‌‌بینی افراد پرخطر و کم‌‌خطر نشان می‌‌دهد.
نتیجه‌گیری: با استفاده از علم داده و روش‌‌های یادگیری ماشین، شرکت‌‌های بیمه می‌‌توانند دقت برآورد هزینة خود را بهبود بخشند و ریسک را بهتر مدیریت کنند. این رویکرد می‌‌تواند به شرکت‌‌های بیمه کمک کند تا پوشش بیمه‌‌ای و قیمت‌‌گذاری دقیق‌‌تری را برای افراد ارائه دهند که به رضایت بیشتر مشتریان و کاهش زیان_‌‌های مالی منجر می‌‌شود.

کلیدواژه‌ها

موضوعات

ارزیابی ریسک در رشته های بیمه ‎

عنوان مقاله [English]

Predicting people's health insurance costs using machine learning and ensemble learning methods

نویسندگان [English]

M. Tajaddodi Nodehi ¹
S. Hosseini Khatibani ¹
M. Yazdinejad ²
S. Zolfi ¹

¹ Department of Computer, Faculty of Computer Engineering, Al Taha Institute of Higher Education, Tehran, Iran

² Department of Artificial Intelligence, Faculty of Computer Engineering, Isfahan University, Isfahan, Iran

چکیده [English]

BACKGROUND AND OBJECTIVES: The healthcare insurance industry faces a significant challenge predicting individuals' insurance costs, which are based on complex parameters such as age and physical characteristics. Insurance companies categorize policyholders into high-risk and low-risk groups to manage risks and avoid potential losses. However, the accurate estimation of costs for each individual can be a daunting task. By leveraging data science and machine learning techniques, insurance companies can improve their cost estimation accuracy and better manage risks. This approach can help insurance companies to provide more accurate insurance coverage and pricing for individuals leading to higher customer satisfaction and lower financial losses.
METHODS: To address this challenge, a data science and machine learning-based approach that uses ensemble learning to predict high-risk and low-risk individuals is used. The method involves several steps including data preprocessing, feature engineering, and cross-validation to evaluate the model's performance. The first step involves preprocessing the data by cleaning it, handling missing values, and encoding categorical variables. The second step generates new features using feature engineering techniques such as scaling, normalization, and dimensionality reduction. Next, ensemble learning is used to combine multiple regression methods such as logistic regression, neural networks, support vector machines, random forests, LightGBM, and XGBoost. By combining these methods, the aim is to leverage their strengths and minimize their weaknesses to achieve better prediction accuracy. Finally, the model's performance is evaluated using cross-validation techniques such as k-fold cross-validation. These techniques help to validate the model's accuracy and prevent overfitting.
FINDINGS: The proposed approach achieves an AUC of 0.73 demonstrating its effectiveness in predicting high-risk and low-risk individuals.
CONCLUSION: In conclusion, the healthcare insurance industry can benefit greatly from data science and machine learning-based approaches. By accurately predicting high-risk and low-risk individuals, insurance companies can better manage risks and provide more accurate coverage and pricing for their customers. This can lead to the improvement of customer satisfaction and the reduction of financial losses for insurance companies.

کلیدواژه‌ها [English]

Data mining
Ensemble learning
Healthcare insurance cost
Machin learning
Risk

مراجع

Ahmadlou, Y.; Pourebrahimi, A.;Tanha, J.; Rajabzadeh, A., (2023). Presenting a hybrid model for identifying claims of suspicious damages in agricultural insurance. J. Insur. Res., 12(1): 63-78 (16 Pages). [In Persian]

Albalawi, S.; Alshahrani, L.; Albalawi, N.; Alharbi, R., (2023). Prediction of healthcare insurance costs. Comput. Inf., 3(1): 9-18 (10 Pages).

Anwar ul Hassan, Ch.; Iqbal, J.; Hussain, S.; AlSalman, H.; Mosleh, M.A.; Sajid Ullah, S., (2021). A computational intelligence approach for predicting medical insurance cost. Math. Probl. Eng., 2021: 1-13 (13 Pages).

Babichev, S.; Korobchynskyi, M.; Lahodynskyi, O.; Korchomnyi, O.; Basanets, V.; Borynskyi, V., (2018). Development of a technique for the reconstruction and validation of gene network models based on gene expression profiles. J. Enterp. Technol., 1(4): 19-32 (14 Pages).

Benedek, B.; Ciumas, C.; Nagy, B.Z., (2022). Automobile insurance fraud detection in the age of big data – A systematic and comprehensive literature review. J. Financ. Regul. Compliance., 30(4): 503-523 (21 Pages).

Bhardwaj, N.; Anand, R., (2020). Health insurance amount prediction. J. Eng. Res. Technol., 9: 1008-1011 (4 Pages).

Bodyanskiy, Y.; Vynokurova, O.; Pliss, I.; Peleshko, D., (2017). Hybrid adaptive systems of computational intelligence and their online learning for green it in energy management tasks., 229-244 (16 Pages).

Bogaert, M.; Ballings, M.; Bergmans, R.; Van den Poel, D., (2021). Predicting self-declared movie watching behavior using facebook data and information-fusion sensitivity analysis. J. Decis. Sci. Inst., 52(3): 776-810 (35 Pages).

Boutahir, M.K.; Farhaoui, Y.; Azrour, M.; Zeroual, I.; El Allaoui, A., (2022). Effect of feature selection on the prediction of direct normal irradiance. Big. Data. Min. Anal., 5(4): 309-317 (9 Pages).

Brownlee, J., (2016). Feature importance and feature selection with XGBoost in python.

Chen, T.; He, T., (2016). XGBoost: A scalable tree boosting system. In proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining., 785-794 (10 Pages).

Chou, Y.C.; Chuang, H.H.C.; Chou, P.; Oliva, R., (2023). Supervised machine learning for theory building and testing: Opportunities in operations management. J. Oper. Manage., 69(4): 643-675 (33 Pages).

Christobel, Y.A.; Subramanian, S., (2022). An empirical study of machine learning regression models to predict health insurance cost. Webology., 19(2).

Chyrun, L.; Vysotska, V.; Kis, I.; Chyrun, L., (2018). Content analysis method for cut formation of human psychological state. ., 139-144 (6 Pages).

Doroshenko, A., (2018). Piecewise-linear approach to classification based on geometrical transformation model for imbalanced dataset., 231-235 (5 Pages).

Drewe-Boss, P.; Enders, D.; Walker, J.; Ohler, U., (2022). Deep learning for prediction of population health costs. BMC. Med. Inf. Decis. Making., 22(1): 1-10 (10 Pages).

Du, Y.; Yang, C.; Zhao, B.; Hu, C.; Zhang, H.; Yu, Z.; Wang, H., (2023). Optimal design of a supercritical carbon dioxide recompression cycle using deep neural network and data mining techniques. Energy., 271.

Effrosynidis, D.; Arampatzis, A., (2021). An evaluation of feature selection methods for environmental data., 61.

Eriksson, K.; Estep, D.; Johnson, C., (2004). Applied mathematics: Body and soul. ., 1: 741-753 (13 Pages).

Fauzan, M.A.; Murfi, H., (2018). The accuracy of XGBoost for insurance claim prediction. Int. J. Adv. Soft. Comput. Appl., 10(2): 159-171 (13 Pages).

Goundar, S.; Prakash, S.; Sadal, P.; Bhardwaj, A., (2020). Health insurance claim prediction using artificial neural networks. J. Syst. Dyn. Appl., 9(3): 40-57 (18 Pages).

Hanafy, M.; Omar, M.A.M., (2021). Predict health insurance cost by using machine learning and DNN regression models. Int. J. Innov. Technol. Explor. Eng., 10(3): 137-143 (7 Pages).

Ho, T.K., (1995). Random decision forests. In proceedings of 3rd international conference on document analysis and recognition., 1: 278-282 (5 Pages).

Kafuria., A.D., (2022). Predictive model for computing health insurance premium rates using machine learning algorithms. J. Comput., 44(1): 21-38 (18 Pages).

Kaushik, K.; Bhardwaj, A.; Dwivedi, A.D.; Singh, R., (2022). Machine learning-based regression framework to predict health insurance premiums. J. Environ. Res. Public. Health., 19(13).

Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y., (2017). LightGBM: A highly efficient gradient boosting decision tree. Adv. Neural. Inf. Process. Syst., 30: 3149-3157 (9 Pages).

Kumar Sharma, D.; Sharma, A., (2020). Prediction of health insurance emergency using multiple linear regression technique. Eur. J. Mol. Clin. Med., 7: 95-105 (11 Pages).

Kumar, M.; Ghani, R.; Mei, Z.S., (2010). Data mining to predict and prevent errors in health insurance claims processing. In proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining., 65-74 (10 Pages).

Lakshmanarao, A.; Koppireddy, C.S.; Kumar, G.V., (2020). Prediction of medical costs using regression algorithms. J. Inf. Comput. Sci., 10(5): 751-757 (7 Pages).

Lee, T.S.; Chiu, C.C.; Chou, Y.C.; Lu, C.J., (2006). Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Comput. Stat. Data. Anal., 50(4): 1113-1130 (18 Pages).

Marmolejo‐Ramos, F.; Tejo, M.; Brabec, M.; Kuzilek, J.; Joksimovic, S.; Kovanovic, V.; Ospina, R., (2023). Distributional regression modeling via generalized additive models for location, scale, and shape: An overview through a data set from learning analytics. Wiley. Interdiscip. Rev. Data. Min. Knowl. Discovery., 13(1).

Milovic, B.; Milovic, M., (2012). Prediction and decision making in health care using data mining. Kuwait. Chapter. Arabian. J. Bus. Manage. Rev., 1(12): 1-11 (11 Pages).

Morid, M.A.; Kawamoto, K.; Ault, T.; Dorius, J.; Abdelrahman, S., (2017). Supervised learning methods for predicting healthcare costs: Systematic literature review and empirical evaluation. ., 2017: 1312-1321 (10 Pages).

Park, S.B.; Oh, S.K.; Kim, E.H.; Pedrycz, W., (2023). Rule-based fuzzy neural networks realized with the aid of linear function prototype-driven fuzzy clustering and layer reconstruction-based network design strategy. Expert. Syst. Appl., 219.

Perova, I.; Pliss, I., (2017). Deep hybrid system of computational intelligence with architecture adaptation for medical fuzzy diagnostics. Int. J. Intell. Syst. Appl., 9(7): 12-21 (10 Pages).

Pfutzenreuter, T.C.; Lima, E.P., (2021). Machine learning in healthcare management for medical insurance cost prediction.

Sepahvand, S.; Ramandi, S.; Mahmoudvand, R., (2022). Identifying customers' risk in auto insurance and calculating distorted insurance premiums. Iran. J. Insur. Res., 11(4): 321-338 (18 Pages).

Shakhovska, N.; Melnykova, N.; Chopiyak, V., (2022). An ensemble methods for medical insurance costs prediction task. Comput. Mater. Continua., 70(2).

Shakhovska, N.; Veres, O.; Bolubash, Y.; Bychkovska-Lipinska, L., (2015). Data space architecture for big data managering. In 2015 Xth international scientific and technical conference computer sciences and information technologies (CSIT)., 184-187 (4 Pages).

Shyamala Devi, M.; Swathi, P.; Purushotham Reddy, M.; Deepak Varma, V.; Praveen Kumar Reddy, A.; Vivekanandan, S.; Moorthy, P., (2021). Linear and ensembling regression based health cost insurance prediction using machine learning. In smart computing techniques and applications: Proceedings of the fourth international conference on smart computing and informatics., 2.

Sommers, B.D., (2020). Health insurance coverage: What comes after the ACA?. Health. Aff., 39(3): 502-508 (7 Pages).

Tkachenko, R.; Izonin, I.; Kryvinska, N.; Chopyak, V.; Lotoshynska, N.; Danylyuk, D., (2018). Piecewise-linear approach for medical insurance costs prediction using SGTM neural-like structure. IDDM., 21: 170-179 (10 Pages).

Vapnik, V., (1999). An overview of statistical learning theory. IEEE. Trans. Neural. Netw., 10(5): 988-999 (12 Pages).

Vijayalakshmi, V.; Selvakumar, A.; Panimalar, K., (2023). Implementation of medical insurance price prediction system using regression algorithms. In 2023 5th international conference on smart systems and inventive technology (ICSSIT)., 1529-1534 (6 Pages).

Yang, C.; Delcher, C.; Shenkman, E.; Ranka, S., (2018). Machine learning approaches for predicting high cost high need patient expenditures in health care. Biomed. Eng. Online., 17(1): 1-20 (20 Pages).

Zhang, J.; Li, C.; Yin, Y., (2023). Applications of artificial neural networks in microorganism image analysis: a comprehensive review from conventional multilayer perceptron to popular convolutional neural network and potential visual transformer. Artif. Intell. Rev., 56(2): 1013-1070 (58 Pages).

نامه به سردبیر

سردبیر نشریه پژوهشنامه بیمه، هرگونه پیشنهاد و انتقاد دیگر نویسندگان و خوانندگان را در خصوص نقد و بررسی این مقاله مندرج در سامانه نشریه را ظرف مدت 3 ماه از تاریخ انتشار آنلاین مقاله در سامانه و قبل از انتشار چاپی نشریه، به منظور اصلاح و نظردهی امکان پذیر نموده است.، البته این نقد در مورد تحقیقات اصلی مقاله نمی باشد.
توجه به موارد ذیل پیش از ارسال نامه به سردبیر لازم است در نظر گرفته شود:
[1] نامه هایی که شامل گزارش آماری، واقعیت ها، تحقیقات یا نظریه پردازی ها هستند، لازم است همراه با منابع معتبر و مناسب همراه باشد، اگرچه ارسال بیش از زمان 3 نامه توصیه نمی گردد.
[2] نامه هایی که بجای انتقاد سازنده به ایده های تحقیق، مشتمل بر حملات شخصی به نویسنده باشند، توجه و چاپ نمی شود.
[3] نامه ها نباید بیش از 300 کلمه باشد.
[4] نویسندگان نامه لازم است در ابتدای نامه تمایل یا عدم تمایل خود را نسبت به چاپ نظریه ارسالی نسبت به یک مقاله خاص اعلام نمایند.
[5] به نامه های ناشناس ترتیب اثر داده نمی شود.
[6] شهر، کشور و محل سکونت نویسندگان نامه باید در نامه مشخص باشد.
[7] به منظور شفافیت بیشتر و محدودیت حجم نامه، ویرایش بر روی آن انجام می پذیرد.

نام و نام خانوادگی *

پست الکترونیکی *

وابستگی سازمانی *

توضیحات *

شناسه امنیتی *

پژوهشنامه بیمه

پیش بینی هزینه های بیمه درمانی افراد با استفاده از یادگیری ماشین و روش یادگیری جمعی

مراجع

مراجع

ارسال نظر در مورد این مقاله

دوره 13، شماره 1 - شماره پیاپی 47
دی 1402
صفحه 1-14

پیش بینی هزینه های بیمه درمانی افراد با استفاده از یادگیری ماشین و روش یادگیری جمعی

مراجع

مراجع

ارسال نظر در مورد این مقاله

دوره 13، شماره 1 - شماره پیاپی 47دی 1402صفحه 1-14

دوره 13، شماره 1 - شماره پیاپی 47
دی 1402
صفحه 1-14