1 Faculty of Mathematical and Computer Sciences, Allameh Tabatabai University, Tehran, Iran

2 Insurance Research Institute and responsible for the specialized desk of algorithm design and machine learning, Tehran, Iran

3 Insurance Research Institute and head of specialized car insurance desk, Tehran, Iran


Objective: Classifying the risk of policyholders based on observable characteristics can help insurance companies to reduce losses, identify customers more accurately, and prevent adverse selection in the insurance market. The purpose of this article is to examine the financial losses caused by third party insurance and to predict the risk of policyholders in the event of an accident.
Methodology: using decision tree algorithms, support vector machine, Naive Bayes and neural network; The hidden data patterns have been discovered in order to classify third party insurance policyholders. Also, the unbalanced distribution of data in two groups of damaged and undamaged causes an important challenge in the application of machine learning and data mining methods, which is considered in this article.
Findings: The data set belongs to one of the insurance companies and contains more than four hundred thousand samples registered in five years and includes four independent variables of car type, car group, license plate type and car age and a dependent and two-valued variable of financial damage. According to the obtained results, the best performance and prediction accuracy (with accuracy F1=0.72±0.01) is related to the decision tree model.
Conclusion: The impact of variables on the occurrence of damage in order of priority are: car type, license plate type, car age and car group. The evaluation results show that more data related to the driver's characteristics is needed for more accurate prediction of damage and high-risk customers.


