Analysis of mortality models with covariates missing at random

Shoaee, S.; Fathi, R.

doi:10.22056/ijir.2021.03.02

Document Type : Original Research Paper

Authors

Department of Actuarial Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran

https://doi.org/10.22056/ijir.2021.03.02

Abstract

Objective: Demographic indicators such as mortality rates play a very important role in health, financial and pension policies. Therefore, the accuracy of mathematical models in estimating mortality rates is an important challenge. One of the tasks of actuaries is to construct a suitable mortality model for the available data so that these mortality models can calculate mortality for different ages and longevity, as well as the different information available to individuals on retirement plans. Missing data is a problem that may be faced by actuaries when they are analyzing the real data. Missing data can occur for a variety of reasons, such as unanswered or censored. The presence of missing data can pose a threat to the accuracy of the data analysis results. The purpose of this study is to model the mortality in a retirement plan. In this regard, it is assumed that data are available at the individual level, including date of birth, date of joining the retirement plan, date of completion of the observation, and reason for discontinuation (usually death or right censoring). Information on covariate variables such as gender, benefits or size of pension, demographic geography or health status will also be available. More precisely, this study aims to model the mortality in a retirement plan based on missing data and access to information from various covariate variables, to carefully analyze the structure of different models, to estimate and finally to investigate the financial implications for different mortality experiences containing missing data.
Methodology: In this article, we deal with a pension plan in which each member's future life expectancy is modeled using parametric survival models incorporating covariates which may be missing for some individuals. Likelihood-based techniques estimate parameters, and in this regard, an algorithm is proposed that can perform the estimation task in the best possible way. One of the necessary features to check the adequacy of the statistical model, especially when the data contains missing values, is identifiable. If not identifiable, it can be claimed that the statistical model is not a full rank and is not a suitable model for the data. It is worth noting that the Jacobin matrix needs to be calculated to verify identifiability. As mentioned, in the analysis of mortality models with the presence of missing values, the maximum likelihood method can be used. In such cases, an estimation error may often occur when fitting the model, which can be reduced by modeling from a larger population. For this reason, hybrid retirement plans that remain homogeneous are often used. This proposed method can also be useful for calculating financial quantities based on pension factors. In fact, in this proposed method, different data sets with equal or similar death experiences are combined, sample size increases and risk of parameter decreases, which also leads to a reduction in capital requirement. Socio-economic variables such as the level of benefits and geographical characteristics of the population are also considered more if interest rates are low.

Finding: First, complete data are analyzed and modeled for observations of members of a retirement plan, which includes survival time and ancillary variables for each individual. Estimation of parameters is obtained using the maximum likelihood method. however, when the data is missing, it is not easy to estimate the parameters with the maximum likelihood method. In this case, the model parameters are estimated by the maximum likelihood method which are calculated using the proposed algorithm; then, statistical indicators such as identifiability of parameters are calculated to evaluate the performance of the proposed structure and algorithm. Furthermore, the financial effects, in particular the annuity factors, and the mis-estimation risk capital requirements for the mortality experience which includes the maximum covariates variables are calculated and compared with the individual segments when the data are missing. In addition, it can be seen that when the two statistical variables are not observed together, the model is not identifiable according to the data.

Conclusion: It was found that if the data are missing, the statistical model is not always identifiable using the maximum likelihood, and data combination from two or more experiments can avoid identifiable barriers. The methods proposed in this paper can be useful for actuaries when calculating financial committees based on annuity factors. These methods may combine different datasets with equal or similar mortality experiences, increase sample size, and reduce parameter risk, thus, reducing capital requirements. Socio-economic variables such as the level of benefits and geographical characteristics of the population are given more attention if the interest rate is low.
JEL-Classification: C13, C24, C51

Keywords

References

ذکایی، محمد و مقصودی، مسطوره. (۱۳۸۹). ‌بازسازی مدل‌های مرگ‌ومیر بر پایه شکنندگی با استفاده از تعمیم توزیع گومپرتز‌. فصلنامه صنعت بیمه، ۲۵‌(۴): 85-59.

شجاعی‌آذر، زهرا و حسن‌زاده، امین. (۱۳۹۳). ‌کاربرد مدل‌های فاز-‌نوع در مدل‌بندی مرگ‌ومیر‌. پژوهشنامه بیمه، ۲۹‌(۱): 126-۱05.

کمیجانی، اکبر.، کوششی، مجید و نیاکان، لیلی. (۱۳۹۲). ‌برآورد و پیش‌بینی نرخ مرگ‌ومیر در ایران با استفاده از مدل لی-کارتر‌. پژوهشنامه بیمه، ۲۸‌(۴): 25-1.

مهدوی، غدیر.، دقیقی اصل، علیرضا و لطفی، نیر. (۱۳۹۰). ‌کاربرد یک مدل مرگ‌ومیر با چند عامل ریسک در فسخ قراردادهای بیمه عمر (مورد مطالعه: یک شرکت بیمه)‌. پژوهشنامه بیمه، ۲۶‌(۳): 28-1.

Catchpole, E. A. & Morgan, B. J. T. (1997). Detecting parameter redundancy. Biometrika, 84(1): 187–196.

Chen, Q., May, R. C., Ibrahim, J. G., Chu, H. & Cole, S. R. (2014). Joint modeling of longitudinal and survival data with missing and left-censored time-varying covariates. Statistics in Medicine, 33(26): 4560–4576.

Dempster, A. P., Laird, N. M. & Rubin D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society‌, 39(1): 1–38.

Dickson, D., Hardy, M. & Waters, H. (2013). Actuarial mathematics for life contingent risks. international series on actuarial science. Cambridge University Press.

Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G. & Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3): 279–300.

Gompertz, B. (1825). On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philosophical Transactions of the Royal Society of London, 115: 513–583.

Herring, A. H. & Ibrahim, J. G. (2001). Likelihood-Based methods for missing covariates in the cox proportional hazards model. Journal of the American Statistical Association, 96(453): 292–302.

Lin, X. S. & Liu, X. (2007). Markov aging process and Phase-Type law of mortality. North American Actuarial Journal. 11(4): 92–109.

Little, R. & An, H. (2004). Robust Likelihood-Based analysis of multivariate data with missing values. Statistica Sinica, 14(3): 949–968.

Lord, F. M. (1955). Estimation of parameters from incomplete data. Journal of the American Statistical Association, 50(271): 870–876.

Madrigal, A. M., Matthews, F. E., Patel, D., Gaches, A. & Baxter, S. (2011). What longevity predictors should be allowed for when valuing pension scheme liabilities? British Actuarial Journal, 16(1): 1–38.

Macdonald, A. S., Richards, S. J. & Currie, I. D. (2018). Modelling mortality with actuarial applications. International Series on Actuarial Science. Cambridge University Press.

McLachlan, G. & Peel, D. (2000). Finite mixture models. Wiley Series in Probability and Statistics, New York.

Richards, S. J. (2016). Mis-Estimation risk: Measurement and impact. British Actuarial Journal, 21(3): 429–457.

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3): 581–592.

Schluchter, M. D. & Jackson, K. L. (1989). Log-Linear analysis of censored survival data with partially observed covariates. Journal of the American Statistical Association, 84(405): 42–52.

Titterington, D. M., Smith, A. F. M. & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York, Wiley.

Tsiatis, A. (2007). Semiparametric theory and missing data. Springer Science & Business Media.

Ungolo, F., Christiansen, M. C., Kleinow, T. & MacDonald, A. S. (2019). Survival analysis of pension scheme mortality when data are missing. Scandinavian Actuarial Journal, 2019 (6): 523–547.

Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(116): 3571–3594.

Wilks, S. S. (1932). Moments and distributions of estimates of population parameters from fragmentary samples. The Annals of Mathematical Statistics, 3(3): 163–195.

Xu, Y., Kim, J. K. & Li, Y. (2017). Semiparametric estimation for measurement error models with validation data. Canadian Journal of Statistics, 45(2): 185–201.

Yashin, A. (2001). Mortality models incorporating theoretical concepts of ageing. In Forecasting Mortality in Developed Countries, 261–280.

Letters to Editor

IJIR Journal welcomes letters to the editor for the post-publication discussions and corrections which allows debate post publication on its site, through the Letters to Editor. Letters pertaining to manuscript published in IJIR should be sent to the editorial office of IJIR within three months of either online publication or before printed publication, except for critiques of original research. Following points are to be considering before sending the letters (comments) to the editor.

[1] Letters that include statements of statistics, facts, research, or theories should include appropriate references, although more than three are discouraged.

[2] Letters that are personal attacks on an author rather than thoughtful criticism of the author’s ideas will not be considered for publication.

[3] Letters can be no more than 300 words in length.

[4] Letter writers should include a statement at the beginning of the letter stating that it is being submitted either for publication or not.

[5] Anonymous letters will not be considered.

[6] Letter writers must include their city and state of residence or work.

[7] Letters will be edited for clarity and length.

Name *

Email Address *

Affiliation *

Comments *

Security Code *

Iranian Journal of Insurance Research

Analysis of mortality models with covariates missing at random

References

References

Letters to Editor

Send comment about this article

Volume 10, Issue 3 - Serial Number 37
July 2021
Pages 169-184

Analysis of mortality models with covariates missing at random

References

References

Letters to Editor

Send comment about this article

Volume 10, Issue 3 - Serial Number 37July 2021Pages 169-184

Volume 10, Issue 3 - Serial Number 37
July 2021
Pages 169-184