Abstract
This study investigates the application of machine learning (ML) algorithms in predicting lapses in investment policies, a significant challenge for the insurance and financial services industry. The study specifically addresses the problem of policy lapses, which can destabilise financial planning and pose challenges to risk management. It focuses on three ensemble learning techniques: random forest (RF), gradient boosting (GB), and extreme gradient boosting (XGBoost), comparing their effectiveness in identifying key factors that influence policy lapse predictions. The research objectives are to identify these key influencing features, evaluate the predictive accuracy of the ML models, compare the models’ performance to determine the best predictor, and use local interpretable model-agnostic explanations (LIME) to enhance model interpretability.
The study adopts an exploratory research purpose and is grounded in a positivist research philosophy, ensuring objectivity in observation and analysis. A quantitative methodological choice is used to analyse numerical data from a Kaggle-sourced dataset containing 51,685 policy records from a Zimbabwean insurance company, covering the period 2017 to 2020. The experimental strategy involves model implementation and evaluation, beginning with comprehensive data preprocessing steps such as handling missing values, addressing outliers, scaling numerical features, and encoding categorical. An 80/20 train-test split was used to train the models, followed by validation through 10-Fold cross-validation, to ensure reliability. Hyperparameter tuning via GridSearchCV optimised model performance for each algorithm.
The findings reveal that RF emerges as the top-performing model, demonstrating superior performance compared to GB and XGBoost. Analysis revealed that shorter policy tenure and higher numbers of missed payments strongly correlate with policy lapses, while longer tenures contribute positively to policy retention. The use of LIME further enhanced the study by providing transparent and actionable insights into the influence of specific features on individual predictions, offering interpretability for both lapse and non-lapse cases.
This study demonstrates that ML models, particularly RFs, are highly effective in predicting policy lapses, providing valuable insights for insurance and financial services companies to better manage and reduce policy lapses.