Abstract
Background: Child mortality under-five years of age remains a pressing global health challenge, particularly in developing regions. This study aims to develop a machine learning model to predict under-five mortality in South Africa and identify key determinants of this mortality.
Methods: Data from the 2016 South Africa Demographic and Health survey (3,548 children) was used to explore a model that optimally predicts under-five years mortality. The study employed a chi-square test and analysis of variance for feature selection, while the synthetic minority over-sampling technique was used to manage class imbalances. Model evaluation was performed using accuracy, precision, recall, specificity, F1-score, and the area under the curve (AUC). The best-performing models were used to determine key factors to predict under-five mortality.
Results: Among the models tested, random forest, XGboost and logistic regression outperformed others, achieving accuracy score of 0.93, 0.94 and 0.89 respectively. The most influential factors of under-five mortality were breastfeeding status and the number of children under five years in the household. Other influential variables were being one of a twin, the total number of children born to the mother, and access to clean drinking water.
Conclusion: The results show the potential of machine learning models to predict under-five mortality and identify key risk factors. Random forest, XGboost and logistic regression models the best performing models for predicting under-five mortality. Child breastfeeding and children five years and under in the household have the highest influence on under-five mortality. The results of this study show the need for targeted policy intervention, particularly promoting breastfeeding, expanding access to essential basic services and ensuring support for larger families with multiple children under the age of five in the household. The results provide policymakers with insights into designing strategies that will assist the country in achieving the Sustainable Development Goal 3. Future research
2
should be conducted to validate these models with more recent dataset to improve generalisability.