Abstract
Credit risk is a crucial component of daily financial services operations; it measures the likelihood that a borrower will default on a loan, incurring an economic loss. By analyzing historical data for assessment of the creditworthiness of a borrower, lenders can reduce credit risk. Assessing and controlling credit risk might be difficult in instances of low data quality. Credit lenders might grant a loan with a high default risk if it has an incomplete credit history about a customer's creditworthiness.
In the finance sector, machine learning has transformed credit risk assessment by delivering a data-driven method of assessing borrowers' creditworthiness. The regulatory environment, industry developments, economic conditions, and borrower creditworthiness are some factors that affect credit risk. If not adequately addressed, low-quality data could result in substantial challenges and disappointments, such as incorrect decisions, ineffective operations, problems with compliance, and harm to one's reputation.
Machine learning has substantial economic ramifications for financial organizations beyond its technical potential. These include improved risk management and regulatory compliance, accuracy and predictive capacity, cost savings and efficiency benefits, etc. Essentially, businesses may improve operational efficiency, minimize risks, obtain insightful information, and provide customers with better offerings when they can access high-quality data to empower artificial intelligence processes.
While adopting machine learning plays a significant role, data is vital at the core of the credit decision-making processes. Decision-making depends heavily on accurate, complete data, and failure to harness high-quality data would impact credit lenders when assessing the loan applicants' risk profiles. Fit for purpose is a standard translation of data quality and assesses how specific data suits the user's needs.
7
Incomplete judgments resulting from poor data quality can harm business success. Furthermore, financial institutions might forfeit revenue on marketing and pricing decisions if they leverage incomplete data to evaluate a product's performance. Therefore, creditors must follow strict regulatory criteria to prevent expensive fines and possible reputational damage due to failure to address poor data culprits.
This thesis considers the contributions of statistics and artificial intelligence that acknowledge the significant impact of poor data on credit risk modeling. The recent machine learning methods for dealing with incomplete data are also discussed to propose the integration of multiple classifiers learning systems ensembles as robust techniques for managing poor data.
The first contribution is an empirical comparison of the robustness of seven machine learning algorithms to credit risk, namely Support Vector Machines (SVM), Naïve base, Decision Trees (D.T.), Random Forest (R.F.), Gradient Boosting (G.B.), K-Nearest Neighbors (k-NN) and Logistic Regression (L.R.) and the experiment is carried out using the Lending Club credit data from Kaggle. This task uses seven performance measures, including the F1 Score (Recall, Accuracy, Precision), ROC-AUC, and H.L. and MCC metrics.
For the complete-case approach, the best-performing machine learning algorithms in terms of their robustness when dealing with missing values are the random forest (93.01%), followed by gradient boosting (92.67%), naïve Bayes (90.01%), logistic regression (88.43%), k-NN (87.34%), and SVM (86.84%), respectively. The worst performance is by decision trees, with an accuracy rate of 85.01%. The differences in performance of all the seven machine learning algorithms are significant at the 95% significance level.
The second contribution is the proposal of using ensemble learning to improve the credit risk classification accuracy of single classifiers. Experiments use the same state-of-the-art machine-learning algorithms, imputation techniques, and datasets. A total of 120 multiple classifiers learning systems (ensembles) are
8
generated. These combinations are then divided into six groups (MC2, MC3, MC4, MC5, MC6, MC7, where MC2 is an ensemble with two individual classifiers, MC3 is an ensemble with three individual classifiers, and so on).
For the complete case approach, MC3, MC4, and MC5 achieve the highest accuracy rates of 93.01%, followed by MC6 (92.92%), MC2 (92.84%), and MC7 (92.42%), respectively. For the imputation approaches, MC4 achieves the highest accuracy rate of 92.68% when mode imputation is used, while for the median imputation, it achieves an accuracy rate of 92.62%. For multiple imputation, the best-performing ensemble is MC6, with an accuracy rate of 92.62%. MC2 achieves the highest accuracy rate (92.56%) when k-NN imputation is used with MC4, achieving the lowest rate (83.64%).
The results suggest that most ensembles are least robust to incomplete data when k-NN imputation is used compared to the other imputation strategies. The findings further indicate that classification accuracy improves for ensembles with random forest and gradient-boosting classifiers as components. Random Forest and gradient-boosting classifiers are the best-performing algorithms with the lowest error rates. This is attributed to the fact that these two algorithms belong to the ensemble learning family. However, the robustness of ensembles when dealing with incomplete data is compromised when the ensemble's composition is a Decision tree and Naïve Bayes classifiers.
Though Decision trees and Naïve Bayes are used as components for an ensemble in conjunction with K-NN imputer, the overall performance of their ensemble model is least resistant to low-quality data. Despite this drawback, most ensembles achieve lower accuracy rates.
Furthermore, the results show that combining many algorithms is not more effective than using specific algorithms to deal with poor data. The MC7 ensemble, where all seven classifiers are combined, produced consistently good accuracy but not as high as when specific classifiers are selected as composites. This occurs
9
regardless of which imputation method is used, and the same holds for the complete approach.
The third contribution proposes the use of generative adversarial networks (GANs). GANs use an unsupervised learning approach. The same performance metrics and the seven machine learning algorithms and datasets are employed for model evaluation.
The results show that when GANs imputation is incorporated, the decision tree is the best-performing classifier with an accuracy rate of 93.01%, followed by random forest (92.92%), gradient boosting (92.33%), support vector machine (90.83%), logistic regression (90.76%) and Naïve Bayes (89.29%) respectively. The classifier is the worst-performing method with a k-NN (88.68%) accuracy rate. Subsequently, when GANs are optimized, the accuracy rate of the Naïve Bayes classifier improves significantly to (90%) accuracy rate. Additionally, the average error rate for these classifiers is over 9%, which implies that the estimates are not far from the actual values.
Additionally, when the 120-hybrid combinations are simulated using the GANs imputation approach, both the MC3 and MC4 ensembles achieve the highest accuracy rate of 93.09%, followed by MC2 (93.01%) and MC5 (93.01%) and MC7 (92.34%), respectively. Furthermore, it appears that any ensemble where decision trees and gradient boosting are used as components is more robust to missing data problems, as shown by the classification accuracy improvement compared to when GANs were not used. In summary, most individual classifiers are more robust to missing data when GANs are used as an imputation technique. The differences in performance of all the seven machine learning algorithms are significant at the 95% level.