Abstract
Technological advances such as online payments, mobile financial technology (FinTech) software
systems and e-commerce have lead to a raise of the volume of daily credit card payments
that occurs online. Consequently, there has been a proliferation of fraudulent transactions that
affects credit card issuers, financial establishments, and users. In addition to credit card fraud
(CCF), financial institutions suffer from the issue of credit risk (CR) prediction. In both those
cases, the datasets used for modelling are highly skewed. There exists many rules based systems
that are used to tackle the issue of CCF and CR prediction. However, these systems are
prone to various breaches that result in massive losses for financial establishments. In this thesis,
machine learning (ML) frameworks are proposed to solve the issue CCF and CR prediction.
In this work, firstly, an ML-based CCF detection engine utilizing the genetic algorithm (GA)
for feature selection is proposed. This framework is coupled with ML methods such as Decision
Tree (DT), Artificial Neural Network (ANN),Naive Bayes (NB), Random Forest (RF),
Logistic Regression (LR). The results demonstrated that using an efficient approach to pick
the best attributes to model CCF has the potential to improve the results of ML models. In
the second part of the thesis, an ML-based CCF using the Synthetic Minority Over-sampling
Technique (SMOTE) and the Adaptive Boosting (AdaBoost) method is presented. The CCFSMOTE-
AdaBoost framework is evaluated using the following ML algorithms: Logistic Regression
(LR), Decision Tree (DT), Extra Tree (ET), Extreme Gradient Boosting (XGBoost),
Support Vector Machine (SVM) and Random Forest (RF). In addition to the real CCF dataset,
the proposed CCF-SMOTE-AdaBoost framework was trained and tested on a highly imbalanced
synthetic CCF dataset to further assess and evaluate the results that were obtained using
the real CCF dataset. The result of the experimental work demonstrated that using AdaBoost
in conjunction with SMOTE algorithm improved the performance of individual ML models. In
the third part of this thesis, a stacked classifier for CR prediction is proposed. This model is
built using the Gradient Boosting (GB), the Extreme Gradient Boosting, and the RF estimators.
The following CR datasets were considered: German, Australian and Taiwan datasets.
The experimental output attained by the stacked model were compared to those attained by the
following individual algorithms: ANN, RF, GB, XGB and K-Nearest Neighbor (KNN). The
performance analysis confirmed that using a stacked approach for CR prediction is superior to using individual classifiers.