Improved Machine Learning methods for enhanced credit card fraud detection

Emmanuel Ileberi

Technological advances such as online payments, mobile financial technology (FinTech) software systems and e-commerce have lead to a raise of the volume of daily credit card payments that occurs online. Consequently, there has been a proliferation of fraudulent transactions that affects credit card issuers, financial establishments, and users. In addition to credit card fraud (CCF), financial institutions suffer from the issue of credit risk (CR) prediction. In both those cases, the datasets used for modelling are highly skewed. There exists many rules based systems that are used to tackle the issue of CCF and CR prediction. However, these systems are prone to various breaches that result in massive losses for financial establishments. In this thesis, machine learning (ML) frameworks are proposed to solve the issue CCF and CR prediction. In this work, firstly, an ML-based CCF detection engine utilizing the genetic algorithm (GA) for feature selection is proposed. This framework is coupled with ML methods such as Decision Tree (DT), Artificial Neural Network (ANN),Naive Bayes (NB), Random Forest (RF), Logistic Regression (LR). The results demonstrated that using an efficient approach to pick the best attributes to model CCF has the potential to improve the results of ML models. In the second part of the thesis, an ML-based CCF using the Synthetic Minority Over-sampling Technique (SMOTE) and the Adaptive Boosting (AdaBoost) method is presented. The CCFSMOTE- AdaBoost framework is evaluated using the following ML algorithms: Logistic Regression (LR), Decision Tree (DT), Extra Tree (ET), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM) and Random Forest (RF). In addition to the real CCF dataset, the proposed CCF-SMOTE-AdaBoost framework was trained and tested on a highly imbalanced synthetic CCF dataset to further assess and evaluate the results that were obtained using the real CCF dataset. The result of the experimental work demonstrated that using AdaBoost in conjunction with SMOTE algorithm improved the performance of individual ML models. In the third part of this thesis, a stacked classifier for CR prediction is proposed. This model is built using the Gradient Boosting (GB), the Extreme Gradient Boosting, and the RF estimators. The following CR datasets were considered: German, Australian and Taiwan datasets. The experimental output attained by the stacked model were compared to those attained by the following individual algorithms: ANN, RF, GB, XGB and K-Nearest Neighbor (KNN). The performance analysis confirmed that using a stacked approach for CR prediction is superior to using individual classifiers.

Improved Machine Learning methods for enhanced credit card fraud detection

Abstract

Files and links (1)

Metrics

Details