Abstract
M.Ing. (Electrical Engineering)
The emergence of Big Data and machine learning (ML) has paved the way for numerous scientific advancements. A challenge which has hindered the progress and application of machine learning algorithms for certain classification tasks is the class imbalance problem. Imbalanced classification is a situation where there is a skewed distribution of the target variables. The class imbalance problem exists in several domains, including medical diagnosis, credit risk prediction, fraud detection, and other areas in which negatively labelled samples considerably exceeds the positively labelled samples. Using imbalanced data to train ML models often results in poor performance. Several research works have proposed diverse methods to mitigate the class imbalance problem, including data sampling, ensemble learning, and feature learning. However, in this research, the focus is on effective feature learning. This dissertation presents two ML methods that are implemented to enhance the performance of diverse classifiers using publicly available imbalanced datasets. Firstly, a thorough literature review is conducted on various ML algorithms developed to solve the class imbalance problem. Secondly, a method was developed to improve the classification performance of some classifiers using stacked sparse autoencoder, with application to credit risk prediction. Thirdly, a method was introduced for medical diagnosis using an enhanced sparse autoencoder and softmax regression. The methods implemented in this research outperformed most machine learning algorithms and scholarly works. Furthermore, this research work demonstrates the effect of effective feature learning on the performance of classifiers and the importance of training these classifiers with relevant data.