Abstract
This thesis investigates the performance of Machine Learning (ML) techniques in predicting stock market trends and returns. Previous studies (e.g. Bonga-Bonga and Muteba Mwamba, 2011) have shown that stock market returns can be predicted by fundamental variables. However, one of the most important question in finance is whether fundamental variables can explain changes in stock market trends (ups and downs). To answer this question, this thesis simultaneously use regression techniques to predict stock market returns and classification techniques to predict stock market trends/directions, using the fundamental variables. Four fundamental variables are considered for this study namely; price earning (PE) ratio, dividend yield (DY), inflation rate as measured by the CPI, and interest rate spread1 (SPREAD). The study is carried out in three different stock markets namely; the US S&P500 stock index, the UK FTSE100 index, and the South Africa ALSI index. Monthly stock market and fundamental variable data are collected from February 1996 to August 2017. The empirical analysis is therefore done in two frameworks. In the first framework the thesis deals with the prediction of stock market trends using ML classification techniques whereas in the second framework the thesis deals with the prediction of stock market returns using ML regression techniques. The ML techniques used for classification and regression analysis includes Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbours (KNN). Each ML technique is thereafter compared to a specific benchmark model in order to carry out a model-to-model comparison. The first benchmark is the Linear Discriminant Analysis (LDA) model used in the classification framework, and the second is the ARIMA-X model used in regression framework. Furthermore, the thesis makes use of the F-score measure, the confusion matrix and the ROC Curve to evaluate the performance of models in the classification framework, and the predicted mean square error in the regression framework. Using the ROC curve, the confusion matrix and the F-Score in the out-sample space, our results show that the Random Forest technique predicts stock market trends better than any other classification technique. Using the predicted mean square error, the SVM is found to be predicting stock market returns better than any other ML regression technique. The thesis makes use of the variable importance analysis in order to identify fundamental variables that drive stock market trends. Our findings show that inflation rate plays an insignificant role in driving stock market trends/directions while the PE ratio and the dividend yield are the significant drivers of stock market trends. Overall, our ML techniques have been found to forecast both stock market trends and returns better than traditional models.
M.Com. (Financial Economics)