Abstract
Quantitative portfolio management depends heavily on mathematical and statistical
analysis to understand and investigate various patterns or phenomena within the financial
markets. In today’s world, quantitative portfolio managers also need to have a
thorough understanding of how to use different technologies and their applications. This
is mainly driven by the reality that modern quantitative portfolio managers will need to
apply their skills at scale given the increase in data production and computing power.
Given this reality, the central inquiry of this thesis is: How can we overcome various
challenges in financial data, such as utilizing alternative data to enhance forecasting
accuracy or dealing with insufficient data, through machine learning techniques? To
answer this question, we present innovative viewpoints and viable solutions to a range of
financial data challenges that a quantitative portfolio manager may encounter in practice.
This thesis presents four contributions tackling specific problems related to synthetic
financial data generation and using alternative data to enhance stock price prediction.
The first contribution of this work presents a first in literature use of FinBERT for generating
sentiment from high-frequency news headlines to predict stock prices. Specifically,
it is found that using FinBERT improves stock price predictions compared to using the
original generic language model, BERT, or no sentiment score at all. The second contribution
of this thesis addresses the issue of biased recommendations or predictions from
sell-side analysts, specifically in the context of directional stock prediction. This issue is
addressed by introducing a feature engineering process that generates rolling feature sets
from sell-side analyst reports to measure accuracy performance. These rolling features,
combined with other market regime features, are then used to train machine learning
models for unbiased and improved bidirectional stock price prediction. The third conii
tribution proposes a first in literature use of rough path theory to compress limit order
book (LOB) data and generate input features known as signatures, for machine learning
models to enhance stock price prediction. In particular, it was found that using rough
path theory improves prediction performance and model efficiency of specific models
compared to traditional methods like autoregression or utilizing raw LOB data. The
last contribution addresses the problem of limited implied volatility data that is crucial
for options pricing, but often challenging to obtain. The thesis proposes using generative
adversarial networks with novel static arbitrage loss conditions to generate synthetic
implied volatility data faithful to the distribution of the original dataset.
Keywords: Sentiment, FinBERT, Sell-Side Analyst, Forecast Accuracy, Rough Paths,
Signatures, Equity Volatility Surface, GAN, Static Arbitrage, Machine Learning.
Supervisor : Dr Peter Olukanmi
Co-supervisor : Assoc. Prof. Rendani Mbuvha
Co-supervisor : Dr Wilson Tsakane Mongwe
Co-supervisor : Prof. Tshilidzi Marwala
School : Electrical and Electronic Engineering
iii
“If we look deep enough within ourselves in the pursuit of finding our purpose,
we find greatness. Not just ordinary greatness but the greatness uniquely
visible even in those around us.”
– Unknow