Machine learning methods for financial data challenges in quatitative finance

Thendo  Sidogi

Quantitative portfolio management depends heavily on mathematical and statistical analysis to understand and investigate various patterns or phenomena within the financial markets. In today’s world, quantitative portfolio managers also need to have a thorough understanding of how to use different technologies and their applications. This is mainly driven by the reality that modern quantitative portfolio managers will need to apply their skills at scale given the increase in data production and computing power. Given this reality, the central inquiry of this thesis is: How can we overcome various challenges in financial data, such as utilizing alternative data to enhance forecasting accuracy or dealing with insufficient data, through machine learning techniques? To answer this question, we present innovative viewpoints and viable solutions to a range of financial data challenges that a quantitative portfolio manager may encounter in practice. This thesis presents four contributions tackling specific problems related to synthetic financial data generation and using alternative data to enhance stock price prediction. The first contribution of this work presents a first in literature use of FinBERT for generating sentiment from high-frequency news headlines to predict stock prices. Specifically, it is found that using FinBERT improves stock price predictions compared to using the original generic language model, BERT, or no sentiment score at all. The second contribution of this thesis addresses the issue of biased recommendations or predictions from sell-side analysts, specifically in the context of directional stock prediction. This issue is addressed by introducing a feature engineering process that generates rolling feature sets from sell-side analyst reports to measure accuracy performance. These rolling features, combined with other market regime features, are then used to train machine learning models for unbiased and improved bidirectional stock price prediction. The third conii tribution proposes a first in literature use of rough path theory to compress limit order book (LOB) data and generate input features known as signatures, for machine learning models to enhance stock price prediction. In particular, it was found that using rough path theory improves prediction performance and model efficiency of specific models compared to traditional methods like autoregression or utilizing raw LOB data. The last contribution addresses the problem of limited implied volatility data that is crucial for options pricing, but often challenging to obtain. The thesis proposes using generative adversarial networks with novel static arbitrage loss conditions to generate synthetic implied volatility data faithful to the distribution of the original dataset. Keywords: Sentiment, FinBERT, Sell-Side Analyst, Forecast Accuracy, Rough Paths, Signatures, Equity Volatility Surface, GAN, Static Arbitrage, Machine Learning. Supervisor : Dr Peter Olukanmi Co-supervisor : Assoc. Prof. Rendani Mbuvha Co-supervisor : Dr Wilson Tsakane Mongwe Co-supervisor : Prof. Tshilidzi Marwala School : Electrical and Electronic Engineering iii “If we look deep enough within ourselves in the pursuit of finding our purpose, we find greatness. Not just ordinary greatness but the greatness uniquely visible even in those around us.” – Unknow

Machine learning methods for financial data challenges in quatitative finance

Abstract

Files and links (1)

Metrics

Details