Abstract
Healthcare fraud poses a significant global financial and operational challenge, undermining
the economic stability of healthcare systems and eroding trust in service delivery. Traditional
approaches to fraud detection, particularly those based on machine learning, struggle to adapt
to the complexity and dynamic nature of fraudulent activities. This limited scope dissertation
addresses these challenges by investigating the use of three deep learning (DL) models, namely,
artificial neural network (ANN), convolutional neural network (CNN), and long short-term
memory (LSTM) networks, alongside a baseline random forest (RF) classifier, for healthcare
insurance fraud detection. The study utilises a publicly available dataset (data.world), which
includes patient demographics, claim amounts, diagnostic codes, and procedure types. The
methodology involves comprehensive data preprocessing to address missing values and class
imbalance, followed by training the DL models and the RF classifier using standard architectures
optimised for the task. Evaluation metrics such as accuracy, precision, recall, and F1-score
were used to assess model performance. Results showed that LSTM achieved the best overall
performance with an accuracy of 0.94, precision of 0.78, recall of 0.5, and F1-score of 0.6, while
CNN excelled in accuracy, and ANN effectively reduced false negatives, a crucial metric in fraud
detection. The RF classifier served as a baseline model, providing a comparative benchmark
for evaluating the effectiveness of deep learning approaches. Furthermore, to enhance model
transparency and interpretability, local interpretable model-agnostic explanations (LIME) was
incorporated. This study underscores the potential of integrating DL models with explainable
artificial intelligence (XAI) techniques to improve decision-making and address the complex
challenges of healthcare insurance fraud detection while highlighting the comparative advantage
of deep learning over traditional machine learning approaches.