Abstract
Outlier detection, the discovery of objects that deviate from normal behaviour, is crucial in many application domains. To detect them, a variety of algorithms have been proposed. These algorithms detect outliers using specific definitions of the concept of outliers, that is, their performance depends mainly on their context of application. In addition, their performance also depends on the evaluation measure used. Therefore it is crucial that relevant evaluation measures be used when evaluating the detection algorithms, since an algorithm can obtain a high detection score by correctly classifying the non-outliers alone. In this research, five outlier detection algorithms, namely k-Nearest Neighbour (k-NN), Local Outlier Factor (LOF), Connectivity-based Outlier Factor (COF), Influenced Outlierness (INFLO), and Angle-Based Outlier Detection (ABOD) are investigated to determine their accuracy in detecting outliers within data mining. These algorithms are applied to synthetic data with 10,000 simulations and a real-world (credit card transactions) dataset, where their performance is evaluated using the Receiver-Operating Characteristic (ROC) and Precision-Recall (PR) curves, along with their Area Under the Curves (AUC). Keywords: outlier detection, evaluation measures, cluster analysis, outliers, objects.
M.Sc. (Mathematical Statistics)