Measuring the effectiveness of outlier detection techniques in cluster analysis

Godfrey Kgoedi

Back

Measuring the effectiveness of outlier detection techniques in cluster analysis

Thesis

Open access

Measuring the effectiveness of outlier detection techniques in cluster analysis

Godfrey Kgoedi

Master of Science (MSc), University of Johannesburg

2021

Handle:

https://hdl.handle.net/10210/498280

Abstract

Outliers (Statistics)

Cluster analysis

Mathematical statistics

Outlier detection, the discovery of objects that deviate from normal behaviour, is crucial in many application domains. To detect them, a variety of algorithms have been proposed. These algorithms detect outliers using specific definitions of the concept of outliers, that is, their performance depends mainly on their context of application. In addition, their performance also depends on the evaluation measure used. Therefore it is crucial that relevant evaluation measures be used when evaluating the detection algorithms, since an algorithm can obtain a high detection score by correctly classifying the non-outliers alone. In this research, five outlier detection algorithms, namely k-Nearest Neighbour (k-NN), Local Outlier Factor (LOF), Connectivity-based Outlier Factor (COF), Influenced Outlierness (INFLO), and Angle-Based Outlier Detection (ABOD) are investigated to determine their accuracy in detecting outliers within data mining. These algorithms are applied to synthetic data with 10,000 simulations and a real-world (credit card transactions) dataset, where their performance is evaluated using the Receiver-Operating Characteristic (ROC) and Precision-Recall (PR) curves, along with their Area Under the Curves (AUC). Keywords: outlier detection, evaluation measures, cluster analysis, outliers, objects. M.Sc. (Mathematical Statistics)

Files and links (1)

pdf

Kgoedi_Godfrey Wtm.pdf1.46 MBDownload View

Open Access

Metrics

14 File views/ downloads

61 Record Views

Details

Title: Measuring the effectiveness of outlier detection techniques in cluster analysis
Creators - without role: Godfrey Kgoedi
Contributors - without role: E. Smit
Y. Shiferaw
Awarding Institution: University of Johannesburg; Master of Science (MSc)
Theses and Dissertations: Master of Science (MSc), University of Johannesburg
Identifiers: 9910125507691
Copyright: University of Johannesburg
Academic Unit: Department of Applied Mathematics
Resource Type: Thesis