Modifying class distributions to improve the classification of minority group examples in a class-imbalanced dataset

Banele Mdluli; Terence Lesley Van Zyl

Back

Modifying class distributions to improve the classification of minority group examples in a class-imbalanced dataset

Journal article

Open access

Modifying class distributions to improve the classification of minority group examples in a class-imbalanced dataset

Banele Mdluli and Terence Lesley Van Zyl

2025

Handle:

https://hdl.handle.net/10210/518586

Abstract

Class imbalance

· Oversampling

ADASYN

Class-imbalanced datasets are a common occurrence in real-world applications. The imbalance between minority and majority classes exists due to the over-representation of one class compared to another in a dataset. The class imbalance might reflect a system's behaviour over time. However, the class imbalance causes sub-optimal performance for machine learning models that predict the system's future behaviour. Various techniques are used to reduce the negative impact of class-imbalanced datasets on machine learning models. Data resampling techniques are one of the main techniques, and the subdivisions of data re-sampling techniques include oversampling and undersampling. Oversam-pling techniques have outperformed undersampling techniques in most studies, and most data resampling techniques are derived from oversam-pling. However, some oversampling techniques are ineffective when used on minority-class datasets that lack within-class variation and have a high-class imbalance. In this study, an analysis was performed to understand the changes in within-class variation before and after over-sampling for nine datasets. Additionally, classification performance was measured for standard and hybrid oversampled datasets. A novel hybrid oversampling technique that uses k-Means and ADASYN was implemented. Hybrid oversampling techniques generated synthetic examples that marginally changed the within-class variation and had the highest F1 score compared to standard oversampling techniques across nine datasets.

Files and links (1)

pdf

Research (13)224.26 kBDownload View

Open Access

Metrics

1 Record Views

Details

Title: Modifying class distributions to improve the classification of minority group examples in a class-imbalanced dataset
Creators - without role: Banele Mdluli
Terence Lesley Van Zyl
Identifiers: 9959608907691
Academic Unit: University of Johannesburg
Language: English
Resource Type: Journal article