Software reliability prediction using ensemble learning on selected features in imbalanced and balanced datasets: A review

Suneel Kumar Rath; Madhusmita Sahu; Shom Prasad Das; Junali Jasmine Jena; Chitralekha Jena; Baseem Khan; Ahmed Ali; Pitshou Bokoro

doi:10.32604/csse.2024.057067

Back

Software reliability prediction using ensemble learning on selected features in imbalanced and balanced datasets: A review

Journal article

Open access

Peer reviewed

Software reliability prediction using ensemble learning on selected features in imbalanced and balanced datasets: A review

Suneel Kumar Rath, Madhusmita Sahu, Shom Prasad Das, Junali Jasmine Jena, Chitralekha Jena, Baseem Khan, Ahmed Ali and Pitshou Bokoro

Computer Systems Science and Engineering, Vol.48(6), pp.1513-1536

2024

DOI: https://doi.org/10.32604/csse.2024.057067

Handle:

https://hdl.handle.net/10210/512783

Abstract

Redundancy, correlation, feature irrelevance, and missing samples are just a few problems that make it difficult to analyze software defect data. Additionally, it might be challenging to maintain an even distribution of data relating tobothdefective andnon-defective software.The latter software class’sdata are predominatelypresent in thedataset in the majority of experimental situations. The objective of this review study is to demonstrate the effectiveness of combining ensemble learning and feature selection in improving the performance of defect classification. Besides the successful feature selection approach, a novel variant of the ensemble learning technique is analyzed to address the challenges of feature redundancy and data imbalance, providing robustness in the classification process. To overcome these problems and lessen their impact on the fault classification performance, authors carefully integrate effective feature selection with ensemble learning models. Forward selection demonstrates that a significant area under the receiver operating curve (ROC) can be attributed to only a small subset of features. The Greedy forward selection (GFS) technique outperformed Pearson’s correlationmethodwhen evaluating feature selection techniques on the datasets. Ensemble learners, such as random forests (RF) and the proposed average probability ensemble (APE), demonstrate greater resistance to the impact of weak features when compared to weighted support vector machines (W-SVMs) and extreme learning machines (ELM). Furthermore, in the case of the NASA and Java datasets, the enhanced average probability ensemble model, which incorporates the Greedy forward selection technique with the average probability ensemble model, achieved remarkably high accuracy for the area under the ROC. It approached a value of 1.0, indicating exceptional performance. This review emphasizes the importance of meticulously selecting attributes in a software dataset to accurately classify damaged components. In addition, the suggested ensemble learning model successfully addressed the aforementioned problems with software data and produced outstanding classification performance.

Files and links (1)

pdf

GetDocument (55)491.35 kBDownload View

Open Access

Metrics

3 Record Views

Details

Title: Software reliability prediction using ensemble learning on selected features in imbalanced and balanced datasets: A review
Creators - without role: Suneel Kumar Rath
Madhusmita Sahu
Shom Prasad Das
Junali Jasmine Jena
Chitralekha Jena - KIIT University
Baseem Khan - Universidad Internacional
Ahmed Ali - University of Johannesburg
Pitshou Bokoro
Publication Details: Computer Systems Science and Engineering, Vol.48(6), pp.1513-1536
Identifiers: 9948204907691
Publication Details: 0267-6192
Academic Unit: Department of Electrical Engineering Technology; Faculty of Engineering & the Built Environment; University of Johannesburg
Language: English
Resource Type: Journal article