Abstract
A network intrusion detection system (NIDS) performs a vital role in sustaining and optimizing the security of computer networks by identifying and responding to malicious activities, cyber threats, and risks. Nonetheless, the overall performance of these NIDS can be crucially affected by the class imbalance in the datasets used for training and the cost associated with misclassifications. This research aims to evaluate the impact of varying cost values in a cost matrix and the effect of different distribution spread values in the spread-subsample function on the performance metrics of NIDS utilizing the popular NSL-KDD dataset.
The primary research aims were to determine the optimal cost value that balances the trade-off between False Positive (FP) and False Negative (FN) in cost-sensitive learning and to identify the best distribution spread values in the spread-subsample function for addressing the class imbalance on the training set. These research aims were important because misclassifications in NIDS can cause undetected threats or false alarms, which have consequential security and operational implications.
The experimental results indicate that a uniform class distribution (M=1) optimally balances class representation on the training set, substantially reducing the dataset size from 125977 instances to 117260 instances using the 10-fold cross-validation while maintaining high-performance metrics. For cost-sensitive learning, the optimal cost of false negative (πΆπΉπ) was determined to be πΆπΉπ=5 π‘π 30 with an accuracy of 90.0 % and 79.41% in the 10-fold cross-validation and on a separate test set (KDDTest) for final evaluation, respectively. This effectively balances the trade-off between false positives and false negatives, optimizing the NIDS performance by reducing misclassification costs.
In terms of the existing body of knowledge, this research aids and extends the current understanding of cost-sensitive learning and class imbalance handling in NIDS. It provides empirical evidence that determining optimal cost values and distribution spread factors can significantly improve NIDS performance.