Abstract
The achievable performances of intrusion detection systems are unknown beforehand. Currently, intrusion detection researchers implement these systems before they can determine what the performances of their systems will be or compare the performance of their systems to existing systems in order to evaluate the performances of their systems . Another challenge of network researchers is the unavailability of real world traffic traces of network activities due to privacy and legal restrictions. This Thesis contributes to the literature by 1. presenting the achievable performances of the existing anomaly and learning based network intrusion detection systems (NIDSs) in detecting the Transmission Control Protocol (TCP) synchronised (SYN) flooding attacks. Two anomaly based algorithms, adaptive threshold and cumulative sum based algorithms were considered in building the anomaly based NIDSs. The logic OR operator was used to combine the outcomes of the two anomaly based algorithms to enhance their performance. The three algorithms were used to detect TCP SYN flooding attacks that were synthetically generated according to a Poisson process and constant interarrival times. The logic OR operator performed better than the two algorithms. The three algorithms detected the Poisson process attacks better than the constant interarrival times attacks. For the learning based NIDSs, the decision tree and a novel fuzzy logic based NIDSs were used to detect Neptune, which is a type of a TCP SYN flooding attack. The decision tree outperformed the fuzzy logic system. 2. providing the achievable upper bounds on the accuracies of two ensembles of classifiers based NIDSs. The first NIDS is an AdaBoost based ensemble that uses decision stamp as a base learner. The second NIDS is a Bagging based ensemble that uses a decision tree as a base learner. The obtained bounds will enable researchers to estimate the performance of their ensemble based NIDSs before they implement them and determine how well their ensemble based NIDSs are performing relative to these bounds. From the empirical studies, it was deduced that if the dataset entropy with respect to the features falls between 0.9578 to 0.9586 and the average information gain amongst the features used in the ensemble falls between 0.045615 and 0.25615 then the accuracy of the first NIDS will be at most 0.9065 and the accuracy of the second NIDS will be at best 0.9193. These obtained ensemble accuracy upper bounds hold irrespective of the attack or dataset provided that the features used in the ensemble (AdaBoosted decision stump ensemble or Bagged decision tree ensemble) have the same characteristics as the features used in this Thesis and the features are discretised in the same way as in this work...
D.Phil.