Abstract
Despite significant advances in machine learning techniques for handling drifting concepts over the last decade, the problem of handling concept drift in dynamic, time-evolving and nonstationary environments remains largely unsolved. A plethora of contemporary machine learning techniques designed to handle concept drift have emerged in the literature. Recently, ensemble classifiers have emerged as potential candidates for handling drift due to their modularity. Ensemble classifiers have demonstrated their ability to perform better than some previously suggested computational intelligence techniques for handling concept drift as they can easily update existing models to deal with new concepts in the streaming data.
In non-stationary learning scenarios associated with concept drift, computational intelligence techniques have to process data that is not independent or identically distributed and where the learning model must not only generalize well but also adapt to changes in the underlying data distribution. To learn in non-stationary, dynamic and time-varying environments, ensemble learning machines have often been applied to classification problems and generally improve upon the predictive performance of individual experts that constitute the ensemble. Although ensembles of learning machines have widely been used to handle drifting concepts, the literature does not contain any deep study of why they can be helpful for and which of their features contributes significantly to the accurate adaptation to drift. Such a study is important because a better understanding of the behaviour of ensembles in the presence of concept drift allows better exploitation of their features for handling concept drift. Voting margins provide an alternative explanation for the behavior of ensemble learning machines; they have been prominently used in the interpretation of the AdaBoost algorithm and the literature suggests that large margins are beneficial. Building on the work of Minku L.L and Yao X (2010), I hypothesize that different kinds of drifts require different amounts of diversity in determining the rate at which the learning model can adapt. We argue that ensemble behavior can be sufficiently understood by quantifying diversity first as for two class problems, large margins are inversely proportional to low diversity. We explore the connection between diversity and margins to formulate the reasoning behind this intuition formally and then derive an adaptive algorithm, a variant of AdaBoost that explicitly manages and controls the amount of diversity required for each type of concept in order to test the hypothesis.
This study contributes to knowledge in several ways. Firstly, deriving an adaptive algorithm is derived, a variant of AdaBoost, capable of explicitly managing and controlling diversity. The algorithm does this by exploiting the diversity-margin connection to tune diversity according to a proxy parameter and identifying empirically time evolving scenarios where different levels of diversity may contribute to the adaptation of ensemble learning machines. It is possible to consistently encourage more or less diversity in online ensemble learning using AceBoost, derived from AdaBoost since it is controlled by a proxy parameter. In some learning scenarios associated with concept drift, ensembles characterised by less diversity lead to lower test errors. Highly diverse ensembles are associated with lower test errors if drift is occurs regardless of the type of drift although highly diverse ensembles play a pivotal role in addressing drifts of high severity. Diversity often minimises the initial increase in error from the drift but slowly recovers from drift in the long run. The performance improvement is largely attributed to the
v
feature of diversity and the literature is laden with strong support for the idea that diversity plays an important role in improving the predictive accuracy of ensemble classifiers.
An efficient and effective ensemble-learning machine designed for dynamic environments must handle all types of drift with minimal computational overheads. To reduce computational overheads, existing state of the art ensemble learning machines discard experts as soon as they reach a predefined fixed size or if a new concept has been observed. This makes them unsuitable to handle all types of drifts, especially recurring contexts since they discard previously learned knowledge. This study contributes to knowledge by providing a technique of learning recurring concepts by using previously learned knowledge to reduce computational complexity and execution overheads. Convergence to a new concept can be facilitated by exploiting highly diverse ensembles trained on the old concept. This can be achieved by learning the new concept with low diversity. Newly created low diversity ensembles tend to be more accurate when the drift is severe and occurs suddenly whereas low diversity ensembles trained on the old concept are considered to be more accurate soon after the drift starts.
It has often been argued that adaptation to drift depends on the type of ensemble. Some authors suggest that homogeneous ensembles adapt to concept drift better than online heterogeneous ensembles. In this study, we develop a heterogeneous online learning ensemble called Heterogeneous Dynamic Ensemble Selection based on Accuracy and Diversity (HDES-AD) and compare its predictive accuracy with existing homogeneous online learning ensembles for nonstationary time series data. The algorithm exhibits responsive adaptation, dealing appropriately with changing environments with limited CPU and memory usage to increase the reliability and predictive accuracy of the learning model. The superior predictive accuracy of HDES-AD is attributed to the heterogeneity of the ensemble. The algorithm improved the predictive accuracies in the presence of different types of drift such as gradual, sudden and recurring concepts.
The last contribution of this study is the presentation of a novel ensemble learning machine suitable for detecting and handling all types of drifts. The Adaptive Diversified Ensemble Selection algorithm (ADES) is accurate and robust to false alarms, handling all types of drifts and achieves good accuracy in stable periods. The advantage of the proposed approach is that it is based on the premises of ensemble selection and all models generated are stored in the library of memory, only models that are representative of the concept are selected and combined. The library is composed of heterogeneous models generated from different types of learning algorithms thereby enhancing diversity among the models.
Empirical results obtained using both real and synthetic data are provided and potentially give a valuable and promising insight into the problem of handling drifting concepts. The work has proved beyond any reasonable doubt that ensemble selection is a reliable tool in our efforts to address the problem of concept drift in streaming data.