Hybrid and ensemble artificial intelligence-based time series techniques with applications in power and health sectors

Solomon Oluwole Akinola

In the current fourth industrial revolution (4IR), governments and businesses increasingly leverage large datasets and artificial intelligence (AI) to glean insights for competitive advantage. One crucial application is to forecast the future for effective decision-making. This thesis studies AI forecasting techniques in two practical fields: public health and the electricity industry. In recent years, there has been an increase in studies on time series forecasting for the future occurrence of disease incidents and heterogeneous ensemble classification in the disruption of electric power systems. Deep learning, hybridisation, and feature enrichment techniques are helpful in modelling long-term temporal relationships. However, these techniques are rigorously painstaking, prone to errors, and require human expertise. Thus, implementing these techniques in epidemiological forecasting and supply interruption problems in electric power systems is difficult. Hence, careful consideration is required to achieve better performance. We first present bFilter+GRU-Seq2Seq for time series forecasting of COVID- 19 case counts in South Africa. The approach for time series forecasting using feature-enriched filters and an evolutionary neural architecture search (NAS) with sequence-to-sequence gated recurrent units (GRU-Seq2Seq). bFilter+GRUSeq2Seq is applied to the prediction of daily cases of coronavirus disease in South Africa. The highly pathogenic coronavirus pandemic incident data was modelled with filters, optimised hyperparameter search trials, and an evolutional neural algorithm. The proposed model was compared with autoregressive integrated moving average (ARIMA) and seasonal auto-regressive integrated moving averages (SARIMA). The model predicted trends for 30, 60, and 90-day horizons and evaluated for 7, 14, and 31 days. Simulation results demonstrate that observed daily case counts with added filters and evolutionary search optimisation for forecasting improve performance accuracy. Generally, the proposed bFilter+GRUSeq2Seq outperformed GRU-RNN, ARIMA, and SARIMA with a coefficient of determination score of 0.748 for a forecast horizon of 30 days. The second model addresses the onset of monkeypox virus (MPXV) using surveillance data to track the outbreak surge. MPXV surveillance data received iii considerable attention after multiple European countries recorded cases. The data obtained from May 9, 2022, to August 10, 2022, were used to model cumulative case trajectories of MPXV in five countries. Our study employed ARIMA, neural networks autoregression (NNETAR), exponential smoothing (ETS), and seasonal na¨ıve regression (SNA¨IVE) for training and evaluation. A statistical technique called the Box-Cox transformation stabilises variance and improves a dataset’s conformity to a normal distribution. It is beneficial when dealing with data that defies the assumptions of normality or homoscedasticity and is frequently used in statistical procedures like linear regression. The Box-Cox transformation was necessary for the preprocessing step, and we experimented with linear and nonlinear models and modelled the top five countries during the impulsive rise in cases of MPXV. We provide the results that were evaluated based on three metrics: root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) for the countries investigated. It can be observed that NNETAR performed more accurately in the five countries with the Box-Cox transformation for the four assessment parameters for NNETAR, ETS, ARIMA, and SNAIVE models. On the other hand, ARIMA and ETS perform better for one country each. RMSE for France (ARIMA = 92.40), Germany (ETS = 50.40), Spain (NNETAR = 220.00), the United Kingdom (NNETAR = 32.00), and the United States (NNETAR = 327.00), ARIMA had the lower RMSE. MAE for France (ARIMA = 76.10), Germany (ETS = 41.60), Spain (NNETAR = 172.00), the United Kingdom (NNETAR = 33.40), and the USA (NNETAR = 279.00). MAPE for the United States (NNETAR = 3.68), France (ARIMA = 3.25), Germany (ETS = 1.44), Spain (NNETAR = 3.45), and the United Kingdom (NNETAR = 1.12). In the five nations lacking Box-Cox transformation, ARIMA outperformed NNETAR in two, while ARIMA outperformed NNETAR in three. Without the Box-Cox transformation, ARIMA outperformed in three countries and NNETAR in two countries. RMSE for France (ARIMA = 163.00), Germany (ARIMA = 48.4), Spain (NNETAR = 240.00), the United Kingdom (NNETAR = 46.10), and the United States (ARIMA = 280.00). MAE for France (ARIMA = 156.00), Germany (ARIMA = 45.30), Spain (NNETAR = 193.00), the United Kingdom (NNETAR = 39.50), and the United States (ARIMA = 242.00). MAPE for France (ARIMA = 6.72), Germany (ARIMA = 1.55), Spain (NNETAR = iv 3.88), the United Kingdom (NNETAR = 1.36), and the United States (ARIMA = 3.28). Our findings show that careful model selection and Box-Cox transformation significantly improve incidence case predictions, even in sparse surveillance data. Our research offers early detection and might contribute to a better understanding of predictions for MPXV cases using combinations of linear and nonlinear models. The third model is a heterogeneous ensemble technique for classifying manual load reduction based on the contributing features of electricity generation sources and demand. Classical random forests (RF), sparse partial least squares (SPLS), and averaged neural networks (AVNNET) machine learning techniques were used as benchmarks. Three ensemble approaches were explored: the average ensemble, the majority-voting ensemble, and the weighted average ensemble. Our results showed that the weighted average technique outperformed all other techniques investigated. The weighted average results for precision (65.40%), F1 score (78.52%), balanced accuracy (97.15%), Kappa (76.53%), and confusion matrix outperformed averaging ensemble and majority voting. The recall (98.24%) is the lone exception; the majority vote (98.81%) performed more effectively. It correctly classified 89.3% and 6.9% for Eskom’s no-load reduction (normal) and load reduction (anomaly), respectively. The type-I and type-II errors accounted for 3.7% and 0.1% miss classification, respectively. In the fourth model, load shedding is vital for managing electric power shortages and avoiding grid collapse. However, it poses an imminent threat to the overall stability of power grid system (PGS) and its ability to run safely and reliably. Load shedding strategies can be complicated and inadequate to deal with efficiently. The study proposed a data-driven load shedding time series classification (TSC) technique employing a hybrid ensemble super learner (eSL) to categorise load shedding based on contributing features. The model investigated challenges with binary classification while using a multidimensional time series for South Africa’s hourly load shedding stages in MW collected from PGS data. Considering that load shedding is planned and predicted based on contributing features, we use these features as strong indicators to classify expected outcomes for load shedding or no load shedding. Validation tests for the suggested technique included the precision recall curve, the confusion matrix, the class likelihood ratio, the Brier skill scores and critical difference factor (CDF). Logistic regression produced the v highest CDF average score, while support vector classifier (SVC) had the highest balanced precision (90.694%). The recursive feature elimination (RFE) model exhibited the most significant true negative and true positive counts, at 50.59% and 40.84%, respectively, and the highest proportion of valid classifications. Keywords— time series classification, Time series Forecasting, Boosting, Filters, Epidemiological, Electric Load Interruption

Hybrid and ensemble artificial intelligence-based time series techniques with applications in power and health sectors

Abstract

Files and links (1)

Metrics

Details