Abstract
In the current fourth industrial revolution (4IR), governments and businesses
increasingly leverage large datasets and artificial intelligence (AI) to glean insights
for competitive advantage. One crucial application is to forecast the future for
effective decision-making. This thesis studies AI forecasting techniques in two
practical fields: public health and the electricity industry. In recent years, there
has been an increase in studies on time series forecasting for the future occurrence
of disease incidents and heterogeneous ensemble classification in the disruption
of electric power systems. Deep learning, hybridisation, and feature enrichment
techniques are helpful in modelling long-term temporal relationships. However,
these techniques are rigorously painstaking, prone to errors, and require human
expertise. Thus, implementing these techniques in epidemiological forecasting and
supply interruption problems in electric power systems is difficult. Hence, careful
consideration is required to achieve better performance.
We first present bFilter+GRU-Seq2Seq for time series forecasting of COVID-
19 case counts in South Africa. The approach for time series forecasting using
feature-enriched filters and an evolutionary neural architecture search (NAS)
with sequence-to-sequence gated recurrent units (GRU-Seq2Seq). bFilter+GRUSeq2Seq
is applied to the prediction of daily cases of coronavirus disease in South
Africa. The highly pathogenic coronavirus pandemic incident data was modelled
with filters, optimised hyperparameter search trials, and an evolutional neural
algorithm. The proposed model was compared with autoregressive integrated
moving average (ARIMA) and seasonal auto-regressive integrated moving averages
(SARIMA). The model predicted trends for 30, 60, and 90-day horizons and
evaluated for 7, 14, and 31 days. Simulation results demonstrate that observed
daily case counts with added filters and evolutionary search optimisation for forecasting
improve performance accuracy. Generally, the proposed bFilter+GRUSeq2Seq
outperformed GRU-RNN, ARIMA, and SARIMA with a coefficient of
determination score of 0.748 for a forecast horizon of 30 days.
The second model addresses the onset of monkeypox virus (MPXV) using
surveillance data to track the outbreak surge. MPXV surveillance data received
iii
considerable attention after multiple European countries recorded cases. The data
obtained from May 9, 2022, to August 10, 2022, were used to model cumulative
case trajectories of MPXV in five countries. Our study employed ARIMA, neural
networks autoregression (NNETAR), exponential smoothing (ETS), and seasonal
na¨ıve regression (SNA¨IVE) for training and evaluation. A statistical technique
called the Box-Cox transformation stabilises variance and improves a dataset’s
conformity to a normal distribution. It is beneficial when dealing with data that
defies the assumptions of normality or homoscedasticity and is frequently used
in statistical procedures like linear regression. The Box-Cox transformation was
necessary for the preprocessing step, and we experimented with linear and nonlinear
models and modelled the top five countries during the impulsive rise in
cases of MPXV. We provide the results that were evaluated based on three metrics:
root mean squared error (RMSE), mean absolute error (MAE), and mean
absolute percentage error (MAPE) for the countries investigated. It can be observed
that NNETAR performed more accurately in the five countries with the
Box-Cox transformation for the four assessment parameters for NNETAR, ETS,
ARIMA, and SNAIVE models. On the other hand, ARIMA and ETS perform
better for one country each. RMSE for France (ARIMA = 92.40), Germany
(ETS = 50.40), Spain (NNETAR = 220.00), the United Kingdom (NNETAR =
32.00), and the United States (NNETAR = 327.00), ARIMA had the lower RMSE.
MAE for France (ARIMA = 76.10), Germany (ETS = 41.60), Spain (NNETAR
= 172.00), the United Kingdom (NNETAR = 33.40), and the USA (NNETAR
= 279.00). MAPE for the United States (NNETAR = 3.68), France (ARIMA =
3.25), Germany (ETS = 1.44), Spain (NNETAR = 3.45), and the United Kingdom
(NNETAR = 1.12). In the five nations lacking Box-Cox transformation, ARIMA
outperformed NNETAR in two, while ARIMA outperformed NNETAR in three.
Without the Box-Cox transformation, ARIMA outperformed in three countries
and NNETAR in two countries. RMSE for France (ARIMA = 163.00), Germany
(ARIMA = 48.4), Spain (NNETAR = 240.00), the United Kingdom (NNETAR
= 46.10), and the United States (ARIMA = 280.00). MAE for France (ARIMA
= 156.00), Germany (ARIMA = 45.30), Spain (NNETAR = 193.00), the United
Kingdom (NNETAR = 39.50), and the United States (ARIMA = 242.00). MAPE
for France (ARIMA = 6.72), Germany (ARIMA = 1.55), Spain (NNETAR =
iv
3.88), the United Kingdom (NNETAR = 1.36), and the United States (ARIMA =
3.28). Our findings show that careful model selection and Box-Cox transformation
significantly improve incidence case predictions, even in sparse surveillance data.
Our research offers early detection and might contribute to a better understanding
of predictions for MPXV cases using combinations of linear and nonlinear models.
The third model is a heterogeneous ensemble technique for classifying manual
load reduction based on the contributing features of electricity generation sources
and demand. Classical random forests (RF), sparse partial least squares (SPLS),
and averaged neural networks (AVNNET) machine learning techniques were used
as benchmarks. Three ensemble approaches were explored: the average ensemble,
the majority-voting ensemble, and the weighted average ensemble. Our results
showed that the weighted average technique outperformed all other techniques investigated.
The weighted average results for precision (65.40%), F1 score (78.52%),
balanced accuracy (97.15%), Kappa (76.53%), and confusion matrix outperformed
averaging ensemble and majority voting. The recall (98.24%) is the lone exception;
the majority vote (98.81%) performed more effectively. It correctly classified 89.3%
and 6.9% for Eskom’s no-load reduction (normal) and load reduction (anomaly),
respectively. The type-I and type-II errors accounted for 3.7% and 0.1% miss
classification, respectively.
In the fourth model, load shedding is vital for managing electric power shortages
and avoiding grid collapse. However, it poses an imminent threat to the overall
stability of power grid system (PGS) and its ability to run safely and reliably. Load
shedding strategies can be complicated and inadequate to deal with efficiently.
The study proposed a data-driven load shedding time series classification (TSC)
technique employing a hybrid ensemble super learner (eSL) to categorise load
shedding based on contributing features. The model investigated challenges with
binary classification while using a multidimensional time series for South Africa’s
hourly load shedding stages in MW collected from PGS data. Considering that
load shedding is planned and predicted based on contributing features, we use
these features as strong indicators to classify expected outcomes for load shedding
or no load shedding. Validation tests for the suggested technique included the
precision recall curve, the confusion matrix, the class likelihood ratio, the Brier
skill scores and critical difference factor (CDF). Logistic regression produced the
v
highest CDF average score, while support vector classifier (SVC) had the highest
balanced precision (90.694%). The recursive feature elimination (RFE) model exhibited
the most significant true negative and true positive counts, at 50.59% and
40.84%, respectively, and the highest proportion of valid classifications.
Keywords— time series classification, Time series Forecasting, Boosting,
Filters, Epidemiological, Electric Load Interruption