Abstract
Complex non-linear prediction systems have become ubiquitous in numerous decision making and other socio-technical systems. In recent years, the increased adoption and use of these complex non-linear systems has been dominated by universal approximators such as neural networks and Gaussian Processes. These systems’ applications span a large number of critical domains, including transportation, drug design, law enforcement, financial services, energy planning, and pandemic forecasting. The aforementioned critical nature of the application domains necessitates the need to study the inference methods for training or calibration of these systems’ parameters. Further to this, inference methods coupled with estimators of the uncertainty around the system’s predictions and measures of the relative influence of its inputs aid in managing the very high societal risks associated with incorrect predictions. This thesis investigates probabilistic parameter inference methods that provide both the required uncertainty and relevance measures. We first introduce Metropolis Hastings (MH) and Hybrid Monte Carlo (HMC) methods for parameter inference in Bayesian Neural Networks (BNNs) with applications in credit risk modelling and South African wind energy resource planning. We further utilise a Separable Shadow Hamiltonian Hybrid Monte Carlo (S2HMC) method for the first time in the inference of BNN parameters. S2HMC addresses traditional MCMC methods’ discretisation constraints by using a perturbed Hamiltonian, which is conserved at a higher-order by the numerical integration scheme. Experimental results on wind energy and credit datasets find that S2HMC yields higher effective sample sizes than the competing Hybrid Monte Carlo (HMC). The predictive performance of S2HMC and HMC based BNNs is found to be similar. We thirdly perform hierarchical inference for BNN parameters by combining the S2HMC sampler with Gibbs sampling of hyperparameters for Automatic Relevance Determination (ARD). A generalisable ARD committee framework is introduced to synthesise various sampler’s ARD outputs into robust feature selections. Experimental results show that this ARD committee approach selects features of high predictive information value. Further, the results show that dimensionality reduction performed through this approach improves the sampling performance of samplers which suffer from random walk behaviour such as Metropolis-Hastings (MH). The thesis also addresses predictive distribution calibration pathologies of the existing product of Gaussian Process expert models. We introduce a solution to the predictive dominance of uninformed experts through expert combination via theWasserstein Barycenter and sparsity control through tempered softmax weightings. These proposals are empirically shown to outperform other product of experts (PoE) methods. The proposed PoE are also found to outperform BNNs on wind speed forecasting regression tasks. Finally, the thesis provides a Bayesian inference approach to change point determination in the spreading rates of the novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in South Africa. This approach is a first in literature, probabilistically principled method for quantifying the relative efficacy of the various South African government interventions to slow the spread of SARS-CoV-2.
Ph.D. (Electrical and Electronic Engineering)