Abstract
Improving water quality is essential for better public health, productivity, and economic prosperity. However, many important water bodies that provide water for domestic, agricultural, and industrial purposes are showing unacceptable levels of contamination. Addressing this issue requires the use of innovative water management technologies coupled with adequate monitoring systems. These systems must be robust, low cost, easy to maintain, and should operate in real-time. On the other hand, in spite of the recent technological improvements in monitoring sensors, the high-frequency monitoring of certain water quality variables is economically impractical. As a viable alternative, data-driven virtual sensors provide a reliable estimation of these variables by using those that are commonly measured in situ as surrogates. Theoretically, despite extensive research on the feasibility of virtual sensing for water quality monitoring, shortcomings remain based on the review of current studies. For instance, (i) data quality and quantity issues are not handled appropriately, (ii) there is an inconsistency in how virtual sensors are currently developed, and (iii) the predictive performances are not state-of-the-art. Therefore, while we advocate for the broader adoption of virtual sensing for water quality monitoring, these drawbacks may impede its uptake for operational purposes. Thus, this thesis addressed these limitations by (i) formalizing the virtual sensing concept through the development of specification books, (ii) developing qualitative cost models for water quality monitoring systems, (iii) assessing the impact of various data scaling and missing value imputation methods, and (iv) assessing the efficacy of data augmentation and hyperparameter optimization on the prediction performance of nutrient concentrations. To test and validate the effectiveness of our predictive models, we used publicly available water quality data from two catchments with contrasting land uses. The predictive performances, in terms of the coefficient of determination, ranged between 86% (in the urban catchment) and 97% (in the rural catchment). Importantly, monitoring in the urban catchment requires about six surrogate sensors to achieve the 86%, while the rural catchment requires only three to achieve 95%. Notably, although the 86% accuracy is superior to the current benchmark, the monitoring cost will still be relatively high since the resulting surrogate sensors will take part in the future or practical operation of the virtual sensor.