Abstract
Monitoring water quality is crucial for the sustainable management of freshwater resources and for meeting the United Nations Sustainable Development Goals. Urbanisation, agricultural practices, industrial activities, and population growth increase the presence of biological, chemical and physical properties in water bodies. Traditional monitoring methods (in-situ measurements and laboratory tests), while accurate, are spatially and temporally limited and costly. Satellite-based remote sensing offers a cost-effective, repeatable, and near-real-time alternative, especially when integrated with in-situ data and machine learning models. This study aimed to determine the sensitivity of Sentinel-2 Multispectral Instrument (MSI) bands to both optically and non-optically active water quality parameters (dissolved oxygen [DO], pH, Temperature, electrical conductivity [EC], chlorophyll-a, and suspended solids) in the Cradle of Humankind World Heritage Site (COHWHS), South Africa. Field data were collected from 40 sampling points during high- and low-flow conditions. Corresponding Sentinel-2 Level-1C images were atmospherically corrected, resampled to 10 m resolution, and values were extracted to each sampling point. Multiple Linear Regression (MLR), Random Forest (RF), and Partial Dependence Plots (PDPs) were used to assess the sensitivity of Sentinel-2 spectral bands (B1 - B8, B8A, B11 and B12) to water quality parameters. Two RF models (Model 1 [using only spectral bands] and Model 2 [spectral bands + indices]) were used to characterise the spatio-temporal patterns of optically and non-optically active water quality parameters within the COHWHS. Regarding the sensitivity analysis results, DO strongly influenced Sentinel-2 MSI bands (except the coastal band [B1]) in detecting non-optically active parameters, with a strong negative correlation under both flow conditions. Suspended solids were the dominant optically active parameter, showing strong positive correlations with visible (B2, B4), NIR (B8), and shortwave infrared - SWIR (B11, B12) bands in both flow conditions. The Cradlemoon Lake, which remained clearly distinguishable in Sentinel-2 imagery, was used as a case study to characterise and map seasonal and spatial variability. Results revealed that DO had the strongest predictive performance under low-flow conditions (Model 2: R2 = 0.88, RMSE = 1.37), while EC followed (Model 1: R² = 0.63, RMSE = 291.48). Suspended solids showed the highest accuracy for optically active parameters under high-flow conditions (Model 2: R2 = 0.55, RMSE = 118.19). Variations in the results were influenced by runoff dynamics and upstream pollution: lower Temperatures and suspended solids under low-flow conditions increased DO concentrations, whereas higher suspended solid concentrations under high-flow conditions likely reduced light penetration, resulting in lower spectral reflectance and chlorophyll-a levels. This study demonstrates the capability of Sentinel-2 MSI data, in combination with machine learning algorithms, to monitor both optically and non-optically active water quality parameters in dynamic freshwater systems. The findings contribute to the growing body of knowledge on remote sensing for water quality and support the development of scalable, data-driven monitoring solutions for inland water bodies in the Global South.