Abstract
This study investigates the challenges of water quality (WQ) monitoring, specifically focusing on the detection and quantification of nitrogen (N) and phosphorus (P) levels. Traditional assessment methods, primarily reliant on manual sample collection and laboratory analysis, are resource-intensive and prone to errors, particularly in developing regions such as South Africa, where continuous monitoring is lacking. The degradation of water quality due to industrial discharge, agricultural runoff, and rapid urbanization necessitates alternative monitoring solutions.
This research explores the application of Machine Learning (ML) techniques, emphasizing strategic feature selection methods to enhance model performance. The findings underscore the significance of hydrological and physical parameters—specifically flow, conductivity, and temperature—over biochemical parameters, such as total chlorophyll, in accurately predicting nutrient levels. Robust ML models, particularly Random Forest and Extremely Randomized Trees, demonstrated superior predictive accuracy.
Comparative analysis of various feature selection methods revealed that a minimal set of three features—conductivity, flow, and temperature—can achieve over 90% accuracy in predicting total reactive phosphorus (TRP) and nitrate levels. Furthermore, the study highlights the potential for substantial cost savings in WQ monitoring by focusing on these critical predictors and utilizing affordable sensors for measurement.
This research advances the scientific understanding of optimizing ML models for WQ monitoring and presents practical, cost-effective solutions with significant implications for resource allocation and sustainable environmental management. By leveraging advanced feature selection methods, this study not only improves monitoring outcomes but also aligns with national priorities for clean and safe water, offering a framework for technology-driven environmental conservation efforts globally.