Abstract
Air pollution is a global challenge which poses a major threat to environmental and population health. Epidemiological studies indicate that the health implications resulting from exposure to air pollution include asthma, cardiorespiratory illnesses and lung cancer. Furthermore, 6.5 million people die yearly owing to air pollution, with low-income communities and developing nations being affected the most. Nitrogen dioxide (NO2) is among the most abundant pollutants found in the atmosphere, resulting mainly from the combustion of fossil fuels such as wood, coal, oil and gas from activities such as power generation and biomass burning. Natural sources of NO2 include volcanoes, lightning strikes and biological decay. The north-eastern region of South Africa hosts the provinces of Mpumalanga, Gauteng, Limpopo and Free State; these are home to the most intensive industrial activities such as mining, agriculture, transportation and electricity generation. These provinces emit large quantities of pollution accounting for 89% of PM10, 90% of NOx and 99% of SO2. Moreover, household energy in the form of wood, coal, paraffin and gas for cooking and heating has been observed to be a major source of indoor air pollution. Greater use of energy fuel is evident in low-income households and rural areas of South Africa owing to proximity to, and the relative affordability of, sources like wood and coal. Hence, to mitigate the impacts of air pollution and to investigate the dynamics of NO2 pollution in South Africa, socio-environmental variables must be explored. Three objectives were set to achieve the aim of the study, which was to predict tropospheric NO2 column density using socio-environmental variables and statistical analysis at the scale of South African local municipalities. Objective 1 was to predict annual tropospheric NO2 column density across South Africa using socio-environmental variables and multiscale geographically weighted regression (MGWR). Objective 2 set out to utilise MGWR to estimate the seasonal variation of NO2 in relation to socio-environmental variables across South African local municipalities. Finally, Objective 3 was to investigate the efficacy of random forest regression to forecast NO2 column density using principal components (PCs) derived separately from social and environmental variables across local municipalities in South Africa. Addressing Objective 1, Chapter 2 of the study used environmental variables (n = 3) and social variables (n = 32) encompassing energy usage, demography, dwelling type and age distribution as explanatory variables, while NO2 derived from Sentinel-5P was used as the dependent variable. The correlation between the observed and predicted NO2 was high, as evidenced by the coefficient of determination (R2) of 0.92. Moreover, hotspots of NO2 pollution were observed in the north-eastern region of the country, with aerosol optical depth (AOD) having the greatest contribution to the spatial distribution of NO2 column density. Wood and electricity usage had the greatest influence on NO2 levels in the energy use category, while residential clusters and apartments were associated with lower NO2 levels across municipalities. It was therefore recommended that using more environmental variables could improve the accuracy of estimating NO2 levels. Based on the Objective 1 findings, additional seasonally averaged environmental variables (including fire count, precipitation, wind speed and relative humidity) were included to estimate the seasonal variation of the relationships between NO2 and socio-environmental variables in Objective 2 (i.e., Chapter 3). The results showed that the highest column density of NO2 was observed during the
iii
autumn and winter seasons in the north-eastern region of South Africa. The correlations between the observed and predicted seasonal NO2 were high, with overall R2 of 0.85, 0.92, 0.94 and 0.89 for summer, autumn, winter and spring, respectively. Similar to Objective 1, AOD showed a higher correlation for all seasons in estimating NO2 levels followed by precipitation. Coal for cooking and space heating purposes had the most impact on NO2 levels during the winter season. Objective 3 (i.e., Chapter 4) of the study investigated the efficacy of random forest regression to forecast NO2 column density using PCs derived from social and environmental variables in local municipalities of South Africa. The purpose of this component of the study was to explore a non-linear machine learning method with better (proven) previous performance to deal with input datasets that have high dimensionality (like the datasets used in Objectives 1 and 2). To achieve Objective 3 two models were developed, i.e., one with PCs (Model 1) and one with original environmental variables (Model 2) as predictors of NO2, using the random forest regression algorithm to identify variables with the most importance in estimating NO2. The results from the two models yielded satisfactory results, achieving an R2 of 0.69 and 0.82, respectively. PC2 for environmental variables, age and dwellings were the most significant in the estimation of NO2. From Model 2, fires, AOD, precipitation and wind speed contributed the most in predicting NO2. Random forest estimated NO2 using multiple decision trees based on predictor variables, thus producing an output that mostly explains NO2, which was useful in identifying the important variables that explain the distribution of NO2 and accounting for the issue of overfitting in the model. Therefore, this study is significant for understanding the spatiotemporal dynamics of NO2 in relation to the socio-environmental characteristics in South Africa. In turn, it will inform pollution reduction interventions customised by locality in the country, therefore allowing sufficient resources to be directed to respective municipalities for air quality mitigation initiatives.