Abstract
Despite increasing regulatory scrutiny, greenwashing, where companies falsely present themselves as environmentally friendly, remains an obstacle to authentic corporate environmental accountability. Conventional approaches to identifying greenwashing are constrained by the complexity and scale of disclosure data, leaving critical gaps in both practice and academic literature. This thesis investigates the application of artificial intelligence (AI), specifically large language models (LLMs), to mitigate greenwashing in corporate sustainability reporting disclosures—a persistent issue that undermines the transparency and accountability of sustainability reporting.
The research begins with a comprehensive literature review that examines the roles of sustainability, sustainability reporting, and greenwashing through the lenses of stakeholder and signalling theories. It highlights the rise of artificial intelligence (AI) technologies, particularly natural language processing (NLP) and transformer-based LLMs, and their potential to enhance the accuracy and efficiency of identifying greenwashing and green claims in large volumes of text data.
A mixed-methods approach was adopted to address the interdisciplinary nature of this study, integrating concepts from sustainability reporting, AI, and corporate reporting. An interdisciplinary research framework was developed, leading to the creation of a conceptual model for applying LLMs to the detection of green claims within sustainability reporting disclosures.
The empirical component involved the development and evaluation of "EmissionsBert," a BERT-based LLM further pre-trained on subdomain-specific emissions data and fine-tuned for binary text classification tasks. EmissionsBert demonstrates superior performance in identifying company emissions claims (green claims) compared to existing models (ClimateBert and DistilRoBERTa) across various evaluation metrics, confirming the efficacy of further pre-training on subdomain-specific data, and fine-tuning on subdomain-specific data in enhancing model accuracy, precision, and F1 scores.
Further, the practical application of EmissionsBert to real-world sustainability reporting disclosure data from JSE-listed companies demonstrated its effectiveness in classifying emissions-related green claims. The results underscored its potential utility in automating the detection of green claims, thereby contributing to the broader understanding of how AI can be leveraged to promote transparency and accountability in corporate sustainability reporting.
This thesis advances the discourse on the use of AI in sustainability contexts and lays the foundation for future research into the development of subdomain-specific LLMs for various subdomains of sustainability reporting disclosures.
Keywords: Greenwashing, Sustainability reporting, Artificial Intelligence (AI), Large Language Models (LLMs)