Sentiment analysis in African languages : Evaluating generative AI and Afrocentric language models

Koena Ronny Mabokela; Turgay Celik; Mpho Primus

Sentiment analysis for low-resource African languages such as Sepedi, Sesotho, and Setswana (Sotho-Tswana family) and isiZulu and isiXhosa (Nguni family) remains underexplored due to data scarcity and linguistic complexities. This study investigates the application of generative AI models—including large multilingual language models (e.g., GPT-4, BLOOM) and Afrocentric language models (AfroLM and AfroXLMR)— to sentiment classification in these languages. The study leverages a multilingual Twitter corpus covering five African languages and fine-tuned models using cross-lingual and language-specific strategies. The results show that Afrocentric transformer-based models achieve the best performance (e.g., Afro-XLMR reaching over 75% F1 score on average) for sentiment detection in these languages, outperforming other multilingual language models. GPT-4 demonstrates a reasonable zeroshot learning accuracy but still lags behind fine-tuned domain-specific models. For instance, the fine-tuned Afro-XLMR model achieved over 75% F1-score (≈80% accuracy) on isiZulu and isiXhosa, whereas GPT- 4 reached only about 52-55% accuracy in zero-shot mode. The study investigated how the complex word structures (morphological richness) and the mixing of languages (code-switching) in the Nguni and Sotho- Tswana language families, especially with English, create significant difficulties. These results suggest that generative AI holds a potential promise for sentiment analysis within under-resourced African languages. Finally, this work emphasises the value of creating language-specific models and proposes future research directions for natural language processing (NLP) in African languages.

Sentiment analysis in African languages : Evaluating generative AI and Afrocentric language models

Abstract

Files and links (1)

Metrics

Details