Abstract
Sentiment analysis for low-resource African languages such
as Sepedi, Sesotho, and Setswana (Sotho-Tswana family) and isiZulu and
isiXhosa (Nguni family) remains underexplored due to data scarcity and
linguistic complexities. This study investigates the application of generative
AI models—including large multilingual language models (e.g.,
GPT-4, BLOOM) and Afrocentric language models (AfroLM and AfroXLMR)—
to sentiment classification in these languages. The study leverages
a multilingual Twitter corpus covering five African languages and
fine-tuned models using cross-lingual and language-specific strategies.
The results show that Afrocentric transformer-based models achieve the
best performance (e.g., Afro-XLMR reaching over 75% F1 score on average)
for sentiment detection in these languages, outperforming other
multilingual language models. GPT-4 demonstrates a reasonable zeroshot
learning accuracy but still lags behind fine-tuned domain-specific
models. For instance, the fine-tuned Afro-XLMR model achieved over
75% F1-score (≈80% accuracy) on isiZulu and isiXhosa, whereas GPT-
4 reached only about 52-55% accuracy in zero-shot mode. The study
investigated how the complex word structures (morphological richness)
and the mixing of languages (code-switching) in the Nguni and Sotho-
Tswana language families, especially with English, create significant difficulties.
These results suggest that generative AI holds a potential promise
for sentiment analysis within under-resourced African languages. Finally,
this work emphasises the value of creating language-specific models
and proposes future research directions for natural language processing
(NLP) in African languages.