Abstract
Efficient language analysis techniques and models are crucial in the artificial
intelligence age for enhancing cross-lingual question answering. Transfer learning with
state-of-the-art models has been beneficial in this regard, but the performance of lowresource
African languages with morphologically rich grammatical structures and unique
typologies has shown deficiencies linkable to evaluation techniques and scarce training
data. To enhance the former, this paper proposes an evaluation pipeline leveraging the
semantic answer similarity method enhanced with automatic answer annotation. The
pipeline uses the Language-agnostic BERT Sentence Embedding model integrated with
an adapted vector measure to perform cross-lingual text analysis after answer prediction.
Experimental results from the multilingual-T5 and AfroXLMR models on nine languages
of the AfriQA dataset surpassed existing benchmarks deploying string-based methods for
question answer evaluation. The results are also superior to the F1-score-based GPT4 and
Llama-2 performances on the same downstream task. The automatic answer annotation
technique effectively reduced the labelling time while maintaining a high performance.
Thus, the proposed pipeline is more efficient than the prevailing string-based F1 and Exact
Match metrics in mixed answer type question–answer evaluations, and it is a more natural
performance estimator for models targeting real-world deployment.