Abstract
Stylometry is often adopted to solve aspects of authorship through the use of computational techniques to quantify the style of a given author. Author attribution leverages stylometry to assign an author to an unknown document by assessing the writing patterns of various authors within a corpus. Author attribution has several applications such as plagiarism detection and cybercrime. There are many advanced techniques to perform author attribution such as Burrow’s Delta which computes the distance between two documents by assessing the most common words between the documents. There is a plethora of research on Burrows Delta across several languages. However, these are outside of the South African context, a country with 11 official languages, including Afrikaans – a language with roots in Dutch and spoken by 12% of the South African population and with a rich literary history. This article explores the effectiveness of Burrows Delta on a corpus of Afrikaans novels. It aims to cover three crucial aspects. The first is assessing if Burrows Delta is effective on an Afrikaans corpus. The second is the optimal number of most frequent words to examine. Lastly, the study looks at the impact of stop words on Burrows Delta on an Afrikaans corpus. Burrows Delta was carried out programmatically using Python to address these three aspects. The results showed that Burrows Delta can be used on an Afrikaans Corpus. However, the most optimal, most-frequent-word range was between 50 to 100. Lastly, the study showed that removing pronouns and stop words did degrade the Burrows Delta score while still being able to allocate the lowest score to the correct author. Ultimately, the study showed that Burrows Delta can be used for Authorship Attribution on the Afrikaans corpus.
Keywords: Burrows Delta, Authorship Attribution, Stylometry, Afrikaans