Abstract
Source code plagiarism is very common among undergraduate computer
science students and a lot of research has been carried out on how it can
be detected, penalised, controlled or even stopped. In this paper, we propose a
new approach to the detection of possibly plagiarised programs written in C++
using deterministic finite automaton (DFA) abstractions. The two programs to
be checked for similarity are first normalised, granulated and abstracted to a
DFA structure referred to as Single Program Deterministic Finite Automata or
SPDFA. Then a newly proposed algorithm is used to map the alphabets of the
two SPDFAs. If there is a one-to-one mapping of the symbols in both alphabets,
we conclude that the programs are totally similar.We have also presented a
prototype software application called the Exact Code Matcher that implements
this technique as a proof of concept. This detection technique is a new application
of finite automata theories, it is efficient for cloned or lexically altered
programs, precise with no false positives and portable across different platforms.