Plagiarism detection using N-Gram model
Abstract
The vast increase of available documents in the World Wide Web (WWW) and the ease
access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of sentences or when slightly replacing words by their synonyms, it is often hard to reveal plagiarism when the copied
sentences are deliberately modified. This project proposes an algorithm for plagiarism
detection by using syntactic plagiarism detection using 1-gram and 2-gram. Jaccard
similarity coefficient is applied to detect similarity between documents of English
corpus in engineering field by using C programming language. From the value of the
results which is precision, recall and f-measure, we considered 2-gram showed the great potential for the plagiarism detection method. The 2-gram extraction achieved values
0.983 for precision, 0.380 for recall and 0.548 for f-measure compared to 1-gram
extraction. Jaccard similarity coefficient incorporation with N-gram method is suitable
sufficiently to be employed in the word similarity measurement. In efficiency
measurement, the program performance can deal appropriately with high stability to
calculate the word similarity.