dc.description.abstract |
At present, the problem of plagiarism is being increased by widespread use of online documents, the Internet and e - learning systems. It has been identified as one of the most crucial issues to be addressed to maintain the quality and effectiveness of the learning/teaching process especially in higher and university education sector. In order to tackle this problem there should be free, efficient and reliable methods to identify the plagiarized versions of documents among the corpus stored in the large document bases in Learning Management Systems (LMS). The main problem which is addressed in this thesis is detecting the plagiarized versions of documents among the submitted tutorials, assignments and other documents by the students in a LMS. Other than the traditional plagiarism detection approaches a new framework for plagiarism detection is introduced for detecting plagiarism of such kind of corpus in the LMS which covers all the inherent tendencies of the plagiarizer. It is called MAPDetect. The core of this framework consists of several metrics which give more evidence of plagiarism on different types such as verbatim copying, paraphrasing and collusion, structural changes of the content and change of formatting. Algorithms on the document representation are used to calculate the word level correlation among the documents and it is more related to the surface level document similarity analysis. The deep structure of a document such as its, syntactic and semantic analyses are used to detect paraphrasing and collusion. Formatting structure of a document which gives other area of evidence on plagiarism is also emphatically considered. Authorship verification from the field of intrinsic plagiarism detection is also used in the proposed framework. A modular architecture is used for this framework to implement the plagiarism detection techniques with preprocessing sub systems. Real document sets submitted by university students have been used for testing the improved surface level detection of the framework. The deep level detection is tested with a manually created corpus. The result of the exploratory experiments on proposed algorithms of each module gives promising results. It demonstrates that the integration of several metrics on different areas gives significant evidence to discriminate the plagiarized documents more accurately. In this context the user is provided a great opportunity to obtain more evidence to prove the identification of the plagiarized segments of the documents. |
en_US |