Please use this identifier to cite or link to this item: http://archive.cmb.ac.lk:8080/xmlui/handle/70130/4031
Title: Automatic Text Summarization for Sinhala
Authors: Welgama, W.V.
Issue Date: 2012
Citation: A Thesis submitted for the Degree of Master of Philosophy
Abstract: With the rapid development of information and communication technology, people are surrounded with vast amounts of information albeit with less and less time or ability to make sense of it. The field of automatic summarization which has been in existence since the 1950’s is anticipated to find solutions to this issue. With the adaptation of Unicode technology in 2004, the Sinhala language began to appear in computers rapidly and Sinhala language users also began to experience the above issue. This research on Automatic Text Summarization in Sinhala is carried out to find the possible approaches to address the above issue with the minimum linguistic resources. The field of automatic text summarization began with some classical approaches which attempted to indentify the most salient information of an article using some thematic features. This research was intended to indentify such features for the Sinhala language with the most suitable approach to define each of these features for achieving accurate summaries. In order to benefit from all these features, this research proposes a best possible linear combination of identified features. The proposed method was evaluated by comparing the machine generated and human extracted summaries based on the primary assumption that the human summaries are perfect. Results show that the sentence location feature is the best individual feature for extracting most informative sentences from Sinhala articles while the linear combination of keyword feature, title words feature and the sentence location feature giving the best performance for a summarizer. Results revealed some equations to define the flow of information over a Sinhala article which can be used in many such applications. Further, this research provides a benchmark for future research on Sinhala automatic text summarization
URI: http://archive.cmb.ac.lk:8080/xmlui/handle/70130/4031
Appears in Collections:MPhil/PhD theses

Files in This Item:
File Description SizeFormat 
MPhil2013-WV Welgama.pdf968.8 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.