Automatic Text Summarization for Sinhala

Welgama, W.V.

dc.contributor.author	Welgama, W.V.
dc.date.accessioned	2013-07-08T03:52:04Z
dc.date.available	2013-07-08T03:52:04Z
dc.date.issued	2012
dc.identifier.citation	A Thesis submitted for the Degree of Master of Philosophy	en_US
dc.identifier.uri	http://archive.cmb.ac.lk:8080/xmlui/handle/70130/4031
dc.description.abstract	With the rapid development of information and communication technology, people are surrounded with vast amounts of information albeit with less and less time or ability to make sense of it. The field of automatic summarization which has been in existence since the 1950’s is anticipated to find solutions to this issue. With the adaptation of Unicode technology in 2004, the Sinhala language began to appear in computers rapidly and Sinhala language users also began to experience the above issue. This research on Automatic Text Summarization in Sinhala is carried out to find the possible approaches to address the above issue with the minimum linguistic resources. The field of automatic text summarization began with some classical approaches which attempted to indentify the most salient information of an article using some thematic features. This research was intended to indentify such features for the Sinhala language with the most suitable approach to define each of these features for achieving accurate summaries. In order to benefit from all these features, this research proposes a best possible linear combination of identified features. The proposed method was evaluated by comparing the machine generated and human extracted summaries based on the primary assumption that the human summaries are perfect. Results show that the sentence location feature is the best individual feature for extracting most informative sentences from Sinhala articles while the linear combination of keyword feature, title words feature and the sentence location feature giving the best performance for a summarizer. Results revealed some equations to define the flow of information over a Sinhala article which can be used in many such applications. Further, this research provides a benchmark for future research on Sinhala automatic text summarization
dc.language.iso	en	en_US
dc.title	Automatic Text Summarization for Sinhala	en_US
dc.type	Thesis full-text	en_US