Automatic Text Summarization for Sinhala

Show simple item record

dc.contributor.author Welgama, W.V.
dc.date.accessioned 2013-07-08T03:52:04Z
dc.date.available 2013-07-08T03:52:04Z
dc.date.issued 2012
dc.identifier.citation A Thesis submitted for the Degree of Master of Philosophy en_US
dc.identifier.uri http://archive.cmb.ac.lk:8080/xmlui/handle/70130/4031
dc.description.abstract With the rapid development of information and communication technology, people are surrounded with vast amounts of information albeit with less and less time or ability to make sense of it. The field of automatic summarization which has been in existence since the 1950’s is anticipated to find solutions to this issue. With the adaptation of Unicode technology in 2004, the Sinhala language began to appear in computers rapidly and Sinhala language users also began to experience the above issue. This research on Automatic Text Summarization in Sinhala is carried out to find the possible approaches to address the above issue with the minimum linguistic resources. The field of automatic text summarization began with some classical approaches which attempted to indentify the most salient information of an article using some thematic features. This research was intended to indentify such features for the Sinhala language with the most suitable approach to define each of these features for achieving accurate summaries. In order to benefit from all these features, this research proposes a best possible linear combination of identified features. The proposed method was evaluated by comparing the machine generated and human extracted summaries based on the primary assumption that the human summaries are perfect. Results show that the sentence location feature is the best individual feature for extracting most informative sentences from Sinhala articles while the linear combination of keyword feature, title words feature and the sentence location feature giving the best performance for a summarizer. Results revealed some equations to define the flow of information over a Sinhala article which can be used in many such applications. Further, this research provides a benchmark for future research on Sinhala automatic text summarization
dc.language.iso en en_US
dc.title Automatic Text Summarization for Sinhala en_US
dc.type Thesis full-text en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account