dc.description.abstract |
With the rapid development of information and communication technology, people are
surrounded with vast amounts of information albeit with less and less time or ability to make
sense of it. The field of automatic summarization which has been in existence since the
1950’s is anticipated to find solutions to this issue. With the adaptation of Unicode technology
in 2004, the Sinhala language began to appear in computers rapidly and Sinhala language
users also began to experience the above issue. This research on Automatic Text
Summarization in Sinhala is carried out to find the possible approaches to address the above
issue with the minimum linguistic resources.
The field of automatic text summarization began with some classical approaches which
attempted to indentify the most salient information of an article using some thematic features.
This research was intended to indentify such features for the Sinhala language with the most
suitable approach to define each of these features for achieving accurate summaries. In order
to benefit from all these features, this research proposes a best possible linear combination of
identified features.
The proposed method was evaluated by comparing the machine generated and human
extracted summaries based on the primary assumption that the human summaries are perfect.
Results show that the sentence location feature is the best individual feature for extracting
most informative sentences from Sinhala articles while the linear combination of keyword
feature, title words feature and the sentence location feature giving the best performance for a
summarizer. Results revealed some equations to define the flow of information over a Sinhala
article which can be used in many such applications. Further, this research provides a
benchmark for future research on Sinhala automatic text summarization |
|