Abstract:
Open ended questions are an essential and important part of survey questionnaires. They provide
an opportunity for researchers to discover unanticipated information regarding the domain of
study. However, they are problematic for processing since they are unstructured questions to
which possible answers are not suggested, and the respondent is free to answer in his or her own
words. This thesis presents novel methods of categorizing such open ended survey responses. A
document clustering technique is employed in this study to categorize responses to open-ended
survey questions. Supervised and unsupervised methods of categorizing open ended responses
are tested in the study.
Initially the author proposed a hierarchical clustering based algorithm as the unsupervised
method to code the open-ended responses which were not labelled at all. The algorithm employs
several natural language processing techniques to extract a classification of responses
automatically. Naive Bayes classification was proposed as the supervised solution. This Naive
Bayes algorithm was proposed for the open ended responses which were partially labelled.
Two experiments were carried out to determine the accuracy of the proposed algorithms which
proved to be promising. Hierarchical clustering based algorithm shows more than 70% accuracy
when compared with the manually coded responses. The proposed Naive Bayes algorithm didn’t
not illustrate the results as it expected. Therefore Positive Naive Bayes algorithm was introduced
and it achieved an overall performance of 80%