Selecting the most suitable classification algorithm for tiger beetle identification using morphometric data and habitat data

Show simple item record

dc.contributor.author Abeywardhana, D. L.
dc.contributor.author Dangalle, C. D.
dc.contributor.author Nugaliyadde, A.
dc.contributor.author Mallawarachchi, Y.W.
dc.date.accessioned 2022-02-02T08:43:01Z
dc.date.available 2022-02-02T08:43:01Z
dc.date.issued 2020
dc.identifier.citation Abeywardhana D. L.; Dangalle C.D.; Nugaliyadde A.; Mallawarachchi Y.W. (2020), Selecting the most suitable classification algorithm for tiger beetle identification using morphometric data and habitat data, Proceedings of the Annual Research Symposium, 2020, University of Colombo, 41 en_US
dc.identifier.uri http://archive.cmb.ac.lk:8080/xmlui/handle/70130/6447
dc.description.abstract Habitat heterogeneity is a main factor in ecology which affects species diversity. Therefore, habitat details can be used as factors that influence species identification. Tiger beetles are highly habitat specific species. Different species of tiger beetles that have morphometric variation can be found restricted to different habitat types in temperate and tropical areas of the world. Therefore, habitat and morphometric data of tiger beetle species were used to develop a predictive model for the identification of tiger beetle species. Data gathered on grounddwelling tiger beetle species collected from 45 locations by Dangalle (2002-2015), and 150 locations by Thotagamuwa (2014 -2017) were used to construct the dataset required for the study. Then data pre-processing was done to convert nominal data to numerical data, detach records with missing data and correct imbalanced data. Data resampling was also done. Further, an evaluation was performed for feature selection to determine the most important attributes for species identification predictions. Finally, a dataset containing 468 records (individuals of species) having 13 attributes of 14 species was constructed and 351 records were selected for the training set while 117 records were selected for the test set. This dataset was fed to several multiclass algorithms belonging to both single (KNN, Naïve-Bayes, SVM) and ensemble classifiers (Gradient Tree Boosting, Extra Tree Classifier and Random Forest). The performance of each algorithm was evaluated by calculating accuracy values for each model and the results revealed that ensemble classifiers yield a higher accuracy than that of single classifiers. Hence it was proven that ensemble models have a positive effect on the overall quality of predictions, in terms of accuracy, generalizability and lower misclassification costs and are more stable than single classifiers. Further, when considering the different types of ensemble classification algorithms, bagging (averaging) ensemble classification algorithm performed better than boosting methods. When considering the two bagging ensemble classification algorithms - Ensemble Extra Tree Classifier and Random Forest algorithm, both revealed almost the same overall accuracy (85%) with less than 0.12% difference. Therefore, both ensemble classification algorithms are effective for species prediction using habitat and morphometric data. However, when considering the computational time with performance, Ensemble Extra Trees Classifier can be considered as the most suitable algorithm for the scenario. en_US
dc.language.iso en en_US
dc.publisher University of Colombo en_US
dc.subject Ensemble classifiers en_US
dc.subject single classifiers en_US
dc.subject tabular data en_US
dc.subject tiger beetles en_US
dc.title Selecting the most suitable classification algorithm for tiger beetle identification using morphometric data and habitat data en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account