Multi-modal Machine Learning-based Birds Diversity Identification Using the Merlin Bird ID Application
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Wildlife Conservation, Sri Lanka
Abstract
This study presents an in-depth evaluation of the Merlin Bird ID application for automated bird species identification using both audio and image inputs. Techniques such as Convolutional Neural
Networks (CNNs), Recurrent Convolutional Neural Networks (RCNNs) and Deep Conventional Neural Network (DCNN) are employed to analyze bird calls through spectrograms and extract visual features from images, enabling accurate classification of bird species. Audio signals were converted into spectrograms using Short-Time Fourier Transform (STFT) and Mel-Frequency Cepstral Coefficients (MFCCs), while image data underwent preprocessing and augmentation to enhance model robustness. The dataset consists of 390 bird observations, including images and audio recordings collected using the Merlin Bird ID application at the premises of South Eastern University premises in Sri Lanka. Performance was evaluated using standard metrics, accuracy, precision, recall, and F1-score, which were 87.7%, 74.1%, 73.5% and 87.2%. These results confirm the model’s effectiveness in identifying species with high sensitivity and specificity, especially in challenging field environments. However, misclassifications were observed in some visually or acoustically similar species, suggesting areas for further refinement. This study also highlights the application’s value in biodiversity monitoring, citizen science, and ecological research. To further enhance performance, future work should address data imbalance, integrate expert annotations, apply advanced augmentation techniques, and incorporate geospatial or temporal data. Overall, the Merlin Bird ID application demonstrates strong potential as a reliable tool for automated bird classification and long-term avian biodiversity documentation. This research presents a novel application of multimodal deep learning for bird diversity identification in a
tropical field setting, contributing to both ecological monitoring and machine learning domains.
Description
Keywords
Neural Network, Multi-modal, Deep Learning, Bird Species, biodiversity
Citation
Ameer, M. L. F., Ruzaik, F., Zahir, I. L. M., Iyoob, A. L., Nuskiya, M. H. F., & Hewapathirana, I. S. A. (2025). Multi-modal Machine Learning-based Birds Diversity Identification Using the Merlin Bird ID Application. WILDLANKA, 13(2), 242-256.
