Deep Sound Search Leveraging Pre-trained CNNs and Faiss for Animal Vocalization Analysis
- DOI
- 10.2991/978-94-6463-718-2_36How to use a DOI?
- Keywords
- deep learning; animal vocalization analysis; dual audio recordings; ensemble methods; autoencoders; wildlife monitoring
- Abstract
Deep learning techniques such as pre-trained Convolutional Neural Networks (CNNs) and Faiss were applied to animal vocalization analysis in this work. Comprehensive bioacoustic data in recent years have shown the versatility and power of CNN based models, which have outperformed in classifying a myriad of animal sounds. The proposed architecture with the self-supervised transformers such as animal2vec improves the raw audio input severity by ensuring the model is designed to handle rare-event nature by including species that are less vocalized in the database. Use of dual audio recording systems provides reliable source attribution, and employing ensemble methods, where multiple classification models are trained, can further improve accuracy across a range of environments. Using pre-trained models accelerates the adaptation to new species, requiring less time to train new species, which makes these techniques more broadly applicable. Moreover, autoencoder-based vocal analysis helps suppress noise and capture minute patterns, enabling deep learning models to be more effective in challenging environments. The deep learning methods highlight scalability, efficiency, and versatility of the approach which can also be adopted as a rich tool in wildlife monitoring and species identification in diverse ecosystems.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - D. Sathiya AU - P. Palanisamy AU - E. Nandhini AU - V. Jayamurugan AU - K. Kirthikcharan AU - N. Manoj PY - 2025 DA - 2025/05/23 TI - Deep Sound Search Leveraging Pre-trained CNNs and Faiss for Animal Vocalization Analysis BT - Proceedings of the International Conference on Sustainability Innovation in Computing and Engineering (ICSICE 2024) PB - Atlantis Press SP - 410 EP - 423 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6463-718-2_36 DO - 10.2991/978-94-6463-718-2_36 ID - Sathiya2025 ER -