A Multilingual Web-Based Approach to Speech Emotion Recognition: Challenges and Solutions
- DOI
- 10.2991/978-94-6463-831-8_40How to use a DOI?
- Keywords
- speech emotion recognition; HuBERT; RAVDESS; EMO-DB
- Abstract
Speech Emotion Recognition (SER) has gained significant attention in recent years due to its broad range of applications in areas such as human-computer interaction, mental health monitoring, virtual assistants, and customer service. Accurately recognizing human emotions from speech signals can greatly enhance user experience and system responsiveness. However, several critical challenges continue to hinder the development of robust and generalizable SER models. These include the lack of multilingual integration, limited support for real-time processing, cultural and dialectal variations in emotional expression, the absence of standardized evaluation metrics, scarcity and imbalance in publicly available datasets, and the complexities involved in recognizing emotions from multiple users simultaneously. To address some of these issues, this paper presents a web-based application for SER that leverages HuBERT, a self-supervised speech representation model, for multilingual emotion classification. Our system is capable of identifying a wide range of emotional states, including happiness, sadness, anger, neutrality, fear, disgust, boredom, anxiety, surprise, and calmness. The frontend of the application is built using Angular, ensuring a responsive and user-friendly interface, while the backend is powered by FastAPI, enabling efficient API communication and seamless integration of user feedback. The system is trained and evaluated using two well-established emotional speech datasets—RAVDESS and German EMO-DB—which together enhance its generalizability across different languages and cultural contexts. By combining modern deep learning techniques with a practical web-based deployment, this work aims to bridge the gap between SER research and real-world applications, offering a scalable and interactive solution for multilingual emotion recognition.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Arundhati Wani AU - Rujuta Kulkarni AU - Ananthan Nair AU - Vishvjita Savkare AU - Punam Chavan PY - 2025 DA - 2025/08/31 TI - A Multilingual Web-Based Approach to Speech Emotion Recognition: Challenges and Solutions BT - Proceeding of the 1st International Conference on Lifespan Innovation (ICLI 2025) PB - Atlantis Press SP - 331 EP - 338 SN - 2468-5739 UR - https://doi.org/10.2991/978-94-6463-831-8_40 DO - 10.2991/978-94-6463-831-8_40 ID - Wani2025 ER -