A Multilingual Web-Based Approach to Speech Emotion Recognition: Challenges and Solutions

Arundhati Wani; Rujuta Kulkarni; Ananthan Nair; Vishvjita Savkare; Punam Chavan

doi:10.2991/978-94-6463-831-8_40

<Previous Article In Volume

Next Article In Volume>

A Multilingual Web-Based Approach to Speech Emotion Recognition: Challenges and Solutions

Authors

Arundhati Wani¹^{, *}, Rujuta Kulkarni¹, Ananthan Nair¹, Vishvjita Savkare¹, Punam Chavan¹

¹Marathwada Mitra Mandal’s College of Engineering, Pune, India

^*Corresponding author. Email: arundhatim.wani@gmail.com

Corresponding Author

Arundhati Wani

Available Online 31 August 2025.

DOI: 10.2991/978-94-6463-831-8_40 How to use a DOI?
Keywords: speech emotion recognition; HuBERT; RAVDESS; EMO-DB
Abstract: Speech Emotion Recognition (SER) has gained significant attention in recent years due to its broad range of applications in areas such as human-computer interaction, mental health monitoring, virtual assistants, and customer service. Accurately recognizing human emotions from speech signals can greatly enhance user experience and system responsiveness. However, several critical challenges continue to hinder the development of robust and generalizable SER models. These include the lack of multilingual integration, limited support for real-time processing, cultural and dialectal variations in emotional expression, the absence of standardized evaluation metrics, scarcity and imbalance in publicly available datasets, and the complexities involved in recognizing emotions from multiple users simultaneously. To address some of these issues, this paper presents a web-based application for SER that leverages HuBERT, a self-supervised speech representation model, for multilingual emotion classification. Our system is capable of identifying a wide range of emotional states, including happiness, sadness, anger, neutrality, fear, disgust, boredom, anxiety, surprise, and calmness. The frontend of the application is built using Angular, ensuring a responsive and user-friendly interface, while the backend is powered by FastAPI, enabling efficient API communication and seamless integration of user feedback. The system is trained and evaluated using two well-established emotional speech datasets—RAVDESS and German EMO-DB—which together enhance its generalizability across different languages and cultural contexts. By combining modern deep learning techniques with a practical web-based deployment, this work aims to bridge the gap between SER research and real-world applications, offering a scalable and interactive solution for multilingual emotion recognition.
Copyright: © 2025 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceeding of the 1st International Conference on Lifespan Innovation (ICLI 2025)
Series: Advances in Health Sciences Research
Publication Date: 31 August 2025
ISBN: 978-94-6463-831-8
ISSN: 2468-5739
DOI: 10.2991/978-94-6463-831-8_40 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Arundhati Wani
AU  - Rujuta Kulkarni
AU  - Ananthan Nair
AU  - Vishvjita Savkare
AU  - Punam Chavan
PY  - 2025
DA  - 2025/08/31
TI  - A Multilingual Web-Based Approach to Speech Emotion Recognition: Challenges and Solutions
BT  - Proceeding of the 1st International Conference on Lifespan Innovation (ICLI 2025)
PB  - Atlantis Press
SP  - 331
EP  - 338
SN  - 2468-5739
UR  - https://doi.org/10.2991/978-94-6463-831-8_40
DO  - 10.2991/978-94-6463-831-8_40
ID  - Wani2025
ER  -

download .riscopy to clipboard