Emotion Detection from Speech Using Deep Neural Networks
- DOI
- 10.2991/978-94-6463-858-5_61How to use a DOI?
- Keywords
- Speech Emotion Recognition (SER); HuBERT model; Proso- dy2Vec; Semantic Content Extraction; Prosodic Feature Disentanglement; RAVDESS Dataset
- Abstract
The primary objective of this project is to improve speech emotion recognition (SER) development with HuBERT intended to draw out the meaning of the spoken content and with Prosody2Vec focused on the disentanglement of prosodic feature because of the embedding space learned from Hu- BERT, semantics of speech is captured, and Prosody2Vec embarks on the isolation of the physical differences, like accent, rhythm, and fundamental frequency. The model that integrates both layers is trained on the RAVDESS corpus that has a rich diversity of attentive emotions to enhance the improvement in emotion recognition. The approach tries to improve verbal communication with computers and systems that need emotions because the approach improves the representation of spoken language. For an industry such as virtual assistants or customer service systems, not to mention mental health aids, emotional understanding is truly essential. By this means, the project aims for a more natural, intuitive, and responsive verbal interactivity with machines by improved spoken language representation, that’s to say, its more significant emotional content. It combines the ability of HuBERT to capture speech semantics with Proso-dy2Vec’s focus on prosodic features for presenting a new solution for all the challenges faced in speech emotion recognition. This model will not only enable better accuracy in emotion recognition but also better-quality communication between human beings and machines.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - S. Shoba Rani AU - Chinnam Chandana AU - Palnati Harshitha Naidu AU - Tadimarri Muzammil AU - T. Satya Kiranmai PY - 2025 DA - 2025/11/04 TI - Emotion Detection from Speech Using Deep Neural Networks BT - Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025) PB - Atlantis Press SP - 715 EP - 725 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6463-858-5_61 DO - 10.2991/978-94-6463-858-5_61 ID - Rani2025 ER -