Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)

Emotion Detection from Speech Using Deep Neural Networks

Authors
S. Shoba Rani1, *, Chinnam Chandana1, Palnati Harshitha Naidu1, Tadimarri Muzammil1, T. Satya Kiranmai1
1Chaitanya Bharathi Institute of Technology, Hyderabad, India
*Corresponding author. Email: shobaraniaids@cbit.ac.in
Corresponding Author
S. Shoba Rani
Available Online 4 November 2025.
DOI
10.2991/978-94-6463-858-5_61How to use a DOI?
Keywords
Speech Emotion Recognition (SER); HuBERT model; Proso- dy2Vec; Semantic Content Extraction; Prosodic Feature Disentanglement; RAVDESS Dataset
Abstract

The primary objective of this project is to improve speech emotion recognition (SER) development with HuBERT intended to draw out the meaning of the spoken content and with Prosody2Vec focused on the disentanglement of prosodic feature because of the embedding space learned from Hu- BERT, semantics of speech is captured, and Prosody2Vec embarks on the isolation of the physical differences, like accent, rhythm, and fundamental frequency. The model that integrates both layers is trained on the RAVDESS corpus that has a rich diversity of attentive emotions to enhance the improvement in emotion recognition. The approach tries to improve verbal communication with computers and systems that need emotions because the approach improves the representation of spoken language. For an industry such as virtual assistants or customer service systems, not to mention mental health aids, emotional understanding is truly essential. By this means, the project aims for a more natural, intuitive, and responsive verbal interactivity with machines by improved spoken language representation, that’s to say, its more significant emotional content. It combines the ability of HuBERT to capture speech semantics with Proso-dy2Vec’s focus on prosodic features for presenting a new solution for all the challenges faced in speech emotion recognition. This model will not only enable better accuracy in emotion recognition but also better-quality communication between human beings and machines.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
Series
Advances in Computer Science Research
Publication Date
4 November 2025
ISBN
978-94-6463-858-5
ISSN
2352-538X
DOI
10.2991/978-94-6463-858-5_61How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - S. Shoba Rani
AU  - Chinnam Chandana
AU  - Palnati Harshitha Naidu
AU  - Tadimarri Muzammil
AU  - T. Satya Kiranmai
PY  - 2025
DA  - 2025/11/04
TI  - Emotion Detection from Speech Using Deep Neural Networks
BT  - Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
PB  - Atlantis Press
SP  - 715
EP  - 725
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-858-5_61
DO  - 10.2991/978-94-6463-858-5_61
ID  - Rani2025
ER  -