Deep Learning Approach for Human Emotion From Speech

Valarmathi Ramasamy; P. Bhavadharani; N. Divya Geetha; S. Sugumaran

doi:10.2991/978-94-6463-858-5_246

<Previous Article In Volume

Next Article In Volume>

Deep Learning Approach for Human Emotion From Speech

Authors

Valarmathi Ramasamy¹^{, *}, P. Bhavadharani¹, N. Divya Geetha¹, S. Sugumaran²

¹Department of ECE, St. Peter’s College of Engineering and Technology, Avadi, India

²Department of ECE, Vishnu Institute of Technology, Bhimavaram, India

^*Corresponding author. Email: valarmathy@spcet.ac.in

Corresponding Author

Valarmathi Ramasamy

Available Online 4 November 2025.

DOI: 10.2991/978-94-6463-858-5_246 How to use a DOI?
Keywords: Emotion detection; speech recognition; prosodic features; pitch; intensity; tone; Convolutional Neural Network (CNN); transformer model; real-time processing; audio signal processing; feature extraction; emotion classification; deep learning; human-computer
Abstract: This project is concerned with real-time emotion recognition from speech using machine learning methods to interpret vocal expressions. The system records audio through a microphone, processes it to extract major prosodic features like pitch, intensity, and tone using the Librosa library, and then classifies the identified emotion by either a Convolutional Neural Network (CNN) and a Transformer-based model. These models learn to detect emotional states like neutrality, happiness, sadness, anger, fear, disgust, surprise, love, and joy. Furthermore, the speech input is also transcribed into text using the Speech Recognition library, augmenting context comprehension. Built to accommodate a variety of languages, such as Tamil, Telugu, and English, the system constantly monitors and categorizes emotions in real time through the use of vocal tone, qualifying it for multiple applications like AI-powered assistants, customer service, and emotional wellness tracking.
Copyright: © 2025 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
Series: Advances in Computer Science Research
Publication Date: 4 November 2025
ISBN: 978-94-6463-858-5
ISSN: 2352-538X
DOI: 10.2991/978-94-6463-858-5_246 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Valarmathi Ramasamy
AU  - P. Bhavadharani
AU  - N. Divya Geetha
AU  - S. Sugumaran
PY  - 2025
DA  - 2025/11/04
TI  - Deep Learning Approach for Human Emotion From Speech
BT  - Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
PB  - Atlantis Press
SP  - 2929
EP  - 2945
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-858-5_246
DO  - 10.2991/978-94-6463-858-5_246
ID  - Ramasamy2025
ER  -

download .riscopy to clipboard