Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)

Deep Learning Approach for Human Emotion From Speech

Authors
Valarmathi Ramasamy1, *, P. Bhavadharani1, N. Divya Geetha1, S. Sugumaran2
1Department of ECE, St. Peter’s College of Engineering and Technology, Avadi, India
2Department of ECE, Vishnu Institute of Technology, Bhimavaram, India
*Corresponding author. Email: valarmathy@spcet.ac.in
Corresponding Author
Valarmathi Ramasamy
Available Online 4 November 2025.
DOI
10.2991/978-94-6463-858-5_246How to use a DOI?
Keywords
Emotion detection; speech recognition; prosodic features; pitch; intensity; tone; Convolutional Neural Network (CNN); transformer model; real-time processing; audio signal processing; feature extraction; emotion classification; deep learning; human-computer
Abstract

This project is concerned with real-time emotion recognition from speech using machine learning methods to interpret vocal expressions. The system records audio through a microphone, processes it to extract major prosodic features like pitch, intensity, and tone using the Librosa library, and then classifies the identified emotion by either a Convolutional Neural Network (CNN) and a Transformer-based model. These models learn to detect emotional states like neutrality, happiness, sadness, anger, fear, disgust, surprise, love, and joy. Furthermore, the speech input is also transcribed into text using the Speech Recognition library, augmenting context comprehension. Built to accommodate a variety of languages, such as Tamil, Telugu, and English, the system constantly monitors and categorizes emotions in real time through the use of vocal tone, qualifying it for multiple applications like AI-powered assistants, customer service, and emotional wellness tracking.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
Series
Advances in Computer Science Research
Publication Date
4 November 2025
ISBN
978-94-6463-858-5
ISSN
2352-538X
DOI
10.2991/978-94-6463-858-5_246How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Valarmathi Ramasamy
AU  - P. Bhavadharani
AU  - N. Divya Geetha
AU  - S. Sugumaran
PY  - 2025
DA  - 2025/11/04
TI  - Deep Learning Approach for Human Emotion From Speech
BT  - Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
PB  - Atlantis Press
SP  - 2929
EP  - 2945
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-858-5_246
DO  - 10.2991/978-94-6463-858-5_246
ID  - Ramasamy2025
ER  -