Deep Learning Approach for Human Emotion From Speech
- DOI
- 10.2991/978-94-6463-858-5_246How to use a DOI?
- Keywords
- Emotion detection; speech recognition; prosodic features; pitch; intensity; tone; Convolutional Neural Network (CNN); transformer model; real-time processing; audio signal processing; feature extraction; emotion classification; deep learning; human-computer
- Abstract
This project is concerned with real-time emotion recognition from speech using machine learning methods to interpret vocal expressions. The system records audio through a microphone, processes it to extract major prosodic features like pitch, intensity, and tone using the Librosa library, and then classifies the identified emotion by either a Convolutional Neural Network (CNN) and a Transformer-based model. These models learn to detect emotional states like neutrality, happiness, sadness, anger, fear, disgust, surprise, love, and joy. Furthermore, the speech input is also transcribed into text using the Speech Recognition library, augmenting context comprehension. Built to accommodate a variety of languages, such as Tamil, Telugu, and English, the system constantly monitors and categorizes emotions in real time through the use of vocal tone, qualifying it for multiple applications like AI-powered assistants, customer service, and emotional wellness tracking.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Valarmathi Ramasamy AU - P. Bhavadharani AU - N. Divya Geetha AU - S. Sugumaran PY - 2025 DA - 2025/11/04 TI - Deep Learning Approach for Human Emotion From Speech BT - Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025) PB - Atlantis Press SP - 2929 EP - 2945 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6463-858-5_246 DO - 10.2991/978-94-6463-858-5_246 ID - Ramasamy2025 ER -