Speech Emotion Recognition using Constant-Q Transform and Deep Learning
- DOI
- 10.2991/978-94-6463-976-6_12How to use a DOI?
- Keywords
- Speech Emotion Recognition; Constant-Q Transform; Recurrent Neural Networks; Deep Learning; Emotion Classification
- Abstract
In today’s digital world, machines are evolving beyond basic speech-to-text systems. This project develops an emotionally aware solution that recognizes both spoken words and the underlying emotions. The system combines Constant-Q Transform (CQT) for capturing time frequency audio features with Recurrent Neural Networks (RNNs) to analyze and classify emotional tones in real-time. By integrating emotional audio analysis into traditional speech recognition, it detects emotions including happiness, sadness, anger, fear, and neutrality directly from raw voice signals.
The model is trained on diverse emotional datasets and fine-tuned to recognize subtle variations in pitch, tone, and energy levels. Audio signals are transformed into spectrogram-like representations using CQT, then processed through RNN layers to capture temporal dynamics of emotional speech. A Django-based user interface provides an interactive platform where users can upload audio files or record voice directly through browsers. The system outputs transcribed text with identified emotions, visual cues, and probability scores.
This integration of audio signal processing, deep learning, and web technology creates a powerful tool with applications in mental health monitoring, therapy support, virtual assistants, call centers, and education.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - S. Revathi AU - V. MuthuPriya AU - R. Akila AU - Salman Faris Syed AU - K. Mohammed Zahaan Dhande PY - 2025 DA - 2025/12/29 TI - Speech Emotion Recognition using Constant-Q Transform and Deep Learning BT - Proceedings of the International Conference on Intelligent Information Systems Design and Indian Knowledge System Applications (ICISDIKSA 2026) PB - Atlantis Press SP - 173 EP - 183 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-976-6_12 DO - 10.2991/978-94-6463-976-6_12 ID - Revathi2025 ER -