Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)

Multimodal Emotion Recognition Using Deep Learning with Voice, Text, and Facial Expression Analysis

Authors
P. Praveenkumar1, S. Yogithaa1, *, R. Soundarya1, M. Harshini1
1Sri Manakula Vinayagar Engineering College, Madagadipet, Puducherry, 605107, India
*Corresponding author. Email: yogithaasendhil@gmail.com
Corresponding Author
S. Yogithaa
Available Online 31 March 2026.
DOI
10.2991/978-94-6239-616-6_21How to use a DOI?
Keywords
Multimodal emotion recognition; BiLSTM; CNN-RNN; ResNet-101; feature-level fusion; audio-text-visual integration; affective computing; human–computer interaction
Abstract

Emotion recognition plays a crucial role in intelligent systems, as emotions influence communication, decision-making, and human–machine interaction. Audio-only methods such as CNN-BiLSTM often perform poorly because emotional expression varies across speech, facial cues, and textual semantics. This study proposes a multimodal framework integrating text, audio, and facial expressions for robust emotion detection. Text is modeled with BiLSTM to capture contextual meaning, audio is processed through a CNN-RNN hybrid to learn spectral–temporal cues, and visual data is analyzed using ResNet-101 for deep facial feature extraction. Feature-level fusion combines all modalities into a unified emotional representation, improving accuracy and stability across real-world conditions. The approach benefits applications in HCI, e-learning, affective computing, and mental-health monitoring.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
Series
Advances in Intelligent Systems Research
Publication Date
31 March 2026
ISBN
978-94-6239-616-6
ISSN
1951-6851
DOI
10.2991/978-94-6239-616-6_21How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - P. Praveenkumar
AU  - S. Yogithaa
AU  - R. Soundarya
AU  - M. Harshini
PY  - 2026
DA  - 2026/03/31
TI  - Multimodal Emotion Recognition Using Deep Learning with Voice, Text, and Facial Expression Analysis
BT  - Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025)
PB  - Atlantis Press
SP  - 249
EP  - 261
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-616-6_21
DO  - 10.2991/978-94-6239-616-6_21
ID  - Praveenkumar2026
ER  -