Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)

Intelligent Question Answering System Based on Multimodal Fusion

Authors
Mengcong Zhang1, *
1College of Engineering, Environment and Science, Coventry University, Coventry, United Kingdom
*Corresponding author. Email: zhaom18@uni.coventry.ac.uk
Corresponding Author
Mengcong Zhang
Available Online 24 April 2026.
DOI
10.2991/978-94-6239-648-7_36How to use a DOI?
Keywords
Multimodal Learning; Image-text Intelligent Question Answering; Data Fusion; Shared Representation Learning; Collaborative Learning
Abstract

This article explores the application of multimodal learning in intelligent question answering, emphasizing the importance of integrating data from multiple modalities to fully understand complex scenarios. The article reviews the development of intelligent question answering systems, from the early Turing test to modern models such as BERT and GPT-3, and discusses in detail key multimodal learning technologies, such as data fusion, shared representation learning, and collaborative learning. Using case studies such as Large Language and Vision Assistant, Visual Language Environment, and JBoltAI, the article demonstrates specific applications of multimodal learning in intelligent question answering systems. Finally, the article outlines future development directions, including intelligent agentization, personalization, and neural-symbolic fusion reasoning. These directions will promote the application of multimodal learning technology in more fields, providing users with smarter and more convenient services. Multimodal learning overcomes the information bottleneck of a single modality, enabling machines to possess ‘synaesthesia’ capabilities closer to humans, enabling explainable decisions in high-trust scenarios such as healthcare, finance, and justice. It also lowers the barrier to entry for AI services and promotes industrial transformation.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)
Series
Advances in Computer Science Research
Publication Date
24 April 2026
ISBN
978-94-6239-648-7
ISSN
2352-538X
DOI
10.2991/978-94-6239-648-7_36How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Mengcong Zhang
PY  - 2026
DA  - 2026/04/24
TI  - Intelligent Question Answering System Based on Multimodal Fusion
BT  - Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)
PB  - Atlantis Press
SP  - 329
EP  - 338
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6239-648-7_36
DO  - 10.2991/978-94-6239-648-7_36
ID  - Zhang2026
ER  -