Intelligent Question Answering System Based on Multimodal Fusion

Mengcong Zhang

doi:10.2991/978-94-6239-648-7_36

<Previous Article In Volume

Next Article In Volume>

Intelligent Question Answering System Based on Multimodal Fusion

Authors

Mengcong Zhang¹^{, *}

¹College of Engineering, Environment and Science, Coventry University, Coventry, United Kingdom

^*Corresponding author. Email: zhaom18@uni.coventry.ac.uk

Corresponding Author

Mengcong Zhang

Available Online 24 April 2026.

DOI: 10.2991/978-94-6239-648-7_36 How to use a DOI?
Keywords: Multimodal Learning; Image-text Intelligent Question Answering; Data Fusion; Shared Representation Learning; Collaborative Learning
Abstract: This article explores the application of multimodal learning in intelligent question answering, emphasizing the importance of integrating data from multiple modalities to fully understand complex scenarios. The article reviews the development of intelligent question answering systems, from the early Turing test to modern models such as BERT and GPT-3, and discusses in detail key multimodal learning technologies, such as data fusion, shared representation learning, and collaborative learning. Using case studies such as Large Language and Vision Assistant, Visual Language Environment, and JBoltAI, the article demonstrates specific applications of multimodal learning in intelligent question answering systems. Finally, the article outlines future development directions, including intelligent agentization, personalization, and neural-symbolic fusion reasoning. These directions will promote the application of multimodal learning technology in more fields, providing users with smarter and more convenient services. Multimodal learning overcomes the information bottleneck of a single modality, enabling machines to possess ‘synaesthesia’ capabilities closer to humans, enabling explainable decisions in high-trust scenarios such as healthcare, finance, and justice. It also lowers the barrier to entry for AI services and promotes industrial transformation.
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)
Series: Advances in Computer Science Research
Publication Date: 24 April 2026
ISBN: 978-94-6239-648-7
ISSN: 2352-538X
DOI: 10.2991/978-94-6239-648-7_36 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Mengcong Zhang
PY  - 2026
DA  - 2026/04/24
TI  - Intelligent Question Answering System Based on Multimodal Fusion
BT  - Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025)
PB  - Atlantis Press
SP  - 329
EP  - 338
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6239-648-7_36
DO  - 10.2991/978-94-6239-648-7_36
ID  - Zhang2026
ER  -

download .riscopy to clipboard