Intelligent Question Answering System Based on Multimodal Fusion
- DOI
- 10.2991/978-94-6239-648-7_36How to use a DOI?
- Keywords
- Multimodal Learning; Image-text Intelligent Question Answering; Data Fusion; Shared Representation Learning; Collaborative Learning
- Abstract
This article explores the application of multimodal learning in intelligent question answering, emphasizing the importance of integrating data from multiple modalities to fully understand complex scenarios. The article reviews the development of intelligent question answering systems, from the early Turing test to modern models such as BERT and GPT-3, and discusses in detail key multimodal learning technologies, such as data fusion, shared representation learning, and collaborative learning. Using case studies such as Large Language and Vision Assistant, Visual Language Environment, and JBoltAI, the article demonstrates specific applications of multimodal learning in intelligent question answering systems. Finally, the article outlines future development directions, including intelligent agentization, personalization, and neural-symbolic fusion reasoning. These directions will promote the application of multimodal learning technology in more fields, providing users with smarter and more convenient services. Multimodal learning overcomes the information bottleneck of a single modality, enabling machines to possess ‘synaesthesia’ capabilities closer to humans, enabling explainable decisions in high-trust scenarios such as healthcare, finance, and justice. It also lowers the barrier to entry for AI services and promotes industrial transformation.
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Mengcong Zhang PY - 2026 DA - 2026/04/24 TI - Intelligent Question Answering System Based on Multimodal Fusion BT - Proceedings of the International Workshop on Advances in Deep Learning for Image Analysis and Computer Vision (IWADIC 2025) PB - Atlantis Press SP - 329 EP - 338 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6239-648-7_36 DO - 10.2991/978-94-6239-648-7_36 ID - Zhang2026 ER -