Enabling Multimodal Understanding: Lidar Data Meets VQA

Muhammad Zeeshan Khan; Anuroop Gaddam; Dhananjay Thiruvady; N. K. Suryadevara

doi:10.2991/978-94-6463-784-7_11

<Previous Article In Volume

Enabling Multimodal Understanding: Lidar Data Meets VQA

Authors

Muhammad Zeeshan Khan¹, Anuroop Gaddam¹^{, *}, Dhananjay Thiruvady¹, N. K. Suryadevara²

¹School of Information Technology, Deakin University, Geelong, VIC, 3216, Australia

²University of Hyderabad, Hyderabad, India

^*Corresponding author. Email: anuroop.gaddam@deakin.edu.au

Corresponding Author

Anuroop Gaddam

Available Online 28 July 2025.

DOI: 10.2991/978-94-6463-784-7_11 How to use a DOI?
Keywords: LiDAR; Visual Question Answering (VQA); VQA Applications; Multimodality; Computer Vision; Natural Language Processing
Abstract: This chapter explores the integration of Light Detection and Ranging (LiDAR) data with multimodal systems such as Visual Question Answering (VQA) to enable robust contextual understanding. The chapter begins with an overview of VQA, including its architecture, various modeling approaches, and practical applications. The chapter then introduces LiDAR data, particularly point clouds, and highlights how this 3D information enhances traditional visual input in machine learning tasks. Special attention is given to recent efforts in combining point cloud data with VQA, examining relevant datasets, deep learning models, and fusion techniques. The chapter concludes by outlining current limitations and future directions for advancing LiDAR-based multimodal understanding in VQA.
Copyright: © 2025 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Volume Title: Proceedings of the IoT AND LiDAR Technologies in Healthcare Workshop (ILTH 2024)
Series: Atlantis Highlights in Intelligent Systems
Publication Date: 28 July 2025
ISBN: 978-94-6463-784-7
ISSN: 2589-4919
DOI: 10.2991/978-94-6463-784-7_11 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Muhammad Zeeshan Khan
AU  - Anuroop Gaddam
AU  - Dhananjay Thiruvady
AU  - N. K. Suryadevara
PY  - 2025
DA  - 2025/07/28
TI  - Enabling Multimodal Understanding: Lidar Data Meets VQA
BT  - Proceedings of the IoT AND LiDAR Technologies in Healthcare Workshop (ILTH 2024)
PB  - Atlantis Press
SP  - 110
EP  - 127
SN  - 2589-4919
UR  - https://doi.org/10.2991/978-94-6463-784-7_11
DO  - 10.2991/978-94-6463-784-7_11
ID  - Khan2025
ER  -

download .riscopy to clipboard