Multimodal NLP and Artificial Intelligence: Cross-Media Information Understanding and Generation

Yicheng Liu

doi:10.2991/978-2-38476-327-6_24

<Previous Article In Volume

Next Article In Volume>

Multimodal NLP and Artificial Intelligence: Cross-Media Information Understanding and Generation

Authors

Yicheng Liu¹^{, *}

¹National University of Singapore, Singapore, Singapore

^*Corresponding author. Email: 1280538273@qq.com

Corresponding Author

Yicheng Liu

Available Online 17 December 2024.

DOI: 10.2991/978-2-38476-327-6_24 How to use a DOI?
Keywords: Multimodal NLP; Artificial Intelligence; Cross-media
Abstract: In the era of rapid development of artificial intelligence, multimodal natural language processing (NLP) has emerged as a crucial field. This paper explores the significance and applications of multimodal NLP in cross-media information understanding and generation. By integrating multiple modalities such as text, images, audio, and video, multimodal NLP aims to enhance the accuracy and comprehensiveness of language understanding and generation. The paper discusses various techniques and models used in multimodal NLP, including deep learning architectures and attention mechanisms. It also examines the challenges and future directions of this field, highlighting the potential for improved human-computer interaction and intelligent applications. Through case studies and experimental results, the paper demonstrates the effectiveness of multimodal NLP in tasks such as image captioning, video description generation, and cross-modal retrieval. Overall, multimodal NLP holds great promise for advancing the capabilities of artificial intelligence and enabling more natural and seamless interaction between humans and machines.
Copyright: © 2024 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2024 4th International Conference on Social Development and Media Communication (SDMC 2024)
Series: Advances in Social Science, Education and Humanities Research
Publication Date: 17 December 2024
ISBN: 978-2-38476-327-6
ISSN: 2352-5398
DOI: 10.2991/978-2-38476-327-6_24 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Yicheng Liu
PY  - 2024
DA  - 2024/12/17
TI  - Multimodal NLP and Artificial Intelligence: Cross-Media Information Understanding and Generation
BT  - Proceedings of the 2024 4th International Conference on Social Development and Media Communication (SDMC 2024)
PB  - Atlantis Press
SP  - 202
EP  - 209
SN  - 2352-5398
UR  - https://doi.org/10.2991/978-2-38476-327-6_24
DO  - 10.2991/978-2-38476-327-6_24
ID  - Liu2024
ER  -

download .riscopy to clipboard