Proceedings of the 2024 4th International Conference on Social Development and Media Communication (SDMC 2024)

Multimodal NLP and Artificial Intelligence: Cross-Media Information Understanding and Generation

Authors
Yicheng Liu1, *
1National University of Singapore, Singapore, Singapore
*Corresponding author. Email: 1280538273@qq.com
Corresponding Author
Yicheng Liu
Available Online 17 December 2024.
DOI
10.2991/978-2-38476-327-6_24How to use a DOI?
Keywords
Multimodal NLP; Artificial Intelligence; Cross-media
Abstract

In the era of rapid development of artificial intelligence, multimodal natural language processing (NLP) has emerged as a crucial field. This paper explores the significance and applications of multimodal NLP in cross-media information understanding and generation. By integrating multiple modalities such as text, images, audio, and video, multimodal NLP aims to enhance the accuracy and comprehensiveness of language understanding and generation. The paper discusses various techniques and models used in multimodal NLP, including deep learning architectures and attention mechanisms. It also examines the challenges and future directions of this field, highlighting the potential for improved human-computer interaction and intelligent applications. Through case studies and experimental results, the paper demonstrates the effectiveness of multimodal NLP in tasks such as image captioning, video description generation, and cross-modal retrieval. Overall, multimodal NLP holds great promise for advancing the capabilities of artificial intelligence and enabling more natural and seamless interaction between humans and machines.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2024 4th International Conference on Social Development and Media Communication (SDMC 2024)
Series
Advances in Social Science, Education and Humanities Research
Publication Date
17 December 2024
ISBN
978-2-38476-327-6
ISSN
2352-5398
DOI
10.2991/978-2-38476-327-6_24How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Yicheng Liu
PY  - 2024
DA  - 2024/12/17
TI  - Multimodal NLP and Artificial Intelligence: Cross-Media Information Understanding and Generation
BT  - Proceedings of the 2024 4th International Conference on Social Development and Media Communication (SDMC 2024)
PB  - Atlantis Press
SP  - 202
EP  - 209
SN  - 2352-5398
UR  - https://doi.org/10.2991/978-2-38476-327-6_24
DO  - 10.2991/978-2-38476-327-6_24
ID  - Liu2024
ER  -