An Efficient Hindi Image Captioning with Transformer Model
- DOI
- 10.2991/978-94-6463-738-0_4How to use a DOI?
- Keywords
- Image Captioning; Transformer; DenseNet121; Data Augmentation; BLEU; pre-trained model
- Abstract
Automatic image captioning in Indian languages like Hindi presents unique challenges due to the complex grammar and lack of large-scale datasets compared to English. This research introduces a new way of generating Hindi image captions using a pre-trained DenseNet121 as an encoder and a transformer as a decoder. The model incorporates data augmentation to improve the generalization of captions. We evaluated our model on a benchmark Hindi image captioning dataset. According to the experimental results, the proposed model has outperformed the state-of-the-art models in Hindi paragraph captioning. The model has achieved a BLEU-1 score of 76.54, a BLEU-2 score of 57.41, a BLEU-3 score of 36.64, and a BLEU-4 score of 20.71. Additionally, we have demonstrated competitive performance when compared to other existing methods.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Ashutosh Chandra Bhensle AU - Jyoti Prakash Patra AU - Sumitra Samal PY - 2025 DA - 2025/06/22 TI - An Efficient Hindi Image Captioning with Transformer Model BT - Proceedings of the International Conference on Advances and Applications in Artificial Intelligence (ICAAAI 2025) PB - Atlantis Press SP - 32 EP - 43 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-738-0_4 DO - 10.2991/978-94-6463-738-0_4 ID - Bhensle2025 ER -