Proceedings of the International Conference on Advances and Applications in Artificial Intelligence (ICAAAI 2025)

An Efficient Hindi Image Captioning with Transformer Model

Authors
Ashutosh Chandra Bhensle1, *, Jyoti Prakash Patra2, Sumitra Samal3
1Computer Science and engineering, Govt. Girls Polytechnic, Raipur, India
2Computer Science Engineering, University Teaching Department, Chhattisgarh Swami Vivekanand Technical University, Bhilai, India
3Rungta College of Engineering and Technology, Bhilai, India
*Corresponding author. Email: bhensle.ashu@gmail.com
Corresponding Author
Ashutosh Chandra Bhensle
Available Online 22 June 2025.
DOI
10.2991/978-94-6463-738-0_4How to use a DOI?
Keywords
Image Captioning; Transformer; DenseNet121; Data Augmentation; BLEU; pre-trained model
Abstract

Automatic image captioning in Indian languages like Hindi presents unique challenges due to the complex grammar and lack of large-scale datasets compared to English. This research introduces a new way of generating Hindi image captions using a pre-trained DenseNet121 as an encoder and a transformer as a decoder. The model incorporates data augmentation to improve the generalization of captions. We evaluated our model on a benchmark Hindi image captioning dataset. According to the experimental results, the proposed model has outperformed the state-of-the-art models in Hindi paragraph captioning. The model has achieved a BLEU-1 score of 76.54, a BLEU-2 score of 57.41, a BLEU-3 score of 36.64, and a BLEU-4 score of 20.71. Additionally, we have demonstrated competitive performance when compared to other existing methods.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Advances and Applications in Artificial Intelligence (ICAAAI 2025)
Series
Advances in Intelligent Systems Research
Publication Date
22 June 2025
ISBN
978-94-6463-738-0
ISSN
1951-6851
DOI
10.2991/978-94-6463-738-0_4How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Ashutosh Chandra Bhensle
AU  - Jyoti Prakash Patra
AU  - Sumitra Samal
PY  - 2025
DA  - 2025/06/22
TI  - An Efficient Hindi Image Captioning with Transformer Model
BT  - Proceedings of the International Conference on Advances and Applications in Artificial Intelligence (ICAAAI 2025)
PB  - Atlantis Press
SP  - 32
EP  - 43
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-738-0_4
DO  - 10.2991/978-94-6463-738-0_4
ID  - Bhensle2025
ER  -