Proceedings of the 1st Engineering Data Analytics and Management Conference (EAMCON 2025)

Attention Guided Medical Image Captioning Using ResNet–LSTM

Authors
S. N. Muralikrishna1, V. S. Shrishma Rao2, *, Poornima Shetty2, Aruna Doreen Manezes2
1Manipal Institute of Technology, Manipal Academy of Higher Education (MAHE), Manipal, India
2Directorate of Online Education, Manipal Academy of Higher Education (MAHE), Manipal, India
*Corresponding author. Email: shrishma.rao@manipal.edu
Corresponding Author
V. S. Shrishma Rao
Available Online 31 December 2025.
DOI
10.2991/978-94-6463-978-0_29How to use a DOI?
Keywords
Medical image captioning; Encoder–decoder framework; Attention mechanism; Radiology; Deep learning; Natural language processing; Multimodal learning; Knowledge graphs
Abstract

Medical image captioning is an emerging area that integrates computer vision techniques and natural language processing to automatically generate relevant descriptive text for medical images. This holds significant promise for improving clinical documentation, diagnostic accuracy, and decision support. Unlike general image captioning, the medical domain requires precise recognition of anatomical structures, subtle abnormalities, and clinically meaningful interpretations. In this study, we propose a deep learning framework employing an encoder–decoder architecture with attention mechanisms for the medical image caption. A modified ResNet-50 was used for visual feature extraction, and an attention-based LSTM decoder was used to generate natural language descriptions. The proposed system was trained on the ROCOv2 radiology dataset. The model was trained with early stopping, gradient clipping, and multi-GPU optimization to enhance efficiency and stability. Evaluation was conducted using standard natural language metrics, including BLEU, METEOR, ROUGE, and CIDEr, alongside loss-based performance. The results demonstrate that the proposed approach can produce coherent, accurate, and semantically relevant captions, showing improvement over baseline methods. These findings highlight the potential of medical image captioning systems for assisting radiologists in report generation, supporting clinical education, and enabling content-based retrieval of medical data. The source code and the model weights are available at https://github.com/muralikrishnasn/MedicalImageCaption

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 1st Engineering Data Analytics and Management Conference (EAMCON 2025)
Series
Advances in Engineering Research
Publication Date
31 December 2025
ISBN
978-94-6463-978-0
ISSN
2352-5401
DOI
10.2991/978-94-6463-978-0_29How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - S. N. Muralikrishna
AU  - V. S. Shrishma Rao
AU  - Poornima Shetty
AU  - Aruna Doreen Manezes
PY  - 2025
DA  - 2025/12/31
TI  - Attention Guided Medical Image Captioning Using ResNet–LSTM
BT  - Proceedings of the 1st Engineering Data Analytics and Management Conference (EAMCON 2025)
PB  - Atlantis Press
SP  - 322
EP  - 331
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-978-0_29
DO  - 10.2991/978-94-6463-978-0_29
ID  - Muralikrishna2025
ER  -