Attention Guided Medical Image Captioning Using ResNet–LSTM
- DOI
- 10.2991/978-94-6463-978-0_29How to use a DOI?
- Keywords
- Medical image captioning; Encoder–decoder framework; Attention mechanism; Radiology; Deep learning; Natural language processing; Multimodal learning; Knowledge graphs
- Abstract
Medical image captioning is an emerging area that integrates computer vision techniques and natural language processing to automatically generate relevant descriptive text for medical images. This holds significant promise for improving clinical documentation, diagnostic accuracy, and decision support. Unlike general image captioning, the medical domain requires precise recognition of anatomical structures, subtle abnormalities, and clinically meaningful interpretations. In this study, we propose a deep learning framework employing an encoder–decoder architecture with attention mechanisms for the medical image caption. A modified ResNet-50 was used for visual feature extraction, and an attention-based LSTM decoder was used to generate natural language descriptions. The proposed system was trained on the ROCOv2 radiology dataset. The model was trained with early stopping, gradient clipping, and multi-GPU optimization to enhance efficiency and stability. Evaluation was conducted using standard natural language metrics, including BLEU, METEOR, ROUGE, and CIDEr, alongside loss-based performance. The results demonstrate that the proposed approach can produce coherent, accurate, and semantically relevant captions, showing improvement over baseline methods. These findings highlight the potential of medical image captioning systems for assisting radiologists in report generation, supporting clinical education, and enabling content-based retrieval of medical data. The source code and the model weights are available at https://github.com/muralikrishnasn/MedicalImageCaption
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - S. N. Muralikrishna AU - V. S. Shrishma Rao AU - Poornima Shetty AU - Aruna Doreen Manezes PY - 2025 DA - 2025/12/31 TI - Attention Guided Medical Image Captioning Using ResNet–LSTM BT - Proceedings of the 1st Engineering Data Analytics and Management Conference (EAMCON 2025) PB - Atlantis Press SP - 322 EP - 331 SN - 2352-5401 UR - https://doi.org/10.2991/978-94-6463-978-0_29 DO - 10.2991/978-94-6463-978-0_29 ID - Muralikrishna2025 ER -