Enhanced Feature Representation for Otoscopic Image Classification Using Supervised Contrastive Learning
- DOI
- 10.2991/978-94-6239-668-5_31How to use a DOI?
- Keywords
- otoscopy; deep learning; contrastive learning; fine-tune; supervised learning
- Abstract
Accurate interpretation of otoscopic images is essential for the early detection of middle-ear disorders, yet diagnostic variability among clinicians and the scarcity of large, annotated datasets make automated screening a challenging task. Traditional deep learning methods, typically based on fine-tuning convolutional networks with cross-entropy loss, may not fully capture inter-class relationships in small datasets. This study investigates whether supervised contrastive learning can produce more discriminative feature representations for otoscopic image classification compared to conventional fine-tuning. Using the publicly available Eardrum Dataset, images were reorganized into two categories, normal (534 images) and abnormal (391 images), and split into training (70%), validation (15%), and test (15%) sets using stratified sampling. Two pipelines were compared: (1) a baseline model fine-tuning a pretrained ResNet-50 using cross-entropy loss, and (2) a supervised contrastive learning approach, where a ResNet-50 encoder was trained to minimize intra-class distance and maximize inter-class separation. Contrastive pairs were generated via strong augmentations including random cropping, color jitter, grayscale conversion, flipping, and erasing. After training, the encoder was frozen, and a linear classifier was trained on the learned embeddings. A temperature ablation (τ = 0.03, 0.07, 0.10, 0.20) was performed to assess sensitivity. The contrastive learning model achieved superior performance, reaching 85.82% accuracy and 85.36% F1-score compared to the baseline’s 82.27% accuracy and 81.43% F1-score. The improvement was most evident in detecting abnormal cases. These findings demonstrate that a supervised contrastive learning stage can enhance diagnostic reliability in otoscopic image classification, particularly under limited and imbalanced data conditions.
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Mesut Şeker PY - 2026 DA - 2026/05/14 TI - Enhanced Feature Representation for Otoscopic Image Classification Using Supervised Contrastive Learning BT - Proceedings of the International Conference on Current Problems in Engineering and Applied Sciences (ICCPEAS 2025) PB - Atlantis Press SP - 291 EP - 298 SN - 2352-5401 UR - https://doi.org/10.2991/978-94-6239-668-5_31 DO - 10.2991/978-94-6239-668-5_31 ID - Şeker2026 ER -