A Comparative Study on the Effectiveness of Data Augmentation Techniques for Cervical Cancer Detection
- DOI
- 10.2991/978-94-6239-664-7_19How to use a DOI?
- Keywords
- cervical cancer; data augmentation; class imbalance; Forest-Diffusion; TabNet
- Abstract
This paper presents a comparative study of traditional and generative data augmentation techniques for cervical cancer detection using the UCI Cervical Cancer Risk Factors dataset. Conventional oversampling methods, namely SMOTE and ADASYN, are evaluated alongside advanced generative approaches, including diffusion-based (Forest-Diffusion) and adversarial (CTGAN) models. Three machine learning classifiers, namely XGBoost, CatBoost, and TabNet, are employed to assess predictive performance across multiple metrics including accuracy, precision, recall, F1-score, and AUC. Experimental results reveal that the optimal augmentation strategy varies across different diagnostic targets, with TabNet consistently outperforming gradient boosting methods. Specifically, the ForestDiffusion–TabNet model achieved a 42.5 percent improvement in F1-score for Hinselmann prediction, while the SMOTE–TabNet model yielded increases of 112.12 percent and 134.7 percent for Schiller and Cytology predictions, respectively. Furthermore, the ADASYN–TabNet model enhanced Biopsy prediction performance by 57.89 percent. Ten-fold cross-validation confirmed model stability, though the persistent challenge of severe class imbalance limits performance on imbalanced test sets. These findings indicate that distinct augmentation methods capture complementary data characteristics, underscoring the potential of tailored augmentation strategies for robust cervical cancer screening.
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - All-Marufi Rahaman Sajon AU - Sadia Parvin Ripa AU - Sadat Iqbal Priom AU - Shamima Afrin Sweety PY - 2026 DA - 2026/06/08 TI - A Comparative Study on the Effectiveness of Data Augmentation Techniques for Cervical Cancer Detection BT - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025) PB - Atlantis Press SP - 250 EP - 266 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6239-664-7_19 DO - 10.2991/978-94-6239-664-7_19 ID - Sajon2026 ER -