Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)

A Comparative Study on the Effectiveness of Data Augmentation Techniques for Cervical Cancer Detection

Authors
All-Marufi Rahaman Sajon1, *, Sadia Parvin Ripa2, Sadat Iqbal Priom2, Shamima Afrin Sweety3
1University of Dhaka, Dhaka, 1000, Bangladesh
2East West University, Dhaka, 1212, Bangladesh
3Shaheed Suhrawardy Medical College Hospital, Dhaka, 1207, Bangladesh
*Corresponding author. Email: allmarufirahaman-2019415666@devs.du.ac.bd
Corresponding Author
All-Marufi Rahaman Sajon
Available Online 8 June 2026.
DOI
10.2991/978-94-6239-664-7_19How to use a DOI?
Keywords
cervical cancer; data augmentation; class imbalance; Forest-Diffusion; TabNet
Abstract

This paper presents a comparative study of traditional and generative data augmentation techniques for cervical cancer detection using the UCI Cervical Cancer Risk Factors dataset. Conventional oversampling methods, namely SMOTE and ADASYN, are evaluated alongside advanced generative approaches, including diffusion-based (Forest-Diffusion) and adversarial (CTGAN) models. Three machine learning classifiers, namely XGBoost, CatBoost, and TabNet, are employed to assess predictive performance across multiple metrics including accuracy, precision, recall, F1-score, and AUC. Experimental results reveal that the optimal augmentation strategy varies across different diagnostic targets, with TabNet consistently outperforming gradient boosting methods. Specifically, the ForestDiffusion–TabNet model achieved a 42.5 percent improvement in F1-score for Hinselmann prediction, while the SMOTE–TabNet model yielded increases of 112.12 percent and 134.7 percent for Schiller and Cytology predictions, respectively. Furthermore, the ADASYN–TabNet model enhanced Biopsy prediction performance by 57.89 percent. Ten-fold cross-validation confirmed model stability, though the persistent challenge of severe class imbalance limits performance on imbalanced test sets. These findings indicate that distinct augmentation methods capture complementary data characteristics, underscoring the potential of tailored augmentation strategies for robust cervical cancer screening.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
Series
Advances in Intelligent Systems Research
Publication Date
8 June 2026
ISBN
978-94-6239-664-7
ISSN
1951-6851
DOI
10.2991/978-94-6239-664-7_19How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - All-Marufi Rahaman Sajon
AU  - Sadia Parvin Ripa
AU  - Sadat Iqbal Priom
AU  - Shamima Afrin Sweety
PY  - 2026
DA  - 2026/06/08
TI  - A Comparative Study on the Effectiveness of Data Augmentation Techniques for Cervical Cancer Detection
BT  - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
PB  - Atlantis Press
SP  - 250
EP  - 266
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-664-7_19
DO  - 10.2991/978-94-6239-664-7_19
ID  - Sajon2026
ER  -