Proceedings of the International Conference on Smart Systems and Social Management (ICSSSM-2 2025)

Enhancing Gender Prediction in Consolidated College Admission Records using Machine Learning Models through Data Transformation

Authors
Tripti Barnwal1, Dillip Rout2, *, Abinas Panda3, B. Ravi Kiran Patnaik4, Deepshikha Routray5
1Dept. of Computer Science and Engineering, C.V. Raman Global University, Bhubaneswar, India
2Dept. of Computer Science and Engineering, Royal Global University, Guwahati, India
3School of Computer Engineering, Kalinga Institu te of Industrial Technology (KIIT) Deemed to be University, Bhubaneswar, India
4Dept. of Computer Science and Engineering, The ICFAI University, Raipur, India
5P.G. Dept. of English, S.C.S. Autonomous College, Puri, India
*Corresponding author. Email: dillip.rout.iitb@gmail.com
Corresponding Author
Dillip Rout
Available Online 31 December 2025.
DOI
10.2991/978-2-38476-533-1_56How to use a DOI?
Keywords
Textual classification; Random forest; CNN; Class imbalance; Data normalization
Abstract

This study presents an extensive analysis of gender classification in college admission data. A wide range of algorithms: Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), Random Forest (RF), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), K-Nearest Neighbor (KNN), XG-Boost (XGB), and AdaBoost (ADB) are investigated for the suitability of the predication. Essentially, traditional Machine Learning (ML) and Deep Learning (DL) models are considered for a comprehensive comparison in this study. In this context, data preprocessing techniques such as data profiling, scaling, one-hot encoding, and addressing class imbalance are emphasized to enhance model performance. The influence of data preparation is assessed by evaluating the deployed models under an ablation study on without and with data transformation. Furthermore, a 5-fold crossvalidation is employed to suppress the possible biasing of the model’s performance. Also, each model is subjected to varying test sizes (20%, 30%, and 40%) to gauge the efficiency under multiple classification metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC score, and Cross - Validation Score (CV) score. The results indicate that data transformation yields a significant improvement in performance. All the ensemble models: RF, XGB, and ADB have demonstrated high CV scores (more than 70%). Nevertheless, RF stands out as most suitable in this context due to its high scores with transformed data, a more than 80% CV score, and consistency under all conditions, including varying test sizes and several metrics. The findings of the proposed research will provide valuable insights for future gender classification in education data and beyond.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Smart Systems and Social Management (ICSSSM-2 2025)
Series
Advances in Social Science, Education and Humanities Research
Publication Date
31 December 2025
ISBN
978-2-38476-533-1
ISSN
2352-5398
DOI
10.2991/978-2-38476-533-1_56How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Tripti Barnwal
AU  - Dillip Rout
AU  - Abinas Panda
AU  - B. Ravi Kiran Patnaik
AU  - Deepshikha Routray
PY  - 2025
DA  - 2025/12/31
TI  - Enhancing Gender Prediction in Consolidated College Admission Records using Machine Learning Models through Data Transformation
BT  - Proceedings of the International Conference on Smart Systems and Social Management (ICSSSM-2 2025)
PB  - Atlantis Press
SP  - 940
EP  - 959
SN  - 2352-5398
UR  - https://doi.org/10.2991/978-2-38476-533-1_56
DO  - 10.2991/978-2-38476-533-1_56
ID  - Barnwal2025
ER  -