Enhancing Gender Prediction in Consolidated College Admission Records using Machine Learning Models through Data Transformation
- DOI
- 10.2991/978-2-38476-533-1_56How to use a DOI?
- Keywords
- Textual classification; Random forest; CNN; Class imbalance; Data normalization
- Abstract
This study presents an extensive analysis of gender classification in college admission data. A wide range of algorithms: Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), Random Forest (RF), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), K-Nearest Neighbor (KNN), XG-Boost (XGB), and AdaBoost (ADB) are investigated for the suitability of the predication. Essentially, traditional Machine Learning (ML) and Deep Learning (DL) models are considered for a comprehensive comparison in this study. In this context, data preprocessing techniques such as data profiling, scaling, one-hot encoding, and addressing class imbalance are emphasized to enhance model performance. The influence of data preparation is assessed by evaluating the deployed models under an ablation study on without and with data transformation. Furthermore, a 5-fold crossvalidation is employed to suppress the possible biasing of the model’s performance. Also, each model is subjected to varying test sizes (20%, 30%, and 40%) to gauge the efficiency under multiple classification metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC score, and Cross - Validation Score (CV) score. The results indicate that data transformation yields a significant improvement in performance. All the ensemble models: RF, XGB, and ADB have demonstrated high CV scores (more than 70%). Nevertheless, RF stands out as most suitable in this context due to its high scores with transformed data, a more than 80% CV score, and consistency under all conditions, including varying test sizes and several metrics. The findings of the proposed research will provide valuable insights for future gender classification in education data and beyond.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Tripti Barnwal AU - Dillip Rout AU - Abinas Panda AU - B. Ravi Kiran Patnaik AU - Deepshikha Routray PY - 2025 DA - 2025/12/31 TI - Enhancing Gender Prediction in Consolidated College Admission Records using Machine Learning Models through Data Transformation BT - Proceedings of the International Conference on Smart Systems and Social Management (ICSSSM-2 2025) PB - Atlantis Press SP - 940 EP - 959 SN - 2352-5398 UR - https://doi.org/10.2991/978-2-38476-533-1_56 DO - 10.2991/978-2-38476-533-1_56 ID - Barnwal2025 ER -