Enhanced Hybrid Sampling Technique for Classifying Unevenly Distributed Data to Forecast Student Performance
- DOI
- 10.2991/978-2-38476-408-2_20How to use a DOI?
- Keywords
- imbalanced dataset; Random Forest; undersampling; oversampling
- Abstract
Training an imbalanced dataset may cause classifiers to overfit the majority class, raising the risk of information loss for the minority class. One of the most common roadblocks to establishing successful classification and prediction systems is an uneven distribution of classes in the data obtained. Furthermore, accuracy may not be a good predictor of the classifier’s performance. This study employed SMOTE, SMOTENC, SVM-SMOTE, Tomek Links, Edited Nearest Neighbors, and NearMiss. The Random Forest (RF) classifier is used for each resampling technique to explore its utility in dealing with the unbalanced data problem and predicting students’ graduation grades based on their performance. In the first stage, four performance criteria are utilized to assess the influence of class imbalance on educational data. According to the findings, an RF model was employed to compare the outcomes before and after resampling the data. However, in terms of balanced accuracy and sensitivity, random forest with a random undersampling dataset outperformed all other techniques. When each performance parameter was studied separately, the findings indicated that the classifier in question performed better in terms of specificity, accuracy. Nonetheless, employing the complete resampled dataset enhanced RF classifier performance in terms of balanced accuracy and sensitivity.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Mohamed Bellaj AU - Ahmed Ben Dahmane PY - 2025 DA - 2025/06/20 TI - Enhanced Hybrid Sampling Technique for Classifying Unevenly Distributed Data to Forecast Student Performance BT - Proceedings of the E-Learning and Smart Engineering Systems (ELSES 2024) PB - Atlantis Press SP - 268 EP - 277 SN - 2667-128X UR - https://doi.org/10.2991/978-2-38476-408-2_20 DO - 10.2991/978-2-38476-408-2_20 ID - Bellaj2025 ER -