Proceedings of the E-Learning and Smart Engineering Systems (ELSES 2024)

Enhanced Hybrid Sampling Technique for Classifying Unevenly Distributed Data to Forecast Student Performance

Authors
Mohamed Bellaj1, *, Ahmed Ben Dahmane1
1Abdelmalek Essadi University, High Normal School, Tetouan, Morocco
*Corresponding author. Email: mohamed.bellaj@etu.uae.ac.ma
Corresponding Author
Mohamed Bellaj
Available Online 20 June 2025.
DOI
10.2991/978-2-38476-408-2_20How to use a DOI?
Keywords
imbalanced dataset; Random Forest; undersampling; oversampling
Abstract

Training an imbalanced dataset may cause classifiers to overfit the majority class, raising the risk of information loss for the minority class. One of the most common roadblocks to establishing successful classification and prediction systems is an uneven distribution of classes in the data obtained. Furthermore, accuracy may not be a good predictor of the classifier’s performance. This study employed SMOTE, SMOTENC, SVM-SMOTE, Tomek Links, Edited Nearest Neighbors, and NearMiss. The Random Forest (RF) classifier is used for each resampling technique to explore its utility in dealing with the unbalanced data problem and predicting students’ graduation grades based on their performance. In the first stage, four performance criteria are utilized to assess the influence of class imbalance on educational data. According to the findings, an RF model was employed to compare the outcomes before and after resampling the data. However, in terms of balanced accuracy and sensitivity, random forest with a random undersampling dataset outperformed all other techniques. When each performance parameter was studied separately, the findings indicated that the classifier in question performed better in terms of specificity, accuracy. Nonetheless, employing the complete resampled dataset enhanced RF classifier performance in terms of balanced accuracy and sensitivity.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the E-Learning and Smart Engineering Systems (ELSES 2024)
Series
Atlantis Highlights in Social Sciences, Education and Humanities
Publication Date
20 June 2025
ISBN
978-2-38476-408-2
ISSN
2667-128X
DOI
10.2991/978-2-38476-408-2_20How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Mohamed Bellaj
AU  - Ahmed Ben Dahmane
PY  - 2025
DA  - 2025/06/20
TI  - Enhanced Hybrid Sampling Technique for Classifying Unevenly Distributed Data to Forecast Student Performance
BT  - Proceedings of the E-Learning and Smart Engineering Systems (ELSES 2024)
PB  - Atlantis Press
SP  - 268
EP  - 277
SN  - 2667-128X
UR  - https://doi.org/10.2991/978-2-38476-408-2_20
DO  - 10.2991/978-2-38476-408-2_20
ID  - Bellaj2025
ER  -