A Comparative Analysis of Machine Learning Models for Predicting Loan Defaults under Imbalanced Data Conditions
- DOI
- 10.2991/978-2-38476-585-0_25How to use a DOI?
- Keywords
- LightGBM; XGBoost; Predicting Loan Defaults
- Abstract
Accurate loan default prediction is crucial for credit risk management. This study used Kaggle’s Loan Default Prediction dataset, applying preprocessing (cleaning, encoding, scaling) and testing six models: Logistic Regression, Decision Trees, Random Forest, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and a neural network. Performance was evaluated using precision, recall, F1-score, and Receiver Operating Characteristic (AUC-ROC), with emphasis on handling class imbalance. Gradient boosting (especially LightGBM, AUC ~0.76) outperformed linear and tree-based models. Adjusting XGBoost’s decision threshold improved the minority class F1-score from 0.17 to 0.36 without losing Area Under the Curve (AUC). Ensemble methods like Voting Classifiers balanced recall and precision effectively. Key takeaways include model selection, threshold tuning, and ensemble strategies for imbalanced data. Future work could explore richer features, imbalance-aware loss functions, and explainability tools for transparency. This study provides a reproducible benchmark for default prediction, laying the groundwork for more robust, interpretable, and fair credit scoring systems in real-world financial applications.
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Yi Zhou PY - 2026 DA - 2026/06/18 TI - A Comparative Analysis of Machine Learning Models for Predicting Loan Defaults under Imbalanced Data Conditions BT - Proceedings of the 2025 International Conference on Hybrid Commerce, Human Capital, and Economic Dynamics (ICHCH 2025) PB - Atlantis Press SP - 213 EP - 220 SN - 2352-5428 UR - https://doi.org/10.2991/978-2-38476-585-0_25 DO - 10.2991/978-2-38476-585-0_25 ID - Zhou2026 ER -