Proceedings of the 2025 International Conference on Hybrid Commerce, Human Capital, and Economic Dynamics (ICHCH 2025)

A Comparative Analysis of Machine Learning Models for Predicting Loan Defaults under Imbalanced Data Conditions

Authors
Yi Zhou1, *
1Department of Mathematics, Imperial College London, London, UK
*Corresponding author. Email: mz2023@ic.ac.uk
Corresponding Author
Yi Zhou
Available Online 18 June 2026.
DOI
10.2991/978-2-38476-585-0_25How to use a DOI?
Keywords
LightGBM; XGBoost; Predicting Loan Defaults
Abstract

Accurate loan default prediction is crucial for credit risk management. This study used Kaggle’s Loan Default Prediction dataset, applying preprocessing (cleaning, encoding, scaling) and testing six models: Logistic Regression, Decision Trees, Random Forest, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and a neural network. Performance was evaluated using precision, recall, F1-score, and Receiver Operating Characteristic (AUC-ROC), with emphasis on handling class imbalance. Gradient boosting (especially LightGBM, AUC ~0.76) outperformed linear and tree-based models. Adjusting XGBoost’s decision threshold improved the minority class F1-score from 0.17 to 0.36 without losing Area Under the Curve (AUC). Ensemble methods like Voting Classifiers balanced recall and precision effectively. Key takeaways include model selection, threshold tuning, and ensemble strategies for imbalanced data. Future work could explore richer features, imbalance-aware loss functions, and explainability tools for transparency. This study provides a reproducible benchmark for default prediction, laying the groundwork for more robust, interpretable, and fair credit scoring systems in real-world financial applications.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2025 International Conference on Hybrid Commerce, Human Capital, and Economic Dynamics (ICHCH 2025)
Series
Advances in Economics, Business and Management Research
Publication Date
18 June 2026
ISBN
978-2-38476-585-0
ISSN
2352-5428
DOI
10.2991/978-2-38476-585-0_25How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Yi Zhou
PY  - 2026
DA  - 2026/06/18
TI  - A Comparative Analysis of Machine Learning Models for Predicting Loan Defaults under Imbalanced Data Conditions
BT  - Proceedings of the 2025 International Conference on Hybrid Commerce, Human Capital, and Economic Dynamics (ICHCH 2025)
PB  - Atlantis Press
SP  - 213
EP  - 220
SN  - 2352-5428
UR  - https://doi.org/10.2991/978-2-38476-585-0_25
DO  - 10.2991/978-2-38476-585-0_25
ID  - Zhou2026
ER  -