Proceedings of the 2025 International Conference on Hybrid Commerce, Human Capital, and Economic Dynamics (ICHCH 2025)

A Comparative Study on Loan Default Classification with Imbalanced Data Processing

Authors
Ziang Wang1, *
1Department of Mathematics & Statistics, McMaster University, Hamilton, Ontario, L8S 4K1, Canada
*Corresponding author. Email: wangz834@mcmaster.ca
Corresponding Author
Ziang Wang
Available Online 18 June 2026.
DOI
10.2991/978-2-38476-585-0_33How to use a DOI?
Keywords
Weight processing; Logistic Regression; Tree-based Models
Abstract

Credit risk default classification is a cornerstone of modern financial risk management, enabling institutions to optimize lending, allocate capital efficiently, and mitigate losses, with accurate predictions directly impacting financial system stability amid economic volatility. A critical challenge is data imbalance: default samples typically make up only 5–15% of datasets, biasing models toward the majority class and harming recall, the key metric for minimizing losses. This study compares four models (Logistic Regression, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Random Forest) combined with four imbalance-handling methods, using Accuracy, Recall, and F1 score as metrics. Results show tree-based models outperform Logistic Regression across metrics. For Logistic Regression, class weighting effectively improves recall; for tree-based models, class weighting boosts recall but slightly reduces F1, while Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) enhance F1 but risk noise. These findings highlight optimal strategies, with future work needed on ensemble methods and interpretability to refine credit risk assessment.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2025 International Conference on Hybrid Commerce, Human Capital, and Economic Dynamics (ICHCH 2025)
Series
Advances in Economics, Business and Management Research
Publication Date
18 June 2026
ISBN
978-2-38476-585-0
ISSN
2352-5428
DOI
10.2991/978-2-38476-585-0_33How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Ziang Wang
PY  - 2026
DA  - 2026/06/18
TI  - A Comparative Study on Loan Default Classification with Imbalanced Data Processing
BT  - Proceedings of the 2025 International Conference on Hybrid Commerce, Human Capital, and Economic Dynamics (ICHCH 2025)
PB  - Atlantis Press
SP  - 280
EP  - 287
SN  - 2352-5428
UR  - https://doi.org/10.2991/978-2-38476-585-0_33
DO  - 10.2991/978-2-38476-585-0_33
ID  - Wang2026
ER  -