A Comparative Study on Loan Default Classification with Imbalanced Data Processing
- DOI
- 10.2991/978-2-38476-585-0_33How to use a DOI?
- Keywords
- Weight processing; Logistic Regression; Tree-based Models
- Abstract
Credit risk default classification is a cornerstone of modern financial risk management, enabling institutions to optimize lending, allocate capital efficiently, and mitigate losses, with accurate predictions directly impacting financial system stability amid economic volatility. A critical challenge is data imbalance: default samples typically make up only 5–15% of datasets, biasing models toward the majority class and harming recall, the key metric for minimizing losses. This study compares four models (Logistic Regression, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Random Forest) combined with four imbalance-handling methods, using Accuracy, Recall, and F1 score as metrics. Results show tree-based models outperform Logistic Regression across metrics. For Logistic Regression, class weighting effectively improves recall; for tree-based models, class weighting boosts recall but slightly reduces F1, while Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) enhance F1 but risk noise. These findings highlight optimal strategies, with future work needed on ensemble methods and interpretability to refine credit risk assessment.
- Copyright
- © 2026 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Ziang Wang PY - 2026 DA - 2026/06/18 TI - A Comparative Study on Loan Default Classification with Imbalanced Data Processing BT - Proceedings of the 2025 International Conference on Hybrid Commerce, Human Capital, and Economic Dynamics (ICHCH 2025) PB - Atlantis Press SP - 280 EP - 287 SN - 2352-5428 UR - https://doi.org/10.2991/978-2-38476-585-0_33 DO - 10.2991/978-2-38476-585-0_33 ID - Wang2026 ER -