Classification Of Imbalanced Data On Crotonylation Sites Using Lightgbm With Adasyn Oversampling
- DOI
- 10.2991/978-94-6463-730-4_9How to use a DOI?
- Keywords
- Classification; crotonylation; oversampling; ADASYN; LightGBM
- Abstract
Histone crotonylation is one of newly identified post-translational modification. It may lead to changes in structure and function of proteins due to its power in gene expression regulation. Various kind of diseases have been found to be associated with this modification. That makes it indispensable to have reliable method that can accurately recognize this new type of acylation easily and quickly. This study was performed using LightGBM to compare the classification results between with and without the use of ADASYN in dealing with imbalanced problem between positive and negative class. The data was collected from the UniProt database consisting 1006 sequences in total. A combination of five feature extraction methods was used to transform raw sequence into feature vectors. We employed 5-fold cross-validation to determine the optimal hyperparameter combinations during model training. Subsequently, the best model was selected to evaluate the test data in model testing. The results suggest that classification using oversampled data yields a higher MCC than using original imbalanced data. However, classification by applying ADASYN did not give statistically significant difference.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Favorisen R. Lumbanraja AU - Ardella Dean Awalia AU - Sri Karnila AU - Mohammad Reza Faisal AU - Aristoteles AU - Akmal Junaidi PY - 2025 DA - 2025/05/27 TI - Classification Of Imbalanced Data On Crotonylation Sites Using Lightgbm With Adasyn Oversampling BT - Proceedings of the 5th International Conference on Applied Sciences, Mathematics, and Informatics (ICASMI 2024) PB - Atlantis Press SP - 93 EP - 107 SN - 2352-541X UR - https://doi.org/10.2991/978-94-6463-730-4_9 DO - 10.2991/978-94-6463-730-4_9 ID - Lumbanraja2025 ER -