Proceedings of the 5th International Conference on Applied Sciences, Mathematics, and Informatics (ICASMI 2024)

Classification Of Imbalanced Data On Crotonylation Sites Using Lightgbm With Adasyn Oversampling

Authors
Favorisen R. Lumbanraja1, *, Ardella Dean Awalia1, Sri Karnila2, Mohammad Reza Faisal3, Aristoteles1, Akmal Junaidi1
1Computer Science, Faculty of Mathematics and Natural Sciences, Lampung University, Bandar Lampung, Indonesia
2Doctoral Program, Faculty of Mathematics and Natural Sciences, Lampung University, Bandar Lampung, Indonesia
3Computer Science, Faculty of Mathematics and Natural Sciences, Lambung Mangkurat University, Banjarbaru, Indonesia
*Corresponding author. Email: favorisen.lumbanraja@fmipa.unila.ac.id
Corresponding Author
Favorisen R. Lumbanraja
Available Online 27 May 2025.
DOI
10.2991/978-94-6463-730-4_9How to use a DOI?
Keywords
Classification; crotonylation; oversampling; ADASYN; LightGBM
Abstract

Histone crotonylation is one of newly identified post-translational modification. It may lead to changes in structure and function of proteins due to its power in gene expression regulation. Various kind of diseases have been found to be associated with this modification. That makes it indispensable to have reliable method that can accurately recognize this new type of acylation easily and quickly. This study was performed using LightGBM to compare the classification results between with and without the use of ADASYN in dealing with imbalanced problem between positive and negative class. The data was collected from the UniProt database consisting 1006 sequences in total. A combination of five feature extraction methods was used to transform raw sequence into feature vectors. We employed 5-fold cross-validation to determine the optimal hyperparameter combinations during model training. Subsequently, the best model was selected to evaluate the test data in model testing. The results suggest that classification using oversampled data yields a higher MCC than using original imbalanced data. However, classification by applying ADASYN did not give statistically significant difference.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 5th International Conference on Applied Sciences, Mathematics, and Informatics (ICASMI 2024)
Series
Advances in Physics Research
Publication Date
27 May 2025
ISBN
978-94-6463-730-4
ISSN
2352-541X
DOI
10.2991/978-94-6463-730-4_9How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Favorisen R. Lumbanraja
AU  - Ardella Dean Awalia
AU  - Sri Karnila
AU  - Mohammad Reza Faisal
AU  - Aristoteles
AU  - Akmal Junaidi
PY  - 2025
DA  - 2025/05/27
TI  - Classification Of Imbalanced Data On Crotonylation Sites Using Lightgbm With Adasyn Oversampling
BT  - Proceedings of the 5th International Conference on Applied Sciences, Mathematics, and Informatics (ICASMI 2024)
PB  - Atlantis Press
SP  - 93
EP  - 107
SN  - 2352-541X
UR  - https://doi.org/10.2991/978-94-6463-730-4_9
DO  - 10.2991/978-94-6463-730-4_9
ID  - Lumbanraja2025
ER  -