Impact of Pre-processing for Marathi Text Classification using SVM and NB
- DOI
- 10.2991/978-94-6463-858-5_74How to use a DOI?
- Keywords
- Natural Language Processing (NLP); Machine Learning; Text Classification; Support Vector Machine (SVM); Naïve Bayes (NB)
- Abstract
Digital content in regional languages like Marathi is increasing every second on various online platforms, making it difficult to categorize manually. Marathi, a morphologically rich language, presents unique challenges for automated text classification. This study examines the performance of Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers on the L3Cube-MahaNews Marathi dataset, emphasizing the impact of stemming for text classification. Model performance was assessed using Precision, Recall, Accuracy, and F-Measure. Experimental results demonstrate that stemming significantly enhances classification accuracy and observed a 20% improvement for SVM and 10% for NB. SVM consistently outperforms NB among the tested algorithms across various train-test split combinations. This paper highlights the underexplored role of stemming in Marathi text classification and emphasizes the effectiveness of SVM in handling complex linguistic data. Future work will expand the analysis by incorporating additional machine learning algorithms on the LDC-L3Cube-MahaNews dataset to optimize Marathi text classification accuracy further.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Madhuri P. Narkhede AU - Harshali B. Patil PY - 2025 DA - 2025/11/04 TI - Impact of Pre-processing for Marathi Text Classification using SVM and NB BT - Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025) PB - Atlantis Press SP - 875 EP - 885 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6463-858-5_74 DO - 10.2991/978-94-6463-858-5_74 ID - Narkhede2025 ER -