Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)

Impact of Pre-processing for Marathi Text Classification using SVM and NB

Authors
Madhuri P. Narkhede1, *, Harshali B. Patil2
1Department of Computer Science and Information Technology Smt, Devkiba Mohansinhji Chauhan College of Commerce & Sciences, Silvassa, UT of DD & DNH, 396230, India
2Department of Computer Science, Dr. Annasaheb G. D. Bendale Mahila Mahavidyalaya, Jalgaon, Maharashtra, India
*Corresponding author. Email: madhurinarkhede.research@gmail.com
Corresponding Author
Madhuri P. Narkhede
Available Online 4 November 2025.
DOI
10.2991/978-94-6463-858-5_74How to use a DOI?
Keywords
Natural Language Processing (NLP); Machine Learning; Text Classification; Support Vector Machine (SVM); Naïve Bayes (NB)
Abstract

Digital content in regional languages like Marathi is increasing every second on various online platforms, making it difficult to categorize manually. Marathi, a morphologically rich language, presents unique challenges for automated text classification. This study examines the performance of Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers on the L3Cube-MahaNews Marathi dataset, emphasizing the impact of stemming for text classification. Model performance was assessed using Precision, Recall, Accuracy, and F-Measure. Experimental results demonstrate that stemming significantly enhances classification accuracy and observed a 20% improvement for SVM and 10% for NB. SVM consistently outperforms NB among the tested algorithms across various train-test split combinations. This paper highlights the underexplored role of stemming in Marathi text classification and emphasizes the effectiveness of SVM in handling complex linguistic data. Future work will expand the analysis by incorporating additional machine learning algorithms on the LDC-L3Cube-MahaNews dataset to optimize Marathi text classification accuracy further.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
Series
Advances in Computer Science Research
Publication Date
4 November 2025
ISBN
978-94-6463-858-5
ISSN
2352-538X
DOI
10.2991/978-94-6463-858-5_74How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Madhuri P. Narkhede
AU  - Harshali B. Patil
PY  - 2025
DA  - 2025/11/04
TI  - Impact of Pre-processing for Marathi Text Classification using SVM and NB
BT  - Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
PB  - Atlantis Press
SP  - 875
EP  - 885
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-858-5_74
DO  - 10.2991/978-94-6463-858-5_74
ID  - Narkhede2025
ER  -