Impact of Pre-processing for Marathi Text Classification using SVM and NB

Madhuri P. Narkhede; Harshali B. Patil

doi:10.2991/978-94-6463-858-5_74

<Previous Article In Volume

Next Article In Volume>

Impact of Pre-processing for Marathi Text Classification using SVM and NB

Authors

Madhuri P. Narkhede¹^{, *}, Harshali B. Patil²

¹Department of Computer Science and Information Technology Smt, Devkiba Mohansinhji Chauhan College of Commerce & Sciences, Silvassa, UT of DD & DNH, 396230, India

²Department of Computer Science, Dr. Annasaheb G. D. Bendale Mahila Mahavidyalaya, Jalgaon, Maharashtra, India

^*Corresponding author. Email: madhurinarkhede.research@gmail.com

Corresponding Author

Madhuri P. Narkhede

Available Online 4 November 2025.

DOI: 10.2991/978-94-6463-858-5_74 How to use a DOI?
Keywords: Natural Language Processing (NLP); Machine Learning; Text Classification; Support Vector Machine (SVM); Naïve Bayes (NB)
Abstract: Digital content in regional languages like Marathi is increasing every second on various online platforms, making it difficult to categorize manually. Marathi, a morphologically rich language, presents unique challenges for automated text classification. This study examines the performance of Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers on the L3Cube-MahaNews Marathi dataset, emphasizing the impact of stemming for text classification. Model performance was assessed using Precision, Recall, Accuracy, and F-Measure. Experimental results demonstrate that stemming significantly enhances classification accuracy and observed a 20% improvement for SVM and 10% for NB. SVM consistently outperforms NB among the tested algorithms across various train-test split combinations. This paper highlights the underexplored role of stemming in Marathi text classification and emphasizes the effectiveness of SVM in handling complex linguistic data. Future work will expand the analysis by incorporating additional machine learning algorithms on the LDC-L3Cube-MahaNews dataset to optimize Marathi text classification accuracy further.
Copyright: © 2025 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
Series: Advances in Computer Science Research
Publication Date: 4 November 2025
ISBN: 978-94-6463-858-5
ISSN: 2352-538X
DOI: 10.2991/978-94-6463-858-5_74 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Madhuri P. Narkhede
AU  - Harshali B. Patil
PY  - 2025
DA  - 2025/11/04
TI  - Impact of Pre-processing for Marathi Text Classification using SVM and NB
BT  - Proceedings of International Conference on Computer Science and Communication Engineering (ICCSCE 2025)
PB  - Atlantis Press
SP  - 875
EP  - 885
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-858-5_74
DO  - 10.2991/978-94-6463-858-5_74
ID  - Narkhede2025
ER  -

download .riscopy to clipboard