Human Versus Machine Generated Text Authenticity Detection System
- DOI
- 10.2991/978-94-6463-940-7_27How to use a DOI?
- Keywords
- Text Classification; ANN; DistilBERT; Model Generalization; Transformer Models; NLP Robustness; Tokenization Strategy
- Abstract
With the emergence of AI models like GPT, Deep Fake technologies, differentiating AI and human written text is increasingly complex. These models can produce highly realistic and consistent content that is difficult to detect, creating a growing demand for effective AI versus Human Text Authenticity Detection Systems. These technologies can identify minor deviations in writing style, structure and language patterns through linguistic analysis and machine learning. The technology can be used in many ways in areas like education (e.g. detecting AI-written assignments), media (e.g. authorizing human signatures in news headlines), all the way to cybersecurity (e.g. spam filters of phishing messages that are generated by AI).This research focuses on classifying text as either AI-generated or written by a human. Where, the research dealt with five varying datasets provided by Hugging Face, Google, and Kaggle, and includes variety of text types, like literature, essays and news. DistilBERT achieved 96.32% accuracy on internal validation, surpassing the ANN model’s 90.78%. However, when tested on unseen data, the ANN generalized better with 95.2% accuracy, while DistilBERT’s performance dropped to 49.4%. This shows that in basic text classification simpler models when trained cautiously can adapt more readily than their complicated counterparts.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Akella Hima Bala Padmini AU - G. N. V. G. Sirisha PY - 2025 DA - 2025/12/31 TI - Human Versus Machine Generated Text Authenticity Detection System BT - Proceedings of the Conference on Social and Sustainable Innovation in Technology & Engineering (SASI-ITE 2025) PB - Atlantis Press SP - 363 EP - 373 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-940-7_27 DO - 10.2991/978-94-6463-940-7_27 ID - Padmini2025 ER -