Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025)

Document Summarizer: A Machine Learning Approach to PDF Summarization

Authors
Prajakta Dhamdhere1, *, Aarti Sardhara2, Piyush Dhoka3, Vedant Pandhare4, Varun Inamdar5, Shriram Dixit6
1Department of Artificial Intelligence, Vishwakarma University, Pune, India
2Department of Computer Engineering, Vishwakarma University, Pune, India
3Department of Artificial Intelligence, Vishwakarma University, Pune, India
4Department of Artificial Intelligence, Vishwakarma University, Pune, India
5Department of Artificial Intelligence, Vishwakarma University, Pune, India
6Department of Artificial Intelligence, Vishwakarma University, Pune, India
*Corresponding author. Email: prajakta.dhamdhere@vupune.ac.in
Corresponding Author
Prajakta Dhamdhere
Available Online 6 January 2026.
DOI
10.2991/978-94-6463-948-3_50How to use a DOI?
Keywords
Abstractive; Embedding; Extractive; K-Means Clustering; Sentence Transformer
Abstract

To justify the need for summarizing and extracting information efficiently in right ways, this paper highlights the growing challenge posed by the increasing number of PDF files. Reading lengthy documents is a tedious and time consuming task in many sectors. To save time and quickly comprehend the key points out of a PDF, a PDF summarizer tool has been developed to tackle these issues reducing the document size reasonably without losing actual contents. In today's professional environments, gathering and managing data from documents is critical to cite the exact semantics in right way. This article introduces an innovative solution called ‘DocSum’, automates the process of summarizing extracted data reducing size of document up to 70%. The system facilitates users a user friendly interface that encourages interaction and engagement, utilizing Artificial Intelligence and machine learning techniques to streamline document handling. Users can request specific sum- maries, enabling efficient document management workflows. By empowering users to seamlessly interact with vast amounts of information, ‘DocSum’ enhances productivity and explores new ways to optimize document management with respect to retrieve effective summaries. Such a solution fits the requirements of a professional who wants to be up-to-date with data management in an efficient way.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025)
Series
Advances in Intelligent Systems Research
Publication Date
6 January 2026
ISBN
978-94-6463-948-3
ISSN
1951-6851
DOI
10.2991/978-94-6463-948-3_50How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Prajakta Dhamdhere
AU  - Aarti Sardhara
AU  - Piyush Dhoka
AU  - Vedant Pandhare
AU  - Varun Inamdar
AU  - Shriram Dixit
PY  - 2026
DA  - 2026/01/06
TI  - Document Summarizer: A Machine Learning Approach to PDF Summarization
BT  - Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025)
PB  - Atlantis Press
SP  - 720
EP  - 731
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-948-3_50
DO  - 10.2991/978-94-6463-948-3_50
ID  - Dhamdhere2026
ER  -