Document Summarizer: A Machine Learning Approach to PDF Summarization
- DOI
- 10.2991/978-94-6463-948-3_50How to use a DOI?
- Keywords
- Abstractive; Embedding; Extractive; K-Means Clustering; Sentence Transformer
- Abstract
To justify the need for summarizing and extracting information efficiently in right ways, this paper highlights the growing challenge posed by the increasing number of PDF files. Reading lengthy documents is a tedious and time consuming task in many sectors. To save time and quickly comprehend the key points out of a PDF, a PDF summarizer tool has been developed to tackle these issues reducing the document size reasonably without losing actual contents. In today's professional environments, gathering and managing data from documents is critical to cite the exact semantics in right way. This article introduces an innovative solution called ‘DocSum’, automates the process of summarizing extracted data reducing size of document up to 70%. The system facilitates users a user friendly interface that encourages interaction and engagement, utilizing Artificial Intelligence and machine learning techniques to streamline document handling. Users can request specific sum- maries, enabling efficient document management workflows. By empowering users to seamlessly interact with vast amounts of information, ‘DocSum’ enhances productivity and explores new ways to optimize document management with respect to retrieve effective summaries. Such a solution fits the requirements of a professional who wants to be up-to-date with data management in an efficient way.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Prajakta Dhamdhere AU - Aarti Sardhara AU - Piyush Dhoka AU - Vedant Pandhare AU - Varun Inamdar AU - Shriram Dixit PY - 2026 DA - 2026/01/06 TI - Document Summarizer: A Machine Learning Approach to PDF Summarization BT - Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025) PB - Atlantis Press SP - 720 EP - 731 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-948-3_50 DO - 10.2991/978-94-6463-948-3_50 ID - Dhamdhere2026 ER -