<Previous Article In Volume
Artificial Intelligence Based PDF and Document Extractor Using Retrieval Augmented Generation
Corresponding Author
Swanand Kulkarni
Available Online 31 August 2025.
- DOI
- 10.2991/978-94-6463-831-8_52How to use a DOI?
- Keywords
- Artificial Intelligence(AI); Retrieval Augmented Generation(RAG); Natural Language Processing(NLP); Large Language Models(LLM)
- Abstract
This paper introduces a RAG system for extracting semantically meaningful information from PDFs and DOCX documents. It employs a Retriever with Gemini Embeddings and FAISS indexing, and a Generator on top of Gemini 2.0 Flash for context-aware, rapid response. Evaluated on 30 documents, the system is superior to conventional approaches in semantic accuracy and retrieval precision. It is modular, scalable, and applicable to diverse real-world document understanding tasks, and has scope for future real-time and multimodal advancements.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
<Previous Article In Volume
Cite this article
TY - CONF AU - Swanand Kulkarni AU - Kalpana Thakre AU - Varad Kulkarni AU - Aneesh Pandit AU - Atharva Naik PY - 2025 DA - 2025/08/31 TI - Artificial Intelligence Based PDF and Document Extractor Using Retrieval Augmented Generation BT - Proceeding of the 1st International Conference on Lifespan Innovation (ICLI 2025) PB - Atlantis Press SP - 428 EP - 435 SN - 2468-5739 UR - https://doi.org/10.2991/978-94-6463-831-8_52 DO - 10.2991/978-94-6463-831-8_52 ID - Kulkarni2025 ER -