Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)

RAG-Driven Scholarly Assistant: Automating Research Paper Analysis with Open-Source LLM Benchmarking

Authors
Irtefa Waseek1, *, Md Rezaul Karim1, Md Efatuzzaman Efat2, Sumiya Afrose3
1Department of Statistics and Data Science, Jahangirnagar University, Savar, Dhaka, 1342, Bangladesh
2Institute of Information and Communication Technology (IICT), University of Engineering and Technology (BUET), Dhaka, 1000, Bangladesh
3Department of Robotics and Mechatronics Engineering, University of Dhaka, Dhaka, 1000, Bangladesh
*Corresponding author. Email: waseekirtefa@gmail.com
Corresponding Author
Irtefa Waseek
Available Online 8 June 2026.
DOI
10.2991/978-94-6239-664-7_77How to use a DOI?
Keywords
Retrieval-Augmented Generation (RAG); Large Language Models (LLMs); Automated Literature Review; AI-Driven Document Analysis; OCR; Citation Network Analysis; Benchmarking; NLP
Abstract

This work introduces a Retrieval-Augmented Generation (RAG)-based scholarly assistant, for automated reading of papers, which benchmarks several open-source LLMs. The developed system uses a pipeline of document processing, citation and structural analysis, and LLM-based question-answering to produce summaries and insights from academic literature. The benchmarking is done using a range of quantitative metrics majorly BLEU, METEOR, ROUGE scores. Other parameters like Perplexity, factual consistency and computational resource usage are also taken into consideration. The evaluation report is generated by the tool and provide downloadable CSV file. Visual demonstration of the data is also included in the user interface. Our developed toolkit is assessed on five domain-specific research papers (in medicine, literature, economics, computer science and mathematics) ensuring an even comparison across domains. It has been observed that the smaller RAG-based models (DeepSeek-1.5B, 8B), responds faster while exhibiting average higher factual consistency. On the contrary, the larger generative models (Mistral-7B and LLaMA3-8B) provide more detailed answers with higher overlaps. However, it costs higher computation and occasional factually inaccurate outputs. This extensive evaluation bolsters the potential for an open scholarly assistant. Furthermore, it leaves a much clearer impression of domain dependent challenges and strengths as well as a set of directions for future advancements.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
Series
Advances in Intelligent Systems Research
Publication Date
8 June 2026
ISBN
978-94-6239-664-7
ISSN
1951-6851
DOI
10.2991/978-94-6239-664-7_77How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Irtefa Waseek
AU  - Md Rezaul Karim
AU  - Md Efatuzzaman Efat
AU  - Sumiya Afrose
PY  - 2026
DA  - 2026/06/08
TI  - RAG-Driven Scholarly Assistant: Automating Research Paper Analysis with Open-Source LLM Benchmarking
BT  - Proceedings of the International Conference on Intelligent Data Analysis and Applications (IDAA 2025)
PB  - Atlantis Press
SP  - 1127
EP  - 1143
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-664-7_77
DO  - 10.2991/978-94-6239-664-7_77
ID  - Waseek2026
ER  -