Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025)

Hybrid Semantic Retrieval: Augmenting Weighted TF–IDF with BERT for Enhanced Question Answering

Authors
Dinesh Kumar Koilada1, *
1Independent Researcher, San Jose, CA, United States
*Corresponding author. Email: dineshkoilada@gmail.com
Corresponding Author
Dinesh Kumar Koilada
Available Online 6 January 2026.
DOI
10.2991/978-94-6463-948-3_26How to use a DOI?
Keywords
Semantic Search; TF–IDF; BERT; Question Answering; Information Retrieval; Hybrid Retrieval; BiLSTM–CRF
Abstract

Question-answering (QA) systems face a difficult trade-off: the speed of inverted indices versus the understanding of neural models. Traditional TF-IDF is fast but brittle when query wording shifts, while BERT offers deep context at a high computational cost. We bridge this divide with a hybrid architecture centered on "questionable spans"—specific text segments statistically likely to hold answers. By training a BiLSTM-CRF model to detect these high-value spans and up-weighting them in a standard TF-IDF index, we create a semantically sharpened candidate set. This allows us to apply expensive BERT re-ranking only where it counts. Experiments on Yahoo! Answers show this approach significantly boosts recall and precision, successfully recovering relevant documents that standard lexical search misses.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025)
Series
Advances in Intelligent Systems Research
Publication Date
6 January 2026
ISBN
978-94-6463-948-3
ISSN
1951-6851
DOI
10.2991/978-94-6463-948-3_26How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Dinesh Kumar Koilada
PY  - 2026
DA  - 2026/01/06
TI  - Hybrid Semantic Retrieval: Augmenting Weighted TF–IDF with BERT for Enhanced Question Answering
BT  - Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025)
PB  - Atlantis Press
SP  - 359
EP  - 365
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-948-3_26
DO  - 10.2991/978-94-6463-948-3_26
ID  - Koilada2026
ER  -