Smart Text-to-Speech System for Visually Impaired Users: A Hybrid AI-Based Solution

Sunita Patil; Aaryan Pawar; Pranit Pawar; Aditya Kulkarni; Vedant Gaikwad; Shubhangi Vairagar; Chetana Shravage; Priya Metri

doi:10.2991/978-94-6463-948-3_20

<Previous Article In Volume

Next Article In Volume>

Smart Text-to-Speech System for Visually Impaired Users: A Hybrid AI-Based Solution

Authors

Sunita Patil¹^{, *}, Aaryan Pawar¹, Pranit Pawar¹, Aditya Kulkarni¹, Vedant Gaikwad¹, Shubhangi Vairagar¹, Chetana Shravage¹, Priya Metri¹

¹Department of Computer Engineering, Dr. D. Y. Patil Institute of Technology, Pimpri, India

^*Corresponding author. Email: Sunitapatil6@gmail.com

Corresponding Author

Sunita Patil

Available Online 6 January 2026.

DOI: 10.2991/978-94-6463-948-3_20 How to use a DOI?
Keywords: Optical Character Recognition; AI-based Image Descrip- tion; Text-to-Speech; PaddleOCR; TrOCR; Google Vision API; Hybrid AI; Accessibility; Visually Impaired Assistance
Abstract: This paper presents a new hybrid AI-driven assistive system aimed at benefiting visually impaired people by rendering visual images into good-quality audio descriptions. The system consists of several Op- tical Character Recognition (OCR) engines, AI-driven image description models, and state-of-the-art Text-to-Speech (TTS) synthesis to provide accurate and real-time access. The OCR system leverages PaddleOCR for offline recognition, TrOCR for deep learning OCR, and Google Vision API for high-precision cloud computing. A dynamic switching mecha- nism makes the most suitable OCR strategy choice based on confidence scores, image complexity, and access to the internet. In the absence of text data, Vision-Language Models (VLM) like BLIP-2 generate contex- tual descriptions. The extracted or synthesized texts are then converted into natural-like speech using Google WaveNet or VITS. This combina- tion strategy ensures high accuracy, efficiency, and flexibility in various environments. This report outlines the system’s architecture, method- ology, implementation, and evaluation, illustrating its performance in practical applications.
Copyright: © 2025 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025)
Series: Advances in Intelligent Systems Research
Publication Date: 6 January 2026
ISBN: 978-94-6463-948-3
ISSN: 1951-6851
DOI: 10.2991/978-94-6463-948-3_20 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Sunita Patil
AU  - Aaryan Pawar
AU  - Pranit Pawar
AU  - Aditya Kulkarni
AU  - Vedant Gaikwad
AU  - Shubhangi Vairagar
AU  - Chetana Shravage
AU  - Priya Metri
PY  - 2026
DA  - 2026/01/06
TI  - Smart Text-to-Speech System for Visually Impaired Users: A Hybrid AI-Based Solution
BT  - Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025)
PB  - Atlantis Press
SP  - 287
EP  - 296
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-948-3_20
DO  - 10.2991/978-94-6463-948-3_20
ID  - Patil2026
ER  -

download .riscopy to clipboard