Smart Text-to-Speech System for Visually Impaired Users: A Hybrid AI-Based Solution
- DOI
- 10.2991/978-94-6463-948-3_20How to use a DOI?
- Keywords
- Optical Character Recognition; AI-based Image Descrip- tion; Text-to-Speech; PaddleOCR; TrOCR; Google Vision API; Hybrid AI; Accessibility; Visually Impaired Assistance
- Abstract
This paper presents a new hybrid AI-driven assistive system aimed at benefiting visually impaired people by rendering visual images into good-quality audio descriptions. The system consists of several Op- tical Character Recognition (OCR) engines, AI-driven image description models, and state-of-the-art Text-to-Speech (TTS) synthesis to provide accurate and real-time access. The OCR system leverages PaddleOCR for offline recognition, TrOCR for deep learning OCR, and Google Vision API for high-precision cloud computing. A dynamic switching mecha- nism makes the most suitable OCR strategy choice based on confidence scores, image complexity, and access to the internet. In the absence of text data, Vision-Language Models (VLM) like BLIP-2 generate contex- tual descriptions. The extracted or synthesized texts are then converted into natural-like speech using Google WaveNet or VITS. This combina- tion strategy ensures high accuracy, efficiency, and flexibility in various environments. This report outlines the system’s architecture, method- ology, implementation, and evaluation, illustrating its performance in practical applications.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Sunita Patil AU - Aaryan Pawar AU - Pranit Pawar AU - Aditya Kulkarni AU - Vedant Gaikwad AU - Shubhangi Vairagar AU - Chetana Shravage AU - Priya Metri PY - 2026 DA - 2026/01/06 TI - Smart Text-to-Speech System for Visually Impaired Users: A Hybrid AI-Based Solution BT - Proceedings of the International Conference on Sustainable Innovation with Artificial Intelligence and Machine Learning 2025 (ICSIAIML 2025) PB - Atlantis Press SP - 287 EP - 296 SN - 1951-6851 UR - https://doi.org/10.2991/978-94-6463-948-3_20 DO - 10.2991/978-94-6463-948-3_20 ID - Patil2026 ER -