This project aims to support individuals with varying degrees of visual impairment, enabling them to access textual information contained within images. Shown below is the pipeline of this project:
For each mode, we have developed a jupyter notebook to provide a more interactive overview of our study.
For Scene-Text Detection, we have used TextOCR dataset. Please download the data from here. We compared three OCR tools:
The data folder also includes our results on text detection and method comparison in pickle format which can be loaded for inspection in the Scene-Text Detection notebook.
For Handwritten-Text Detection, we used IAM Handwriting Database. Please download the data from here. For training, we fine-tuned HuggingFace TrOCR base handwritten model with IAM Handwriting Database.
Trained model can be downloaded from here.
For Text-to-Speech Conversion, we used Google Text-to-Speech (gTTS) to tranform identified text into speech.
We provide two Python scripts, with the complete, end-to-end pipeline for each text-detection pipeline.