freedmand / textra Goto Github PK

View Code? Open in Web Editor NEW

580.0 9.0 19.0 1.48 MB

A command-line application to convert images, PDFs, and audio files to text using Apple's APIs

License: MIT License

Swift 83.99% Shell 5.27% Python 10.74%

command-line-tool macos ocr transcription

textra's Introduction

Hi there 👋

Additional places to find me:

textra's People

Contributors

Stargazers

Watchers

Forkers

altoplano brandonrobertz astrowonk penguinmd baro77 personx000 rolandcrosby mkll disco0 fzmdesign righteousgambit longddsf haoyitedaniu matthewgard1

textra's Issues

Transcript to STDOUT?

It would be very convenient if textra could display the output to STDOUT, via some optional command line argument.

This would make it easier to say, use with python subprocess to automate running it on images and then capture the text to a string (which we could then use programmatically however one likes.)

It's not hard to work around and just use the file system for this, but it would make it a little easier.

ImageAnalyzer running VNRecognizeTextRequest in the background?

Using VisionKit, there's two main ways to get text from images.

Based on some OCR tests, I'm seeing that the outputs from these two methods are different. Initially, I thought ImageAnalyzer was running VNRequestTextRecognitionLevel.fast because it's for Live Text, but the outputs from ImageAnalyzer are sometimes better than VNRequestTextRecognitionLevel.accurate.

VNRecognizeRequest does have more options, including language correction and custom words.

Do you know what ImageAnalyzer is calling in the background? Is it essentially running VNRecognizeRequest or is it a separate model/pipeline? And this naturally begs the question, which model would be better for OCR? My initial tests show a pretty similar performance in aggregation between ImageAnalyzer and VNRequestTextRecognitionLevel.accurate, but the results per test case can sometimes be highly variable between the two.

For documentation & in case this is outside the scope of your expertise, I've asked the same question on Apple Developers forum here.

Add support for online speech recognition (behind a flag)

Apparently on-device speech recognition is only supported on some Mac devices.

https://support.apple.com/guide/mac-help/if-dictation-on-mac-doesnt-work-as-expected-mchlc480652b/mac:

This issue tracks adding a feature to support speech recognition powered by Apple. Since it would require sending data to Apple's servers, this should not be enabled by default and should require a flag.

Also, the current error: ERROR: Speech recognizer does not support on-device recognition is not too descriptive. It could be revised to include more context or suggest the proposed flag.

Package manager support

Seems like a great project that would benefit from submission to home-brew?

locale support?

HEllo,
I get the following error when trying to convert some file. I think it might be because either my system locale is set to Spanish (probably) or because the file that I tried to convert was in Spanish. Any hints? The system should have locale resources installed, or does this recognizer only work when system is set to English?

2023-04-03 15:20:48.276 textra[54391:4225664] Required assets are not available for Locale:es_ES
ERROR: Speech recognizer does not support on-device recognition

2023-03-24 13:33:04.515 textra[46484:13310980] Required assets are not available for Locale:en-US ERROR: Speech recognizer does not support on-device recognition

Textra a mp3 audio file, come across the following mistake:

2023-03-24 13:33:04.515 textra[46484:13310980] Required assets are not available for Locale:en-US
ERROR: Speech recognizer does not support on-device recognition

freedmand / textra Goto Github PK

textra's Introduction

Hi there 👋

textra's People

Contributors

Stargazers

Watchers

Forkers

textra's Issues

Transcript to STDOUT?

ImageAnalyzer running VNRecognizeTextRequest in the background?

Add support for online speech recognition (behind a flag)

Package manager support

locale support?

Support for Recognizing Structured Text?

any plans to add ability to embed the extracted text back into the pdf?

Textra fails on images that are too large

2023-03-24 13:33:04.515 textra[46484:13310980] Required assets are not available for Locale:en-US ERROR: Speech recognizer does not support on-device recognition

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent