Comments (4)
Actually, the plugin is currently processing every PDF regardless of whether it contains only images or also a text layer. Try it out yourself.
However, I would like to add an option, that PDFs with some text can be skipped (filtered out) for OCR processing. I imagine to do this as an option which can be check or not, but that is still a todo mentioned somewhere in the code.
BTW the plugin here does not use ocrmypdf.
from zotero-ocr.
Thank you very much for your response.
Actually, the plugin is currently processing every PDF regardless of whether it contains only images or also a text layer. Try it out yourself.
This is true, totally my error as I didn't realise I had changed my language settings to German but was trying to process an english text.
However, I would like to add an option, that PDFs with some text can be skipped (filtered out) for OCR processing. I imagine to do this as an option which can be check or not, but that is still a todo mentioned somewhere in the code.
This seems very interesting, would really like to see it.
As I was wrong though, I will close this issue as it actually is a non-issue. Thank you for your time and the response!
from zotero-ocr.
@opal06 For languages with Latin letters as English, German, etc. you can also try out script/Latin
which usually works quite nice. Moreover, this then overcomes the potential issues that also in a German text you find French, Spanish, ... words e.g. for names.
from zotero-ocr.
This is very useful information, thank you very much. I was trying to find out if there was multi language support just earlier today.
from zotero-ocr.
Related Issues (20)
- Change language to chi_sim_vert, perform OCR didn't response HOT 3
- plugin does not find tesseract HOT 3
- No pdftoppm.exe executive found HOT 2
- Corrupted PDF HOT 8
- Issue with Farsi OCR HOT 1
- An Academic Workflow: Zotero & Obsidian | by Alexandra Phelan | Medium
- OCR Produces corrupted file HOT 3
- Zotero 7 Support HOT 16
- Automatic installation on ArchLinux HOT 3
- Unclear when working HOT 1
- PDF does not auto-link to group libraries
- Arabic language "Saudi Arabia" HOT 1
- Automatically OCR new pdfs
- couldn't open 'nameToUnicode' HOT 1
- No bin.exe executable found HOT 5
- OCR option not in Z7 context menu HOT 19
- 无法调用ocr软件 HOT 9
- TypeError: IOUtils.DirectoryIterator is not a constructor HOT 7
- bugs with newest version & questions on developing HOT 3
- Increase multithreading processing capability HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zotero-ocr.