Comments (3)
Thanks for the suggestion. It should actually work with any language supported by tesseract now (I disable spellchecking for those cases). Just add the language to settings.py and set the spell check language as None
(see the examples for Chinese, etc).
from marker.
I'm planning to get rid of this entirely, btw, by training a model to tell if OCR was bad instead of heuristics. ETA next month or two.
from marker.
Thx, None has worked.
from marker.
Related Issues (20)
- Force formula recognition even on text
- INFERENCE_RAM setting not preventing CUDA OOM HOT 3
- Error when running marker in docker-compose HOT 2
- Is there a way to restrict the areas of a page that are read? HOT 2
- TypeError in batch processing: HOT 2
- Images HOT 1
- Models' Storage Location HOT 2
- Error no file named pytorch_model.bin, model.safetensors HOT 1
- OSError: We couldn't connect to 'https://huggingface.co' to load this file, HOT 7
- images类型的PDF不支持
- chunk_convert.sh saves no files to output directory
- i want to know how to get score in formula
- Text directions HOT 1
- "Tables are not always formatted 100% correctly" HOT 1
- Memory Leak when Converting Long PDFs to Markdown HOT 2
- [feature request] Roadmap for Converting PDFs to Complex Markup Languages (AsciiDoc, TeX/LaTeX, HTML+CSS) HOT 1
- access is denied HOT 1
- Unexpected keyword argument 'interpolate_pos_encoding' HOT 12
- Why not load other models? set ocrmypdf and null did not work. HOT 1
- Number of processes must be at least 1 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from marker.