Comments (5)
Hi there!
You should be able to specify OCR agent with env OCR_AGENT
from link.
Please set it like export OCR_AGENT="unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle"
from the constant definition here.
from unstructured.
Hi @Timotheevin! We need to update our documentation, but you can specify which agent you want to use or even provide your own using an OCR_AGENT
environment variable.
Here's the commit where this was added: https://github.com/Unstructured-IO/unstructured/pull/2462/files
from unstructured.
Hi there! You should be able to specify OCR agent with env
OCR_AGENT
from link.Please set it like
export OCR_AGENT="unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle"
from the constant definition here.
Hi, may I ask which version I should use to enable this feature?
from unstructured.
Hi @peixin-lin
It looks like that was introduced in version 0.12.4, so any version after that should be fine.
from unstructured.
Hello,
I tried to set os.environ['OCR_AGENT'] = '"unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle"'
But I get this error:
ValueError: Environment variable OCR_AGENT must be set to an existing OCR agent module, not unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle.
but that is exactly how the env variable should be set, or am I wrong?
from unstructured.
Related Issues (20)
- `partition_doc` fails the first time it is run in the AMD64 container HOT 2
- bug/partition_html ouputs different results with different args HOT 5
- bug/parsing pdf error - new_cells as str has no "copy" HOT 6
- bug/<pdfminer> HOT 1
- bug/docker images at quay.io not up to date HOT 5
- Not respecting NLTK_DATA environment variable HOT 4
- feat/Allow max-pages/max-total-characters that should be parsed HOT 2
- bug/combineUnderNChars not working properly HOT 5
- docx - error while parsing table with merged cells HOT 2
- bug/HTMLTitle doesn't have `type` attribute HOT 1
- bug/docx parse table without row.grid_cols_before or row.grid_cols_after HOT 1
- feat/Retain text indentations in PDF files HOT 1
- feat/Excluding Specific Types
- Parsing HTML files HOT 5
- Salesforce/ source connector - Not able to ingest salesforce files HOT 1
- Local API Error: `by_similarity` Chunking Strategy Not Recognized HOT 1
- LangChain + Unstructured: Failed to load file ${filePath} using unstructured loader. HOT 2
- bug/language specification does not work for PaddleOCR agent HOT 1
- feat/skip ocr for certain element types HOT 2
- Add ability to pass pipeline param to Elasticsearch connector HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unstructured.