Comments (8)
unpkg support langPath?
Here is a working example that uses unpkg
for all 3 resources: corePath
/langPath
/workerPath
.
const lang = 'eng';
const langPath = `https://unpkg.com/@tesseract.js-data/${lang}/4.0.0_best_int`;
// A worker is created once and used every time a user uploads a new file.
const worker = await Tesseract.createWorker(lang, 1, {
corePath: 'https://unpkg.com/tesseract.js-core@v5',
workerPath: 'https://unpkg.com/tesseract.js@v5/dist/worker.min.js',
langPath: langPath,
logger: function(m){console.log(m);}
});
This loads the LSTM-only data, so will only work with oem
set to 1
(the default). To use the Legacy model, you would replace 4.0.0_best_int
with 4.0.0
. That data is significantly larger, so do not do that unless you are actually using the Legacy model.
from tesseract.js.
Users may use them in browsers but without own site, for example, userscript: https://greasyfork.org/scripts/482236/code
from tesseract.js.
If you do not want to host these files yourself, you can set workerPath
/langPath
/corePath
to an alternative CDN. For example, you could try unpkg
.
I agree that having a default CDN that does not support China is not ideal, and would be open to adding some fallback for the default CDN in the future. However, if individual developers want to be sure that mainland China is supported in their applications, I believe that the existing options in Tesseract.js do allow them to do that.
from tesseract.js.
Convert *.traineddata.gz
files to pure JavaScript could solve this issue.
*.js
files could save and sync to anywhere.
from tesseract.js.
@ivysrono There are 3 types of files loaded from the CDN by default for browser (workerPath
, langPath
, corePath
) and 1 type of files loaded from the CDN for Node.js (langPath
). For all of these files, the JSDelivr CDN is simply the default value--you do not need to use it. All of these files can be hosted on your site (for browser) or local file system (for Node.js), and workerPath
, langPath
, corePath
can be changed to point to those. This is explained in the following document.
https://github.com/naptha/tesseract.js/blob/master/docs/local-installation.md
from tesseract.js.
unpkg support langPath?
from tesseract.js.
Thank you very much, I will try.
from tesseract.js.
@Balearica - just wanted to say thank you for saving me some time with that comment. Cheers 👍
from tesseract.js.
Related Issues (20)
- Switch from CommonJS to ESM modules HOT 1
- setImage is re-run unnecessarily when rotateAuto is enabled
- `debug` output missing from types HOT 1
- Custom traindata do not work HOT 2
- possibility to capture stderr HOT 3
- Large images cause excessive memory usage
- Worker stuck on "loading language traineddata" HOT 3
- Updated types to infer output formats
- Inference of Chinese handwritten characters is bad HOT 3
- Add line size metrics (ascender, descender, size) to `line` objects in `blocks` output HOT 1
- Font attributes incorrect even when font is properly identified (`is_italic`, `is_serif`, etc.) HOT 1
- Focusing area HOT 1
- Multiple issues: Discussion
- Disable non-text output formats by default
- Tesseract - Running in Browser Console HOT 1
- Execution `worker.recognize` repeatedly causes "Out of Memory" error in JSFiddle HOT 5
- Error: Network error while fetching HOT 1
- how to use installed tessercat lib on windows for tesseract.js? HOT 1
- createWorker throws exception with option.langPath set in electron
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesseract.js.