Giter VIP home page Giter VIP logo

tessdata_fast's People

Contributors

jbreiden2 avatar shreeshrii avatar stweil avatar tfmorris avatar theraysmith avatar zdenop avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tessdata_fast's Issues

Tesseract_fast trained data cannot be used in .NET wrapper Tesseract4.0 engine

I am using tesseract 4.0 .net wrapper dll and tying to use tesseract_fast trained data, but it is not gettng instialized and it throws an unhandled exception (read write) . I tried with engine mode "Lstmonly" and "default" but couldn't intialize. Kindly help me whether i am using in a wrong way or i should not used this fast traineddata.
Error is as follows
image

Trained data doesn't seem to be working

I downloaded por.traineddata (for portuguese language) to the StreamingAssets/tessdata folder and I get a "TessAPIInit failed. Output: -1". Should this be enough? I notice there are a bunch of different files for starting with eng like "eng.cube.params", "eng.cube.nn" and so many others? Should there be the same for portuguese?

Can we use fast dataset with Java program? Is it supported

On windows 7,I am trying to use Fast Traineddata files in my java project. But I am getting Invalid Memory access when using it even after setting datapath.

I have tried to use best data files but it also gives same error. Default data files are working but it is huge file so I was going for fast files.

Tesseract tess = new Tesseract();

tess.setDatapath("C:\\Users\\U6070534\\Downloads\\tess4j\\tessdata");
tess.setLanguage("eng");

String inputFilePath = "C:\\Users\\U6070534\\IdeaProjects\\ocrsample\\screenshot\\craft0.png";
    try {
        textpath.add(tess.doOCR(new File(inputFilePath)));
    } catch (TesseractException e1) {
        e1.printStackTrace();
    }

Exception in thread "main" java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:470)
at com.sun.jna.Function.invoke(Function.java:404)
at com.sun.jna.Function.invoke(Function.java:315)
at com.sun.jna.Library$Handler.invoke(Library.java:212)
at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:437)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:292)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:213)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:197)
at OcrReader.main(OcrReader.java:25)
Failed loading language 'eng'
Tesseract couldn't load any languages!
Process finished with exit code 1

Create a new tessdata_fast release/tag?

As emailed to the mailing list, does it make sense to tag another release?

> Hi,
>
> With Tesseract now switching to regular (alpha) releases of 5.0.0; does
> it make sense to consider some versioning for language files as well?
>
> The Internet Archive has switched to using Tesseract for all our OCR,
> and I'm hoping that we can record exactly what version of language files
> was used for a specific OCR job. Currently, the answer is simple, since
> we're using the default packages from Ubuntu focal, but I am working on
> switching to Tesseract release/tag 5.0.0-20201231.
>
> But the tessdata_fast (or tessdata_best, for that matter) do not seem to
> have any recent 5.x releases:
> https://github.com/tesseract-ocr/tessdata_fast/releases
>
> Are there plans to create a release/tag for the tessdata_* repositories?
>
> Cheers,
> Merlijn

And the follow-up:

On 27/01/2021 12:42, Shree Devi Kumar wrote:
>> The Internet Archive has switched to using Tesseract for all our OCR,
> 
> I am so happy to hear this. It will be great to have the Indic languages
> that were marked as non-ocrable so far be converted to text correctly on
> Internet Archive.
> 
> Is there any page with instructions to do this? Can a language be specified
> while OCRing? eg. Better results are many times received using
> script/Devanagari instead of san for Sanskrit.
> 
> Regarding your question about tessdata, there have only been minor changes
> to tessdata files but adding a tag is a good idea. I suggest you post this
> as a feature request in the repo.

Duplicate name problem with Lao / lao

Installations on case insensitive filesystems (macOS, Windows, ...) cannot install both Lao.traineddata and lao.traineddata. It might also be confusing for users to know the difference between -l Lao and -l lao.

I suggest to rename the first one. Which name would be fine?

equ.traineddata is not included

According to the wiki, equ and osd trained data will reuse the 3.x data file. The weird thing is that osd is copied but equ is not. Is there any reason? e.g. equ is deprecated in 4.x

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.