how to export a Keras model of English language? is it possible to export the corpus t

convert eng training to h5 model about tessdata_best HOT 3 OPEN

tesseract-ocr commented on July 17, 2024

convert eng training to h5 model

from tessdata_best.

Comments (3)

stweil commented on July 17, 2024

Good question. Tesseract uses its own model file format. But it should be possible to convert the included neural network to any other model format which supports the same network specification.

We still have to find someone who wants to implement that (and also the other direction).

from tessdata_best.

stefan6419846 commented on July 17, 2024

Is there any documentation available on the model file format Tesseract uses (*.traineddata file format specification)?

from tessdata_best.

stweil commented on July 17, 2024

There exists a command line tool combine_tessdata which can list and extract all components from a model file:

% combine_tessdata -d /opt/homebrew/share/tessdata/eng.traineddata 
Version:4.00.00alpha:eng:synth20170629
17:lstm:size=401636, offset=192
18:lstm-punc-dawg:size=4322, offset=401828
19:lstm-word-dawg:size=3694794, offset=406150
20:lstm-number-dawg:size=4738, offset=4100944
21:lstm-unicharset:size=6360, offset=4105682
22:lstm-recoder:size=1012, offset=4112042
23:version:size=30, offset=4113054

Another tool dawg2wordlist can convert the dawg components to normal text files, and the unicharset is already text. That's the easy part.

The interesting part is the lstm component with the neural network. It's not documented, so the program code is the reference for it. Look for DeSerialize in the lstm code.

from tessdata_best.

Recommend Projects

convert eng training to h5 model about tessdata_best HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent