Hi,
Please excuse if this is the wrong place to report the issue (or maybe here in this repository or in https://github.com/tesseract-ocr/tesstrain?)
In short, the option option --exposures
is not respected.
According to arguments.py
, that reads:
parser.add_argument(
"--exposures",
metavar="EXPOSURES",
action="append",
nargs="+",
help="A list of exposure levels to use (e.g. -1,0,1).",
)
the option can be used more than once and accepts more than one value. Therefore I tried the following:
$ python -m tesstrain --exposures 1 5 ...
then in tesseract.log
we can see the following line that indicates that the values provided on the command line are overriden:
... - DEBUG - tesstrain.language_specific - exposures = [0] (was [['1', '5']])
And even more complex case:
$ python -m tesstrain --exposures 1 5 ... --exposures -1 ...
then in tesseract.log
:
... DEBUG - tesstrain.language_specific - exposures = [0] (was [['1', '5'], ['-1']])
I would expect files to be generated: *.exp1.{tif,box,lstmf}
, *.exp5.{tif,box,lstmf}
, and *.exp-1.{tif,box,lstmf}
, but only the files for exposure=0 are here: *.exp0.{tif,box,lstmf}
In the code, the value provided on the command line is overriden by this line https://github.com/stefan6419846/tesstrain_package/blob/main/tesstrain/language_specific.py#L1327-L1328 that reads
if not EXPOSURES:
EXPOSURES = [0]
If I understand correctly, the culprit of the problem is this line https://github.com/stefan6419846/tesstrain_package/blob/main/tesstrain/language_specific.py#L920 that does not use the values from the command line:
EXPOSURES: List[int] = []
If it can be changed to something like:
EXPOSURES: List[int] = [v for vs in ctx.exposures for v in vs]
this should fix the issue.
Thanks again,
BR, Nikolai