bartzi / kiss Goto Github PK

View Code? Open in Web Editor NEW

109.0 12.0 29.0 571 KB

Code for the paper "KISS: Keeping it Simple for Scene Text Recognition"

License: GNU General Public License v3.0

Python 100.00%

deep-learning paper chainer scene-text-recognition spatial-transformer-networks transformer

kiss's People

Contributors

Stargazers

Watchers

kiss's Issues

Some of files in the link are not downloadable.

Hello,

I want to check performance of pretrained model but it is not clicked and cannot be downloaded, and some other files too.

Can you check on it please?

Thank you.

[Help] Using Pretrained Model

Thanks for sharing the model. I just want to test the pertained model that you provided. Do I still need to download the image data (SynthText/MjSynth) if I'm using the pretrained model? And If not then how can I get run the pertained model on testing datasets like cute80, idcars etc. I have already downloaded the datasets (cute80, idcar2013, idcar2015, iiit5k, svt, svtp) and their respective npz files. How can I run the evaluation on these datasets?

Change the num_words_per_image without training again

Can we predict multiple words from a single image by changing the num_words_per_image?
I tried changing in recognizer_class in evaluate.py file but facing this error.

InvalidType: 
Invalid operation is performed in: Reshape (Forward)

Expect: prod(x.shape) % known_size(=3072) == 0
Actual: 1536 != 0

Also, Can I know why spaces are not there in char_map? ( this may solve to predict multiple words in image)

mjsynth.npz only has first letter of each word in "text"

I would like to start by saying I really enjoyed reading your paper and I am currently porting it to Pytorch.

I was going through the steps to download the synth data (outlined on your github) and during the filter_word_length I noticed that the "text" array in mjsynth.npz only contains the first letter of each word in the dataset. Is there a reason for this? Thank you.

The data link is broken

We can't find the original data, so we don't know their original appearance and filename。 Therefore, we don't know how to rename and how to arrange the path.

Windows fatal exception: Access violation

Hello,

When I try to train the network with train_text_recognition.py using my own images,
I have a Windows fatal exception: access violation.
This is followed by several treads, mostly comming from chainer, tensordboard and multi_node_mean.
Do you have any idea where that could come from?

Thank you by advance,
Clément

link to download SynAdd dataset.?

I cannot register baidu account, so I am not able to download this dataset.
Could anyone who downloaded send me another link to download?
Thanks

How do you generate the mask in transformer model and process text labels to "class_id" ?

As the project is reall huge, I'm not understand how you process the text labels ? Usually, in an attention based text recognitizer, there will be "[GO]" and "[EOS]" label and will be converted into "num
class_id". But I don't understand in the transformer model, how the code process the labels into "num class_id" and generate the mask as bellow :

kiss/text/transformer_recognizer.py

Line 23 in 25038d0

self.mask = subsequent_mask(self.transformer_size)

and your code has some difference in generating the mask from
https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/76762bb08225014fb3055a9d07f0043aba972d68/transformer/Models.py#L169

Do you have used "pad_idx", where can I find it ? what's the difference in use "pad_idx" and not use ? I'm really confused with "pad_idx", "GO_idx", "EOS_idx", how do you process that part ?
I don't quite know how to process with it. Could you give me some advice ?

MultiGPU

Is it possible to train on multi-gpus? Thanks!

Pretrained model & Dataset

@Bartzi Thank you for your hard work

Post the link for the pretrained model, and the sample dataset.

cannot install without a GPU

Hi. It looks like there is a problem when I try to do the install on my MacBook with no GPU :

$ pip install -r requirements.txt
Collecting chainer==6.5.0
  Downloading https://files.pythonhosted.org/packages/1d/59/aa63339001ca8e15ebb560d0c33333ef465c479e165d967e64c7611b6e67/chainer-6.5.0.tar.gz (876kB)
     |████████████████████████████████| 880kB 508kB/s
Collecting chainercv==0.13.1
  Downloading https://files.pythonhosted.org/packages/e8/1c/1f267ccf5ebdf1f63f1812fa0d2d0e6e35f0d08f63d2dcdb1351b0e77d85/chainercv-0.13.1.tar.gz (260kB)
     |████████████████████████████████| 266kB 676kB/s
Collecting cupy==6.5.0
  Downloading https://files.pythonhosted.org/packages/67/4b/6960cdfeee8bbfa12450da6b83206b57f6d6951a74043f055905449bb657/cupy-6.5.0.tar.gz (3.1MB)
     |████████████████████████████████| 3.1MB 959kB/s
    ERROR: Command errored out with exit status 1:
     command: /Users/sebastienvincent/.virtualenvs/kiss/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/setup.py'"'"'; __file__='"'"'/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/pip-egg-info
         cwd: /private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/
    Complete output (46 lines):
    Options: {'package_name': 'cupy', 'long_description': None, 'wheel_libs': [], 'wheel_includes': [], 'no_rpath': False, 'profile': False, 'linetrace': False, 'annotate': False, 'no_cuda': False}

    -------- Configuring Module: cuda --------
    /var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/tmpde2uw1h1/a.cpp:1:10: fatal error: 'cublas_v2.h' file not found
    #include <cublas_v2.h>
             ^~~~~~~~~~~~~
    1 error generated.
    command 'gcc' failed with exit status 1

    ************************************************************
    * CuPy Configuration Summary                               *
    ************************************************************

    Build Environment:
      Include directories: []
      Library directories: []
      nvcc command       : (not found)

    Environment Variables:
      CFLAGS          : (none)
      LDFLAGS         : (none)
      LIBRARY_PATH    : (none)
      CUDA_PATH       : (none)
      NVTOOLSEXT_PATH : (none)
      NVCC            : (none)

    Modules:
      cuda      : No
        -> Include files not found: ['cublas_v2.h', 'cuda.h', 'cuda_profiler_api.h', 'cuda_runtime.h', 'cufft.h', 'curand.h', 'cusparse.h', 'nvrtc.h']
        -> Check your CFLAGS environment variable.

    ERROR: CUDA could not be found on your system.
    Please refer to the Installation Guide for details:
    https://docs-cupy.chainer.org/en/stable/install.html

    ************************************************************

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/setup.py", line 132, in <module>
        ext_modules = cupy_setup_build.get_ext_modules()
      File "/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/cupy_setup_build.py", line 632, in get_ext_modules
        extensions = make_extensions(arg_options, compiler, use_cython)
      File "/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/cupy_setup_build.py", line 387, in make_extensions
        raise Exception('Your CUDA environment is invalid. '
    Exception: Your CUDA environment is invalid. Please check above error log.
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Is it possible to use kiss with CPU only?

Is it possible to use KISS on a test image?

Is there a way to test KISS on a single image? Ex:

python recognize.py ./image1.py

which would return the recognized text in image1?

Loss Functions

Hey again,
I had a few questions about the loss functions you used for the Localization net during training.

In the Out Of Image loss calculation you +/- 1.5 to the bbox instead of +/- 1 (like your paper), why do you do this?
Also why are you using corner coordinates for loss calculations?
Was the DirectionLoss used in your paper?

A Single GPU
CPU

Thanks for checking this out :P

bartzi / kiss Goto Github PK

kiss's People

Contributors

Stargazers

Watchers

Forkers

kiss's Issues

Recommend Projects

Recommend Topics

Recommend Org