bartzi / kiss Goto Github PK
View Code? Open in Web Editor NEWCode for the paper "KISS: Keeping it Simple for Scene Text Recognition"
License: GNU General Public License v3.0
Code for the paper "KISS: Keeping it Simple for Scene Text Recognition"
License: GNU General Public License v3.0
Hello,
I want to check performance of pretrained model but it is not clicked and cannot be downloaded, and some other files too.
Can you check on it please?
Thank you.
Thanks for sharing the model. I just want to test the pertained model that you provided. Do I still need to download the image data (SynthText/MjSynth) if I'm using the pretrained model? And If not then how can I get run the pertained model on testing datasets like cute80, idcars etc. I have already downloaded the datasets (cute80, idcar2013, idcar2015, iiit5k, svt, svtp) and their respective npz files. How can I run the evaluation on these datasets?
Can we predict multiple words from a single image by changing the num_words_per_image?
I tried changing in recognizer_class in evaluate.py file but facing this error.
InvalidType:
Invalid operation is performed in: Reshape (Forward)
Expect: prod(x.shape) % known_size(=3072) == 0
Actual: 1536 != 0
Also, Can I know why spaces are not there in char_map? ( this may solve to predict multiple words in image)
I would like to start by saying I really enjoyed reading your paper and I am currently porting it to Pytorch.
I was going through the steps to download the synth data (outlined on your github) and during the filter_word_length I noticed that the "text" array in mjsynth.npz only contains the first letter of each word in the dataset. Is there a reason for this? Thank you.
Hello,
When I try to train the network with train_text_recognition.py using my own images,
I have a Windows fatal exception: access violation.
This is followed by several treads, mostly comming from chainer, tensordboard and multi_node_mean.
Do you have any idea where that could come from?
Thank you by advance,
Clément
I cannot register baidu account, so I am not able to download this dataset.
Could anyone who downloaded send me another link to download?
Thanks
As the project is reall huge, I'm not understand how you process the text labels ? Usually, in an attention based text recognitizer, there will be "[GO]" and "[EOS]" label and will be converted into "num
class_id". But I don't understand in the transformer model, how the code process the labels into "num class_id" and generate the mask as bellow :
kiss/text/transformer_recognizer.py
Line 23 in 25038d0
and your code has some difference in generating the mask from
https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/76762bb08225014fb3055a9d07f0043aba972d68/transformer/Models.py#L169
Do you have used "pad_idx", where can I find it ? what's the difference in use "pad_idx" and not use ? I'm really confused with "pad_idx", "GO_idx", "EOS_idx", how do you process that part ?
I don't quite know how to process with it. Could you give me some advice ?
Is it possible to train on multi-gpus? Thanks!
@Bartzi Thank you for your hard work
Post the link for the pretrained model, and the sample dataset.
Hi. It looks like there is a problem when I try to do the install on my MacBook with no GPU :
$ pip install -r requirements.txt
Collecting chainer==6.5.0
Downloading https://files.pythonhosted.org/packages/1d/59/aa63339001ca8e15ebb560d0c33333ef465c479e165d967e64c7611b6e67/chainer-6.5.0.tar.gz (876kB)
|████████████████████████████████| 880kB 508kB/s
Collecting chainercv==0.13.1
Downloading https://files.pythonhosted.org/packages/e8/1c/1f267ccf5ebdf1f63f1812fa0d2d0e6e35f0d08f63d2dcdb1351b0e77d85/chainercv-0.13.1.tar.gz (260kB)
|████████████████████████████████| 266kB 676kB/s
Collecting cupy==6.5.0
Downloading https://files.pythonhosted.org/packages/67/4b/6960cdfeee8bbfa12450da6b83206b57f6d6951a74043f055905449bb657/cupy-6.5.0.tar.gz (3.1MB)
|████████████████████████████████| 3.1MB 959kB/s
ERROR: Command errored out with exit status 1:
command: /Users/sebastienvincent/.virtualenvs/kiss/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/setup.py'"'"'; __file__='"'"'/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/pip-egg-info
cwd: /private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/
Complete output (46 lines):
Options: {'package_name': 'cupy', 'long_description': None, 'wheel_libs': [], 'wheel_includes': [], 'no_rpath': False, 'profile': False, 'linetrace': False, 'annotate': False, 'no_cuda': False}
-------- Configuring Module: cuda --------
/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/tmpde2uw1h1/a.cpp:1:10: fatal error: 'cublas_v2.h' file not found
#include <cublas_v2.h>
^~~~~~~~~~~~~
1 error generated.
command 'gcc' failed with exit status 1
************************************************************
* CuPy Configuration Summary *
************************************************************
Build Environment:
Include directories: []
Library directories: []
nvcc command : (not found)
Environment Variables:
CFLAGS : (none)
LDFLAGS : (none)
LIBRARY_PATH : (none)
CUDA_PATH : (none)
NVTOOLSEXT_PATH : (none)
NVCC : (none)
Modules:
cuda : No
-> Include files not found: ['cublas_v2.h', 'cuda.h', 'cuda_profiler_api.h', 'cuda_runtime.h', 'cufft.h', 'curand.h', 'cusparse.h', 'nvrtc.h']
-> Check your CFLAGS environment variable.
ERROR: CUDA could not be found on your system.
Please refer to the Installation Guide for details:
https://docs-cupy.chainer.org/en/stable/install.html
************************************************************
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/setup.py", line 132, in <module>
ext_modules = cupy_setup_build.get_ext_modules()
File "/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/cupy_setup_build.py", line 632, in get_ext_modules
extensions = make_extensions(arg_options, compiler, use_cython)
File "/private/var/folders/z_/yklxqshn4nv69bd4t63rsln40000gn/T/pip-install-yidh2_r0/cupy/cupy_setup_build.py", line 387, in make_extensions
raise Exception('Your CUDA environment is invalid. '
Exception: Your CUDA environment is invalid. Please check above error log.
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
Is it possible to use kiss with CPU only?
Is there a way to test KISS on a single image? Ex:
python recognize.py ./image1.py
which would return the recognized text in image1?
Hey again,
I had a few questions about the loss functions you used for the Localization net during training.
In the Out Of Image loss calculation you +/- 1.5 to the bbox instead of +/- 1 (like your paper), why do you do this?
Also why are you using corner coordinates for loss calculations?
Was the DirectionLoss used in your paper?
The SVT test_crop=img folder contains 0 bite images and I am getting an error for that?
I have both labels in gt texts and images of real-world data. How can I get the gt.mat or crop using crop_words_from_oxford.py
Hi, thanks for this great project. Is it possible to use the pretrained model and continue training on a custom training set containing 5000 images (grey scale images with a dotted font). Do you think the results will be good?
Hi. Thanks for the amazing work.
I was wondering that what is the inference speed of KISS on:
Thanks for checking this out :P
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.