tongkunguan / siga Goto Github PK

[CVPR2023] Self-supervised Implicit Glyph Attention for Text Recognition

Home Page: https://openaccess.thecvf.com/content/CVPR2023/papers/Guan_Self-Supervised_Implicit_Glyph_Attention_for_Text_Recognition_CVPR_2023_paper.pdf

Python 100.00%

scene-text-recognition scene-text-detection-recognition

siga's Introduction

Homepage

siga's People

Contributors

Stargazers

Watchers

Forkers

wubei07 acies-vineet praneelrokz

siga's Issues

test加载模型错误

你好，在加载模型进行test时，显示以下错误
Traceback (most recent call last): File "/content/drive/MyDrive/SRresaerch/SIGA/SIGA_R/test.py", line 223, in <module> test(opt) File "/content/drive/MyDrive/SRresaerch/SIGA/SIGA_R/test.py", line 127, in test model.load_state_dict(pretrained_state_dict['net']) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for DataParallel: Unexpected key(s) in state_dict: "module.model_one.Transformation.GridGenerator.inv_delta_C", "module.model_one.Transformation.GridGenerator.P_hat". 请问这可能是什么导致的问题？

Generate mask costs much time

hello:
after reading your paper, I want to use the segment method to do some work also in the scene text recognition, but I find that at the training stage, to genergate the image mask cost much time, it will increase the training time. I first think to generate the mask at loacl machine but the Sythe dataset has 15M images it also will takes a lots of days to generate all the masks. so can I ask how you deal with the problem when you at the training.

NO issues

code of Transformer architecture

Have you released the code of Transformer architecture? Please forgive my ignorance, it seems like I can't find it.
Additionally, the Glyph Pseudo-label Construction (GPC), Glyph Attention Network (GLAN), and Attention-based Character Fusion Mod-
ule (ACFM), I didn't find them in the code when I searched them in abbreviations. I guess you wrote them in a couple of files under the modules folder, would you offer more information about the code of them? Where are they in the code respectively? How can I find and use these three modules for ablation studies? Again, forgive my ignorance, I'm really a budding nerd, Thank you so much.

Regarding Text Mask Generation

Hello, thanks for your work. I thoroughly enjoyed reading the paper. I have a couple of questions regarding text mask generation.

During the training process of the segmentation network using the labels generated with k-means, did you employ image augmentations such as random transformations and color jittering. I have faced challenges with k-means on images that have color jittering.
I have also observed that for certain images predict the text pixels belong to cluster 0, while for others they are assigned to cluster 1 after performing k-means, depending on the color of the text. Could this potentially lead to challenges during the training of the segmentation model?

Datasetlink not found

The first link in the dataset seems to be inaccessible. Can it be fixed?
Also, I would like to ask about Tables 2 and 3, where some datasets under the first row are annotated with two numbers, such as 'IC13-857, IC13-1015,' etc. Does the number represent the number of samples in the test set?

tongkunguan / siga Goto Github PK

siga's Introduction

siga's People

Contributors

Stargazers

Watchers

Forkers

siga's Issues

test加载模型错误

Generate mask costs much time

NO issues

code of Transformer architecture

Regarding Text Mask Generation

Datasetlink not found

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent