summerlvsong / aggregation-cross-entropy Goto Github PK
View Code? Open in Web Editor NEWAggregation Cross-Entropy for Sequence Recognition. CVPR 2019.
Aggregation Cross-Entropy for Sequence Recognition. CVPR 2019.
I recreated your project and found that the input GT was converted into a word list, which had lost its order, and your prediction only provided the number of characters. Only through the two-dimensional matrix position of the network output can barely judge the order, I would like to ask you how to accurately predict the character order of a word.
I pretrained a model using ctcloss and it works well. Then I loaded the weights and continued to train with the aceloss. The losses seemed to be coming down, but the test results were terrible, almost all wrong.
Here is my implementation of ACELoss.
device = torch.device("cuda:" + cfg.TRAIN.GPU_ID if torch.cuda.is_available() else "cpu")
class ACELoss(nn.Module):
def __init__(self):
super().__init__()
def forward(self, input_, target, target_lens):
w, bs, num_class = input_.size()
aggragetions = torch.zeros(bs, cfg.ARCH.NUM_CLASS)
for i in range(bs):
idx = 0
for j in range(target_lens[i]):
aggragetions[i][target[idx]] += 1
idx += 1
aggragetions[i][0] = w - target_lens[i]
target = aggragetions.to(device)
input_ = input_ + 1e-10
input_ = torch.sum(input_, 0)
input_ = input_ / w
target = target / w
loss = (-torch.sum(torch.log(input_) * target)) / bs
return loss
In the paper in https://arxiv.org/pdf/1904.08364.pdf sec 3.2 it is mentioned:
"We borrow the concept of cross-entropy from information theory, which is designed to measure the “distance” between two probability
distributions."
Wont' kl-divergence be a better way to measure the distance between both probability distributions ?
Hi, thanks for the amazing works.
I was wondering that could you provide the tool that you guys have used to generate these toy dataset ?
Thanks you
-u代表什么,没有查到
for this line: torch.log(input)
The 'input' is the softmax score (0-1).
If k-th class does not show in an input, the accumulative softmax score of all time steps for k-th class is very likely to be 0. Then this will result into torch.log(input) = nan.
How do you make sure that 'input' does not equal to 0 for 'torch.log(input)'
Table 3. Comparison with previous methods for HCTR. ACE (1D) 91.68 91.25 96.70 96.22
code about this experiment will be be publicly or not?
3C-FCRN+B_SLD+SLD(residual LSTM proposed)get ICDAR CR 97.15 AR 96.50
But ACE just CR 96.70 AR 96.22
so do ACE work in HCTR or not residual LSTM?
The result of training with fixed length (32 * 280) (the same number of characters (10)) is only good for short text.
I train a model(CRNN) base on dataset synth90k, through the loss decline step by step, the accuracy is near to 0 all the time. What casue this problem?
As the model can only recognize the characters and characters' number, so what's the accuracy criterion for 2D prediction in the paper?
in https://github.com/summerlvsong/Aggregation-Cross-Entropy/blob/master/source/models/seq_module.py#L65, should it be pred_string_set = [pred_string[i:i+self.w] for i in xrange(0, len(pred_string), self.w)] instetad of self.w*2?
please verify.
Thanks
I use the ACE loss function to do English handwriting recognition. When the model is trained, it does not converge, but with CTC, it gets a good convergence effect. How can this happen?
Hello, it is an excellent work for your "Aggregation Cross-Entropy for Sequence Recognition" paper. Just want to check whether you will release the code or not. Thanks!
I reproduced crnn+ctc and test it on IIIT5K+SVT+IC03+IC13 test database, got WER 0.153, which is same as the reported results in paper.
I also reproduced crnn+ace loss, but only got WER 0.205 on the same test database, any advise?
My environment:
pytorch 1.2.0
batchsize 60
trained only on 8-million synthetic data released by Jaderberg
iterations 1000k
adadelta rho 0.9
Hello, I am writing this topic to ask if it will be useful in speech recognition tasks. I am going to test your ACE loss on my acoustic model. Hope it can produce comparable performance. I will show the result later.
Nice work!
I found this network architecture
(126,576)Input − 8C3 − MP2 − 32C3 − MP2 − 128C3 − MP2−5∗256C3−MP2−512C3−512C3−MP2−512C2− 3 ∗ 512ResLSTM − 7357F C − Output
in the paper for Handwritten Chinese Text Recognition.
Is it necessary for Chinese Text Recognition ?
result is n77*26 ,how to match word
Hi, have you experimented on HWDB 2.0-2.2, could you share your results for ACE? Thanks
对general loss function为何可以通过第3节的公式(2)来估计?估计公式中的每一项概率不是远大于general loss function中的对应项吗?
@summerlvsong @whang94 Thank you for your hard work,
Can you post a script that uses the trained model to predict.
Very nice work,
can this method combine with CTC in 1D case to improve performace further? does it conflict with CTC when training?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.