Giter VIP home page Giter VIP logo

Comments (12)

mvoelk avatar mvoelk commented on July 30, 2024 1

Which data set you use? Can you provide one of these samples as well as a piece of code?

from ssd_detectors.

mvoelk avatar mvoelk commented on July 30, 2024 1

Okay, if you do something like

plt.imshow(images[i])
egt = prior_util.encode(data[i])
prior_util.plot_gt()
prior_util.plot_results(prior_util.decode(egt), color='r')

you may observe the following behavior

index

I confirm, this is a major issue with the segment width in the implementation and in the SegLink approach in general. When I wrote the code, there was no reference implementation available and I was not quite sure how to handle the segment width properly.

Let's look at Figure 5 (3) in the SegLink paper. There are exactly two cases that can occur on the left side of the shown segment. In the first case, another prior box is assigned to the word bounding box and the ground truth width of the corresponding segment is defined by means of the intersection between the prior and the word bounding box. This is also done in the implementation of the SegLink authors. The second case is when no further prior box can be assigned to the word bounding box and the decoded bounding box shrinks. Hence, the ground truth width of a segment is always less or equal to the width of the prior box.

In my implementation, I found that only the second case is a problem when the cropped bounding box is passed to the recognition stage. For that reason, I decided to added some padding to the resulting bounding box.

A pragmatic fix could it be, to allow the left and right most segment to have a width larger than the width of the prior box and then consider only the width of these segments in the loss function.

In case of your dataset, I'm assuming that the aspect ratio is not too large and the text is aligned almost horizontally. You probably may get better results with TextBoxes++ or even with TextBoxes.

from ssd_detectors.

mvoelk avatar mvoelk commented on July 30, 2024 1

@trungpham2606 I spent some time and took a closer look on the issue. It turned out, that there is indeed a issue with the decoding as described in the SegLink paper. In Algorithm 1, step 6 makes only sense if x_p and x_q are on the left and right edge of the bounding box and step 8 makes only sense if x_p and x_q are on the centers of the rightmost and leftmost segment.

I have changed the decoding method to fix this issue and updated my previous comment to avoid confusion. The encoding works as described in the paper, but the issue I mentioned still remains.

The example from above now looks like this:
index2

The modified decoding slightly increased the f-measure of the SegLink model from 0.868 to 0.869.

Thank You!

from ssd_detectors.

trungpham2606 avatar trungpham2606 commented on July 30, 2024

from ssd_detectors.

trungpham2606 avatar trungpham2606 commented on July 30, 2024

image

from ssd_detectors.

trungpham2606 avatar trungpham2606 commented on July 30, 2024

@mvoelk thank you so much for your support. I will apply your change to my dataset and show you my results then.

from ssd_detectors.

trungpham2606 avatar trungpham2606 commented on July 30, 2024

@mvoelk I just tested your new decode script. The results look better than before, there're still some images that ground truth bounding boxes didnt fit the text though. But the results are way better, at least for me case.
Thank you so much for your help. If you figure out anything else to improve or fix decode part completely, just lets me know.

from ssd_detectors.

mvoelk avatar mvoelk commented on July 30, 2024

F-measure of SegLink with DenseNet and Focal Loss increased from 0.922 to 0.932.

from ssd_detectors.

trungpham2606 avatar trungpham2606 commented on July 30, 2024

Oh nice, can you provide the parameters you chose for training with Focal Loss :-? I used to set them as your default but the loss was way worse than normal Loss.
Thanks in advance!

from ssd_detectors.

mvoelk avatar mvoelk commented on July 30, 2024

@trungpham2606 I'm not sure if the default values in sl_training.py are correct. Can you try lambda_segments=1.0, lambda_offsets=1.0, lambda_links=1.0 and report whether the scale is roughly the same as in the log file I provieded with the model? Which f-measuer do you get on segments?

from ssd_detectors.

trungpham2606 avatar trungpham2606 commented on July 30, 2024

@mvoelk Actually I tried on my dataset ( the images which I showed you above ). I just normally observed that the Focal lost initialized at 10000 or even more than that, then slowing down but not much.
I will tried with your suggestion and show you the results as soon as possible.

from ssd_detectors.

mvoelk avatar mvoelk commented on July 30, 2024

I usually divided the loss terms by the number of instances. In SegLinkFocalLoss I commented this normalization out. You should get the old behavior if you uncomment the necessary lines.

from ssd_detectors.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.