It's such a great implementation. But when I tried to visualize the ground truth image

Okay, if you do something like <div class="snippet-clipboard-content

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Endcode-Decode problem about ssd_detectors HOT 12 CLOSED

mvoelk commented on July 30, 2024

Endcode-Decode problem

from ssd_detectors.

Comments (12)

mvoelk commented on July 30, 2024 1

Which data set you use? Can you provide one of these samples as well as a piece of code?

from ssd_detectors.

mvoelk commented on July 30, 2024 1

Okay, if you do something like

plt.imshow(images[i])
egt = prior_util.encode(data[i])
prior_util.plot_gt()
prior_util.plot_results(prior_util.decode(egt), color='r')

you may observe the following behavior

I confirm, this is a major issue with the segment width in the implementation and in the SegLink approach in general. When I wrote the code, there was no reference implementation available and I was not quite sure how to handle the segment width properly.

Let's look at Figure 5 (3) in the SegLink paper. There are exactly two cases that can occur on the left side of the shown segment. In the first case, another prior box is assigned to the word bounding box and the ground truth width of the corresponding segment is defined by means of the intersection between the prior and the word bounding box. This is also done in the implementation of the SegLink authors. The second case is when no further prior box can be assigned to the word bounding box and the decoded bounding box shrinks. Hence, the ground truth width of a segment is always less or equal to the width of the prior box.

In my implementation, I found that only the second case is a problem when the cropped bounding box is passed to the recognition stage. For that reason, I decided to added some padding to the resulting bounding box.

A pragmatic fix could it be, to allow the left and right most segment to have a width larger than the width of the prior box and then consider only the width of these segments in the loss function.

In case of your dataset, I'm assuming that the aspect ratio is not too large and the text is aligned almost horizontally. You probably may get better results with TextBoxes++ or even with TextBoxes.

from ssd_detectors.

mvoelk commented on July 30, 2024 1

@trungpham2606 I spent some time and took a closer look on the issue. It turned out, that there is indeed a issue with the decoding as described in the SegLink paper. In Algorithm 1, step 6 makes only sense if x_p and x_q are on the left and right edge of the bounding box and step 8 makes only sense if x_p and x_q are on the centers of the rightmost and leftmost segment.

I have changed the decoding method to fix this issue and updated my previous comment to avoid confusion. The encoding works as described in the paper, but the issue I mentioned still remains.

The example from above now looks like this:

The modified decoding slightly increased the f-measure of the SegLink model from 0.868 to 0.869.

Thank You!

from ssd_detectors.