Comments (12)
Which data set you use? Can you provide one of these samples as well as a piece of code?
from ssd_detectors.
Okay, if you do something like
plt.imshow(images[i]) egt = prior_util.encode(data[i]) prior_util.plot_gt() prior_util.plot_results(prior_util.decode(egt), color='r')
you may observe the following behavior
I confirm, this is a major issue with the segment width in the implementation and in the SegLink approach in general. When I wrote the code, there was no reference implementation available and I was not quite sure how to handle the segment width properly.
Let's look at Figure 5 (3) in the SegLink paper. There are exactly two cases that can occur on the left side of the shown segment. In the first case, another prior box is assigned to the word bounding box and the ground truth width of the corresponding segment is defined by means of the intersection between the prior and the word bounding box. This is also done in the implementation of the SegLink authors. The second case is when no further prior box can be assigned to the word bounding box and the decoded bounding box shrinks. Hence, the ground truth width of a segment is always less or equal to the width of the prior box.
In my implementation, I found that only the second case is a problem when the cropped bounding box is passed to the recognition stage. For that reason, I decided to added some padding to the resulting bounding box.
A pragmatic fix could it be, to allow the left and right most segment to have a width larger than the width of the prior box and then consider only the width of these segments in the loss function.
In case of your dataset, I'm assuming that the aspect ratio is not too large and the text is aligned almost horizontally. You probably may get better results with TextBoxes++ or even with TextBoxes.
from ssd_detectors.
@trungpham2606 I spent some time and took a closer look on the issue. It turned out, that there is indeed a issue with the decoding as described in the SegLink paper. In Algorithm 1, step 6 makes only sense if x_p and x_q are on the left and right edge of the bounding box and step 8 makes only sense if x_p and x_q are on the centers of the rightmost and leftmost segment.
I have changed the decoding method to fix this issue and updated my previous comment to avoid confusion. The encoding works as described in the paper, but the issue I mentioned still remains.
The example from above now looks like this:
The modified decoding slightly increased the f-measure of the SegLink model from 0.868 to 0.869.
Thank You!
from ssd_detectors.
from ssd_detectors.
from ssd_detectors.
@mvoelk thank you so much for your support. I will apply your change to my dataset and show you my results then.
from ssd_detectors.
@mvoelk I just tested your new decode script. The results look better than before, there're still some images that ground truth bounding boxes didnt fit the text though. But the results are way better, at least for me case.
Thank you so much for your help. If you figure out anything else to improve or fix decode part completely, just lets me know.
from ssd_detectors.
F-measure of SegLink with DenseNet and Focal Loss increased from 0.922 to 0.932.
from ssd_detectors.
Oh nice, can you provide the parameters you chose for training with Focal Loss :-? I used to set them as your default but the loss was way worse than normal Loss.
Thanks in advance!
from ssd_detectors.
@trungpham2606 I'm not sure if the default values in sl_training.py
are correct. Can you try lambda_segments=1.0, lambda_offsets=1.0, lambda_links=1.0
and report whether the scale is roughly the same as in the log file I provieded with the model? Which f-measuer do you get on segments?
from ssd_detectors.
@mvoelk Actually I tried on my dataset ( the images which I showed you above ). I just normally observed that the Focal lost initialized at 10000 or even more than that, then slowing down but not much.
I will tried with your suggestion and show you the results as soon as possible.
from ssd_detectors.
I usually divided the loss terms by the number of instances. In SegLinkFocalLoss
I commented this normalization out. You should get the old behavior if you uncomment the necessary lines.
from ssd_detectors.
Related Issues (20)
- TBPP model arbitrary input shape HOT 1
- encode/decode error for tbpp
- DSOD Low mAP HOT 1
- Target length zero error
- about environment set (tf version?) HOT 2
- SL_end2end_predict.ipynb: Model dimensions don't match that in weights file. HOT 4
- Request to add .pkl files to repo HOT 4
- DSODSL Output tensor format HOT 1
- CRNN output format HOT 1
- How to convert to tflite?
- SL_end2end_predict.ipynb fails on converting to .py with necessary modifications. HOT 9
- Light architectures for object detection HOT 5
- fit_generator for ssd training & Check loss during training HOT 1
- training with own dataset resize issue HOT 1
- While training, got <UnknownError: AttributeError: 'NoneType' object has no attribute 'shape'> HOT 1
- links to download models not working HOT 2
- TypeError: map() got an unexpected keyword argument 'deterministic' error HOT 3
- How to modify the size of anchor? HOT 4
- Metrics issue ? HOT 4
- Problem during with Crowd Human dataset HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ssd_detectors.