Giter VIP home page Giter VIP logo

Comments (5)

czq142857 avatar czq142857 commented on July 24, 2024

Hi,

I am not familiar with your experimental setup, and I have never experienced such collapses, therefore my comments might not be very helpful.

I suspect the collapses might have something to do with the gradients. There are many ReLU and clipping operations in the shallow MLP, therefore once the network outputs zeros, it will receive no gradients and will be stuck there forever. I guess sometimes a specific batch could bring abnormally strong gradients and make the network stuck instantly, and this is totally random. I am not sure how this could be fixed.

from bsp-net-pytorch.

yeshwanth95 avatar yeshwanth95 commented on July 24, 2024

Hi @czq142857. Thanks for the reply!

Actually, you are right about the network being stuck after it outputs zeros. The gradient propagation stops entirely at this point. Although this happens pretty much around the same epoch in each run. So it could be due to a specific 'difficult' batch that causes this, given that the batch ordering is not randomized.
I too think the many clamping operations could cause this behaviour. I'll have to take a closer look though.

from bsp-net-pytorch.

yeshwanth95 avatar yeshwanth95 commented on July 24, 2024

Hi @czq142857. A few quick questions.

  1. I've just been wondering if there is any reason you used a learning rate of 0.00002 (quite a low value) in the 2D experiments with the toy dataset? It appears that the above-mentioned behaviour only occurs when I try to increase the learning rate above this value with the toy dataset.

  2. Also, when I use more complex binary masks, the learning rate needs to be lowered even further. When I lower the learning rate, I'm able to bypass the issue of outputs collapsing to zero however at the cost of the model converging extremely slowly. Are there any limitations to BSPNet I'm overlooking that need to be addressed so that it generalises well to more complex shapes?

I've attached some examples of the binary masks I'm trying to train BSPNet on. Would love to hear your thoughts!
0_gt
1_gt
2_gt
3_gt
4_gt

from bsp-net-pytorch.

czq142857 avatar czq142857 commented on July 24, 2024
  1. I cannot remember why I was setting the learning rate to 0.00002. I tried 0.0001 just now and it also worked. But the loss fluctuated between 0.001 and 0.005 at the end, so I guess I used a small learning rate to avoid such fluctuation. Nevertheless, it should not cause the collapse. Please try the original implementation and see if there is something different in your pytorch implementation,
  2. BSP-Net requires the shapes in the dataset to have some sort of correspondences (see Figure 4 in the paper). Each part (convex) is shared by different shapes, and ideally it should represent the same or corresponding parts in those different shapes. I do not see such correspondences in your dataset, therefore I do not think BSP-Net would work, unless you try to overfit on a handful of samples.

from bsp-net-pytorch.

yeshwanth95 avatar yeshwanth95 commented on July 24, 2024

Hi @czq142857! Thanks for the clarifications. I'll check again to see if there are any differences between my implementation and the original.

Although, I think you are right about the dataset. I'm running some experiments with a simpler version of this dataset (smaller patches basically) and BSP-Net seems to perform better than the earlier runs. However, I still feel my dataset also has correspondences between parts across different objects (since it's just gt masks of mostly rectangular buildings in airborne images). I'll try a few more experiments with the same dataset on the original tf1 implementation to make sure this is not an implementation issue.

from bsp-net-pytorch.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.