Giter VIP home page Giter VIP logo

Comments (16)

forever208 avatar forever208 commented on May 9, 2024 5

Any update on this?

if your giou firstly turned out nan, it is likely that there is something wrong in the defined giou function. In my experiment, I found the union_area = 0, so the IOU = infinity. Correspondingly, you could debug it by edit the giou function. My improper method is adding a small enough number in the end of this place: (because I haven't really find the root cause of this bug)

union_area = boxes1_area + boxes2_area - inter_area + 1e-10

from tensorflow2.0-examples.

YunYang1994 avatar YunYang1994 commented on May 9, 2024

看样子是学习率一直在上升导致的Nan,你可以把学习率调小一点,顺便问一下,训练的哪个数据集?

from tensorflow2.0-examples.

dvlee1024 avatar dvlee1024 commented on May 9, 2024

看样子是学习率一直在上升导致的Nan,你可以把学习率调小一点,顺便问一下,训练的哪个数据集?

人脸的,wider face。
学习率不是应该一直下降的吗? @YunYang1994

from tensorflow2.0-examples.

dvlee1024 avatar dvlee1024 commented on May 9, 2024

我知道了,我的数据集大,steps_per_epoch为1250,warmup为10的话,warmup_steps为12500。
我的global_steps一直小于warmup_steps,lr一直处于上升阶段

steps_per_epoch = len(trainset)
warmup_steps = cfg.TRAIN.WARMUP_EPOCHS * steps_per_epoch
total_steps = cfg.TRAIN.EPOCHS * steps_per_epoch
 if global_steps < warmup_steps:
       lr = global_steps / warmup_steps *cfg.TRAIN.LR_INIT
 else:
       lr = cfg.TRAIN.LR_END + 0.5 * (cfg.TRAIN.LR_INIT - cfg.TRAIN.LR_END) * (
                (1 + tf.cos((global_steps - warmup_steps) / (total_steps - warmup_steps) * np.pi))
        )

from tensorflow2.0-examples.

YunYang1994 avatar YunYang1994 commented on May 9, 2024

你打开tensorboard不就知道了

from tensorflow2.0-examples.

YunYang1994 avatar YunYang1994 commented on May 9, 2024
__C.TRAIN.LR_INIT             = 1e-4
__C.TRAIN.LR_END              = 1e-6
__C.TRAIN.WARMUP_EPOCHS       = 4

试试?

from tensorflow2.0-examples.

dvlee1024 avatar dvlee1024 commented on May 9, 2024
__C.TRAIN.LR_INIT             = 1e-4
__C.TRAIN.LR_END              = 1e-6
__C.TRAIN.WARMUP_EPOCHS       = 4

试试?

其实warmup有什么用的,我还打算设置成0

from tensorflow2.0-examples.

YunYang1994 avatar YunYang1994 commented on May 9, 2024

醉了,有什么用?自己看 https://arxiv.org/pdf/1812.01187.pdf

from tensorflow2.0-examples.

dvlee1024 avatar dvlee1024 commented on May 9, 2024

restore上次的weight继续训练,还需要warmup吗?
外行入门,还是要抽空看看书😂

from tensorflow2.0-examples.

YunYang1994 avatar YunYang1994 commented on May 9, 2024

如果loss没有出现Nan,就不用warmup

from tensorflow2.0-examples.

SinclairHudson avatar SinclairHudson commented on May 9, 2024

I'm having the same issue. Could I please get an english explanation?

from tensorflow2.0-examples.

SinclairHudson avatar SinclairHudson commented on May 9, 2024

@YunYang1994 could I get a quick english translation please?

from tensorflow2.0-examples.

aHandToHelp avatar aHandToHelp commented on May 9, 2024

Any update on this?

from tensorflow2.0-examples.

k-maheshkumar avatar k-maheshkumar commented on May 9, 2024

I am facing same problem, any updates on this?

from tensorflow2.0-examples.

SinclairHudson avatar SinclairHudson commented on May 9, 2024

I solved the issue by reducing the learning rate and using warmup epochs. The learning rate slowly increases and then decreases, and never gets too high. This will prevent the model from diverging (NaN loss). Hope this helps!

from tensorflow2.0-examples.

IqbalLx avatar IqbalLx commented on May 9, 2024

Any update on this?

if your giou firstly turned out nan, it is likely that there is something wrong in the defined giou function. In my experiment, I found the union_area = 0, so the IOU = infinity. Correspondingly, you could debug it by edit the giou function. My improper method is adding a small enough number in the end of this place: (because I haven't really find the root cause of this bug)

union_area = boxes1_area + boxes2_area - inter_area + 1e-10

already try this, and seems working fine. Thanks!

from tensorflow2.0-examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.