Comments (16)
Any update on this?
if your giou firstly turned out nan, it is likely that there is something wrong in the defined giou function. In my experiment, I found the union_area = 0, so the IOU = infinity. Correspondingly, you could debug it by edit the giou function. My improper method is adding a small enough number in the end of this place: (because I haven't really find the root cause of this bug)
union_area = boxes1_area + boxes2_area - inter_area + 1e-10
from tensorflow2.0-examples.
看样子是学习率一直在上升导致的Nan,你可以把学习率调小一点,顺便问一下,训练的哪个数据集?
from tensorflow2.0-examples.
看样子是学习率一直在上升导致的Nan,你可以把学习率调小一点,顺便问一下,训练的哪个数据集?
人脸的,wider face。
学习率不是应该一直下降的吗? @YunYang1994
from tensorflow2.0-examples.
我知道了,我的数据集大,steps_per_epoch为1250,warmup为10的话,warmup_steps为12500。
我的global_steps一直小于warmup_steps,lr一直处于上升阶段
steps_per_epoch = len(trainset)
warmup_steps = cfg.TRAIN.WARMUP_EPOCHS * steps_per_epoch
total_steps = cfg.TRAIN.EPOCHS * steps_per_epoch
if global_steps < warmup_steps:
lr = global_steps / warmup_steps *cfg.TRAIN.LR_INIT
else:
lr = cfg.TRAIN.LR_END + 0.5 * (cfg.TRAIN.LR_INIT - cfg.TRAIN.LR_END) * (
(1 + tf.cos((global_steps - warmup_steps) / (total_steps - warmup_steps) * np.pi))
)
from tensorflow2.0-examples.
你打开tensorboard不就知道了
from tensorflow2.0-examples.
__C.TRAIN.LR_INIT = 1e-4
__C.TRAIN.LR_END = 1e-6
__C.TRAIN.WARMUP_EPOCHS = 4
试试?
from tensorflow2.0-examples.
__C.TRAIN.LR_INIT = 1e-4 __C.TRAIN.LR_END = 1e-6 __C.TRAIN.WARMUP_EPOCHS = 4
试试?
其实warmup有什么用的,我还打算设置成0
from tensorflow2.0-examples.
醉了,有什么用?自己看 https://arxiv.org/pdf/1812.01187.pdf
from tensorflow2.0-examples.
restore上次的weight继续训练,还需要warmup吗?
外行入门,还是要抽空看看书😂
from tensorflow2.0-examples.
如果loss没有出现Nan,就不用warmup
from tensorflow2.0-examples.
I'm having the same issue. Could I please get an english explanation?
from tensorflow2.0-examples.
@YunYang1994 could I get a quick english translation please?
from tensorflow2.0-examples.
Any update on this?
from tensorflow2.0-examples.
I am facing same problem, any updates on this?
from tensorflow2.0-examples.
I solved the issue by reducing the learning rate and using warmup epochs. The learning rate slowly increases and then decreases, and never gets too high. This will prevent the model from diverging (NaN loss). Hope this helps!
from tensorflow2.0-examples.
Any update on this?
if your giou firstly turned out nan, it is likely that there is something wrong in the defined giou function. In my experiment, I found the union_area = 0, so the IOU = infinity. Correspondingly, you could debug it by edit the giou function. My improper method is adding a small enough number in the end of this place: (because I haven't really find the root cause of this bug)
union_area = boxes1_area + boxes2_area - inter_area + 1e-10
already try this, and seems working fine. Thanks!
from tensorflow2.0-examples.
Related Issues (20)
- Can we train with multiple scale training procedure? HOT 1
- anxious for transformer
- 请问大家是怎么自制数据集的呢 HOT 1
- 为什么这里没有开启多尺度训练了?
- 关于 postprocess_boxes(pred_bbox, org_img_shape, input_size, score_threshold)
- test.py nan HOT 2
- 运行test.py只有第一张图片有测试结果,剩下的图片都没有检测框出现 HOT 3
- 请问如何计算FLOPs?
- InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [2704,54] vs. shape[1] = [8112,18] [Op:ConcatV2] name: concat
- YOLOv3 model doesn't converge HOT 7
- Confused about convolutional layer implementation in YOLOv3
- 关于SSD卷积代替全连接层的请教
- Question about the image_preprocess function
- 本人小白,请教迁移学习,初始化权重训练自己的数据!
- Problem about ResNet.
- 运行video_demo.py时,只有第一帧预测正确后面都是nan,也就是后面没有预测框 HOT 1
- yolov3 mAP on PASCAL VOC2012
- Error when training with my own dataset using train.py: Failed to get convolution algorithm.
- 请问为什么求conf_loss和prob_loss时的logits用的是feature map的输出而不用decode后得到的概率值,感谢!
- 这个没有download怎么下载
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow2.0-examples.