During my training process, the loss of step 1 suddenly jumps from a low value (like 2

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

# Add L2 Regularization <code class="notranslate"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The loss of Step 2 about mask_rcnn HOT 9 CLOSED

matterport commented on May 18, 2024

The loss of Step 2

from mask_rcnn.

Comments (9)

waleedka commented on May 18, 2024 3

@Dref360 Did you change anything at around step 7?

The main losses to pay attention to are the individual losses like rpn_class_loss, mrcnn_bbox_loss, ..etc. You'd want to see nice graphs on those like the ones posted by @Dref360 above.

The total loss is the sum of the individual losses plus the L1 weight regularization loss. The L1 weight regularization loss is the sum across all trainable weights, so it could change drastically if you change the number of layers included in the training. So if you train the heads only and then switch to training all the layers, you'd see a big jump in the total loss because you're including more layers and therefore the sum of the L1 of the weights is larger. This is okay.

It might be a good idea to divide the L1 regularization by the number of weights to get a mean rather than a sum, and that should remove that unexpected behavior. I'll look into doing that this weekend.

from mask_rcnn.

waleedka commented on May 18, 2024

Doesn't seem normal. Does it go down afterwards? You can try a smaller learning rate and see if that improves the training.

from mask_rcnn.

YueLiao commented on May 18, 2024

The rpn_loss and mrcnn_loss are normal while the loss(l1_loss) is jumps a high value(like epoch 40: loss = 1.9,while epoch 41:loss =13.1,other loss are normal).And I try a smaller learing rate(lr = 0.001,0.0001),but it is also in this situation.

from mask_rcnn.

Dref360 commented on May 18, 2024

Yeah I have a similar problem. All the losses are small but this one.

from mask_rcnn.

Sharathnasa commented on May 18, 2024

Even I'm facing the same issue after first stage training. Not yet completed the second stage.

…

On Tue, Nov 7, 2017, 9:47 PM Frédéric Branchaud-Charron < ***@***.***> wrote: Yeah I have a similar problem. All the losses are small but this one. [image: selection_115] <https://user-images.githubusercontent.com/8976546/32504072-ea0da892-c3ac-11e7-94ed-13be7962b0e2.png> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#27 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHMgs1CzfZlkJg-F2od8JCg1SMzaVVOkks5s0IKfgaJpZM4QTD8g> .

from mask_rcnn.

leicaand commented on May 18, 2024

# Add L2 Regularization
reg_losses = [keras.regularizers.l2(self.config.WEIGHT_DECAY)(w) for w in self.keras_model.trainable_weights if 'gamma' not in w.name and 'beta' not in w.name]

Gamma and beta parameters shouldn't be included in regularization loss. (batch norm isn't updated by the backprop)

from mask_rcnn.

waleedka commented on May 18, 2024

@leicaand Good catch. I pushed the fix. Thanks.

I also pushed an update to divide the weight regularization by the number of weights so the loss is the mean of the L2 rather than the sum. This removes the confusing jump in the total loss in the graphs.

from mask_rcnn.

DingkunLiu commented on May 18, 2024

I am confused about this issue.
First, in batchnorm layer, setting trainable False means not updating the running mean and std but not the beta and gamma, and they are still trainable. Because I think beta and gamma is updated via gradient but not this update op. Also another evidence, the beta and gamma in trained model is not zero and one, indicating that they have been updated during training.
Second, does it make sense to divide the l2 loss by its size? Cause its gradient is also divided by this factor, the bigger the size of a weight matrix, the less it's updated every step by the weight regularization loss. I don't think it is a good idea.

from mask_rcnn.

DingkunLiu commented on May 18, 2024

Batchnorm has 4 different weights, running mean and std is updated by moving average operation while beta and gamma are updated via gradient. If you want to skip those aren't updated during bp, you should exclude 'moving_mean' and 'moving_variance' but not 'beta' and 'gamma'

from mask_rcnn.

The loss of Step 2 about mask_rcnn HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent