Giter VIP home page Giter VIP logo

Comments (9)

waleedka avatar waleedka commented on May 18, 2024 3

@Dref360 Did you change anything at around step 7?

The main losses to pay attention to are the individual losses like rpn_class_loss, mrcnn_bbox_loss, ..etc. You'd want to see nice graphs on those like the ones posted by @Dref360 above.

The total loss is the sum of the individual losses plus the L1 weight regularization loss. The L1 weight regularization loss is the sum across all trainable weights, so it could change drastically if you change the number of layers included in the training. So if you train the heads only and then switch to training all the layers, you'd see a big jump in the total loss because you're including more layers and therefore the sum of the L1 of the weights is larger. This is okay.

It might be a good idea to divide the L1 regularization by the number of weights to get a mean rather than a sum, and that should remove that unexpected behavior. I'll look into doing that this weekend.

from mask_rcnn.

waleedka avatar waleedka commented on May 18, 2024

Doesn't seem normal. Does it go down afterwards? You can try a smaller learning rate and see if that improves the training.

from mask_rcnn.

YueLiao avatar YueLiao commented on May 18, 2024

The rpn_loss and mrcnn_loss are normal while the loss(l1_loss) is jumps a high value(like epoch 40: loss = 1.9,while epoch 41:loss =13.1,other loss are normal).And I try a smaller learing rate(lr = 0.001,0.0001),but it is also in this situation.

from mask_rcnn.

Dref360 avatar Dref360 commented on May 18, 2024

Yeah I have a similar problem. All the losses are small but this one.
selection_115

from mask_rcnn.

Sharathnasa avatar Sharathnasa commented on May 18, 2024

from mask_rcnn.

leicaand avatar leicaand commented on May 18, 2024

# Add L2 Regularization
reg_losses = [keras.regularizers.l2(self.config.WEIGHT_DECAY)(w) for w in self.keras_model.trainable_weights if 'gamma' not in w.name and 'beta' not in w.name]

Gamma and beta parameters shouldn't be included in regularization loss. (batch norm isn't updated by the backprop)

from mask_rcnn.

waleedka avatar waleedka commented on May 18, 2024

@leicaand Good catch. I pushed the fix. Thanks.

I also pushed an update to divide the weight regularization by the number of weights so the loss is the mean of the L2 rather than the sum. This removes the confusing jump in the total loss in the graphs.

from mask_rcnn.

DingkunLiu avatar DingkunLiu commented on May 18, 2024

I am confused about this issue.
First, in batchnorm layer, setting trainable False means not updating the running mean and std but not the beta and gamma, and they are still trainable. Because I think beta and gamma is updated via gradient but not this update op. Also another evidence, the beta and gamma in trained model is not zero and one, indicating that they have been updated during training.
Second, does it make sense to divide the l2 loss by its size? Cause its gradient is also divided by this factor, the bigger the size of a weight matrix, the less it's updated every step by the weight regularization loss. I don't think it is a good idea.

from mask_rcnn.

DingkunLiu avatar DingkunLiu commented on May 18, 2024

Batchnorm has 4 different weights, running mean and std is updated by moving average operation while beta and gamma are updated via gradient. If you want to skip those aren't updated during bp, you should exclude 'moving_mean' and 'moving_variance' but not 'beta' and 'gamma'

from mask_rcnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.