Giter VIP home page Giter VIP logo

Comments (11)

moono avatar moono commented on July 23, 2024 1

Hello, @arnoegw,

I have question about usage of REGULARIZATION_LOSSES.
Is it like following? Am I on right track?

...

module = hub.Module('...', trainable=True, tags={'train'})
start_from = module(inputs)
logits = tf.layers.dense(start_from, units=n_output_class, activation=None)
...

loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
loss += tf.add_n(reg_losses)

from hub.

arnoegw avatar arnoegw commented on July 23, 2024 1

Hi @moono

Yup, this looks right (for training -- be sure to not set tags={'train'} during eval or inference).

TensorFlow offers some syntactic sugar for getting the regularization losses:
tf.losses.get_regularization_losses() for the list,
tf.losses.get_regularization_loss() for its sum.

from hub.

moono avatar moono commented on July 23, 2024 1

@arnoegw

I was seeing much higher train accuracy than evaluate accuracy.
But as you mentioned, removing tags={"train"} fixed the issue.
Thank you so much :)

from hub.

arnoegw avatar arnoegw commented on July 23, 2024 1

Hello @himaprasoonpt, please see this StackOverflow answer answer. (In short: Solutions differ for TF1 and TF2. In TF1, you'd need to checkpoint weights and restore into a new graph built with switched tags.)

from hub.

arnoegw avatar arnoegw commented on July 23, 2024

from hub.

alabatie avatar alabatie commented on July 23, 2024

Hello @arnoegw and @moono,

I am a colleague of @MeTaNoV. As suggested by his question, we are currently trying to fine-tune a pretrained inception-v3 from TF Hub on our specific classification task. Our first goal (that we haven't yet achieved) is simply to reproduce the results previously obtained with the Caffe framework.

Following your response, we implemented a train graph that instantiates the TF module with hub.Module(module_spec, trainable=True, tags={"train"}) and a test graph that instantiates the TF module with hub.Module(module_spec). As in our Caffe implementation, we reduce the learning rate for the convolutional layers by a factor 10 compared to the final classification layer, using the following trick before the classification layer:cnn_output_tensor = 1/10. * cnn_output_tensor + (1-1/10.0) * tf.stop_gradient(cnn_output_tensor)

An additional important problem was related to the batch normalization layers. In order to work correctly at test time, the moving averages for the batch means and variances need to be updated during training. It seems that these updates are not done by default, which requires to either perform manually the update_op or to include it in a control dependency. Here is what we implemented to automatically perform the updates:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
updates = tf.group(*update_ops)
cross_entropy = control_flow_ops.with_dependencies([updates], cross_entropy)

Even with this implementation, we still don't manage to reproduce our previous results with Caffe.

On today I implemented to fetch variables from batch normalization layers, and to write their histograms in summaries for tensorboard visualization. The visualization shows that moving averages are indeed updated during training, but it also shows that beta variables seem to be fixed throughout training.

I understand that gamma variables are not present since they are redundant with the next convolutional layers in case of ReLU activation. However I would expect beta variables to be very important before a ReLU activation. And I would expect that the normalization effect of batch normalization layers combined with non-trainable beta variables is very detrimental (from our tests, it seems we loose ~4% in our final top-1 accuracy). Is this analysis correct ? Would you have a fix for this ?

Thank you very much in advance.
Antoine

from hub.

arnoegw avatar arnoegw commented on July 23, 2024

Hi Antoine,

what you describe is a general TensorFlow subtlety about running UPDATE_OPS. As far as I know, it's all the same whether they come out of a TensorFlow Hub module, or directly from Python code using batch normalization.

Usually, training is done with a train_op that combines the gradient updates from the optimizer with the elements of the UPDATE_OPS collection. The helper function tf.contrib.training.create_train_op does that by returning a train_op that is the total_loss with control_dependencies on both the update_ops and the grad_updates.

I recommend to do something similar in your code.

Just putting a control dependency on the loss does not automatically put a control dependency on its gradient; cf. the final example snippet in the API docs for tf.Graph.control_dependencies.

I agree that not running UPDATE_OPS to keep the moving averages of batch norm for inference in sync with the per-batch statistics seen during training (or fine-tuning) will likely cause a serious degradation of quality.

Hope that helps,
Arno

from hub.

alabatie avatar alabatie commented on July 23, 2024

Thank you very much Arno for the quick answer.

I don't think there's a problem with our moving average updates, since we can now visualize these variables evolving during the training.

What concerned me was beta variables that didn't seem to be updated. However I managed to spot slight variations of beta (probably only slight due to how small we set the learning rate in the module part) in my latest visualizations: https://screenshots.firefox.com/Gc0s298lAiaIFpIP/localhost

This means that these layers are correctly trained. Thus we are still wondering why we can't reproduce the results we obtained with Caffe.

from hub.

arnoegw avatar arnoegw commented on July 23, 2024

Hi Antoine, I'm glad you could clear the UPDATE_OPS issue (If you evaluate cross_entropy at every step for loss reporting, your code will work, albeit not from backprop alone.), and also the training of beta (batch norm's learned output mean).

Are you still seeing a difference to Caffe? That's such a wide question, it's hard for me to answer. REGULARIZATION_LOSSES of the module were already discussed upthread. There might also be differences in regularizing the classifier you put on top (dropout?, weight decay?), data augmentation, the optimizer and its learning rate schedule, Polyak averaging of the model weights, ...

from hub.

rsethur avatar rsethur commented on July 23, 2024

Hello @alabatie can you share your findings, please?
I'm leveraging TF Hub as well and would appreciate your findings.

from hub.

himaprasoonpt avatar himaprasoonpt commented on July 23, 2024

If I am using a hub module as follows
module = hub.Module('...', trainable=True, tags={'train'})
module_out = module(input)
layer2 = somelayer(moduleout)
Defines losses and optimizer
After training is complete, when I run layer2(final layer) in infer mode, should I change the module tag? If yes how can I do that? Should I be using some sort of placeholder to switch tag? The batch norm mode has to be changed right?
@arnoegw

from hub.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.