Comments (11)
Hello, @arnoegw,
I have question about usage of REGULARIZATION_LOSSES.
Is it like following? Am I on right track?
...
module = hub.Module('...', trainable=True, tags={'train'})
start_from = module(inputs)
logits = tf.layers.dense(start_from, units=n_output_class, activation=None)
...
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
loss += tf.add_n(reg_losses)
from hub.
Hi @moono
Yup, this looks right (for training -- be sure to not set tags={'train'} during eval or inference).
TensorFlow offers some syntactic sugar for getting the regularization losses:
tf.losses.get_regularization_losses()
for the list,
tf.losses.get_regularization_loss()
for its sum.
from hub.
I was seeing much higher train accuracy than evaluate accuracy.
But as you mentioned, removing tags={"train"}
fixed the issue.
Thank you so much :)
from hub.
Hello @himaprasoonpt, please see this StackOverflow answer answer. (In short: Solutions differ for TF1 and TF2. In TF1, you'd need to checkpoint weights and restore into a new graph built with switched tags.)
from hub.
from hub.
I am a colleague of @MeTaNoV. As suggested by his question, we are currently trying to fine-tune a pretrained inception-v3 from TF Hub on our specific classification task. Our first goal (that we haven't yet achieved) is simply to reproduce the results previously obtained with the Caffe framework.
Following your response, we implemented a train graph that instantiates the TF module with hub.Module(module_spec, trainable=True, tags={"train"})
and a test graph that instantiates the TF module with hub.Module(module_spec)
. As in our Caffe implementation, we reduce the learning rate for the convolutional layers by a factor 10 compared to the final classification layer, using the following trick before the classification layer:cnn_output_tensor = 1/10. * cnn_output_tensor + (1-1/10.0) * tf.stop_gradient(cnn_output_tensor)
An additional important problem was related to the batch normalization layers. In order to work correctly at test time, the moving averages for the batch means and variances need to be updated during training. It seems that these updates are not done by default, which requires to either perform manually the update_op or to include it in a control dependency. Here is what we implemented to automatically perform the updates:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
updates = tf.group(*update_ops)
cross_entropy = control_flow_ops.with_dependencies([updates], cross_entropy)
Even with this implementation, we still don't manage to reproduce our previous results with Caffe.
On today I implemented to fetch variables from batch normalization layers, and to write their histograms in summaries for tensorboard visualization. The visualization shows that moving averages are indeed updated during training, but it also shows that beta variables seem to be fixed throughout training.
I understand that gamma variables are not present since they are redundant with the next convolutional layers in case of ReLU activation. However I would expect beta variables to be very important before a ReLU activation. And I would expect that the normalization effect of batch normalization layers combined with non-trainable beta variables is very detrimental (from our tests, it seems we loose ~4% in our final top-1 accuracy). Is this analysis correct ? Would you have a fix for this ?
Thank you very much in advance.
Antoine
from hub.
Hi Antoine,
what you describe is a general TensorFlow subtlety about running UPDATE_OPS. As far as I know, it's all the same whether they come out of a TensorFlow Hub module, or directly from Python code using batch normalization.
Usually, training is done with a train_op that combines the gradient updates from the optimizer with the elements of the UPDATE_OPS collection. The helper function tf.contrib.training.create_train_op does that by returning a train_op
that is the total_loss
with control_dependencies
on both the update_ops
and the grad_updates
.
I recommend to do something similar in your code.
Just putting a control dependency on the loss does not automatically put a control dependency on its gradient; cf. the final example snippet in the API docs for tf.Graph.control_dependencies.
I agree that not running UPDATE_OPS to keep the moving averages of batch norm for inference in sync with the per-batch statistics seen during training (or fine-tuning) will likely cause a serious degradation of quality.
Hope that helps,
Arno
from hub.
Thank you very much Arno for the quick answer.
I don't think there's a problem with our moving average updates, since we can now visualize these variables evolving during the training.
What concerned me was beta variables that didn't seem to be updated. However I managed to spot slight variations of beta (probably only slight due to how small we set the learning rate in the module part) in my latest visualizations: https://screenshots.firefox.com/Gc0s298lAiaIFpIP/localhost
This means that these layers are correctly trained. Thus we are still wondering why we can't reproduce the results we obtained with Caffe.
from hub.
Hi Antoine, I'm glad you could clear the UPDATE_OPS issue (If you evaluate cross_entropy
at every step for loss reporting, your code will work, albeit not from backprop alone.), and also the training of beta
(batch norm's learned output mean).
Are you still seeing a difference to Caffe? That's such a wide question, it's hard for me to answer. REGULARIZATION_LOSSES of the module were already discussed upthread. There might also be differences in regularizing the classifier you put on top (dropout?, weight decay?), data augmentation, the optimizer and its learning rate schedule, Polyak averaging of the model weights, ...
from hub.
Hello @alabatie can you share your findings, please?
I'm leveraging TF Hub as well and would appreciate your findings.
from hub.
If I am using a hub module as follows
module = hub.Module('...', trainable=True, tags={'train'})
module_out = module(input)
layer2 = somelayer(moduleout)
Defines losses and optimizer
After training is complete, when I run layer2(final layer) in infer mode, should I change the module tag? If yes how can I do that? Should I be using some sort of placeholder to switch tag? The batch norm mode has to be changed right?
@arnoegw
from hub.
Related Issues (20)
- Bug: Seg fault for univerversal sentence encoder v3 with tf 2.14 HOT 3
- Bug: ValueError when loading model previously cached but now missing HOT 6
- tensorflow needed with AVX AVX2 & FMA HOT 3
- Bug: AttributeError: module 'tensorflow_hub.tf_v1' has no attribute 'estimator' HOT 1
- Feature request: TensorFlow Hub and HuggingFace integration HOT 3
- Bug: Missing function register_module_for_export HOT 4
- Bug: Getting CORS issue in getting facemesh model.json HOT 3
- Bug: TF Hub is not compatible with Keras 3 and TensorFlow 2.16+ HOT 1
- Bug: 403 CORS error -tfhub/imagenet/mobilenet. Issue was last night, resolved this morning. Occurred again 3/26 1:47pm est HOT 12
- Model import issue HOT 3
- Tensorflow hub upgrades Tensorflow?
- Bug: Building tensorflow 2.15/2.16 from source is not possible : Missing tensorrt HOT 2
- Bug: CORS fetching model face-detection HOT 14
- Bug: Getting CORS error while loading tensorflow-models/pose-detection, tensorflow-models/face-detection from today HOT 5
- Bug: AttributeError: module 'numpy' has no attribute 'object'. HOT 2
- Update object detection link in web site HOT 3
- Bug: Intermittent CORS issues on qna model fetch
- Trouble Running TensorFlow v2.16.1 with NVIDIA GeForce 940MX GPU HOT 2
- Bug: tfds.load split error
- <Android> Tensorflow crashs on Android15 ( with 16kb page size support)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hub.