clovaai / assembled-cnn Goto Github PK
View Code? Open in Web Editor NEWTensorflow implementation of "Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network"
License: Apache License 2.0
Tensorflow implementation of "Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network"
License: Apache License 2.0
Strangely, when I print the shape of logits and labels out, I find both of them are [None, 1001] instead of [1, 101].
Thank you very much for any help!
Ubuntu 18.04 LTS
Tensorflow 1.15, Cuda 10.0.13
CUDA_VISIBLE_DEVICES=1 python main_classification.py \
--dataset_name=imagenet \
--data_dir=${DATA_DIR} \
--model_dir=${MODEL_DIR} \
--benchmark_log_dir=train.log \
--benchmark_logger_type=BenchmarkFileLogger \
--preprocessing_type=imagenet_224_256 \
--batch_size=1 \
--mixup_type=1 \
--autoaugment_type=imagenet \
--resnet_version=2 \
--resnet_size=50 \
--use_sk_block=True \
--anti_alias_type=sconv \
--anti_alias_filter_size=3 \
--use_dropblock=True \
--num_gpus=1 \
--learning_rate_decay_type=cosine \
--weight_decay=1e-4 \
--base_learning_rate=0.4 \
--momentum=0.9 \
--lr_warmup_epochs=5 \
--zero_gamma=True \
--label_smoothing=0.1 \
--kd_temp=1 \
--dtype=fp16 \
--epochs_between_evals=10 \
--train_epochs=600
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: logits and labels must be broadcastable: logits_size=[1,101] labels_size=[1,1001]
[[node softmax_cross_entropy_loss/xentropy (defined at /anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[cross_entropy/_13281]]
(1) Invalid argument: logits and labels must be broadcastable: logits_size=[1,101] labels_size=[1,1001]
[[node softmax_cross_entropy_loss/xentropy (defined at /anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'softmax_cross_entropy_loss/xentropy':
File "/anaconda3/envs/tf15/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File "/anaconda3/envs/tf15/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 880, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/assemble_cnn/Assemble-CNN/functions/model_fns.py", line 239, in model_fn_cls
p=params)
File "/assemble_cnn/Assemble-CNN/nets/run_loop_classification.py", line 144, in resnet_model_fn
cross_entropy = cls_losses.get_sup_loss(logits, onehot_labels, global_step, num_classes, p)
File "/assemble_cnn/Assemble-CNN/losses/cls_losses.py", line 33, in get_sup_loss
label_smoothing=p['label_smoothing'], weights=weights)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/ops/losses/losses_impl.py", line 782, in softmax_cross_entropy
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 3105, in softmax_cross_entropy_with_logits_v2
labels=labels, logits=logits, axis=axis, name=name)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 3206, in softmax_cross_entropy_with_logits_v2_helper
precise_logits, labels, name=name)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 11458, in softmax_cross_entropy_with_logits
name=name)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
I want to use it to classify my pictures
Hi, I'm trying to finetuning on my own dataset, but I don't know how to write the run script, in your demo finetuning_assemble_on_food101.sh, there are two model path I had to provide, but from your README, I only get one pretrain model, which is Assemble-R50. I don't know how to provide a PRETRAINED_PATH, can you give me som suggestions? Thank you very much!
Would you please share the configuration for the best ResNet50 model?
Thanks for your great work. The baseline performance in Table 1,2,3 of paper are all 76.3, but in the Table 6,7,8, and 10 all are 76.87, waht is the difference of the baseline?
I want use assembled-cnn to finetune my own dataset, with script as:
but run error as follows:
2020-04-07 13:49:57.225 I: Running local_init_op.
2020-04-07 13:49:57.382 I: Done running local_init_op.
2020-04-07 13:49:59.904892: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-04-07 13:50:00.878725: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-04-07 13:50:14.567 I: Finished evaluation at 2020-04-07-13:50:14
2020-04-07 13:50:14.568 I: Saving dict for global step 0: accuracy = 0.005, accuracy_top_5 = 0.024, ece = 0.99431515, global_step = 0, loss = 94.326965
---->>>>>>> current_ckpt= /home/zhaoyanmei/projects/assembled-cnn/models/
2020-04-07 13:50:15.498 W: From /home/zhaoyanmei/projects/assembled-cnn/utils/checkpoint_utils.py:72: The name tf.gfile.IsDirectory is deprecated. Please use tf.io.gfile.isdir instead.
Traceback (most recent call last):
File "main_classification.py", line 64, in
absl_app.run(main)
File "/root/miniconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/root/miniconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "main_classification.py", line 58, in main
run(flags.FLAGS)
File "main_classification.py", line 54, in run
num_images=dataset.num_images, zeroshot_eval=flags_obj.zeroshot_eval)
File "/home/zhaoyanmei/projects/assembled-cnn/nets/run_loop_classification.py", line 480, in resnet_main
ckpt_keeper.save(metric, flags_obj.model_dir)
File "/home/zhaoyanmei/projects/assembled-cnn/utils/checkpoint_utils.py", line 75, in save
if current_ckpt.startswith('hdfs'):
AttributeError: 'NoneType' object has no attribute 'startswith'
Can some know what's this problems?
Hi, I can finetune with Assemble-ResNet50, but when I try with Assemble-ResNet152, errors occured as follow:
2020-04-09 00:59:00.514 I: resnet_model/stage4/batch_normalization_9/gamma:0
2020-04-09 00:59:00.514 I: resnet_model/stage4/batch_normalization_9/beta:0
2020-04-09 00:59:00.514 I: *********** end var_list_warm_start **************
2020-04-09 00:59:00.514 I: Fine-tuning from /data/zhaoyanmei/projects/assembled-cnn/pretrain/Assemble-ResNet152/model.ckpt-750683
2020-04-09 00:59:00.514 W: From /root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-04-09 00:59:00.520 I: Restoring parameters from /data/zhaoyanmei/projects/assembled-cnn/pretrain/Assemble-ResNet152/model.ckpt-750683
2020-04-09 01:04:37.134675: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-04-09 01:04:43.513984: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [32] vs. [64]
[[{{node replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1}}]]
(1) Invalid argument: Incompatible shapes: [32] vs. [64]
[[{{node replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1}}]]
[[Momentum/update_3_826/update/_64128]]
0 successful operations.
3 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main_classification.py", line 64, in
absl_app.run(main)
File "/root/miniconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/root/miniconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "main_classification.py", line 58, in main
run(flags.FLAGS)
File "main_classification.py", line 54, in run
num_images=dataset.num_images, zeroshot_eval=flags_obj.zeroshot_eval)
File "/data/zhaoyanmei/projects/assembled-cnn/nets/run_loop_classification.py", line 475, in resnet_main
eval_results = train_and_evaluate(train_hooks)
File "/data/zhaoyanmei/projects/assembled-cnn/nets/run_loop_classification.py", line 435, in train_and_evaluate
steps=flags_obj.max_train_steps)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1156, in _train_model
return self._train_model_distributed(input_fn, hooks, saving_listeners)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1219, in _train_model_distributed
self._config._train_distribute, input_fn, hooks, saving_listeners)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1329, in _actual_train_model_distributed
saving_listeners)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
run_metadata=run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
raise six.reraise(*original_exc_info)
File "/root/miniconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
return self._sess.run(*args, **kwargs)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1411, in run
run_metadata=run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1169, in run
return self._sess.run(*args, **kwargs)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [32] vs. [64]
[[node replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1 (defined at data/zhaoyanmei/projects/assembled-cnn/nets/model_helper.py:37) ]]
(1) Invalid argument: Incompatible shapes: [32] vs. [64]
[[node replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1 (defined at data/zhaoyanmei/projects/assembled-cnn/nets/model_helper.py:37) ]]
[[Momentum/update_3_826/update/_64128]]
0 successful operations.
3 derived errors ignored.
Original stack trace for 'replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1':
File "root/miniconda3/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "root/miniconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 911, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "data/zhaoyanmei/projects/assembled-cnn/functions/model_fns.py", line 239, in model_fn_cls
p=params)
File "data/zhaoyanmei/projects/assembled-cnn/nets/run_loop_classification.py", line 122, in resnet_model_fn
use_resnet_d=p['use_resnet_d'], keep_prob=keep_prob)
File "data/zhaoyanmei/projects/assembled-cnn/nets/resnet_model.py", line 402, in call
little0 = batch_norm(little0, training, self.data_format, momentum=self.bn_momentum)
File "data/zhaoyanmei/projects/assembled-cnn/nets/model_helper.py", line 37, in batch_norm
scale=True, training=training, fused=True, gamma_initializer=gamma_initializer, name=name)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/layers/normalization.py", line 327, in batch_normalization
return layer.apply(inputs, training=training)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
return self.call(inputs, *args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 537, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 446, in converted_call
return _call_unconverted(f, args, kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 253, in _call_unconverted
return f(*args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/layers/normalization.py", line 167, in call
return super(BatchNormalization, self).call(inputs, training=training)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 651, in call
outputs = self._fused_batch_norm(inputs, training=training)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 534, in _fused_batch_norm
self.add_update(variance_update, inputs=True)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1082, in add_update
updates = [process_update(x) for x in updates]
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1082, in
updates = [process_update(x) for x in updates]
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1070, in process_update
return update()
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1066, in
update = lambda: process_update(x())
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 531, in variance_update
momentum, inputs_size)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 454, in _assign_moving_average
variable - math_ops.cast(value, variable.dtype)) * decay
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 386, in sub
def sub(self, o): return self.get() - o
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 1045, in _run_op
return tensor_oper(a.value(), *args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 884, in binary_op_wrapper
return func(x, y, name=name)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 10855, in sub
"Sub", x=x, y=y, name=name)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()
[root@ca6c7b6992e6 assembled-cnn]#
Is there any param I need to adjust?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.