Giter VIP home page Giter VIP logo

clovaai / assembled-cnn Goto Github PK

View Code? Open in Web Editor NEW
329.0 21.0 41.0 3.9 MB

Tensorflow implementation of "Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network"

License: Apache License 2.0

Python 98.55% Shell 1.45%
convolutional-neural-networks tensorflow image-classification deep-learning transfer-learning computer-vision mce imagenet inference-throughput robustness food-101

assembled-cnn's Issues

Invalid argument: logits and labels must be broadcastable: logits_size=[1,101] labels_size=[1,1001]

Strangely, when I print the shape of logits and labels out, I find both of them are [None, 1001] instead of [1, 101].
Thank you very much for any help!

Environment

Ubuntu 18.04 LTS
Tensorflow 1.15, Cuda 10.0.13

Run

CUDA_VISIBLE_DEVICES=1 python main_classification.py \
--dataset_name=imagenet \
--data_dir=${DATA_DIR} \
--model_dir=${MODEL_DIR} \
--benchmark_log_dir=train.log \
--benchmark_logger_type=BenchmarkFileLogger \
--preprocessing_type=imagenet_224_256 \
--batch_size=1 \
--mixup_type=1 \
--autoaugment_type=imagenet \
--resnet_version=2 \
--resnet_size=50 \
--use_sk_block=True \
--anti_alias_type=sconv \
--anti_alias_filter_size=3 \
--use_dropblock=True \
--num_gpus=1 \
--learning_rate_decay_type=cosine \
--weight_decay=1e-4 \
--base_learning_rate=0.4 \
--momentum=0.9 \
--lr_warmup_epochs=5 \
--zero_gamma=True \
--label_smoothing=0.1 \
--kd_temp=1 \
--dtype=fp16 \
--epochs_between_evals=10 \
--train_epochs=600

Error

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: logits and labels must be broadcastable: logits_size=[1,101] labels_size=[1,1001]
	 [[node softmax_cross_entropy_loss/xentropy (defined at /anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
	 [[cross_entropy/_13281]]
  (1) Invalid argument: logits and labels must be broadcastable: logits_size=[1,101] labels_size=[1,1001]
	 [[node softmax_cross_entropy_loss/xentropy (defined at /anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'softmax_cross_entropy_loss/xentropy':
  File "/anaconda3/envs/tf15/lib/python3.7/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/anaconda3/envs/tf15/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 880, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/assemble_cnn/Assemble-CNN/functions/model_fns.py", line 239, in model_fn_cls
    p=params)
  File "/assemble_cnn/Assemble-CNN/nets/run_loop_classification.py", line 144, in resnet_model_fn
    cross_entropy = cls_losses.get_sup_loss(logits, onehot_labels, global_step, num_classes, p)
  File "/assemble_cnn/Assemble-CNN/losses/cls_losses.py", line 33, in get_sup_loss
    label_smoothing=p['label_smoothing'], weights=weights)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/ops/losses/losses_impl.py", line 782, in softmax_cross_entropy
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 3105, in softmax_cross_entropy_with_logits_v2
    labels=labels, logits=logits, axis=axis, name=name)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 3206, in softmax_cross_entropy_with_logits_v2_helper
    precise_logits, labels, name=name)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 11458, in softmax_cross_entropy_with_logits
    name=name)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/anaconda3/envs/tf15/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

The baseline performance of R50

Thanks for your great work. The baseline performance in Table 1,2,3 of paper are all 76.3, but in the Table 6,7,8, and 10 all are 76.87, waht is the difference of the baseline?

'NoneType' object has no attribute 'startswith'

I want use assembled-cnn to finetune my own dataset, with script as:
image

but run error as follows:
2020-04-07 13:49:57.225 I: Running local_init_op.
2020-04-07 13:49:57.382 I: Done running local_init_op.
2020-04-07 13:49:59.904892: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-04-07 13:50:00.878725: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-04-07 13:50:14.567 I: Finished evaluation at 2020-04-07-13:50:14
2020-04-07 13:50:14.568 I: Saving dict for global step 0: accuracy = 0.005, accuracy_top_5 = 0.024, ece = 0.99431515, global_step = 0, loss = 94.326965
---->>>>>>> current_ckpt= /home/zhaoyanmei/projects/assembled-cnn/models/
2020-04-07 13:50:15.498 W: From /home/zhaoyanmei/projects/assembled-cnn/utils/checkpoint_utils.py:72: The name tf.gfile.IsDirectory is deprecated. Please use tf.io.gfile.isdir instead.

Traceback (most recent call last):
File "main_classification.py", line 64, in
absl_app.run(main)
File "/root/miniconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/root/miniconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "main_classification.py", line 58, in main
run(flags.FLAGS)
File "main_classification.py", line 54, in run
num_images=dataset.num_images, zeroshot_eval=flags_obj.zeroshot_eval)
File "/home/zhaoyanmei/projects/assembled-cnn/nets/run_loop_classification.py", line 480, in resnet_main
ckpt_keeper.save(metric, flags_obj.model_dir)
File "/home/zhaoyanmei/projects/assembled-cnn/utils/checkpoint_utils.py", line 75, in save
if current_ckpt.startswith('hdfs'):
AttributeError: 'NoneType' object has no attribute 'startswith'

Can some know what's this problems?

Finetune with Assemble-ResNet152, but failed

Hi, I can finetune with Assemble-ResNet50, but when I try with Assemble-ResNet152, errors occured as follow:

2020-04-09 00:59:00.514 I: resnet_model/stage4/batch_normalization_9/gamma:0
2020-04-09 00:59:00.514 I: resnet_model/stage4/batch_normalization_9/beta:0
2020-04-09 00:59:00.514 I: *********** end var_list_warm_start **************
2020-04-09 00:59:00.514 I: Fine-tuning from /data/zhaoyanmei/projects/assembled-cnn/pretrain/Assemble-ResNet152/model.ckpt-750683
2020-04-09 00:59:00.514 W: From /root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-04-09 00:59:00.520 I: Restoring parameters from /data/zhaoyanmei/projects/assembled-cnn/pretrain/Assemble-ResNet152/model.ckpt-750683
2020-04-09 01:04:37.134675: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-04-09 01:04:43.513984: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [32] vs. [64]
[[{{node replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1}}]]
(1) Invalid argument: Incompatible shapes: [32] vs. [64]
[[{{node replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1}}]]
[[Momentum/update_3_826/update/_64128]]
0 successful operations.
3 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main_classification.py", line 64, in
absl_app.run(main)
File "/root/miniconda3/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/root/miniconda3/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "main_classification.py", line 58, in main
run(flags.FLAGS)
File "main_classification.py", line 54, in run
num_images=dataset.num_images, zeroshot_eval=flags_obj.zeroshot_eval)
File "/data/zhaoyanmei/projects/assembled-cnn/nets/run_loop_classification.py", line 475, in resnet_main
eval_results = train_and_evaluate(train_hooks)
File "/data/zhaoyanmei/projects/assembled-cnn/nets/run_loop_classification.py", line 435, in train_and_evaluate
steps=flags_obj.max_train_steps)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1156, in _train_model
return self._train_model_distributed(input_fn, hooks, saving_listeners)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1219, in _train_model_distributed
self._config._train_distribute, input_fn, hooks, saving_listeners)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1329, in _actual_train_model_distributed
saving_listeners)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
run_metadata=run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
raise six.reraise(*original_exc_info)
File "/root/miniconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
return self._sess.run(*args, **kwargs)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1411, in run
run_metadata=run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1169, in run
return self._sess.run(*args, **kwargs)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/root/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [32] vs. [64]
[[node replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1 (defined at data/zhaoyanmei/projects/assembled-cnn/nets/model_helper.py:37) ]]
(1) Invalid argument: Incompatible shapes: [32] vs. [64]
[[node replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1 (defined at data/zhaoyanmei/projects/assembled-cnn/nets/model_helper.py:37) ]]
[[Momentum/update_3_826/update/_64128]]
0 successful operations.
3 derived errors ignored.

Original stack trace for 'replica_2/resnet_model/stage0/pool/batch_normalization_1/AssignMovingAvg_1/sub_1':
File "root/miniconda3/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "root/miniconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 911, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "data/zhaoyanmei/projects/assembled-cnn/functions/model_fns.py", line 239, in model_fn_cls
p=params)
File "data/zhaoyanmei/projects/assembled-cnn/nets/run_loop_classification.py", line 122, in resnet_model_fn
use_resnet_d=p['use_resnet_d'], keep_prob=keep_prob)
File "data/zhaoyanmei/projects/assembled-cnn/nets/resnet_model.py", line 402, in call
little0 = batch_norm(little0, training, self.data_format, momentum=self.bn_momentum)
File "data/zhaoyanmei/projects/assembled-cnn/nets/model_helper.py", line 37, in batch_norm
scale=True, training=training, fused=True, gamma_initializer=gamma_initializer, name=name)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/layers/normalization.py", line 327, in batch_normalization
return layer.apply(inputs, training=training)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
return self.call(inputs, *args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 537, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in call
outputs = call_fn(inputs, *args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 446, in converted_call
return _call_unconverted(f, args, kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 253, in _call_unconverted
return f(*args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/layers/normalization.py", line 167, in call
return super(BatchNormalization, self).call(inputs, training=training)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 651, in call
outputs = self._fused_batch_norm(inputs, training=training)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 534, in _fused_batch_norm
self.add_update(variance_update, inputs=True)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1082, in add_update
updates = [process_update(x) for x in updates]
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1082, in
updates = [process_update(x) for x in updates]
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1070, in process_update
return update()
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1066, in
update = lambda: process_update(x())
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 531, in variance_update
momentum, inputs_size)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/normalization.py", line 454, in _assign_moving_average
variable - math_ops.cast(value, variable.dtype)) * decay
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 386, in sub
def sub(self, o): return self.get() - o
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 1045, in _run_op
return tensor_oper(a.value(), *args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 884, in binary_op_wrapper
return func(x, y, name=name)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 10855, in sub
"Sub", x=x, y=y, name=name)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "root/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()

[root@ca6c7b6992e6 assembled-cnn]#

Is there any param I need to adjust?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.