Comments (8)
You need to either provide the model to the test script (--model) or specify it in the config file.
from gossipnet.
Also, could you create a pull request for the update of tensorflow version once you're relatively certain your changes are correct? Did you already train a model?
from gossipnet.
Thank you for your attention!
Ok, When I make sure there is no mistakes, I will pull a request.
I have trained the coco_person
model and everything goes well. However, when I want to train the coco_mulitclass
with command python train.py -c experiments/coco_multiclass/conf.yaml
error occured.
the command python train.py -c experiments/coco_multiclass/conf.yaml -v
goes well.
python train.py -c experiments/coco_multiclass/conf.yaml
{'ROOT_DIR': '/mnt/geekvc/gossipnet',
'gnet': {'bias_const_init': 0.01,
'block_dim': 64,
'freeze_n_imfeat_layers': 3,
'gt_match_thresh': 0.5,
'imfeat_dim': 1024,
'imfeats': False,
'load_imfeats': False,
'neighbor_feats': False,
'neighbor_thresh': 0.2,
'num_block_fc': 2,
'num_block_pw_fc': 2,
'num_blocks': 16,
'num_predict_fc': 3,
'num_pwfeat_fc': 3,
'pairfeat_dim': 64,
'predict_fc_dim': 128,
'pw_feat_multiplyer': 1.0,
'pwfeat_dim': 256,
'pwfeat_narrow_dim': 32,
'reduced_dim': 32,
'shortcut_dim': 128,
'weight_init': 'xavier'},
'image_max_size': 1000,
'image_target_size': 600,
'imfeat_crop_height': 7,
'imfeat_crop_width': 7,
'log_dir': './log',
'pixel_mean': [123.68, 116.779, 103.939],
'prefetch_q_size': 20,
'random_seed': 42,
'resnet_type': '101',
'test': {'imdb': 'coco_2014_minival'},
'train': {'det_min_size': 4,
'detector': 'FRCN_train',
'display_iter': 20,
'flip': True,
'gradient_clipping': -1.0,
'histograms': False,
'imdb': 'coco_2014_train',
'loss_multiplyer': 1.0,
'lr_multi_step': [[800000, 0.0001], [2000000, 1e-05]],
'max_num_detections': 600,
'model_init': None,
'momentum': 0.9,
'normalize_loss': False,
'num_iter': 2000000,
'only_class': '',
'optimizer': 'adam',
'pos_weight': 0.3,
'pretrained_model': '',
'resume': None,
'save_iter': 20000,
'val_imdb': 'coco_2014_minival',
'val_iter': 20000,
'weight_decay': 0.0005}}
reading /mnt/geekvc/gossipnet/data/cache/coco_2014_train_FRCN_train_imdb_cache.pkl
preparing train imdb
82783 images: 29343449 detections, 7038 crowd annotations, 590009 non-crowd annotations
dropping images without detections
82778 images: 29343449 detections, 7038 crowd annotations, 589998 non-crowd annotations
dropping all but 600 highest scoring detections
82778 images: 28675419 detections, 7038 crowd annotations, 589998 non-crowd annotations
appending flipped images
165556 images: 57350838 detections, 14076 crowd annotations, 1179996 non-crowd annotations
done
doing multiclass NMS
WARNING:tensorflow:From /mnt/geekvc/gossipnet/nms_net/network.py:314: add_loss (from tensorflow.contrib.framework.python.ops.arg_scope) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
WARNING:tensorflow:From train.py:238: get_total_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.get_total_loss instead.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:261: get_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.get_losses instead.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:263: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.get_regularization_losses instead.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:95: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
reading /mnt/geekvc/gossipnet/data/cache/coco_2014_minival_FRCN_train_imdb_cache.pkl
preparing test imdb
5000 images: 1821412 detections, 446 crowd annotations, 35821 non-crowd annotations
dropping images without detections
4999 images: 1821412 detections, 446 crowd annotations, 35820 non-crowd annotations
done
doing multiclass NMS
WARNING:tensorflow:From /mnt/geekvc/gossipnet/nms_net/network.py:314: add_loss (from tensorflow.contrib.framework.python.ops.arg_scope) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
2017-11-01 01:42:36.029294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:02:00.0
Total memory: 12.00GiB
Free memory: 11.72GiB
2017-11-01 01:42:36.202270: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x58f1b550 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-01 01:42:36.203458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:03:00.0
Total memory: 12.00GiB
Free memory: 11.90GiB
2017-11-01 01:42:36.401837: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x58f1ed40 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-01 01:42:36.402953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 2 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:83:00.0
Total memory: 12.00GiB
Free memory: 11.90GiB
2017-11-01 01:42:36.635491: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x58f22530 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-01 01:42:36.636130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 3 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:84:00.0
Total memory: 12.00GiB
Free memory: 11.90GiB
2017-11-01 01:42:36.636305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 0 and 2
2017-11-01 01:42:36.636349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 0 and 3
2017-11-01 01:42:36.636401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 1 and 2
2017-11-01 01:42:36.636436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 1 and 3
2017-11-01 01:42:36.636570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 2 and 0
2017-11-01 01:42:36.636652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 2 and 1
2017-11-01 01:42:36.636818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 3 and 0
2017-11-01 01:42:36.636864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 3 and 1
2017-11-01 01:42:36.637010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 1 2 3
2017-11-01 01:42:36.637041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y Y N N
2017-11-01 01:42:36.637061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 1: Y Y N N
2017-11-01 01:42:36.637079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 2: N N Y Y
2017-11-01 01:42:36.637097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 3: N N Y Y
2017-11-01 01:42:36.637124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0)
2017-11-01 01:42:36.637147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:03:00.0)
2017-11-01 01:42:36.637168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K40c, pci bus id: 0000:83:00.0)
2017-11-01 01:42:36.637188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K40c, pci bus id: 0000:84:00.0)
2017-11-01 01:43:15.547229: W tensorflow/core/kernels/queue_base.cc:295] _0_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "train.py", line 416, in <module>
main()
File "train.py", line 412, in main
train(args.resume, args.visualize)
File "train.py", line 320, in train
feed_dict={learning_rate: lr_gen.get_lr(it)})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1118, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1315, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentErrorException in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "train.py", line 86, in load_and_enqueue
sess.run(enqueue_op, feed_dict=food)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1118, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1315, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
raise type(e)(node_def, op, message)
CancelledError: Enqueue operation was cancelled
[[Node: fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](fifo_queue, _arg_Placeholder_0_0, _arg_Placeholder_1_0_1, _arg_Placeholder_2_0_2, _arg_Placeholder_3_0_3, _arg_Placeholder_4_0_4, _arg_Placeholder_5_0_5)]]
Caused by op u'fifo_queue_enqueue', defined at:
File "train.py", line 416, in <module>
main()
File "train.py", line 412, in main
train(args.resume, args.visualize)
File "train.py", line 230, in train
Gnet.get_batch_spec(train_imdb['num_classes']))
File "train.py", line 95, in setup_preloading
enqueue_op = q.enqueue([ph for _, ph in enqueue_placeholders])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 321, in enqueue
self._queue_ref, vals, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1589, in _queue_enqueue_v2
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2619, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1205, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
CancelledError (see above for traceback): Enqueue operation was cancelled
[[Node: fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](fifo_queue, _arg_Placeholder_0_0, _arg_Placeholder_1_0_1, _arg_Placeholder_2_0_2, _arg_Placeholder_3_0_3, _arg_Placeholder_4_0_4, _arg_Placeholder_5_0_5)]]
: Input to reshape is a tensor with 5129792 values, but the requested shape has 177812937145164
[[Node: gradients/gnet/block16/build_context/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/gnet/block16/build_context/concat_grad/tuple/control_dependency_1, gradients/gnet/block16/build_context/Gather_grad/concat)]]
[[Node: Adam/update/_400 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_9229_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'gradients/gnet/block16/build_context/Gather_grad/Reshape', defined at:
File "train.py", line 416, in <module>
main()
File "train.py", line 412, in main
train(args.resume, args.visualize)
File "train.py", line 240, in train
optimized_loss, net.trainable_variables)
File "train.py", line 76, in get_optimizer
clip_gradient_norm=cfg.train.gradient_clipping)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 440, in create_train_op
check_numerics=check_numerics)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/training/python/training/training.py", line 439, in create_train_op
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 542, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 348, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 542, in <lambda>
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_grad.py", line 371, in _GatherGrad
values = array_ops.reshape(grad, values_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 2602, in reshape
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2619, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1205, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op u'gnet/block16/build_context/Gather', defined at:
File "train.py", line 416, in <module>
main()
[elided 0 identical lines from previous traceback]
File "train.py", line 412, in main
train(args.resume, args.visualize)
File "train.py", line 233, in train
weight_reg=reg, class_weights=class_weights)
File "/mnt/geekvc/gossipnet/nms_net/network.py", line 253, in __init__
pair_n_idxs, pw_feats, weight_reg)
File "/mnt/geekvc/gossipnet/nms_net/network.py", line 368, in _block
c_feats = tf.gather(feats, pair_c_idxs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 2410, in gather
validate_indices=validate_indices, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1202, in gather
validate_indices=validate_indices, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2619, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1205, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 5129792 values, but the requested shape has 177812937145164
[[Node: gradients/gnet/block16/build_context/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/gnet/block16/build_context/concat_grad/tuple/control_dependency_1, gradients/gnet/block16/build_context/Gather_grad/concat)]]
[[Node: Adam/update/_400 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_9229_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
from gossipnet.
The backprop in a gather operation seems to go wrong in the multiclass version only. Did you add gather operations will upgrading the code? I think it happens in the last block first because that's where backprop touches the blocks first.
I hope this helps a bit. I'll see if I can find time to work on upgrading the code to newer tensorflow.
from gossipnet.
Did this issue resolve?
from gossipnet.
hy friends am facing problem with the argumentparser n the jupyter IDE when i am using the easy dictionary but its again giving errors kindly help me to resolve this issue
- [ ]
from gossipnet.
- [ usage: ipykernel_launcher.py [-h] [--no-cuda] [--model_name MODEL_NAME]
[--l_r L_R] [--weight_decay WEIGHT_DECAY]
[--batch_size BATCH_SIZE]
[--dim_latent DIM_LATENT] [--num_epoch NUM_EPOCH]
[--num_user NUM_USER] [--num_item NUM_ITEM]
[--aggr_mode AGGR_MODE] [--num_layer NUM_LAYER]
[--has_id HAS_ID] [--concat CONCAT]
ipykernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-4b72a416-91ce-4374-aa63-53a799d22613.json
An exception has occurred, use %tb to see the full traceback.
SystemExit: 2
] how to resolve this error please help me
from gossipnet.
i did it with easy dict but its also giving error is i have to write all the code in the easy dictionary?
from gossipnet.
Related Issues (13)
- pretrained model HOT 1
- Error in compiling: make: *** [nms_net/matching_module/det_matching.so] Fehler 1 HOT 4
- compute Map
- roi_pooling.so Makefile HOT 1
- Trained Model for COCO multiclass setting
- Run with TF 1.12.0
- there is an error running train.py HOT 2
- roi_pooling.so: undefined symbol HOT 3
- KeyError: 'gt_classes' HOT 7
- do you have caffe version ? HOT 5
- Strong Tnet
- Problem with Makefile HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gossipnet.