Giter VIP home page Giter VIP logo

Comments (8)

hosang avatar hosang commented on September 28, 2024

You need to either provide the model to the test script (--model) or specify it in the config file.

from gossipnet.

hosang avatar hosang commented on September 28, 2024

Also, could you create a pull request for the update of tensorflow version once you're relatively certain your changes are correct? Did you already train a model?

from gossipnet.

geekvc avatar geekvc commented on September 28, 2024

Thank you for your attention!
Ok, When I make sure there is no mistakes, I will pull a request.
I have trained the coco_person model and everything goes well. However, when I want to train the coco_mulitclass with command python train.py -c experiments/coco_multiclass/conf.yaml error occured.
the command python train.py -c experiments/coco_multiclass/conf.yaml -v goes well.

python train.py -c experiments/coco_multiclass/conf.yaml
{'ROOT_DIR': '/mnt/geekvc/gossipnet',
 'gnet': {'bias_const_init': 0.01,
          'block_dim': 64,
          'freeze_n_imfeat_layers': 3,
          'gt_match_thresh': 0.5,
          'imfeat_dim': 1024,
          'imfeats': False,
          'load_imfeats': False,
          'neighbor_feats': False,
          'neighbor_thresh': 0.2,
          'num_block_fc': 2,
          'num_block_pw_fc': 2,
          'num_blocks': 16,
          'num_predict_fc': 3,
          'num_pwfeat_fc': 3,
          'pairfeat_dim': 64,
          'predict_fc_dim': 128,
          'pw_feat_multiplyer': 1.0,
          'pwfeat_dim': 256,
          'pwfeat_narrow_dim': 32,
          'reduced_dim': 32,
          'shortcut_dim': 128,
          'weight_init': 'xavier'},
 'image_max_size': 1000,
 'image_target_size': 600,
 'imfeat_crop_height': 7,
 'imfeat_crop_width': 7,
 'log_dir': './log',
 'pixel_mean': [123.68, 116.779, 103.939],
 'prefetch_q_size': 20,
 'random_seed': 42,
 'resnet_type': '101',
 'test': {'imdb': 'coco_2014_minival'},
 'train': {'det_min_size': 4,
           'detector': 'FRCN_train',
           'display_iter': 20,
           'flip': True,
           'gradient_clipping': -1.0,
           'histograms': False,
           'imdb': 'coco_2014_train',
           'loss_multiplyer': 1.0,
           'lr_multi_step': [[800000, 0.0001], [2000000, 1e-05]],
           'max_num_detections': 600,
           'model_init': None,
           'momentum': 0.9,
           'normalize_loss': False,
           'num_iter': 2000000,
           'only_class': '',
           'optimizer': 'adam',
           'pos_weight': 0.3,
           'pretrained_model': '',
           'resume': None,
           'save_iter': 20000,
           'val_imdb': 'coco_2014_minival',
           'val_iter': 20000,
           'weight_decay': 0.0005}}
reading /mnt/geekvc/gossipnet/data/cache/coco_2014_train_FRCN_train_imdb_cache.pkl
preparing train imdb
82783 images: 29343449 detections, 7038 crowd annotations, 590009 non-crowd annotations
dropping images without detections
82778 images: 29343449 detections, 7038 crowd annotations, 589998 non-crowd annotations
dropping all but 600 highest scoring detections
82778 images: 28675419 detections, 7038 crowd annotations, 589998 non-crowd annotations
appending flipped images
165556 images: 57350838 detections, 14076 crowd annotations, 1179996 non-crowd annotations
done
doing multiclass NMS
WARNING:tensorflow:From /mnt/geekvc/gossipnet/nms_net/network.py:314: add_loss (from tensorflow.contrib.framework.python.ops.arg_scope) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
WARNING:tensorflow:From train.py:238: get_total_loss (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.get_total_loss instead.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:261: get_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.get_losses instead.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/losses/python/losses/loss_ops.py:263: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.get_regularization_losses instead.
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:95: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
reading /mnt/geekvc/gossipnet/data/cache/coco_2014_minival_FRCN_train_imdb_cache.pkl
preparing test imdb
5000 images: 1821412 detections, 446 crowd annotations, 35821 non-crowd annotations
dropping images without detections
4999 images: 1821412 detections, 446 crowd annotations, 35820 non-crowd annotations
done
doing multiclass NMS
WARNING:tensorflow:From /mnt/geekvc/gossipnet/nms_net/network.py:314: add_loss (from tensorflow.contrib.framework.python.ops.arg_scope) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.add_loss instead.
2017-11-01 01:42:36.029294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:02:00.0
Total memory: 12.00GiB
Free memory: 11.72GiB
2017-11-01 01:42:36.202270: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x58f1b550 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-01 01:42:36.203458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:03:00.0
Total memory: 12.00GiB
Free memory: 11.90GiB
2017-11-01 01:42:36.401837: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x58f1ed40 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-01 01:42:36.402953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 2 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:83:00.0
Total memory: 12.00GiB
Free memory: 11.90GiB
2017-11-01 01:42:36.635491: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x58f22530 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-11-01 01:42:36.636130: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 3 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:84:00.0
Total memory: 12.00GiB
Free memory: 11.90GiB
2017-11-01 01:42:36.636305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 0 and 2
2017-11-01 01:42:36.636349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 0 and 3
2017-11-01 01:42:36.636401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 1 and 2
2017-11-01 01:42:36.636436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 1 and 3
2017-11-01 01:42:36.636570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 2 and 0
2017-11-01 01:42:36.636652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 2 and 1
2017-11-01 01:42:36.636818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 3 and 0
2017-11-01 01:42:36.636864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 3 and 1
2017-11-01 01:42:36.637010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 1 2 3
2017-11-01 01:42:36.637041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y Y N N
2017-11-01 01:42:36.637061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 1:   Y Y N N
2017-11-01 01:42:36.637079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 2:   N N Y Y
2017-11-01 01:42:36.637097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 3:   N N Y Y
2017-11-01 01:42:36.637124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0)
2017-11-01 01:42:36.637147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:03:00.0)
2017-11-01 01:42:36.637168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K40c, pci bus id: 0000:83:00.0)
2017-11-01 01:42:36.637188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K40c, pci bus id: 0000:84:00.0)
2017-11-01 01:43:15.547229: W tensorflow/core/kernels/queue_base.cc:295] _0_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
  File "train.py", line 416, in <module>
    main()
  File "train.py", line 412, in main
    train(args.resume, args.visualize)
  File "train.py", line 320, in train
    feed_dict={learning_rate: lr_gen.get_lr(it)})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1118, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1315, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentErrorException in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "train.py", line 86, in load_and_enqueue
    sess.run(enqueue_op, feed_dict=food)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1118, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1315, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    raise type(e)(node_def, op, message)
CancelledError: Enqueue operation was cancelled
         [[Node: fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](fifo_queue, _arg_Placeholder_0_0, _arg_Placeholder_1_0_1, _arg_Placeholder_2_0_2, _arg_Placeholder_3_0_3, _arg_Placeholder_4_0_4, _arg_Placeholder_5_0_5)]]

Caused by op u'fifo_queue_enqueue', defined at:
  File "train.py", line 416, in <module>
    main()
  File "train.py", line 412, in main
    train(args.resume, args.visualize)
  File "train.py", line 230, in train
    Gnet.get_batch_spec(train_imdb['num_classes']))
  File "train.py", line 95, in setup_preloading
    enqueue_op = q.enqueue([ph for _, ph in enqueue_placeholders])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 321, in enqueue
    self._queue_ref, vals, name=scope)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1589, in _queue_enqueue_v2
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2619, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1205, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

CancelledError (see above for traceback): Enqueue operation was cancelled
         [[Node: fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](fifo_queue, _arg_Placeholder_0_0, _arg_Placeholder_1_0_1, _arg_Placeholder_2_0_2, _arg_Placeholder_3_0_3, _arg_Placeholder_4_0_4, _arg_Placeholder_5_0_5)]]


: Input to reshape is a tensor with 5129792 values, but the requested shape has 177812937145164
         [[Node: gradients/gnet/block16/build_context/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/gnet/block16/build_context/concat_grad/tuple/control_dependency_1, gradients/gnet/block16/build_context/Gather_grad/concat)]]
         [[Node: Adam/update/_400 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_9229_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op u'gradients/gnet/block16/build_context/Gather_grad/Reshape', defined at:
  File "train.py", line 416, in <module>
    main()
  File "train.py", line 412, in main
    train(args.resume, args.visualize)
  File "train.py", line 240, in train
    optimized_loss, net.trainable_variables)
  File "train.py", line 76, in get_optimizer
    clip_gradient_norm=cfg.train.gradient_clipping)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 440, in create_train_op
    check_numerics=check_numerics)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/training/python/training/training.py", line 439, in create_train_op
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 542, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 348, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 542, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_grad.py", line 371, in _GatherGrad
    values = array_ops.reshape(grad, values_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 2602, in reshape
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2619, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1205, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

...which was originally created as op u'gnet/block16/build_context/Gather', defined at:
  File "train.py", line 416, in <module>
    main()
[elided 0 identical lines from previous traceback]
  File "train.py", line 412, in main
    train(args.resume, args.visualize)
  File "train.py", line 233, in train
    weight_reg=reg, class_weights=class_weights)
  File "/mnt/geekvc/gossipnet/nms_net/network.py", line 253, in __init__
    pair_n_idxs, pw_feats, weight_reg)
  File "/mnt/geekvc/gossipnet/nms_net/network.py", line 368, in _block
    c_feats = tf.gather(feats, pair_c_idxs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 2410, in gather
    validate_indices=validate_indices, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1202, in gather
    validate_indices=validate_indices, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2619, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1205, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 5129792 values, but the requested shape has 177812937145164
         [[Node: gradients/gnet/block16/build_context/Gather_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/gnet/block16/build_context/concat_grad/tuple/control_dependency_1, gradients/gnet/block16/build_context/Gather_grad/concat)]]
         [[Node: Adam/update/_400 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_9229_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

from gossipnet.

hosang avatar hosang commented on September 28, 2024

The backprop in a gather operation seems to go wrong in the multiclass version only. Did you add gather operations will upgrading the code? I think it happens in the last block first because that's where backprop touches the blocks first.

I hope this helps a bit. I'll see if I can find time to work on upgrading the code to newer tensorflow.

from gossipnet.

hosang avatar hosang commented on September 28, 2024

Did this issue resolve?

from gossipnet.

junaidtariqjojo13 avatar junaidtariqjojo13 commented on September 28, 2024

hy friends am facing problem with the argumentparser n the jupyter IDE when i am using the easy dictionary but its again giving errors kindly help me to resolve this issue

  • [ ]

from gossipnet.

junaidtariqjojo13 avatar junaidtariqjojo13 commented on September 28, 2024
  • [ usage: ipykernel_launcher.py [-h] [--no-cuda] [--model_name MODEL_NAME]
    [--l_r L_R] [--weight_decay WEIGHT_DECAY]
    [--batch_size BATCH_SIZE]
    [--dim_latent DIM_LATENT] [--num_epoch NUM_EPOCH]
    [--num_user NUM_USER] [--num_item NUM_ITEM]
    [--aggr_mode AGGR_MODE] [--num_layer NUM_LAYER]
    [--has_id HAS_ID] [--concat CONCAT]
    ipykernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-4b72a416-91ce-4374-aa63-53a799d22613.json
    An exception has occurred, use %tb to see the full traceback.

SystemExit: 2
] how to resolve this error please help me

from gossipnet.

junaidtariqjojo13 avatar junaidtariqjojo13 commented on September 28, 2024

i did it with easy dict but its also giving error is i have to write all the code in the easy dictionary?

from gossipnet.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.