I've extracted image patches successfully, however I get the following error when runn

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Errors with Train.py about hover_net HOT 18 CLOSED

vqdang commented on August 20, 2024

Errors with Train.py

from hover_net.

Comments (18)

luke2997 commented on August 20, 2024 1

I mean the virtual environment is fully set up with python 3.6..... however it seems to be initialising with 2.7 as you said. I'll try see how I can fix this.

Yeah - it seems one of the packages i installed manually through conda has simultaneously downgraded python. Will update.

from hover_net.

simongraham commented on August 20, 2024

Hi @luke2997

It seems that you have not put the GPU ids as a string. If using GPUs 0 and 1, use:

python train.py --gpu='0,1'

Please let us know if this fixes the issue.

from hover_net.

luke2997 commented on August 20, 2024

@simongraham - this doesn't fix the problem unfortunately.

from hover_net.

simongraham commented on August 20, 2024

Can you let me know what tensorpack version you are using and then copy the command that you use in the terminal.

from hover_net.

luke2997 commented on August 20, 2024

I have both tensorFlow v1.12 and tensorpack 0.9.0.1. In the command I type:

python train.py --gpu='0,1'

I also get the below output file, not sure if related.

from hover_net.

simongraham commented on August 20, 2024

The output is telling you that there is already a checkpoint file where you plan to save your logs. You need to press k (keep), d (delete) or q (quit) depending on what you want to do.

I'm still not sure exactly what you are requesting here? Are you getting an error? If so, please supply the terminal output.

from hover_net.

luke2997 commented on August 20, 2024

Yeah i did delete it, i think I now realise the source of the issue being a couple of modules are not importing properly due to libpng12 ... i'll update when I get this resolved.

from hover_net.

simongraham commented on August 20, 2024

I will close this issue for now. Please reopen if necessary, with a specific question- then we can be of more assistance.

from hover_net.

luke2997 commented on August 20, 2024

Please reopen: I had to start all over again as there were issues with my venv. Extract_patches.py worked before but when I run it now I get the below code.. is it something to do with the dataset?

from hover_net.

vqdang commented on August 20, 2024

@luke2997 , for the record, you can reopen it by yourself.
Please show me

hover_net/src/extract_patches.py

Lines 27 to 28 in 909ef03

 step_size = [80, 80] # should match self.train_mask_shape (config.py)  

 win_size = [540, 540] # should be at least twice time larger than

in your code. You have to read the error message, it says that the input is of wrong format (float)

Also, please provide as many details as possible for what you have changed in the code compared to the github version.

from hover_net.

luke2997 commented on August 20, 2024

Well I only changed the paths which is why I didn't understand why the error came. So I have the same lines as those above.

I had a look and tried running config.py and it seems I get an error related to #12 (comment). So it seems it is an error with my paths, although it's the same paths i used before which then worked!

One thing that may be causing it is when i first run the code I get this error:

/lib64/libstdc++.so.6: version CXXABI_1.3.9' not found`

And I export it using

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/*username*/.conda/envs/*VirtualEnvironment*/lib/

which then fixes it but causes the error above.

from hover_net.

vqdang commented on August 20, 2024

That is very strange, may be your new environment broke something. For now, you can try change this

hover_net/src/misc/patch_extractor.py

Line 82 in 909ef03

return flag, last_step

into return flag, int(last_step) to enforce all input to range is int.

from hover_net.

vqdang commented on August 20, 2024

Also, the preferred library version is listed here https://github.com/vqdang/hover_net/blob/master/requirements.txt

So you may want to check if it matched. In case you need to reinstall, use the following as guideline.

conda create --name test python=3.6
conda activate test
pip install opencv-python=3.2 scipy scikit-image pandas matplotlib
pip install --upgrade git+https://github.com/tensorpack/[email protected]
pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.12.0-cp36-cp36m-linux_x86_64.whl
source activate test

from hover_net.

luke2997 commented on August 20, 2024

Perfect thanks a lot! Changing that line of code fixed it and I was able to successfully extract patches. However, I have a few errors with train.py suggesting TF GPU is not working, so I will try create a new virtual environment as you suggested and try again, anyway as perhaps there was a fault using tensorflow gpu. Although it is a bit of a pain getting these packages installed all together for some reason.

File "train.py", line 278, in <module> trainer.run() File "train.py", line 244, in run self.run_once(opt, sess_init=init_weights, save_dir=log_dir) File "train.py", line 188, in run_once model = self.get_model()(**model_flags) File "/lustre/home/acct-clsyzs/clsyzs/Luke/hover_net-master/src/model/graph.py", line 113, in __init__ assert tf.test.is_gpu_available()

from hover_net.

simongraham commented on August 20, 2024

Hi @luke2997 ,

As @vqdang suggested- please setup your environment from scratch to ensure there are no issues with library versions.

Although it is a bit of a pain getting these packages installed all together for some reason.

Installing the libraries should be simple and easy enough if you follows @vqdang 's instructions. This is only a few lines in the terminal. Of course you need to make sure you run the commands separately- line by line.

After this has been done let us know and we can advise how to proceed. Whaat CUDA version do you have installed?

from hover_net.

luke2997 commented on August 20, 2024

I'm in Mainland China so some channels get blocked, e.g. using the command above doesn't work for tensorflow, but anyway I do have all the requirements I believe and I have restarted and now got a little further, I assume GPU is working now! However I do have an output but get the long below error. Also I have Cudatoolkit 9.2 and cudnn 7.6.5.

Traceback (most recent call last): File "train.py", line 278, in <module> trainer.run() File "train.py", line 244, in run self.run_once(opt, sess_init=init_weights, save_dir=log_dir) File "train.py", line 215, in run_once launch_train_with_config(config, SyncMultiGPUTrainerParameterServer(nr_gpus)) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/train/interface.py", line 97, in launch_train_with_config extra_callbacks=config.extra_callbacks) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/train/base.py", line 341, in train_with_defaults steps_per_epoch, starting_epoch, max_epoch) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/train/base.py", line 313, in train self.main_loop(steps_per_epoch, starting_epoch, max_epoch) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/utils/argtools.py", line 176, in wrapper return func(*args, **kwargs) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/train/base.py", line 278, in main_loop self.run_step() # implemented by subclass File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/train/base.py", line 181, in run_step self.hooked_sess.run(self.train_op) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run run_metadata=run_metadata) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1156, in run run_metadata=run_metadata) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run raise six.reraise(*original_exc_info) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1240, in run return self._sess.run(*args, **kwargs) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1312, in run run_metadata=run_metadata) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1076, in run return self._sess.run(*args, **kwargs) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Integer division by zero [[node tower0/div_12 (defined at /lustre/home/acct-clsyzs/clsyzs/Luke/hover_net-master/src/model/utils.py:182) = FloorDiv[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower0/sub_13, tower0/sub_14)]] Caused by op u'tower0/div_12', defined at: File "train.py", line 278, in <module> trainer.run() File "train.py", line 244, in run self.run_once(opt, sess_init=init_weights, save_dir=log_dir) File "train.py", line 215, in run_once launch_train_with_config(config, SyncMultiGPUTrainerParameterServer(nr_gpus)) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/train/interface.py", line 87, in launch_train_with_config model._build_graph_get_cost, model.get_optimizer) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/utils/argtools.py", line 176, in wrapper return func(*args, **kwargs) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/train/tower.py", line 204, in setup_graph train_callbacks = self._setup_graph(input, get_cost_fn, get_opt_fn) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/train/trainers.py", line 106, in _setup_graph self._make_get_grad_fn(input, get_cost_fn, get_opt_fn), get_opt_fn) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/graph_builder/training.py", line 161, in build grad_list = DataParallelBuilder.build_on_towers(self.towers, get_grad_fn, devices) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/graph_builder/training.py", line 119, in build_on_towers ret.append(func()) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/train/tower.py", line 232, in get_grad_fn cost = get_cost_fn(*input.get_input_tensors()) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/tfutils/tower.py", line 284, in __call__ output = self._tower_fn(*args) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/graph_builder/model_desc.py", line 246, in _build_graph_get_cost ret = self.build_graph(*inputs) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorpack/graph_builder/model_desc.py", line 162, in build_graph return self._build_graph(args) File "/lustre/home/acct-clsyzs/clsyzs/Luke/hover_net-master/src/model/graph.py", line 305, in _build_graph true_np = colorize(true_np[...,0], cmap='jet') File "/lustre/home/acct-clsyzs/clsyzs/Luke/hover_net-master/src/model/utils.py", line 182, in colorize value = (value - vmin) / (vmax - vmin) # vmin..vmax File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 866, in binary_op_wrapper return func(x, y, name=name) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 999, in _div_python2 return gen_math_ops.floor_div(x, y, name=name) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 3079, in floor_div "FloorDiv", x=x, y=y, name=name) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/lustre/home/acct-clsyzs/clsyzs/.conda/envs/hovernew/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__ self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): Integer division by zero [[node tower0/div_12 (defined at /lustre/home/acct-clsyzs/clsyzs/Luke/hover_net-master/src/model/utils.py:182) = FloorDiv[T=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower0/sub_13, tower0/sub_14)]]

from hover_net.

simongraham commented on August 20, 2024

It looks like you are using python 2.7. As stated in the requirements you need to use python 3.6.

from hover_net.

luke2997 commented on August 20, 2024

Right, thanks a lot for the help I appreciate it, I've successfully trained the data after changing the Python Version!

from hover_net.

Errors with Train.py about hover_net HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	step_size = [80, 80] # should match self.train_mask_shape (config.py)
	win_size = [540, 540] # should be at least twice time larger than