Giter VIP home page Giter VIP logo

Comments (13)

ogroth avatar ogroth commented on June 15, 2024

Hey Yangshell, there's no official description to run the code, yet. We're still experimenting with the architecture to find a good training setup to replicate the paper's results on the rooms_ring_camera dataset. Once we have a stable version and working model snapshots, we will merge the dev branch into master and write a detailed Readme.
In the meanwhile, you can check https://github.com/ogroth/tf-gqn/blob/rooms_ring_camera_training/train_gqn_draw.py
This is the run script training the GQN (provided you have downloaded the training data). However, we haven't managed to produce great visual results, yet.

from tf-gqn.

ogroth avatar ogroth commented on June 15, 2024

We have released a stable version of GQN which trains on the rooms_ring_camera dataset with the default parameters we provide. The training script is: https://github.com/ogroth/tf-gqn/blob/master/train_gqn_draw.py
Please see the Readme for detailed instructions on how to set up and run the code.

from tf-gqn.

wlred avatar wlred commented on June 15, 2024

i download the rooms_ring_camera dataset. but after i run the the script. the system will kill it. my computer has one 1080ti GPU. it not enough, needs better one ?

from tf-gqn.

ogroth avatar ogroth commented on June 15, 2024

i download the rooms_ring_camera dataset. but after i run the the script. the system will kill it. my computer has one 1080ti GPU. it not enough, needs better one ?

Hi wlred, could you please give a more detailed version of the error you get? Which script have you run (with which parameters) and what happened after it had been launched? Has it run into an out-of-memory error? A GTX 1080Ti is definitely sufficient to train the model.

from tf-gqn.

wlred avatar wlred commented on June 15, 2024

Hi ogroth, my steps:
(1)download the rooms_ring_camera dataset
(2)run the command:
1>source venv/bin/activate
2>python3 train_gqn_draw.py --data_dir /tmp/data/gqn-dataset --dataset rooms_ring_camera --model_dir /tmp/models/gqn

the output log is these:
Training a GQN.
FLAGS: Namespace(batch_size=36, chkpt_steps=10000, data_dir='/tmp/data/gqn-dataset', dataset='rooms_ring_camera', debug=False, initial_eval=False, log_steps=100, memcap=1.0, model_dir='/tmp/models/gqn', queue_buffer=64, queue_threads=4, train_epochs=40)
UNPARSED_ARGV: ['--mode_dir', '/tmp/models/gqn']
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_task_id': 0, '_service': None, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
allow_growth: true
}
, '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_master': '', '_task_type': 'worker', '_tf_random_seed': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fec2a484f28>, '_model_dir': '/tmp/models/gqn', '_save_checkpoints_steps': 10000, '_keep_checkpoint_every_n_hours': 10000, '_num_worker_replicas': 1, '_save_summary_steps': 100, '_keep_checkpoint_max': 5, '_train_distribute': None}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
kill

(3) after run the train_gqn_draw.py script. maybe 20 second. the system kill the process. and my computer is slow

from tf-gqn.

ogroth avatar ogroth commented on June 15, 2024

You seem to have a typo in your CLI parameters when calling the script:

UNPARSED_ARGV: ['--mode_dir', '/tmp/models/gqn']

That should read: --model_dir /tmp/models/gqn

from tf-gqn.

wlred avatar wlred commented on June 15, 2024

run the command: python3 train_gqn_draw.py --data_dir /tmp/data/gqn-dataset --dataset rooms_ring_camera
still killed

from tf-gqn.

wlred avatar wlred commented on June 15, 2024

the output log is these:
Training a GQN.
FLAGS: Namespace(batch_size=36, chkpt_steps=10000, data_dir='/tmp/data/gqn-dataset', dataset='rooms_ring_camera', debug=False, initial_eval=False, log_steps=100, memcap=1.0, model_dir='/tmp/models/gqn', queue_buffer=64, queue_threads=4, train_epochs=40)
UNPARSED_ARGV: []
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_log_step_count_steps': 100, '_global_id_in_cluster': 0, '_task_id': 0, '_service': None, '_session_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
allow_growth: true
}
, '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_master': '', '_task_type': 'worker', '_tf_random_seed': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fec2a484f28>, '_model_dir': '/tmp/models/gqn', '_save_checkpoints_steps': 10000, '_keep_checkpoint_every_n_hours': 10000, '_num_worker_replicas': 1, '_save_summary_steps': 100, '_keep_checkpoint_max': 5, '_train_distribute': None}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
kill

from tf-gqn.

ogroth avatar ogroth commented on June 15, 2024

Have you tried to monitor your system with htop and nvidia-smi to check whether there is any unusual behaviour in terms of CPU / GPU usage or memory allocation? That's the only thing I can think off the top of my head which could cause the OS to kill the process. Which OS are you using?

from tf-gqn.

wlred avatar wlred commented on June 15, 2024

ubuntu 16.04

from tf-gqn.

wlred avatar wlred commented on June 15, 2024

hi ogroth, How big is the memory of your computer?

from tf-gqn.

ogroth avatar ogroth commented on June 15, 2024

We've trained on machines with 32GB of RAM, but training never occupied more than 8GB at any time.

from tf-gqn.

wlred avatar wlred commented on June 15, 2024

interesting, i had run your code on 3 computers. all can not run the code. all be killed. maybe
a lot of people have the same problem

from tf-gqn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.