google / dopamine Goto Github PK
View Code? Open in Web Editor NEWDopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Home Page: https://github.com/google/dopamine
License: Apache License 2.0
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Home Page: https://github.com/google/dopamine
License: Apache License 2.0
微信ID:17987666
I tried to run dopamine on my GPU machine w/ Ubuntu 16.04.4 and CUDA 9.0. I was following the testing and training instruction in the provided Readme file under virtualenv. The testing and training was running fine but all on CPU only (high CPU utilization and Zero GPU utilization all the way after one iteration is finished). I'm running using "dopamine/agents/dqn/configs/dqn.gin" and the configuration uses GPU:0 as tf_device by default. Does any body have any pointer on such kind of situation?
Hi everyone, I am reading the great IQN paper and following the implementation, but I find that the definition of loss function is slightly different from described in IQN and the previous QR-DQN:
https://github.com/google/dopamine/blob/master/dopamine/agents/implicit_quantile/implicit_quantile_agent.py#L348
Why is the final quantiled huber loss divided by kappa ?
quantile_huber_loss = (
tf.abs(replay_quantiles - tf.stop_gradient(
tf.to_float(bellman_errors < 0))) * huber_loss) / self.kappa
Although kappa equals to 1.0, thus make no difference. I'm confused here, is it a typo?
observation_dtype
is initialized in the beginning. But in the _build_replay_buffer
method it only uses observation_shape
, but not puts observation_dtype
in its arguments.
when I run:
python tests/replay_memory/circular_replay_buffer_test.py
I get the following errors:
python tests/replay_memory/circular_replay_buffer_test.py
.......................2018-11-05 09:56:04.804020: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
..2018-11-05 09:56:04.895912: W tensorflow/core/framework/op_kernel.cc:1263] Unknown: exceptions.RuntimeError: Cannot sample a batch with fewer than stack size (4) + update_horizon (1) transitions.
Traceback (most recent call last):
File "/home/clock/dopamine-env/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 206, in call
ret = func(*args)
File "/home/clock/dopamine-env/lib/python2.7/site-packages/dopamine/replay_memory/circular_replay_buffer.py", line 467, in sample_transition_batch
indices = self.sample_index_batch(batch_size)
File "/home/clock/dopamine-env/lib/python2.7/site-packages/dopamine/replay_memory/circular_replay_buffer.py", line 417, in sample_index_batch
format(self._stack_size, self._update_horizon))
RuntimeError: Cannot sample a batch with fewer than stack size (4) + update_horizon (1) transitions.
Ran 32 tests in 6.195s
OK
And all other tests in tests directory work fine. This circular_replay_buffer_test.py is the only exception.
Just skimmed through the Document section, couldn't find any information regarding installation in Windows 10, will this framework work for windows environment?
In considering this framework for a practical project, it was not clear to me how to scale it up.
I think it would need a distributed prioritised replay/Ape-X at minimum.
I was considering using VecEnv from OpenAI, but even then we would need a thread-safe prioritised replay.
I looked for ape-x implementations on github but didnt see one I liked enough to try to integrate.
I bet this library would see a lot more use if it had a nice scale up story.
Can it work in tensorflow version <= 1.4.1 and cuda version <= 8.0 ?
May I have a quick question about how is this set of markdown api doc generated?
Thanks
I've been trying to download the various checkpoint files as per the instructions at the bottom of: https://github.com/google/dopamine/tree/master/docs.
I've tried numerous variations of agents, games, runs and suffixes but every time I receive an error stating there is no object by that name.
I am typing the urls directly into Chrome.
Here's an example that I've tried:
https://storage.cloud.google.com/download-dopamine-rl/lucid/dqn/qbert/1/tf_ckpt-199.index
I assumed you used the same naming for the games as the names given in the dropdown on https://google.github.io/dopamine/baselines/plots.html.
Qustion as the tille ,I hava installed all of the requirement package,but the error I can`t fix it. Otherwise I execute the ‘python tests/atari_init_test.py’ the error is ‘no moudle named dopamine.atari’
Thanks!
Thank you for opensourcing such a great code!
I have questions about your IQN implementation, especially on how it can reproduce the scores reported by th paper.
First, your config file https://github.com/google/dopamine/blob/master/dopamine/agents/implicit_quantile/configs/implicit_quantile_icml.gin specifies N=N'=64. How did you choose these values?
Second, can the IQN implementation reproduce the scores reported by the paper? I ran it by myself against six games, but the results do not match the paper.
I used this command:
python3 -um dopamine.atari.train '--agent_name=implicit_quantile' '--base_dir=results' '--gin_files=dopamine/agents/implicit_quantile/configs/implicit_quantile_icml.gin' '--gin_bindings=AtariPreprocessing.terminal_on_life_loss=True' "--gin_bindings=Runner.game_name='Breakout'"
Here is the tensorboard plot I got:
Seeing Figure 7 of the IQN paper, they report the raw score of 342,016 for Asterix, 42,776 for Beam Rider, 734 for Breakout, 25,750 for Q*Bert, 30,140 for Seaquest, 28,888 for Space Invaders. Have you successfully reproduced scores on the same level? If yes, how? If no, are you aware of any differences of implementation or settings from DeepMind's?
Is this extendable for policy gradient or actor-critic architectures? Or would one have to do major re-workings? I'm trying to decide whether to use this framework for a project or implement from scratch. I will be using A2C. Any advice would be appreciated!
Could you possibly comment on the difficult of getting dopamine to work with continuous action spaces? Is this something that could be done with a bit of effort, or are the agents completely incompatible with this type of action space? I wanted to have a go getting dopamine to work with the OpenAI Gym BipedalWalker environment. I managed to get all of the initialization and setup working, then realized an assumption of discreet action spaces was hard-coded into dopamine.
Has anyone thought and implemented a multithreaded or parallel or distributed version of dopamine? If not, I will start the process of doing so. If so, please let me know. Thanks!
Just a feature request. I think this would be much more easily adaptable to non-atari environments if we could configure the neural network via the agent's gin file, or something similar. I think it would also be beneficial to use keras instead of slim, if possible, for those of us who want to easily build more complex networks.
Hi there,
When I try to run "python tests/atari_init_test.py", the following error message appears:
ImportError: No module named atari_py
However, atari_py was installed using pip install atari-py, and when try again installing atari-py, it indicates: requirement already satisfied: atari_py in /usr/local/lib/python2.7/dist-packages
System environment information are:
Ubuntu 16.04;
Python: 2.7/15 | Anaconda, Inc. | default, May 1 2018
Any way to help locate atari_py?
Cheers,
Hi, I checked the plot on the page provided by your guys. And I find the scores are quite different from the scores reported in the papers. Take the game Asterix
for example. In the Rainbow paper, it reports Rainbow reaches 428,200.3 after 200 iterations. And in the IQN paper, it reports IQN reaches 342,016 after 200 iterations. But in the baselines, the scores are less than 20,000. So I wonder what the difference in the implement between the paper and the baselines, and how can I use the baselines score when I conduct some new researches.
BTW, I run a Rainbow experiment with the rainbow_aaai.gin
configuration. And the score is better than the baseline. I haven't finished the experiment. Can this configuration reproduce the scores reported in the papers?
Having these implementations of baselines and their training curves is a fantastic help -- thanks! Is there any intention of releasing an implementation of DQN-CTS or DQN-PixelCNN at any point?
Is it possible to apply this framework for two player zero sum game environments? Any tips for doing so would be appreciated.
The compiled pickle files are available here
We make use of these compiled pickle files in both agents and the statistics colabs.
I downloaded these files, but i don't know how to use them, can anyone help me.
When I run 'python tests/atari_init_test.py', it shows below:
Traceback (most recent call last):
File "tests/atari_init_test.py", line 47, in test_atari_init
train.main([])
File "/Users/kudou/Documents/codes/dopamine/dopamine/atari/train.py", line 128, in main
launch_experiment(create_runner, create_agent)
File "/Users/kudou/Documents/codes/dopamine/dopamine/atari/train.py", line 116, in launch_experiment
run_experiment.load_gin_configs(FLAGS.gin_files, FLAGS.gin_bindings)
File "/Users/kudou/anaconda2/envs/py36/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 488, in getattr
raise _exceptions.UnparsedFlagAccessError(error_message)
absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --gin_files before flags were parsed.
Ran 2 tests in 0.001s
FAILED (errors=1)
Thanks for this great project!
There's been quite a few issues (f.e. #3 #36) regarding customising Dopamine to work on new environments. It also seems like there are quite a few people who have made or are making forks that allow for this.
So I was wondering:
python tests/atari_init_test.py
Should be:
python tests/dopamine/atari_init_test.py
Is it possible to somehow increase the number of environments that the agents interact with in parallell? I have very low CPU usage and would like to increase the throughput, but I can't find any parameter controlling this.
Hi guys,
I think this is a basic question but still posting it here.
If i want to use this frame work for my own problem ex- to Solve 2048, where so i start with.
yes I am looking at beginners guide or some thing same.
When I follow these steps to set up dopamine,everything seems ok until testing the "dopamine/atari/train.py.".The problem is:
MemoryError: In call to configurable 'WrappedReplayBuffer' (<unbound method WrappedReplayBuffer.__init__>) In call to configurable 'DQNAgent' (<unbound method DQNAgent.__init__>) In call to configurable 'Runner' (<unbound method Runner.__init__>)
when loading content using experimental_data = colab_utils.load_baselines('/content')
I got this error You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat
.
I'm not sure what's wrong. I ran this on my local computer. I have 'c51, dqn, implicit_quantile, quantile, rainbow' in 'content' folder. Is this pandas related problem?
dopamine/dopamine/agents/dqn/dqn_agent.py
Lines 430 to 432 in bc66570
Documentation in AtariPreprocessing.step
is incorrect. The documentation says it returns is_episode_over: bool, reports whether the episode is actually over
, but it does not.
I'm trying to run https://colab.research.google.com/github/google/dopamine/blob/master/dopamine/colab/agents.ipynb
on local machine.
I got this error for both examples
/home/lukas/anaconda3/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
[2018-10-14 13:18:26,861] Making new env: AsterixNoFrameskip-v0
Traceback (most recent call last):
File "/home/lukas/dopamine/dopamine/agents/luska_1/luska_1.py", line 51, in
max_steps_per_episode=100)
File "/home/lukas/anaconda3/lib/python3.6/site-packages/gin/config.py", line 1032, in wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/home/lukas/anaconda3/lib/python3.6/site-packages/gin/utils.py", line 48, in augment_exception_message_and_reraise
six.raise_from(proxy.with_traceback(exception.traceback), None)
File "", line 2, in raise_from
File "/home/lukas/anaconda3/lib/python3.6/site-packages/gin/config.py", line 1009, in wrapper
return fn(*new_args, **new_kwargs)
File "/home/lukas/dopamine/dopamine/atari/run_experiment.py", line 157, in init
self._agent = create_agent_fn(self._sess, self._environment, summary_writer=self._summary_writer)
TypeError: create_random_dqn_agent() got an unexpected keyword argument 'summary_writer'
In call to configurable 'Runner' (<function Runner.init at 0x7f9b5b24b6a8>)
Thanks!
mac os High Sierra 10.13.6
error message:
OSError: dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/atari_py/ale_interface/build/libale_c.so, 6): no suitable image found. Did find:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/atari_py/ale_interface/build/libale_c.so: mach-o, but built for simulator (not macOS)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/atari_py/ale_interface/build/libale_c.so: mach-o, but built for simulator (not macOS)
It seems that this framework only support discrete action spaces. Do you have any plans for supporting continuous action spaces?
Will this eventually allow supporting custom gym environments? Or is it purpose built for atari baselines?
I'm trying to adapt this to a non-atari environment that has simple, flat 1d observations, however the code, while commented to suggest you can use a tuple OBSERVATION_SHAPE, breaks down when you try to use anything other than a square observation space due to the following lines:
state_shape = [1, OBSERVATION_SHAPE, OBSERVATION_SHAPE, STACK_SIZE] self.state = np.zeros(state_shape)
you get a "TypeError: 'tuple' object cannot be interpreted as an integer" unless OBSERVATION_SHAPE is anything other than an integer. However, the docs for the replay buffer indicate we should be able to use a tuple for observation_shape instead:
"observation_shape: tuple or int. If int, the observation is assumed to be a 2D square."
I should be able to work through this fairly quickly, but you may want to address this so users can more easily use this in experiments that don't involve playing atari games.
Hello there,
First and foremost, thank you for opensourcing those agents !
I would like to try to use them in a context where inputs are structured/tabular data.
What do you think would be the best (and/or quickest) approach between :
And lastly, is there already something in the project allowing us to process tabular/structured data or do you plan to add it in future releases ?
Thank you in advance for you answers.
Nicolas.
Can you pause and resume training sessions using the rainbow agent?
Hi,
I want to use Dopamine with my own game environnement.
I don't use asterix, pong ou atari environnement.
Dopamine allows to do stuff like that ?
Thanks
First, thanks for sharing dopamine with all of us. It definitely improves productivity!
In the Minh et al. (2015) Nature paper on DQN, Extended Data Table 1 lists the gradient momentum as 0.95. However, in dqn_nature.gin the value seems to be set to 0.0:
tf.train.RMSPropOptimizer.learning_rate = 0.00025
tf.train.RMSPropOptimizer.decay = 0.95
tf.train.RMSPropOptimizer.momentum = 0.0
tf.train.RMSPropOptimizer.epsilon = 0.00001
tf.train.RMSPropOptimizer.centered = True
I believe this particular configuration file is supposed to offer the same hyperparameter settings as the Nature paper. Or perhaps I'm not interpreting the parameters correctly. Anyway, should it be 0.0 or 0.95?
Thanks much!
--Ted
As above?
Would it too hard to add a working example of the cool IMPALA Trainer?
Also, how do you hook IMPALA to custom Envs and Actors?
Thanks in advance!
Hi,
I am facing a problem with installing and testing Dopamine. I am using an Ubuntu 18 OS and followed the exact instruction on the main page for "install via source" without getting any error. However, when I execute the commands on "Running tests" section I get errors
Command:
python tests/atari_init_test.py
Output:
ImportError: No module named dopamine.atari
Command:
python -um dopamine.atari.train
--agent_name=dqn
--base_dir=/tmp/dopamine
--gin_files='dopamine/agents/dqn/configs/dqn.gin'
Output:
Illegal instruction (core dumped)
I appreciate any suggestion on how I can solve this issue.
I want use DQNAgent to train my network with state shape [1, 14, 14, 5], state_shape is not the parameter of DQNAgent, I need to modify the code in dqn_agent.py before to use it. so state shape as the parameter of DQNAgent maybe better.
ofter I train my network, I cannot change the eval mode of DQNAgent, so eval_mode as the parameter of DQNAgent maybe better.
I don't get why you keep using Python 2?
Checkpointing is done to not repeat time consuming training, it should happen at regular time intervals. Logs can blow up in size if each iteration does not take a lot of time.
I recommend logging every 0.5 secs and checkpointing every 5 mins
For a given game and run (say Qbert/1/) the following files are included:
However, the function dopamine.common.checkpointer.get_latest_checkpoint_number looks for files with sentinel_checkpoint_complete.*
to determine the largest checkpoint file to load.
def get_latest_checkpoint_number(base_directory):
"""Returns the version number of the latest completed checkpoint.
Args:
base_directory: str, directory in which to look for checkpoint files.
Returns:
int, the iteration number of the latest checkpoint, or -1 if none was found.
"""
glob = os.path.join(base_directory, 'sentinel_checkpoint_complete.*')
def extract_iteration(x):
return int(x[x.rfind('.') + 1:])
try:
checkpoint_files = tf.gfile.Glob(glob)
except tf.errors.NotFoundError:
return -1
try:
latest_iteration = max(extract_iteration(x) for x in checkpoint_files)
return latest_iteration
except ValueError:
return -1
The list checkpoint_files
is empty.
As a result the check on dopamine.atart.run_experiment.py:204 fails and unbundle, which would read in the provided checkpoint files, never gets called on the agent.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.