google / dopamine Goto Github PK

View Code? Open in Web Editor NEW

10.4K 10.4K 1.4K 21.2 MB

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Home Page: https://github.com/google/dopamine

License: Apache License 2.0

Python 15.26% Jupyter Notebook 84.63% HTML 0.08% Dockerfile 0.03% Shell 0.01%

ai google ml rl tensorflow

dopamine's Issues

马星

微信ID:17987666

What is the CUDA version supported?

I tried to run dopamine on my GPU machine w/ Ubuntu 16.04.4 and CUDA 9.0. I was following the testing and training instruction in the provided Readme file under virtualenv. The testing and training was running fine but all on CPU only (high CPU utilization and Zero GPU utilization all the way after one iteration is finished). I'm running using "dopamine/agents/dqn/configs/dqn.gin" and the configuration uses GPU:0 as tf_device by default. Does any body have any pointer on such kind of situation?

About the IQN loss

Hi everyone, I am reading the great IQN paper and following the implementation, but I find that the definition of loss function is slightly different from described in IQN and the previous QR-DQN:
https://github.com/google/dopamine/blob/master/dopamine/agents/implicit_quantile/implicit_quantile_agent.py#L348

Why is the final quantiled huber loss divided by kappa ?

quantile_huber_loss = (
tf.abs(replay_quantiles - tf.stop_gradient(
tf.to_float(bellman_errors < 0))) * huber_loss) / self.kappa

Although kappa equals to 1.0, thus make no difference. I'm confused here, is it a typo?

[Bug] Rainbow is not using observation_dtype to build replay buffer

observation_dtype is initialized in the beginning. But in the _build_replay_buffer method it only uses observation_shape, but not puts observation_dtype in its arguments.

test error in circular_replay_buffer_test.py

when I run:
python tests/replay_memory/circular_replay_buffer_test.py
I get the following errors:

python tests/replay_memory/circular_replay_buffer_test.py
.......................2018-11-05 09:56:04.804020: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
..2018-11-05 09:56:04.895912: W tensorflow/core/framework/op_kernel.cc:1263] Unknown: exceptions.RuntimeError: Cannot sample a batch with fewer than stack size (4) + update_horizon (1) transitions.
Traceback (most recent call last):

File "/home/clock/dopamine-env/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 206, in call
ret = func(*args)

File "/home/clock/dopamine-env/lib/python2.7/site-packages/dopamine/replay_memory/circular_replay_buffer.py", line 467, in sample_transition_batch
indices = self.sample_index_batch(batch_size)

File "/home/clock/dopamine-env/lib/python2.7/site-packages/dopamine/replay_memory/circular_replay_buffer.py", line 417, in sample_index_batch
format(self._stack_size, self._update_horizon))

RuntimeError: Cannot sample a batch with fewer than stack size (4) + update_horizon (1) transitions.

.......

Ran 32 tests in 6.195s

And all other tests in tests directory work fine. This circular_replay_buffer_test.py is the only exception.

will this framework work in windows 10?

Just skimmed through the Document section, couldn't find any information regarding installation in Windows 10, will this framework work for windows environment?

Scalability

In considering this framework for a practical project, it was not clear to me how to scale it up.

I think it would need a distributed prioritised replay/Ape-X at minimum.

I was considering using VecEnv from OpenAI, but even then we would need a thread-safe prioritised replay.

I looked for ape-x implementations on github but didnt see one I liked enough to try to integrate.

I bet this library would see a lot more use if it had a nice scale up story.

Required tensorflow version and cuda version are which one

Can it work in tensorflow version <= 1.4.1 and cuda version <= 8.0 ?

API doc generation

May I have a quick question about how is this set of markdown api doc generated?

Thanks

Unable to download individual baseline checkpoint files

I've been trying to download the various checkpoint files as per the instructions at the bottom of: https://github.com/google/dopamine/tree/master/docs.

I've tried numerous variations of agents, games, runs and suffixes but every time I receive an error stating there is no object by that name.

I am typing the urls directly into Chrome.

Here's an example that I've tried:
https://storage.cloud.google.com/download-dopamine-rl/lucid/dqn/qbert/1/tf_ckpt-199.index

I assumed you used the same naming for the games as the names given in the dropdown on https://google.github.io/dopamine/baselines/plots.html.

What the error 'Failed building wheel for atari-py？'

Qustion as the tille ,I hava installed all of the requirement package，but the error I can`t fix it. Otherwise I execute the ‘python tests/atari_init_test.py’ the error is ‘no moudle named dopamine.atari’
Thanks!

Reproducing the scores reported by the IQN paper

Thank you for opensourcing such a great code!

I have questions about your IQN implementation, especially on how it can reproduce the scores reported by th paper.

First, your config file https://github.com/google/dopamine/blob/master/dopamine/agents/implicit_quantile/configs/implicit_quantile_icml.gin specifies N=N'=64. How did you choose these values?

Second, can the IQN implementation reproduce the scores reported by the paper? I ran it by myself against six games, but the results do not match the paper.

I used this command:

python3 -um dopamine.atari.train '--agent_name=implicit_quantile' '--base_dir=results' '--gin_files=dopamine/agents/implicit_quantile/configs/implicit_quantile_icml.gin' '--gin_bindings=AtariPreprocessing.terminal_on_life_loss=True' "--gin_bindings=Runner.game_name='Breakout'"

Here is the tensorboard plot I got:

Seeing Figure 7 of the IQN paper, they report the raw score of 342,016 for Asterix, 42,776 for Beam Rider, 734 for Breakout, 25,750 for Q*Bert, 30,140 for Seaquest, 28,888 for Space Invaders. Have you successfully reproduced scores on the same level? If yes, how? If no, are you aware of any differences of implementation or settings from DeepMind's?

Extendability For Policy Gradients?

Is this extendable for policy gradient or actor-critic architectures? Or would one have to do major re-workings? I'm trying to decide whether to use this framework for a project or implement from scratch. I will be using A2C. Any advice would be appreciated!

Modifying dopamine to accept continuous action spaces

Could you possibly comment on the difficult of getting dopamine to work with continuous action spaces? Is this something that could be done with a bit of effort, or are the agents completely incompatible with this type of action space? I wanted to have a go getting dopamine to work with the OpenAI Gym BipedalWalker environment. I managed to get all of the initialization and setup working, then realized an assumption of discreet action spaces was hard-coded into dopamine.

Has anyone thought and implemented a multithreaded or parallel or distributed version of dopamine?

Has anyone thought and implemented a multithreaded or parallel or distributed version of dopamine? If not, I will start the process of doing so. If so, please let me know. Thanks!

feature request: allow configuration of neural network via gin

Just a feature request. I think this would be much more easily adaptable to non-atari environments if we could configure the neural network via the agent's gin file, or something similar. I think it would also be beneficial to use keras instead of slim, if possible, for those of us who want to easily build more complex networks.

i

ImportError: No module named atari_py

Hi there,

When I try to run "python tests/atari_init_test.py", the following error message appears:
ImportError: No module named atari_py

However, atari_py was installed using pip install atari-py, and when try again installing atari-py, it indicates: requirement already satisfied: atari_py in /usr/local/lib/python2.7/dist-packages

System environment information are:
Ubuntu 16.04;
Python: 2.7/15 | Anaconda, Inc. | default, May 1 2018

Any way to help locate atari_py?

Cheers,

[Question] How to utilize baseline score

Hi, I checked the plot on the page provided by your guys. And I find the scores are quite different from the scores reported in the papers. Take the game Asterix for example. In the Rainbow paper, it reports Rainbow reaches 428,200.3 after 200 iterations. And in the IQN paper, it reports IQN reaches 342,016 after 200 iterations. But in the baselines, the scores are less than 20,000. So I wonder what the difference in the implement between the paper and the baselines, and how can I use the baselines score when I conduct some new researches.

BTW, I run a Rainbow experiment with the rainbow_aaai.gin configuration. And the score is better than the baseline. I haven't finished the experiment. Can this configuration reproduce the scores reported in the papers?

DQN-CTS / DQN-PixelCNN

Having these implementations of baselines and their training curves is a fantastic help -- thanks! Is there any intention of releasing an implementation of DQN-CTS or DQN-PixelCNN at any point?

Two player games

Is it possible to apply this framework for two player zero sum game environments? Any tips for doing so would be appreciated.

how to use the compiled pickle files?

The compiled pickle files are available here
We make use of these compiled pickle files in both agents and the statistics colabs.

I downloaded these files, but i don't know how to use them, can anyone help me.

ERROR: test_atari_init (main.AtariInitTest)

When I run 'python tests/atari_init_test.py', it shows below:

======================================================================
ERROR: test_atari_init (main.AtariInitTest)
Tests that a DQN agent is initialized.

Traceback (most recent call last):
File "tests/atari_init_test.py", line 47, in test_atari_init
train.main([])
File "/Users/kudou/Documents/codes/dopamine/dopamine/atari/train.py", line 128, in main
launch_experiment(create_runner, create_agent)
File "/Users/kudou/Documents/codes/dopamine/dopamine/atari/train.py", line 116, in launch_experiment
run_experiment.load_gin_configs(FLAGS.gin_files, FLAGS.gin_bindings)
File "/Users/kudou/anaconda2/envs/py36/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 488, in getattr
raise _exceptions.UnparsedFlagAccessError(error_message)
absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --gin_files before flags were parsed.

Ran 2 tests in 0.001s

FAILED (errors=1)

[Question] Timeline for generalizing Dopamine and policy for contributions towards this.

Thanks for this great project!

There's been quite a few issues (f.e. #3 #36) regarding customising Dopamine to work on new environments. It also seems like there are quite a few people who have made or are making forks that allow for this.

So I was wondering:

Is there any current work/timeline for generalising Dopamine to new environments/network structures and so on?
What is the policy on accepting contributions towards achieving the above?

Documentation error

python tests/atari_init_test.py

Should be:

python tests/dopamine/atari_init_test.py

Increase number of workers

Is it possible to somehow increase the number of environments that the agents interact with in parallell? I have very low CPU usage and would like to increase the throughput, but I can't find any parameter controlling this.

How do i customize this for my application

Hi guys,

I think this is a basic question but still posting it here.

If i want to use this frame work for my own problem ex- to Solve 2048, where so i start with.

yes I am looking at beginners guide or some thing same.

MemoryError:

When I follow these steps to set up dopamine,everything seems ok until testing the "dopamine/atari/train.py.".The problem is:
MemoryError: In call to configurable 'WrappedReplayBuffer' (<unbound method WrappedReplayBuffer.__init__>) In call to configurable 'DQNAgent' (<unbound method DQNAgent.__init__>) In call to configurable 'Runner' (<unbound method Runner.__init__>)

question: trouble loading 'content'

when loading content using experimental_data = colab_utils.load_baselines('/content')
I got this error You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat.
I'm not sure what's wrong. I ran this on my local computer. I have 'c51, dqn, implicit_quantile, quantile, rainbow' in 'content' folder. Is this pandas related problem?

[Bug] Dummy code in _record_observation

dopamine/dopamine/agents/dqn/dqn_agent.py

Lines 430 to 432 in bc66570

 observation = np.reshape(observation, self.observation_shape) 

 self._observation = observation[..., 0] 

 self._observation = np.reshape(observation, self.observation_shape)

Stale Documentation

Documentation in AtariPreprocessing.step is incorrect. The documentation says it returns is_episode_over: bool, reports whether the episode is actually over, but it does not.

Issue when creating example form colab.

I'm trying to run https://colab.research.google.com/github/google/dopamine/blob/master/dopamine/colab/agents.ipynb
on local machine.

I got this error for both examples

/home/lukas/anaconda3/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
[2018-10-14 13:18:26,861] Making new env: AsterixNoFrameskip-v0
Traceback (most recent call last):
File "/home/lukas/dopamine/dopamine/agents/luska_1/luska_1.py", line 51, in
max_steps_per_episode=100)
File "/home/lukas/anaconda3/lib/python3.6/site-packages/gin/config.py", line 1032, in wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/home/lukas/anaconda3/lib/python3.6/site-packages/gin/utils.py", line 48, in augment_exception_message_and_reraise
six.raise_from(proxy.with_traceback(exception.traceback), None)
File "", line 2, in raise_from
File "/home/lukas/anaconda3/lib/python3.6/site-packages/gin/config.py", line 1009, in wrapper
return fn(*new_args, **new_kwargs)
File "/home/lukas/dopamine/dopamine/atari/run_experiment.py", line 157, in init
self._agent = create_agent_fn(self._sess, self._environment, summary_writer=self._summary_writer)
TypeError: create_random_dqn_agent() got an unexpected keyword argument 'summary_writer'
In call to configurable 'Runner' (<function Runner.init at 0x7f9b5b24b6a8>)

Thanks!

mach-o, but built for simulator (not macOS)

mac os High Sierra 10.13.6
error message:
OSError: dlopen(/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/atari_py/ale_interface/build/libale_c.so, 6): no suitable image found. Did find:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/atari_py/ale_interface/build/libale_c.so: mach-o, but built for simulator (not macOS)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/atari_py/ale_interface/build/libale_c.so: mach-o, but built for simulator (not macOS)

Continuous Problems Support

It seems that this framework only support discrete action spaces. Do you have any plans for supporting continuous action spaces?

Custom environments?

Will this eventually allow supporting custom gym environments? Or is it purpose built for atari baselines?

ocaka bosco

http://ocakabosco.simplesite.com

ImportError: No module named gin.tf

won't work without square observation shape

I'm trying to adapt this to a non-atari environment that has simple, flat 1d observations, however the code, while commented to suggest you can use a tuple OBSERVATION_SHAPE, breaks down when you try to use anything other than a square observation space due to the following lines:

state_shape = [1, OBSERVATION_SHAPE, OBSERVATION_SHAPE, STACK_SIZE] self.state = np.zeros(state_shape)

you get a "TypeError: 'tuple' object cannot be interpreted as an integer" unless OBSERVATION_SHAPE is anything other than an integer. However, the docs for the replay buffer indicate we should be able to use a tuple for observation_shape instead:

"observation_shape: tuple or int. If int, the observation is assumed to be a 2D square."

I should be able to work through this fairly quickly, but you may want to address this so users can more easily use this in experiments that don't involve playing atari games.

[question] Use of Dopamine with Tabular/Structured Data

Hello there,

First and foremost, thank you for opensourcing those agents !
I would like to try to use them in a context where inputs are structured/tabular data.
What do you think would be the best (and/or quickest) approach between :

fork the project in order to allow custom network template and inputs (as convolutional layers are maybe not needed in the first layers)
put all our structured data in a matrix form for the conv layers can process it (that doesn't seem right to me as applying the same kernel between heterogeneous inputs seems weird)

And lastly, is there already something in the project allowing us to process tabular/structured data or do you plan to add it in future releases ?

Thank you in advance for you answers.

Nicolas.

[question] Pausing and Resuming Training Session with Rainbow Agent

Can you pause and resume training sessions using the rainbow agent?

Can i use my own game environnement ?

Hi,

I want to use Dopamine with my own game environnement.
I don't use asterix, pong ou atari environnement.

Dopamine allows to do stuff like that ?

Thanks

[Question] Value of tf.train.RMSPropOptimizer.momentum in dqn_nature.gin

First, thanks for sharing dopamine with all of us. It definitely improves productivity!

In the Minh et al. (2015) Nature paper on DQN, Extended Data Table 1 lists the gradient momentum as 0.95. However, in dqn_nature.gin the value seems to be set to 0.0:

tf.train.RMSPropOptimizer.learning_rate = 0.00025
tf.train.RMSPropOptimizer.decay = 0.95
tf.train.RMSPropOptimizer.momentum = 0.0
tf.train.RMSPropOptimizer.epsilon = 0.00001
tf.train.RMSPropOptimizer.centered = True

I believe this particular configuration file is supposed to offer the same hyperparameter settings as the Nature paper. Or perhaps I'm not interpreting the parameters correctly. Anyway, should it be 0.0 or 0.95?

Thanks much!

--Ted

Can Dopamine support the distributed DDPG algorithm?

As above?

IMPALA Example

Would it too hard to add a working example of the cool IMPALA Trainer?

Also, how do you hook IMPALA to custom Envs and Actors?

Thanks in advance!

Installation problem: No module named dopamine.atari

Hi,

I am facing a problem with installing and testing Dopamine. I am using an Ubuntu 18 OS and followed the exact instruction on the main page for "install via source" without getting any error. However, when I execute the commands on "Running tests" section I get errors

Command:
python tests/atari_init_test.py
Output:
ImportError: No module named dopamine.atari

Command:
python -um dopamine.atari.train
--agent_name=dqn
--base_dir=/tmp/dopamine
--gin_files='dopamine/agents/dqn/configs/dqn.gin'
Output:
Illegal instruction (core dumped)

I appreciate any suggestion on how I can solve this issue.

Why use np.array to create add_count

I notice here the replay buffer initialize add_count as a numpy array. But it used as int.
And in the test script here and other agent test scripts like this initialize add_count as int.
So why use np.array in the first place?

maybe state_shape and eval_mode should be pull up in DQNAgent

I want use DQNAgent to train my network with state shape [1, 14, 14, 5], state_shape is not the parameter of DQNAgent， I need to modify the code in dqn_agent.py before to use it. so state shape as the parameter of DQNAgent maybe better.

ofter I train my network, I cannot change the eval mode of DQNAgent, so eval_mode as the parameter of DQNAgent maybe better.

Python 3

I don't get why you keep using Python 2?

https://pythonclock.org/

Logging and checkpointing is better done on a time interval basis instead of per iteration basis

Checkpointing is done to not repeat time consuming training, it should happen at regular time intervals. Logs can blow up in size if each iteration does not take a lot of time.
I recommend logging every 0.5 secs and checkpointing every 5 mins

Provided checkpoint files not sufficient to restore agent?

For a given game and run (say Qbert/1/) the following files are included:

tf_ckpt-199.data-00000-of-00001
tf_ckpt-199.index
tf_ckpt-199.meta

However, the function dopamine.common.checkpointer.get_latest_checkpoint_number looks for files with sentinel_checkpoint_complete.* to determine the largest checkpoint file to load.

def get_latest_checkpoint_number(base_directory):
  """Returns the version number of the latest completed checkpoint.

  Args:
    base_directory: str, directory in which to look for checkpoint files.

  Returns:
    int, the iteration number of the latest checkpoint, or -1 if none was found.
  """
  glob = os.path.join(base_directory, 'sentinel_checkpoint_complete.*')
  def extract_iteration(x):
    return int(x[x.rfind('.') + 1:])
  try:
    checkpoint_files = tf.gfile.Glob(glob)
  except tf.errors.NotFoundError:
    return -1
  try:
    latest_iteration = max(extract_iteration(x) for x in checkpoint_files)
    return latest_iteration
  except ValueError:
    return -1

The list checkpoint_files is empty.

As a result the check on dopamine.atart.run_experiment.py:204 fails and unbundle, which would read in the provided checkpoint files, never gets called on the agent.

	observation = np.reshape(observation, self.observation_shape)
	self._observation = observation[..., 0]
	self._observation = np.reshape(observation, self.observation_shape)

google / dopamine Goto Github PK

dopamine's Issues

.......

====================================================================== ERROR: test_atari_init (main.AtariInitTest) Tests that a DQN agent is initialized.

Recommend Projects

Recommend Topics

Recommend Org

======================================================================
ERROR: test_atari_init (main.AtariInitTest)
Tests that a DQN agent is initialized.