openai / baselines Goto Github PK

View Code? Open in Web Editor NEW

15.4K 15.4K 4.8K 6.46 MB

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

License: MIT License

Python 51.51% Dockerfile 0.04% HTML 48.45%

baselines's Introduction

Status: Maintenance (expect bug fixes and minor updates)

Baselines

OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.

These algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones.

Prerequisites

Baselines requires python3 (>=3.5) with the development headers. You'll also need system packages CMake, OpenMPI and zlib. Those can be installed as follows

Ubuntu

sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev

Mac OS X

Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the following:

brew install cmake openmpi

Virtual environment

From the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via

pip install virtualenv

Virtualenvs are essentially folders that have copies of python executable and all python packages. To create a virtualenv called venv with python3, one runs

virtualenv /path/to/venv --python=python3

To activate a virtualenv:

. /path/to/venv/bin/activate

More thorough tutorial on virtualenvs and options can be found here

Tensorflow versions

The master branch supports Tensorflow from version 1.4 to 1.14. For Tensorflow 2.0 support, please use tf2 branch.

Installation

Clone the repo and cd into it:

git clone https://github.com/openai/baselines.git
cd baselines

If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases, you may use
```
pip install tensorflow-gpu==1.14 # if you have a CUDA-compatible gpu and proper drivers
```
or
```
pip install tensorflow==1.14
```
to install Tensorflow 1.14, which is the latest version of Tensorflow supported by the master branch. Refer to TensorFlow installation guide for more details.
Install baselines package
```
pip install -e .
```

MuJoCo

Some of the baselines examples use MuJoCo (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from www.mujoco.org). Instructions on setting up MuJoCo can be found here

Testing the installation

All unit tests in baselines can be run using pytest runner:

pip install pytest
pytest

Training models

Most of the algorithms in baselines repo are used as follows:

python -m baselines.run --alg=<name of the algorithm> --env=<environment_id> [additional arguments]

Example 1. PPO with MuJoCo Humanoid

For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7

Note that for mujoco environments fully-connected network is default, so we can omit --network=mlp The hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance:

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy

will set entropy coefficient to 0.1, and construct fully connected network with 3 layers with 32 hidden units in each, and create a separate network for value function estimation (so that its parameters are not shared with the policy network, but the structure is the same)

See docstrings in common/models.py for description of network parameters for each type of model, and docstring for baselines/ppo2/ppo2.py/learn() for the description of the ppo2 hyperparameters.

Example 2. DQN on Atari

DQN with Atari is at this point a classics of benchmarks. To run the baselines implementation of DQN on Atari Pong:

python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6

Saving, loading and visualizing models

Saving and loading the model

The algorithms serialization API is not properly unified yet; however, there is a simple method to save / restore trained models. --save_path and --load_path command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively. Let's imagine you'd like to train ppo2 on Atari Pong, save the model and then later visualize what has it learnt.

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2

This should get to the mean reward per episode about 20. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize:

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --load_path=~/models/pong_20M_ppo2 --play

NOTE: Mujoco environments require normalization to work properly, so we wrap them with VecNormalize wrapper. Currently, to ensure the models are saved with normalization (so that trained models can be restored and run without further training) the normalization coefficients are saved as tensorflow variables. This can decrease the performance somewhat, so if you require high-throughput steps with Mujoco and do not need saving/restoring the models, it may make sense to use numpy normalization instead. To do that, set 'use_tf=False` in baselines/run.py.

Logging and vizualizing learning curves and other training metrics

By default, all summary data, including progress, standard output, is saved to a unique directory in a temp folder, specified by a call to Python's tempfile.gettempdir(). The directory can be changed with the --log_path command-line option.

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2 --log_path=~/logs/Pong/

NOTE: Please be aware that the logger will overwrite files of the same name in an existing directory, thus it's recommended that folder names be given a unique timestamp to prevent overwritten logs.

Another way the temp directory can be changed is through the use of the $OPENAI_LOGDIR environment variable.

For examples on how to load and display the training data, see here.

Subpackages

A2C
ACER
ACKTR
DDPG
DQN
GAIL
HER
PPO1 (obsolete version, left here temporarily)
PPO2
TRPO

Benchmarks

Results of benchmarks on Mujoco (1M timesteps) and Atari (10M timesteps) are available here for Mujoco and here for Atari respectively. Note that these results may be not on the latest version of the code, particular commit hash with which results were obtained is specified on the benchmarks page.

To cite this repository in publications:

@misc{baselines,
  author = {Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai and Zhokhov, Peter},
  title = {OpenAI Baselines},
  year = {2017},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/openai/baselines}},
}

baselines's People

Contributors

Stargazers

Watchers

Forkers

yenchenlin nagyist kracwarlock omoindrot albertbuchard phvu stites ml-lab jdc08161063 tigerneil oppa3109 little1tow adolfoeliazat xaveng shmuma ajay-wong bityangke chingyaoc nkcr7 filwaline karansaxena rockt collawolley 0xsuu permanz pushpen skogs riashat soroushmehr nerdoid yigitunallar ameyc adit-chandra vangao yifenzhong1920 wangboyunze x-hexy xhuvom matthewwilfred alokranjan1234 mediaeater prakritidev kastureranjit blackcat30stm chagge pcmoritz ligua arnocandel jiecui lulzzz amano-ginji hhy5277 shuidong jayjinseokkim mylearning2017 19ai deepx-top empia artmario youngdev tjacobs jmassapina taurus3g smasoudn charlontank tdavchev zgsxwsdxg oztc dwqy11 tiagosgc codeaudit zzz622848 snowfeet alexxnica kryndex labbros syzer ngc92 orirmi wmlabs niumeng07 linkpassion walkerrsmith renfeier larenzhang aaronzhangl alakia zach-nervana nottombrown zhudejun1985 vbmgk gwding 0ad4ai puneethapai linzichuan kongmo b0xtch cpehle hedgefair adrianp-

baselines's Issues

PPO OOM

When I ran with CPU it worked fine, but after install tensorflow-gpu, I got the error below. Perhaps need to share sessions across MPI processes? When I set num_cpu to 1, it worked fine.

2017-07-25 21:11:16.630413: E tensorflow/core/common_runtime/direct_session.cc:138] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimary
CtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 11711807488
Traceback (most recent call last):
  File "run_atari.py", line 54, in <module>
    main()
  File "run_atari.py", line 51, in main
    train('PongNoFrameskip-v4', num_timesteps=40e6, seed=0, num_cpu=8)
  File "run_atari.py", line 23, in train
    sess = U.single_threaded_session()
  File "/home/ben/Documents/baselines/baselines/common/tf_util.py", line 233, in single_threaded_session
    return make_session(1)
  File "/home/ben/Documents/baselines/baselines/common/tf_util.py", line 228, in make_session
    return tf.Session(config=tf_config)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/client/session
.py", line 1292, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 562, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/framework/erro
rs_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))

Blas GEMM launch failed

After upgrade to the TensorFlow 1.1 an example python -m baselines.deepq.experiments.train_cartpole stopped working for me. How it can be fixed?

2017-06-01 17:37:06.830729: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
[2017-06-01 17:37:07,224] VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
WARNING:tensorflow:VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
[2017-06-01 17:37:07,262] VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02.
2017-06-01 17:37:08.309557: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2017-06-01 17:37:08.309714: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1550] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1039, in _do_call
    return fn(*args)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _run_fn
    status, run_metadata)
  File "C:\Users\Viktor\Anaconda3\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 4), b.shape=(4, 64), m=1, n=64, k=4
         [[Node: deepq/q_func/fully_connected/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_deepq/observation_0/_11, deepq/q_func/fully_connected/weights/read)]]
         [[Node: deepq/cond/Merge/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_42_deepq/cond/Merge", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 31, in <module>
    main()
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 24, in main
    callback=callback
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\simple.py", line 216, in learn
    action = act(np.array(obs)[None], update_eps=exploration.value(t))[0]
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\common\tf_util.py", line 402, in <lambda>
    return lambda *args, **kwargs: f(*args, **kwargs)[0]
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\common\tf_util.py", line 445, in __call__
    results = get_session().run(self.outputs_update, feed_dict=feed_dict)[:-1]
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
    run_metadata_ptr)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 982, in _run
    feed_dict_string, options, run_metadata)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1032, in _do_run
    target_list, options, run_metadata)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1052, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 4), b.shape=(4, 64), m=1, n=64, k=4
         [[Node: deepq/q_func/fully_connected/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_deepq/observation_0/_11, deepq/q_func/fully_connected/weights/read)]]
         [[Node: deepq/cond/Merge/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_42_deepq/cond/Merge", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'deepq/q_func/fully_connected/MatMul', defined at:
  File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\Viktor\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 31, in <module>
    main()
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\experiments\train_cartpole.py", line 24, in main
    callback=callback
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\simple.py", line 178, in learn
    grad_norm_clipping=10
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\build_graph.py", line 178, in build_train
    act_f = build_act(make_obs_ph, q_func, num_actions, scope=scope, reuse=reuse)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\build_graph.py", line 111, in build_act
    q_values = q_func(observations_ph.get(), num_actions, scope="q_func")
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\models.py", line 27, in <lambda>
    return lambda *args, **kwargs: _mlp(hiddens, *args, **kwargs)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\baselines\deepq\models.py", line 9, in _mlp
    out = layers.fully_connected(out, num_outputs=hidden, activation_fn=tf.nn.relu)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 181, in func_with_args
    return func(*args, **current_args)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 1433, in fully_connected
    outputs = layer.apply(inputs)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 320, in apply
    return self.__call__(inputs, **kwargs)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 290, in __call__
    outputs = self.call(inputs, **kwargs)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\layers\core.py", line 144, in call
    outputs = standard_ops.matmul(inputs, self.kernel)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1801, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 1263, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "C:\Users\Viktor\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(1, 4), b.shape=(4, 64), m=1, n=64, k=4
         [[Node: deepq/q_func/fully_connected/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_deepq/observation_0/_11, deepq/q_func/fully_connected/weights/read)]]
         [[Node: deepq/cond/Merge/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_42_deepq/cond/Merge", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

How to set the randomness of the act during enjoy?

In the file enjoy_cartpole.py, the action is provided by
obs, rew, done, _ = env.step(act(obs[None])[0])

When the outputs of act(obs[None])[0] were printed out, it seemed that it did not print out the same action and changed somewhat randomly even though the input sequence was the same.

How can it be set to work as the simple greedy action?
How can the rate of randomness can be controlled?

cheers,

MaxAndSkipEnv is calculating the max over the last two time steps only

The MaxAndSkipEnv as it is right now is skipping over the skip observations and calculating the total reward properly over all the time steps, but the max over the observations is only calculated over the last 2 observations regardless of skip size. Is this intentional?

One could move the max_frame line into the loop and then it could use the deque of size 2 to keep track of the max over all the skip time step.

Benchmarking for PPO and TRPO

Thanks to the OpenAI team for the latest release!

Are there any benchmark results (like Atari score) on PPO and TRPO? DQN has a report here: https://github.com/openai/baselines-results. It's super useful. Thanks again!

Atari wrappers deprecated?

Quick question:

Why are the atari wrappers deprecated? Do you plan to add the non-deprecated version of wrappers soon?

Load act and continue learning

I am trying to save the Q network and reload it and continue improving it.

This is how I save act every few episodes:

ActWrapper(act, act_params).save("myfile.pkl")

However, when I load it, I get an error saying that some variables are exist. This is how I load a saved act:

act, train, updated_target, debug = deepq.builld_train(....)
act = ActWrapper.load("myfile.pkl")

Any idea would be appreciated.

Save and load of the trained TRPO and PPO agents

Hi,

How save, load and visualization of the trained agents with TRPO or PPO algorithms can be done?

Integration with Google Cloud ML Engine

We are using the deepq.mlp class to implement reinforcement learning and would like to host it on Google Cloud ML engine which requires the model to be exported into SavedModel format. My understanding of it is at a beginner level but I believe that requires us to pass the tf Session and input and output tensors to SavedModel builder.

I am not sure exactly how to get those from the deepq.mlp class or if there is maybe a much better way to do all this. Any help would be apreciated!

Baseline network does not learn Pong

If I run train_pong.py, I get a final score of -20.1, which is close to a random control and far away from the results of the original publication (21.0). Do I have to tweak parameters?

Running into issues on example execution

Get this error when I run the first example python3 -m baselines.deepq.experiments.train_cartpole:

/usr/bin/python3: Error while finding spec for 'baselines.deepq.experiments.train_cartpole' (<class 'ImportError'>: cannot import name 'deepq')

I have both Python 2 and 3 installed. Thus I installed baselines with pip3.
Any suggestions?

Crash on import

I'm trying to run some example code in pybullet, see bulletphysics/bullet3#1234 (comment), that is using baselines but I'm getting an error on import, and they mentioned this is most likely an upstream issue.

I'm on python 2.7.13 on OS X. Perhaps this is a problem with baselines?

athundt at Andrews-2013-MacBook-Pro-2 in ~/src/bullet3/examples/pybullet/gym on master!
± python train_pybullet_racecar.py
pybullet build time: Jul 17 2017 18:59:54
Couldn't import dot_parser, loading of dot files will not be possible.
Traceback (most recent call last):
  File "train_pybullet_racecar.py", line 4, in <module>
    from baselines import deepq
  File "/usr/local/lib/python2.7/site-packages/baselines/deepq/__init__.py", line 2, in <module>
    from baselines.deepq.build_graph import build_act, build_train  # noqa
  File "/usr/local/lib/python2.7/site-packages/baselines/deepq/build_graph.py", line 71, in <module>
    import baselines.common.tf_util as U
  File "/usr/local/lib/python2.7/site-packages/baselines/common/tf_util.py", line 3, in <module>
    import builtins
ImportError: No module named builtins

train_kuka_grasping.py

± python train_kuka_grasping.py
pybullet build time: Jul 17 2017 18:59:54
Couldn't import dot_parser, loading of dot files will not be possible.
Traceback (most recent call last):
  File "train_kuka_grasping.py", line 4, in <module>
    from baselines import deepq
  File "/usr/local/lib/python2.7/site-packages/baselines/deepq/__init__.py", line 2, in <module>
    from baselines.deepq.build_graph import build_act, build_train  # noqa
  File "/usr/local/lib/python2.7/site-packages/baselines/deepq/build_graph.py", line 71, in <module>
    import baselines.common.tf_util as U
  File "/usr/local/lib/python2.7/site-packages/baselines/common/tf_util.py", line 3, in <module>
    import builtins
ImportError: No module named builtins

tf version:

± python -c 'import tensorflow as tf; print(tf.__version__)'
1.2.0

I installed by running pip install baselines, and a full list of installed packages is at bulletphysics/bullet3#1234 (comment)

Poor PPO 1 CPU performance on the Pong task

Hi,

I've started the default Pong training with run_atari.py on my laptop. The only change to the start parameters was a num_cpu =1. After more than a 2 days training the reward was still around -20.4. It started from -20.6 after a day of training temporary improved to the -20.2 and then dropped again to -20.4 without any changes for a quite long time. On the same laptop it took near the same time to train baselines vanilla DQN agent to the maximal reward 20+.

Is it an expected result for a single CPU PPO training?

no BreakoutNoFrameskip-v3 env

I test python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling

Came across the following error:

[2017-05-25 09:54:25,435] Making new env: BreakoutNoFrameskip-v3

.....
DeprecatedEnv: Env BreakoutNoFrameskip-v3 not found (valid versions include ['BreakoutNoFrameskip-v0', 'BreakoutNoFrameskip-v4'])

It seems error of version of our dependencies. Easy to make it runnable: gym<=0.8.2 and atari-py<=0.0.21

Finally I degrade gym and atari-py to successfully run the enjoy for v3.

Really enjoy~

Multi thread to run on Mujoco?

Hi, I found that the current implementation of pposgd/run_mujoco.py using only single thread. Is it possible to modify it into multi-thread like this? Not sure if it will arouse bugs 😢

LSTM Model

Are there any thoughts how an LSTM model could be used with Baselines? I have some time series data and would love to use an RNN of sorts. I might be able to work on this project but would appreciate a point in the right direction to properly integrate the "LSTM STATE" and data series.

It seems like there are 2 complexities:

LSTM STATE needs to be captured in replace for random playback (might not need playback with LSTM)
Time horizon data for BP over time needs to be handled as well

cannot import name 'deepq'

On a fresh Debian GNU/Linux 3.16.0-4-amd64 install I tried:

wget https://bootstrap.pypa.io/get-pip.py
sudo python3.4 get-pip.py
sudo pip install baselines
python3.4 -m baselines.deepq.experiments.train_cartpole

Error: "/usr/bin/python3.4: Error while finding spec for 'baselines.deepq.experiments.train_cartpole' (<class 'ImportError'>: cannot import name 'deepq')"

I couldn't find any mention of deepq in the feedback from the installation.
baselines_install_log.txt

Please expose size, number and activation function parameters for DDPG actor and critic

Great work with adding DDPG with parameter space noise, thank!

Can you also expose additional command line parameters in main.py like sizes of the layers, their numbers and activation functions for DDPG actor and critic. Currently they can only be set in models.py

Is a plan to replicate other RL algorithms like A3C, DPG, etc ?

Requirements not clearly stated

The document doesn't state upfront that the code requires Python 3 to run; I only realized this when I got an error about no module named builtins.

In addition, the requires.txt file doesn't state gym as a requirement. I realise most people who install this will probably have that module anyway, but in cases where they don't, the earliest they'll realize something is wrong is when they try to execute the python -m baselines.deepq.experiments.train_cartpole example and fail.

recording activations

May I kindly ask for some help / hint regarding the following question / problem:
https://stackoverflow.com/questions/44813861/record-activations-of-openai-baselines-implementation

Fail to load pretrained checkpoints

Hi,
I try to load a part of variables from download models. But it turns out Key Not Found Error, even though the variables names are same. The only difference is the ':0' in tails, but I think it does not matter since this is added automatically by tensorflow op.

Here is what I read from checkpoints and the Key not found error log.

Pop-Art implementation ?

Looks like we've done a clip wrapper for reward, which might be not very good:

class ClippedRewardsWrapper(gym.RewardWrapper):
    def _reward(self, reward):
        """Change all the positive rewards to 1, negative to -1 and keep zero."""
        return np.sign(reward)

I found this article has done a DDQN without clip operation:
https://arxiv.org/pdf/1602.07714.pdf
Do we have any plan to implement DDQN based on this article?

Unable to download all pretrained models

Are all models listed in "python -m baselines.deepq.experiments.atari.download_model" available for download? I am unable to download anything without dueling nets.

Baseline code for policy gradient methods?

This repo is awesome! It saves me a lot of time implementing DQN myself. It's a real lifesaver. Many thanks to OpenAI! 👍

When do you plan to release Baseline code for policy gradient methods, like TRPO, A3C, and ACER? It's been almost 2 months since the DQN release. I look forward to the next announcement!

Program Fails in def log.

Hi, I am using baselines by installing with pip, and run python -m baselines.deepq.experiments.train_cartpole, but I encountered with:

Traceback (most recent call last):
  File "/Users/swacg/anaconda2/lib/python2.7/runpy.py", line 163, in _run_module_as_main
    mod_name, _Error)
  File "/Users/swacg/anaconda2/lib/python2.7/runpy.py", line 102, in _get_module_details
    loader = get_loader(mod_name)
  File "/Users/swacg/anaconda2/lib/python2.7/pkgutil.py", line 464, in get_loader
    return find_loader(fullname)
  File "/Users/swacg/anaconda2/lib/python2.7/pkgutil.py", line 474, in find_loader
    for importer in iter_importers(fullname):
  File "/Users/swacg/anaconda2/lib/python2.7/pkgutil.py", line 430, in iter_importers
    __import__(pkg)
  File "baselines/deepq/__init__.py", line 4, in <module>
    from baselines.deepq.simple import learn, load  # noqa
  File "baselines/deepq/simple.py", line 10, in <module>
    from baselines import logger
  File "baselines/logger.py", line 139
    def log(*args, level=INFO):
                       ^
SyntaxError: invalid syntax

How can I solve that?

Confusion between `done` and `info` in `env.step`, and the correct way we need to detect for when episodes complete.

I ran into an interesting problem today and, while I understand the solution, I'd like to explain it here and inquire about how OpenAI gym and OpenAI baselines are going to handle this going forward. I'm running gym version 0.9.2 and AtariPy 0.0.20, which is outdated but that's the version where the models were pre-trained for baselines here.

According to the docs, the env.step condition returns a done parameter which tells us that:

done (boolean): whether it's time to reset the environment again. Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)

Note the emphasis on "you lost your last life". This is true, for instance when I run Breakout:

import gym 
import numpy as np

env = gym.make('Breakout-v0')
obs = env.reset()
done = False
steps = 0 

while not done:
    obs, rew, done, info = env.step(np.random.randint(env.action_space.n))
    steps += 1
    if done:
        print("done == True")
        print("info: {}".format(info))
print("steps: {}".format(steps))

The outcome is:

[2017-06-29 10:26:00,612] Making new env: Breakout-v0
done == True
info: {'ale.lives': 0}
steps: 271

However, the baselines code wraps several monitors around the environment, which results in different semantics of the method. To test, I downloaded the pre-trained Breakout-1 model for Prioritized, Dueling DQN. Then I ran the following command:

python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1/ --env Breakout --dueling

This runs the enjoy script. The only things I changed from the current master branch (version 0778e9f) are some print statements and removing the render since I was running ssh. You can see the git diff here:

git diff
diff --git a/baselines/deepq/experiments/atari/enjoy.py b/baselines/deepq/experiments/atari/enjoy.py
index fe482ca..ec5e78e 100644
--- a/baselines/deepq/experiments/atari/enjoy.py
+++ b/baselines/deepq/experiments/atari/enjoy.py
@@ -42,11 +42,13 @@ def play(env, act, stochastic, video_path):
         env, video_path, enabled=video_path is not None)
     obs = env.reset()
     while True:
-        env.unwrapped.render()
+        #env.unwrapped.render()
         video_recorder.capture_frame()
         action = act(np.array(obs)[None], stochastic=stochastic)[0]
         obs, rew, done, info = env.step(action)
         if done:
+            print("done == True")
+            print("info: {}".format(info))
             obs = env.reset()
         if len(info["rewards"]) > num_episodes:
             if len(info["rewards"]) == 1 and video_recorder.enabled:
@@ -56,6 +58,7 @@ def play(env, act, stochastic, video_path):
                 video_recorder.enabled = False
             print(info["rewards"][-1])
             num_episodes = len(info["rewards"])
+            print("we must have finished an episode here now\n")

I ran this, but then I saw this output:

[2017-06-29 10:22:35,014] Making new env: BreakoutNoFrameskip-v4
done == True
info: {'rewards': [], 'steps': 6845, 'ale.lives': 4}
done == True
info: {'rewards': [], 'steps': 9703, 'ale.lives': 3}
done == True
info: {'rewards': [], 'steps': 10228, 'ale.lives': 2}
done == True
info: {'rewards': [], 'steps': 16350, 'ale.lives': 1}
done == True
info: {'rewards': [], 'steps': 22194, 'ale.lives': 0}
846.0
we must have finished an episode here now

done == True
info: {'rewards': [846.0], 'steps': 24488, 'ale.lives': 4}
done == True
info: {'rewards': [846.0], 'steps': 33442, 'ale.lives': 3}
done == True
info: {'rewards': [846.0], 'steps': 35160, 'ale.lives': 2}
done == True
info: {'rewards': [846.0], 'steps': 37665, 'ale.lives': 1}
done == True
info: {'rewards': [846.0], 'steps': 38732, 'ale.lives': 0}
438.0
we must have finished an episode here now

I terminated the run after this, but what happens now is that the done semantics have changed and break from the docs. Instead, to detect when an episode finishes, I have to detect when the "rewards" list has increased in size, or when ale.lives is zero. This doesn't seem as elegant as the previous way of just detecting a single done==True condition.

In conclusion:

Detect when an episode finishes with the "rewards" list from info, NOT the done condition, despite what the documentation says.
More generally, if the documentation is going to break from default gym, then I think it should clarified somewhere.
In addition, is there any other set of formal documentation other than the website I linked to earlier, which hasn't changed (as far as I can tell) in about a year and a half.

Fails to import a module, from itself

Traceback (most recent call last):
File "pole_train.py", line 3, in
from baselines import deepq
File "/usr/local/lib/python3.4/dist-packages/baselines/deepq/init.py", line 4, in
from baselines.deepq.simple import learn, load # noqa
File "/usr/local/lib/python3.4/dist-packages/baselines/deepq/simple.py", line 12, in
from baselines import deepq
ImportError: cannot import name 'deepq'

Could you explain how to execute PPO and TRPO?

There is readme explaining all the process to execute deepqn algorithm.

However, there is no such thing for PPO and TRPO....

Could you please explain how to execute PPO and TRPO?

ImportError: cannot import name 'weakref'

any comment?

➜  baselines git:(master) python baselines/pposgd/run_atari.py
Traceback (most recent call last):
  File "baselines/pposgd/run_atari.py", line 54, in <module>
    main()
  File "baselines/pposgd/run_atari.py", line 51, in main
    train('PongNoFrameskip-v4', num_timesteps=40e6, seed=0, num_cpu=8)
  File "baselines/pposgd/run_atari.py", line 18, in train
    from baselines.pposgd import pposgd_simple, cnn_policy
  File "/Users/Tiger/projects/baselines/baselines/pposgd/pposgd_simple.py", line 3, in <module>
    import baselines.common.tf_util as U
  File "/Users/Tiger/projects/baselines/baselines/common/tf_util.py", line 2, in <module>
    import tensorflow as tf  # pylint: ignore-module
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 63, in <module>
    from tensorflow.python.framework.framework_lib import *
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/framework_lib.py", line 100, in <module>
    from tensorflow.python.framework.subscribe import subscribe
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/subscribe.py", line 26, in <module>
    from tensorflow.python.ops import variables
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 26, in <module>
    from tensorflow.python.ops import control_flow_ops
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 70, in <module>
    from tensorflow.python.ops import tensor_array_ops
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/tensor_array_ops.py", line 33, in <module>
    from tensorflow.python.util import tf_should_use
  File "/Users/Tiger/anaconda/lib/python3.5/site-packages/tensorflow/python/util/tf_should_use.py", line 28, in <module>
    from backports import weakref  # pylint: disable=g-bad-import-order
ImportError: cannot import name 'weakref'

what's the detail means of the available models name?

Greetings all! I have run "python -m baselines.deepq.experiments.atari.download_model". It listed some available models's name. But I'm in a puzzle about the detail means of model's name. For example, what the differences between "model-atari-alien-1","model-atari-alien-2",and "model-atari-alien-3", are they trained by dqn or double dqn? "model-atari-duel-alien-1" was trained with dueling double dqn or dueling dqn? what the detail about "model-atari-rb100000-test-seaquest-1", and the meaning of rb100000? What's more,how can I know the detail params were used to trained these models? Thanks!

Last command line fails

Running the visualise command line fails with version incompatibility:
raise error.DeprecatedEnv('Env {} not found (valid versions include {})'.format(id, matching_envs)) gym.error.DeprecatedEnv: Env BreakoutNoFrameskip-v3 not found (valid versions include ['BreakoutNoFrameskip-v4', 'BreakoutNoFrameskip-v0'])

Commandline used:
python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling

Support for continuous action spaces

Would love to see DQN for continuous action spaces implemented (https://arxiv.org/pdf/1509.02971.pdf)

Integration with Rllab

One thing that seems a bit redundant is the fact that there is openai/rllab and now openai/baselines implementing RL algorithms. It seems like it may be a worthwhile endeavor to merge the two in some way rather than have two parallel repositories that are supposed to have baseline RL implementations. Are there any plans to do so or any thoughts on this from the openai team?

Thanks.

sum-tree unit test

According to the doc, find_prefixsum_idx method should return the highest index i in the array such that sum(arr[0] ... arr[i-1]) <= prefixsum.

If this is true, shouldn't the test return 4 instead of 3?
https://github.com/openai/baselines/blob/master/baselines/common/tests/test_segment_tree.py#L44

and here, the test should also return 4 instead of 3
https://github.com/openai/baselines/blob/master/baselines/common/tests/test_segment_tree.py#L60

When i == 4, sum(arr[0]+arr[1]+arr[2]+arr[3] <= 4.0 holds.

Low GPU usage

When I tried to run train.py in the atari folder, I found the ETA reached 16 days after a few minutes and the usage of GPU was quite low.

Pretrained Breakout model error

I'm able to run most other pretrained models except Breakout. Pong and BeamRider have no problem. Breakout has tensorflow mismatch error when loading the model parameters. The error happens to all the Breakout models: vanilla, prior, duel, and prior-duel.

My command for vanilla breakout-1 model:
python -m baselines.deepq.experiments.atari.enjoy --model-dir ~/Temp/models/model-atari-breakout-1 --env Breakout

Error message:

Caused by op 'save/Assign_3', defined at:
  File "/Users/miria/anaconda/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/miria/anaconda/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/miria/baselines/baselines/deepq/experiments/atari/enjoy.py", line 69, in <module>
    U.load_state(os.path.join(args.model_dir, "saved"))
  File "/Users/miria/baselines/baselines/common/tf_util.py", line 272, in load_state
    saver = tf.train.Saver()
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1139, in __init__
    self.build()
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1170, in build
    restore_sequentially=self._restore_sequentially)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 271, in assign
    validate_shape=validate_shape)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
    use_locking=use_locking, name=name)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/miria/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [6]
	 [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@deepq/q_func/action_value/fully_connected_1/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](deepq/q_func/action_value/fully_connected_1/biases, save/RestoreV2_3)]]

failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED

I got the following error while training the cart pole example:

failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
screen.txt

There is no issue with my cuda driver as I have verified the installation by running some for the cuda sample codes

I guess the issue can be resolved by making the following changes in the code:
https://stackoverflow.com/questions/41117740/tensorflow-crashes-with-cublas-status-alloc-failed

A little bug in line 111 in replay_buffer.py??

mass = random.random() * self._it_sum.sum(0, len(self._storage) - 1)

seems should be:

mass = random.random() * self._it_sum.sum(0, len(self._storage) ) ??

Small possible divergence with original DQN paper

Hi,

I noticed that there might be a slight difference between this implementation of the network and the original one by DeepMind. Maybe this is a known fact, but I didn't see it mentionned anywhere, and as this implementation seems to try to be as close as possible to the original one, I thought it'd be worth it to point it out.

It boils down to the fact that this implementation uses the default padding from TensorFlow, which is 'VALID', whereas DeepMind didn't document any padding on their convolutional layers.
If we refer to the Torch implementation they released, we can conclude that they used the default padding of Torch (which is 0 in the SpatialConvolution module), except for the first layer, where they used padding=1

After the convolutions, the image sizes are quite different: 7x7 for (py)Torch and 11x11 for TensorFlow (40% difference). As such, the input size of the first linear layer diverge (3136 vs 7744)

I'm not sure that makes a huge difference (be it positive or negative) in the outcome, but experience has proved that devil's in the details when it comes to deep architectures.

What do you guys think ?

a

sorry, accidentally opened an issue

Gym and ALE

Hi, the environments here use Atari of Gym. Are they totally same as Atari of ALE?

Not able to run pre-trained model

Both python2 and python3 were not working:
yhu@yhu-Aspire-M3920:$ python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 163, in _run_module_as_main
mod_name, _Error)
File "/usr/lib/python2.7/runpy.py", line 102, in _get_module_details
loader = get_loader(mod_name)
File "/usr/lib/python2.7/pkgutil.py", line 464, in get_loader
return find_loader(fullname)
File "/usr/lib/python2.7/pkgutil.py", line 474, in find_loader
for importer in iter_importers(fullname):
File "/usr/lib/python2.7/pkgutil.py", line 430, in iter_importers
import(pkg)
File "/home/yhu/.local/lib/python2.7/site-packages/baselines/deepq/init.py", line 4, in
from baselines.deepq.simple import learn, load # noqa
File "/home/yhu/.local/lib/python2.7/site-packages/baselines/deepq/simple.py", line 10, in
from baselines import logger
File "/home/yhu/.local/lib/python2.7/site-packages/baselines/logger.py", line 139
def log(*args, level=INFO):
^
SyntaxError: invalid syntax
yhu@yhu-Aspire-M3920:$ python3 -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/yhu/.local/lib/python3.5/site-packages/baselines/deepq/experiments/atari/enjoy.py", line 15, in
from baselines.common.atari_wrappers_deprecated import wrap_dqn
File "/home/yhu/.local/lib/python3.5/site-packages/baselines/common/atari_wrappers_deprecated.py", line 1, in
import cv2
ImportError: No module named 'cv2'
yhu@yhu-Aspire-M3920:~$

Unable to download all pretrained models

When I try downloading any model with the dueling architecture, it downloads fine.
However, when I try downloading a model that does not use dueling, the download does not start and gets stuck as N/A.
The command I use is:
python -m baselines.deepq.experiments.atari.download_model --blob model-atari-prior-breakout-1 --model-dir /tmp/models
I have tried it on a couple of computers, and I get the same issue every time.

warnings in cartpole example

The train_cartpole example generates the following warnings:
VARIABLES collection name is deprecated, please use GLOBAL_VARIABLES instead; VARIABLES will be removed after 2017-03-02 ~/.local/lib/python3.5/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice. out=out, **kwargs) ~/.local/lib/python3.5/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)

Error when restoring model to run enjoy.py

Hi,

I was running these two commands:

python -m baselines.deepq.experiments.atari.download_model --blob model-atari-prior-duel-breakout-1 --model-dir /tmp/models
python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-prior-duel-breakout-1 --env Breakout --dueling

in the bottom of README.

However, I got the following error:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [4] rhs shape= [6]
         [[Node: save/Assign_3 = Assign[T=DT_FLOAT, _class=["loc:@deepq/q_func/action_value/fully_connected_1/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](deepq/q_func/action_value/fully_connected_1/biases, save/RestoreV2_3/_1)]]

Roadmap of DDPG, TRPO and Q-prop updates

Hi,

Quality of your DQN implementations is impressive. Looking forward for continuous control algorithms. Do you have, at least very approximate schedule when implementations of the DDPG, TRPO and Q-prop algorithms will be added?

Best regards,
Viktor

Example fails

Following example fails with error:

$ python -m baselines.deepq.experiments.train_cartpole

Error:

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 151, in _run_module_as_main
    mod_name, loader, code, fname = _get_module_details(mod_name)
  File "/usr/lib/python2.7/runpy.py", line 101, in _get_module_details
    loader = get_loader(mod_name)
  File "/usr/lib/python2.7/pkgutil.py", line 464, in get_loader
    return find_loader(fullname)
  File "/usr/lib/python2.7/pkgutil.py", line 474, in find_loader
    for importer in iter_importers(fullname):
  File "/usr/lib/python2.7/pkgutil.py", line 430, in iter_importers
    __import__(pkg)
  File "build/bdist.linux-x86_64/egg/baselines/deepq/__init__.py", line 4, in <module>
  File "build/bdist.linux-x86_64/egg/baselines/deepq/simple.py", line 10, in <module>
  File "/usr/local/lib/python2.7/dist-packages/baselines-0.1.0-py2.7.egg/baselines/logger.py", line 139
    def log(*args, level=INFO):
                       ^
SyntaxError: invalid syntax

Pip install fails on OS-X

On macOS Sierra (10.12.5) attempting to run pip install baselines results in the following error message:

  Using cached baselines-0.1.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/nq/0xqx24zn7dn5qmzxkl4dyp9r0000gn/T/pip-build-nBjPkA/baselines/setup.py", line 8, in <module>
        with open(os.path.join(repo_dir, "README.md")) as f:
    IOError: [Errno 2] No such file or directory: '/private/var/folders/nq/0xqx24zn7dn5qmzxkl4dyp9r0000gn/T/pip-build-nBjPkA/baselines/README.md'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/nq/0xqx24zn7dn5qmzxkl4dyp9r0000gn/T/pip-build-nBjPkA/baselines/```

Can not execute PPO and TRPO

Can PPO or TRPO execute?
I tried my best but failed.
Thank you~