openai / coinrun Goto Github PK

Code for the paper "Quantifying Transfer in Reinforcement Learning"

Home Page: https://blog.openai.com/quantifying-generalization-in-reinforcement-learning/

License: MIT License

Makefile 1.74% Python 46.96% C++ 51.04% Dockerfile 0.25%

coinrun's Issues

Simulating deterministic resets for a single environment

I have been trying to collect demonstrations from a trained PPO using a fixed coinrun environment (assume that the level_seed is set to the same value whenever done is True) , however, it seems that reset state of the environment depends on the actions executed before it.

Specifically, consider the rep value is set to 3, then the two other resets except 1st require the exact sequence of actions to be repeated from a newly instantiated env to get to the exact reset state. Due to this behavior, a long sequence of actions leading to multiple resets and rewards (whenever we solve the env) can't be split up into multiple demonstrations for that env and has to be used as single stream of experience.

Is there any change that I can make to coinrunenv.py to get around this problem?

How to check Done/Termination condition in the Scalarised version of Coinrun

Basically i need standard Scalar version of the Coinrun env (like a gym env) so i can apply various q-learning algos. (i will be using pytorch not that it matters)

From one of the resolved issue i get my hands on the Scalarise class but the level is not changing for some reason it's saying "CoinRun ignores resets" on the console probably becuase of "env.reset()"

example script (i placed it in train_agent.py):-

def testing():
    episodes = 10
    env = Scalarize(make('standard', num_envs=1))
    for i in range(episodes):
        start_state = env.reset()  
        while True:
            env.render()
            action = np.random.randint(0, env.action_space.n)
            next_state, reward, done, info = env.step(action)
            if done or reward > 0:
                break

Hardware requirements

Hi, I was running the code on my Mac and it's taking me around 2 minutes for a single parameter update which means that it would take around a month for the entire training to happen. What hardware were you training on and how much time did it take for you for the training to happen?

How many distinguished environments do you use in evaluation?

Hi there! Thanks for this interesting environment. I am wondering how many environments you use during evaluation, because I can't found it in the paper as well as the code. It seems the number of environment (num_eval) is set to 20 but I am a little confused.

Could you please clarify how many distinguished environments is used during evaluation? Thanks!

AttributeError: 'CoinRunVecEnv' object has no attribute 'spec'

Hi openai team

I am trying to setup the coinrun.

I have followed the installation instructions in the readme.md and had no problem.

However when i try to run training, e.g.
python -m coinrun.train_agent --run-id myrun --num-levels 500

i get the this error:
AttributeError: 'CoinRunVecEnv' object has no attribute 'spec'

==========================================================
Logging to /tmp/openai-2019-05-02-15-39-56-889062
make: Entering directory '/home/yuanl/workspace/coinrun/coinrun'
make: Nothing to be done for 'all'.
make: Leaving directory '/home/yuanl/workspace/coinrun/coinrun'
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
2019-05-02 15:40:00.281504: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-05-02 15:40:00.293015: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-05-02 15:40:00.295117: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7fffd83b4fb0 executing computations on platform Host. Devices:
2019-05-02 15:40:00.295164: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #213: KMP_AFFINITY: x2APIC ids not unique - decoding legacy APIC ids.
OMP: Info #232: KMP_AFFINITY: legacy APIC ids not unique - parsing /proc/cpuinfo.
OMP: Info #148: KMP_AFFINITY: Affinity capable, using cpuinfo file
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-11
OMP: Info #156: KMP_AFFINITY: 12 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 6 cores/pkg x 2 threads/core (6 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 4 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 4 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 5 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 5 thread 1
OMP: Info #250: KMP_AFFINITY: pid 172 tid 172 thread 0 bound to OS proc set 0
2019-05-02 15:40:00.296139: I tensorflow/core/common_runtime/process_util.cc:71] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Traceback (most recent call last):
File "/home/yuanl/anaconda3/envs/ML/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/yuanl/anaconda3/envs/ML/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/yuanl/workspace/coinrun/coinrun/train_agent.py", line 53, in
main()
File "/home/yuanl/workspace/coinrun/coinrun/train_agent.py", line 34, in main
env = wrappers.add_final_wrappers(env)
File "/home/yuanl/workspace/coinrun/coinrun/wrappers.py", line 99, in add_final_wrappers
env = EpisodeRewardWrapper(env)
File "/home/yuanl/workspace/coinrun/coinrun/wrappers.py", line 30, in init
super(EpisodeRewardWrapper, self).init(env)
File "/home/yuanl/anaconda3/envs/ML/lib/python3.6/site-packages/gym/core.py", line 217, in init
self.spec = self.env.spec
AttributeError: 'CoinRunVecEnv' object has no attribute 'spec'

Library missing

Hi there,

I proceed to install the mentionned necessary, but when I try using CoinRun in interactive mode, I get:

    from mpi4py import MPI
ImportError: dlopen(/anaconda/envs/coin/lib/python3.6/site-packages/mpi4py/MPI.cpython-36m-darwin.so, 2): Library not loaded: /usr/local/opt/gcc/lib/gcc/7/libgfortran.4.dylib
  Referenced from: /usr/local/opt/mpich/lib/libmpi.12.dylib
  Reason: image not found

Thanks

batch norm always has is_training = True

Hello,

I have two questions regarding batch normalization. In the policy, when applying a batchnorm, the is_training parameter is always set as True.
Why is the batch norm in training mode for both act_model and train_model in ppo ? More precisely, why not setting the batchnorm in test mode when collecting data (with the act model) ?

Second, how is the batchnorm layer applied at test time ? Is it still in training mode ?

Thank you in advance !

Build on High Sierra

Hi,

I was trying to build the environment and kept running into this error:
coinrun.cpp:10:10: fatal error: 'QtCore/QMutexLocker' file not found #include <QtCore/QMutexLocker> ^~~~~~~~~~~~~~~~~~~~~ 1 error generated. make: *** [.build-release/coinrun.o] Error 1 coinrun: make failed

The solution (with help from here) is to

brew install pkg-config

and then

export PKG_CONFIG_PATH=/usr/local/opt/qt5/lib:/usr/local/opt/qt5/lib/QtWidgets.framework:/usr/local/opt/qt5/lib/pkgconfig

Afterwards the environment compiles successfully, on my system at least.

Error running on MacOS

Hi,

I get this issue when trying to run on MacOS (High Sierra). Here's the stacktrace:

Traceback (most recent call last):
  File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/Users/ankesh/workspace/coinrun/coinrun/__init__.py", line 1, in <module>
    from .coinrunenv import init_args_and_threads
  File "/Users/ankesh/workspace/coinrun/coinrun/coinrunenv.py", line 58, in <module>
    lib = npct.load_library(lib_path, os.path.dirname(__file__))
  File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/site-packages/numpy/ctypeslib.py", line 150, in load_library
    return ctypes.cdll[libpath]
  File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/ctypes/__init__.py", line 423, in __getitem__
    return getattr(self, name)
  File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/ctypes/__init__.py", line 418, in __getattr__
    dll = self._dlltype(name)
  File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/ankesh/workspace/coinrun/coinrun/.build-debug/coinrun_cpp_d.dylib, 6): Symbol not found: __ZNSt12bad_weak_ptrD1Ev
  Referenced from: /Users/ankesh/workspace/coinrun/coinrun/.build-debug/coinrun_cpp_d.dylib (which was built for Mac OS X 10.13)
  Expected in: /usr/lib/libstdc++.6.0.9.dylib
 in /Users/ankesh/workspace/coinrun/coinrun/.build-debug/coinrun_cpp_d.dylib

I am guessing the issue is with C++ libraries on MacOS, but I have been unable to debug.

Tensorboard

Can someone help me with how to access log files in tensorboard for this project, I am very confused where it is saving

How to reset the environment?

Hi Team,
I am trying to read the code of this project, but when I read the ppo2.py, I have a little doubt about class Runner. I can't distinguish when will environments reset in the function run(). And if one of the environments in parallel will done, what about others will be?
thanks

Add license for the code itself

Could you add an explicit license for the code? For example MIT or Apache? like https://github.com/openai/baselines/blob/master/LICENSE

Not working on Windows 10

Hello everybody,

running python -m coinrun.interactive on Windows outputs this error:

'QT_SELECT' is not recognized as an internal or external command,
operable program or batch file.
coinrun: make failed

This is how I installed Qt:

Download qt-opensource-windows-x86-5.13.1
https://www.qt.io/offline-installers
Install Qt

Restart

Also, I installed GnuWin32 using the complete package setup
http://gnuwin32.sourceforge.net/packages/make.htm

Did anybody got CoinRun working on Windows yet?

trouble with fetching the logs

I am trying to recreate the figure 6 in the paper which show the effect of all the known techniques on performance for coinrun. Need help with the following if possible.

starting new training just clear the directory and delete all previous logs!!
the train with --test seems not to produce enough information on tensorboard. rew_mean_<run_id> is the mean reward for testing on random levels I suppose. training mean reward results are missing?

resuming the training starts the episodes and total timestamps from 0 shouldn't it start from the previous numbers. (i can see the model is loaded and getting improved). its ends up with wrong graphs in the tensorboard.
what changes come into mind if I try to apply DQN here?

Thanks in advance

High variance in mean_score during test

Thanks for this code.

I found the variance in mean_score is quite high when I run the test code for multiple times (using the same trained model) with the same set of parameters (num_eval=20 and rep=5), e.g. I got mean_score=3.8, 5.2 & 4.6 for three runs, and sometimes mean_score>6.0 for the same model. Is this normal?

In addition, what values for num_eval and rep would you suggest in order to obtain a fair result for comparison between methods?

How to get the Level Solved information when train and test in coinrun env?

I do not see the "Levels Solved" variable that can be get in train_agent and test_agent files, which is the key information indicating the generalization ability, can you show me how to get this when train and test? Thank you very much.

Does not compile on the new Apple Silicon - M1

Hi,
I am currently working on research in Reinforcement Learning and I have tried to benchmark my research work with other research paper on generalisation, and using some well known environments.
I have tried several time to install / compile coinrun, but I was never successful, now I am blocked on installing baselines (required for coinrun) also from openai, which refuse to compile.
Are you planning to release an M1 compatible version soon ?

Thanks a lot in advance.

Train or Test on specific levels

Thanks for sharing the codes. I'd like to ask if it is possible to train or test an agent on specific levels.

Thanks a lot!

Pypi, version and qt

Hello, Nice project!

Not an issue, but you should

Add a package version in the package; e.g. so you can do coinrun.__version__.
Upload to pypi to make sure you claim the package name.

Why did this depend on Qt? For such a simple game you could have gone for something more lightweight and with a better licence no?

From the readme saying "Status: archive", it almost seems like this project was DOA and dropped on Github? What's the story?

env can not set seed to produce the same env or bug?

is it a bug?
i can not get the same behavior of environment on different computer when i set the same seed

batch_norm

int ppo2.py have tf.get_collection(tf.GraphKeys.UPDATE_OPS)
but not use like:

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss)

method step and value in CnnPolicy, why not set batch_norm(is_training=False)

Why is calling `init_args_and_threads` multiple times from the same process not recommended?

Hi,

I need to initialize the coinrun library multiple times in the file (separately for train / test), and I was able to do so without any visible issues. I did notice though that it's mentioned in the function doc of init_args_and_threads to not do this.

coinrun/coinrun/coinrunenv.py

Lines 95 to 97 in 601de52

  Perform one-time global init for the CoinRun library. This must be called 

  before creating an instance of CoinRunVecEnv. You should not 

  call this multiple times from the same process.

Is there a reason behind this? (Trying to understand why that is the recommendation to avoid any unknown issues)

/cc @christopherhesse @kcobbe

Variable envsperbatch is assigned twice

envsperbatch = nenvs // nminibatches
 ...
envsperbatch = nbatch_train // insteps

How To Use

I have an agent that I'd like to test on this environment by running train and test sessions to determine its generalizability. I was hoping there would be a Gym-like interface that would let me do this, but I am confused by the interface that is available. Could a simple example be provided for how I could achieve this with my agent? I'm very familiar with running standard Gym environments, so that is my main frame of reference for understanding things like this. I apologize if the documentation is clear about how to do this and it just went over my head. Thanks so much for releasing this great resource!

Environment Make Error 1

Hi Team,
Am getting this challenge when running the environment any advice?

(base) romtein@romtein-Predator-G3-571:~/coinrun$ python -m coinrun.train_agent --run-id myrun --save-interval 1
Logging to /tmp/openai-2019-05-03-10-51-13-054392
make: Entering directory '/home/romtein/coinrun/coinrun'
gcc -shared -o .build-release/coinrun_cpp.so .build-release/coinrun.o -L/usr/lib64 -lm -lGL -lGLU -lstdc++ pkg-config --libs Qt5Widgets
/usr/bin/ld: cannot find -lGL
collect2: error: ld returned 1 exit status
Makefile:60: recipe for target '.build-release/coinrun_cpp.so' failed
make: *** [.build-release/coinrun_cpp.so] Error 1
make: Leaving directory '/home/romtein/coinrun/coinrun'
coinrun: make failed

Thank you for your help

Test level generation

Hello!
I am trying to modify the environment where I can pass in a string similar to char* test and have it build a level. However, I can't seem to get the environment to render char* test as a level.
I've modified state_reset to call generate_test_level after initial_floor_and_walls, but it seems to result in the agent suspended in a wall with no ability to move. Furthermore, editing char* test doesn't seem to have an effect.
Do you have any suggestions for generating a specific layout based on a char*?

Thank you!

Selecting a specific level

How can I select a specific level when defining the environment (for coinrun)?

Errors when running the game

Whenever I try to run the game using: python -m coinrun.interactive I get the following error:

Logging to /tmp/openai-2018-12-15-15-58-13-612227
make: Entering directory '/home/shani/coinrun/coinrun'
mkdir -p .generated
mkdir -p .build-release
mkdir -p .build-debug
moc -o .generated/coinrun.moc coinrun.cpp
gcc -std=c++11 -Wall -Wno-unused-variable -Wno-unused-function -Wno-deprecated-register -fPIC -g -O3 -march=native -I/usr/include `pkg-config --cflags Qt5Widgets` -c coinrun.cpp  -o.build-release/coinrun.o -MMD -MF .build-release/coinrun.o.dep
In file included from coinrun.cpp:2234:0:
.generated/coinrun.moc:21:1: error: ‘QT_WARNING_DISABLE_DEPRECATED’ does not name a type
 QT_WARNING_DISABLE_DEPRECATED
 ^
.generated/coinrun.moc:31:14: error: ‘qt_meta_stringdata_TestWindow_t’ does not name a type
 static const qt_meta_stringdata_TestWindow_t qt_meta_stringdata_TestWindow = {
              ^
.generated/coinrun.moc:65:35: error: ‘qt_meta_stringdata_TestWindow’ was not declared in this scope
     { &QWidget::staticMetaObject, qt_meta_stringdata_TestWindow.data,
                                   ^
.generated/coinrun.moc: In member function ‘virtual void* TestWindow::qt_metacast(const char*)’:
.generated/coinrun.moc:78:26: error: ‘qt_meta_stringdata_TestWindow’ was not declared in this scope
     if (!strcmp(_clname, qt_meta_stringdata_TestWindow.stringdata0))
                          ^
At global scope:
cc1plus: warning: unrecognized command line option ‘-Wno-deprecated-register’
Makefile:66: recipe for target '.build-release/coinrun.o' failed
make: *** [.build-release/coinrun.o] Error 1
make: Leaving directory '/home/shani/coinrun/coinrun'
coinrun: make failed

Python=3.6
Ubuntu=16.04
QMake version 3.1
Using Qt version 5.9.7 in /home/shani/anaconda2/envs/coinrun/lib

I created a new environment from scratch and installed all the dependencies according to your instructions.

Results

I am trying to recreate the performance figure in the paper, How do i get the number of levels solved per time stamp as mentioned in the graph?
Can you please tell me this for both the cases of training and testing, and also how do i get the average rewards per time stamp.?

Easy Way to Control the Procedural Generation

Is there any easy way to control the parts of the Procedural Generation? I.e force it to generate environments that have (or don't have) certain properties?

QT build errors

Hey,

I am having some issues...
I created a new conda env, with py3.6. I installed deps as advised,
apt-get install qtbase5-dev mpich, pip install -r requirements.txt.

On linux = 16.04. gcc = 5.4.

Running python -m coinrun.interactive returns the following

Logging to /tmp/openai-2018-12-09-12-30-34-701961
make: Entering directory '/home/act65/repos/coinrun/coinrun'
gcc -std=c++11 -Wall -Wno-unused-variable -Wno-unused-function -Wno-deprecated-register -fPIC -g -O3 -march=native -I/usr/include `pkg-config --cflags Qt5Widgets` -c coinrun.cpp  -o.build-release/coinrun.o -MMD -MF .build-release/coinrun.o.dep
In file included from coinrun.cpp:2234:0:
.generated/coinrun.moc:21:1: error: ‘QT_WARNING_DISABLE_DEPRECATED’ does not name a type
 QT_WARNING_DISABLE_DEPRECATED
 ^
.generated/coinrun.moc:31:14: error: ‘qt_meta_stringdata_TestWindow_t’ does not name a type
 static const qt_meta_stringdata_TestWindow_t qt_meta_stringdata_TestWindow = {
              ^
.generated/coinrun.moc:65:35: error: ‘qt_meta_stringdata_TestWindow’ was not declared in this scope
     { &QWidget::staticMetaObject, qt_meta_stringdata_TestWindow.data,
                                   ^
.generated/coinrun.moc: In member function ‘virtual void* TestWindow::qt_metacast(const char*)’:
.generated/coinrun.moc:78:26: error: ‘qt_meta_stringdata_TestWindow’ was not declared in this scope
     if (!strcmp(_clname, qt_meta_stringdata_TestWindow.stringdata0))
                          ^
At global scope:
cc1plus: warning: unrecognized command line option ‘-Wno-deprecated-register’
Makefile:66: recipe for target '.build-release/coinrun.o' failed
make: *** [.build-release/coinrun.o] Error 1
make: Leaving directory '/home/act65/repos/coinrun/coinrun'
coinrun: make failed

Not sure how to debug this. Any hints are welcome

Please and thank you
Alex

why the next_state never changes?

Please help me understand why the previous state is always equal to the next state ?
if thats the case how will any NN will work on state.

import numpy as np
from q_learning.utils import Scalarize
from coinrun import make,setup_utils

def testing():
    setup_utils.setup_and_load()
    episodes = 10
    env = Scalarize(make('standard', num_envs=1))
    for i in range(episodes):
        previous_state = env.reset()
        while True:
            env.render()
            action = np.random.randint(0, env.action_space.n)
            next_state, reward, done, info = env.step(action)
            print("current state is equal to previous state : ", np.array_equal(next_state, previous_state))

            previous_state = next_state
            if done or reward > 0:
                break
def main():
    testing()


if __name__ == '__main__':
    main()

Output:

....
current state is equal to previous state :  True
current state is equal to previous state :  True
current state is equal to previous state :  True
current state is equal to previous state :  True
current state is equal to previous state :  True
current state is equal to previous state :  True
current state is equal to previous state :  True
...

	Perform one-time global init for the CoinRun library. This must be called
	before creating an instance of CoinRunVecEnv. You should not
	call this multiple times from the same process.

openai / coinrun Goto Github PK

coinrun's Issues

Recommend Projects

Recommend Topics

Recommend Org