openai / coinrun Goto Github PK
View Code? Open in Web Editor NEWCode for the paper "Quantifying Transfer in Reinforcement Learning"
Home Page: https://blog.openai.com/quantifying-generalization-in-reinforcement-learning/
License: MIT License
Code for the paper "Quantifying Transfer in Reinforcement Learning"
Home Page: https://blog.openai.com/quantifying-generalization-in-reinforcement-learning/
License: MIT License
I have been trying to collect demonstrations from a trained PPO
using a fixed coinrun environment (assume that the level_seed
is set to the same value whenever done
is True) , however, it seems that reset state of the environment depends on the actions executed before it.
Specifically, consider the rep
value is set to 3, then the two other resets
except 1st require the exact sequence of actions to be repeated from a newly instantiated env
to get to the exact reset state
. Due to this behavior, a long sequence of actions leading to multiple resets and rewards (whenever we solve the env
) can't be split up into multiple demonstrations for that env
and has to be used as single stream of experience.
Is there any change that I can make to coinrunenv.py
to get around this problem?
Basically i need standard Scalar version of the Coinrun env (like a gym env) so i can apply various q-learning algos. (i will be using pytorch not that it matters)
From one of the resolved issue i get my hands on the Scalarise class but the level is not changing for some reason it's saying "CoinRun ignores resets" on the console probably becuase of "env.reset()"
example script (i placed it in train_agent.py):-
def testing():
episodes = 10
env = Scalarize(make('standard', num_envs=1))
for i in range(episodes):
start_state = env.reset()
while True:
env.render()
action = np.random.randint(0, env.action_space.n)
next_state, reward, done, info = env.step(action)
if done or reward > 0:
break
Hi, I was running the code on my Mac and it's taking me around 2 minutes for a single parameter update which means that it would take around a month for the entire training to happen. What hardware were you training on and how much time did it take for you for the training to happen?
Hi there! Thanks for this interesting environment. I am wondering how many environments you use during evaluation, because I can't found it in the paper as well as the code. It seems the number of environment (num_eval) is set to 20 but I am a little confused.
Could you please clarify how many distinguished environments is used during evaluation? Thanks!
Hi openai team
I am trying to setup the coinrun.
I have followed the installation instructions in the readme.md and had no problem.
However when i try to run training, e.g.
python -m coinrun.train_agent --run-id myrun --num-levels 500
i get the this error:
AttributeError: 'CoinRunVecEnv' object has no attribute 'spec'
==========================================================
Logging to /tmp/openai-2019-05-02-15-39-56-889062
make: Entering directory '/home/yuanl/workspace/coinrun/coinrun'
make: Nothing to be done for 'all'.
make: Leaving directory '/home/yuanl/workspace/coinrun/coinrun'
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
2019-05-02 15:40:00.281504: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-05-02 15:40:00.293015: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-05-02 15:40:00.295117: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7fffd83b4fb0 executing computations on platform Host. Devices:
2019-05-02 15:40:00.295164: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #213: KMP_AFFINITY: x2APIC ids not unique - decoding legacy APIC ids.
OMP: Info #232: KMP_AFFINITY: legacy APIC ids not unique - parsing /proc/cpuinfo.
OMP: Info #148: KMP_AFFINITY: Affinity capable, using cpuinfo file
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-11
OMP: Info #156: KMP_AFFINITY: 12 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 6 cores/pkg x 2 threads/core (6 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 4 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 4 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 5 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 5 thread 1
OMP: Info #250: KMP_AFFINITY: pid 172 tid 172 thread 0 bound to OS proc set 0
2019-05-02 15:40:00.296139: I tensorflow/core/common_runtime/process_util.cc:71] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Traceback (most recent call last):
File "/home/yuanl/anaconda3/envs/ML/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/yuanl/anaconda3/envs/ML/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/yuanl/workspace/coinrun/coinrun/train_agent.py", line 53, in
main()
File "/home/yuanl/workspace/coinrun/coinrun/train_agent.py", line 34, in main
env = wrappers.add_final_wrappers(env)
File "/home/yuanl/workspace/coinrun/coinrun/wrappers.py", line 99, in add_final_wrappers
env = EpisodeRewardWrapper(env)
File "/home/yuanl/workspace/coinrun/coinrun/wrappers.py", line 30, in init
super(EpisodeRewardWrapper, self).init(env)
File "/home/yuanl/anaconda3/envs/ML/lib/python3.6/site-packages/gym/core.py", line 217, in init
self.spec = self.env.spec
AttributeError: 'CoinRunVecEnv' object has no attribute 'spec'
Hi there,
I proceed to install the mentionned necessary, but when I try using CoinRun in interactive mode, I get:
from mpi4py import MPI
ImportError: dlopen(/anaconda/envs/coin/lib/python3.6/site-packages/mpi4py/MPI.cpython-36m-darwin.so, 2): Library not loaded: /usr/local/opt/gcc/lib/gcc/7/libgfortran.4.dylib
Referenced from: /usr/local/opt/mpich/lib/libmpi.12.dylib
Reason: image not found
Thanks
Hello,
I have two questions regarding batch normalization. In the policy, when applying a batchnorm, the is_training parameter is always set as True.
Why is the batch norm in training mode for both act_model and train_model in ppo ? More precisely, why not setting the batchnorm in test mode when collecting data (with the act model) ?
Second, how is the batchnorm layer applied at test time ? Is it still in training mode ?
Thank you in advance !
Hi,
I was trying to build the environment and kept running into this error:
coinrun.cpp:10:10: fatal error: 'QtCore/QMutexLocker' file not found #include <QtCore/QMutexLocker> ^~~~~~~~~~~~~~~~~~~~~ 1 error generated. make: *** [.build-release/coinrun.o] Error 1 coinrun: make failed
The solution (with help from here) is to
brew install pkg-config
and then
export PKG_CONFIG_PATH=/usr/local/opt/qt5/lib:/usr/local/opt/qt5/lib/QtWidgets.framework:/usr/local/opt/qt5/lib/pkgconfig
Afterwards the environment compiles successfully, on my system at least.
Hi,
I get this issue when trying to run on MacOS (High Sierra). Here's the stacktrace:
Traceback (most recent call last):
File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/runpy.py", line 109, in _get_module_details
__import__(pkg_name)
File "/Users/ankesh/workspace/coinrun/coinrun/__init__.py", line 1, in <module>
from .coinrunenv import init_args_and_threads
File "/Users/ankesh/workspace/coinrun/coinrun/coinrunenv.py", line 58, in <module>
lib = npct.load_library(lib_path, os.path.dirname(__file__))
File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/site-packages/numpy/ctypeslib.py", line 150, in load_library
return ctypes.cdll[libpath]
File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/ctypes/__init__.py", line 423, in __getitem__
return getattr(self, name)
File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/ctypes/__init__.py", line 418, in __getattr__
dll = self._dlltype(name)
File "/Users/ankesh/miniconda3/envs/coinrun/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/ankesh/workspace/coinrun/coinrun/.build-debug/coinrun_cpp_d.dylib, 6): Symbol not found: __ZNSt12bad_weak_ptrD1Ev
Referenced from: /Users/ankesh/workspace/coinrun/coinrun/.build-debug/coinrun_cpp_d.dylib (which was built for Mac OS X 10.13)
Expected in: /usr/lib/libstdc++.6.0.9.dylib
in /Users/ankesh/workspace/coinrun/coinrun/.build-debug/coinrun_cpp_d.dylib
I am guessing the issue is with C++ libraries on MacOS, but I have been unable to debug.
Can someone help me with how to access log files in tensorboard for this project, I am very confused where it is saving
Hi Team,
I am trying to read the code of this project, but when I read the ppo2.py, I have a little doubt about class Runner. I can't distinguish when will environments reset in the function run(). And if one of the environments in parallel will done, what about others will be?
thanks
Could you add an explicit license for the code? For example MIT or Apache? like https://github.com/openai/baselines/blob/master/LICENSE
Hello everybody,
running python -m coinrun.interactive
on Windows outputs this error:
'QT_SELECT' is not recognized as an internal or external command,
operable program or batch file.
coinrun: make failed
This is how I installed Qt:
Download qt-opensource-windows-x86-5.13.1
https://www.qt.io/offline-installers
Install Qt
Also, I installed GnuWin32 using the complete package setup
http://gnuwin32.sourceforge.net/packages/make.htm
Did anybody got CoinRun working on Windows yet?
I am trying to recreate the figure 6 in the paper which show the effect of all the known techniques on performance for coinrun. Need help with the following if possible.
starting new training just clear the directory and delete all previous logs!!
the train with --test seems not to produce enough information on tensorboard. rew_mean_<run_id> is the mean reward for testing on random levels I suppose. training mean reward results are missing?
resuming the training starts the episodes and total timestamps from 0 shouldn't it start from the previous numbers. (i can see the model is loaded and getting improved). its ends up with wrong graphs in the tensorboard.
what changes come into mind if I try to apply DQN here?
Thanks in advance
Thanks for this code.
I found the variance in mean_score
is quite high when I run the test code for multiple times (using the same trained model) with the same set of parameters (num_eval=20
and rep=5
), e.g. I got mean_score=3.8, 5.2 & 4.6
for three runs, and sometimes mean_score>6.0
for the same model. Is this normal?
In addition, what values for num_eval
and rep
would you suggest in order to obtain a fair result for comparison between methods?
I do not see the "Levels Solved" variable that can be get in train_agent and test_agent files, which is the key information indicating the generalization ability, can you show me how to get this when train and test? Thank you very much.
Hi,
I am currently working on research in Reinforcement Learning and I have tried to benchmark my research work with other research paper on generalisation, and using some well known environments.
I have tried several time to install / compile coinrun, but I was never successful, now I am blocked on installing baselines (required for coinrun) also from openai, which refuse to compile.
Are you planning to release an M1 compatible version soon ?
Thanks a lot in advance.
Thanks for sharing the codes. I'd like to ask if it is possible to train or test an agent on specific levels.
Thanks a lot!
Hello, Nice project!
Not an issue, but you should
coinrun.__version__
.Why did this depend on Qt? For such a simple game you could have gone for something more lightweight and with a better licence no?
From the readme saying "Status: archive", it almost seems like this project was DOA and dropped on Github? What's the story?
is it a bug?
i can not get the same behavior of environment on different computer when i set the same seed
int ppo2.py have tf.get_collection(tf.GraphKeys.UPDATE_OPS)
but not use like:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
method step and value in CnnPolicy, why not set batch_norm(is_training=False)
Hi,
I need to initialize the coinrun
library multiple times in the file (separately for train / test), and I was able to do so without any visible issues. I did notice though that it's mentioned in the function doc of init_args_and_threads
to not do this.
Lines 95 to 97 in 601de52
envsperbatch = nenvs // nminibatches
...
envsperbatch = nbatch_train // insteps
I have an agent that I'd like to test on this environment by running train and test sessions to determine its generalizability. I was hoping there would be a Gym-like interface that would let me do this, but I am confused by the interface that is available. Could a simple example be provided for how I could achieve this with my agent? I'm very familiar with running standard Gym environments, so that is my main frame of reference for understanding things like this. I apologize if the documentation is clear about how to do this and it just went over my head. Thanks so much for releasing this great resource!
Hi Team,
Am getting this challenge when running the environment any advice?
(base) romtein@romtein-Predator-G3-571:~/coinrun$ python -m coinrun.train_agent --run-id myrun --save-interval 1
Logging to /tmp/openai-2019-05-03-10-51-13-054392
make: Entering directory '/home/romtein/coinrun/coinrun'
gcc -shared -o .build-release/coinrun_cpp.so .build-release/coinrun.o -L/usr/lib64 -lm -lGL -lGLU -lstdc++ pkg-config --libs Qt5Widgets
/usr/bin/ld: cannot find -lGL
collect2: error: ld returned 1 exit status
Makefile:60: recipe for target '.build-release/coinrun_cpp.so' failed
make: *** [.build-release/coinrun_cpp.so] Error 1
make: Leaving directory '/home/romtein/coinrun/coinrun'
coinrun: make failed
Thank you for your help
Hello!
I am trying to modify the environment where I can pass in a string similar to char* test and have it build a level. However, I can't seem to get the environment to render char* test as a level.
I've modified state_reset to call generate_test_level after initial_floor_and_walls, but it seems to result in the agent suspended in a wall with no ability to move. Furthermore, editing char* test doesn't seem to have an effect.
Do you have any suggestions for generating a specific layout based on a char*?
Thank you!
How can I select a specific level when defining the environment (for coinrun)?
Whenever I try to run the game using: python -m coinrun.interactive
I get the following error:
Logging to /tmp/openai-2018-12-15-15-58-13-612227
make: Entering directory '/home/shani/coinrun/coinrun'
mkdir -p .generated
mkdir -p .build-release
mkdir -p .build-debug
moc -o .generated/coinrun.moc coinrun.cpp
gcc -std=c++11 -Wall -Wno-unused-variable -Wno-unused-function -Wno-deprecated-register -fPIC -g -O3 -march=native -I/usr/include `pkg-config --cflags Qt5Widgets` -c coinrun.cpp -o.build-release/coinrun.o -MMD -MF .build-release/coinrun.o.dep
In file included from coinrun.cpp:2234:0:
.generated/coinrun.moc:21:1: error: ‘QT_WARNING_DISABLE_DEPRECATED’ does not name a type
QT_WARNING_DISABLE_DEPRECATED
^
.generated/coinrun.moc:31:14: error: ‘qt_meta_stringdata_TestWindow_t’ does not name a type
static const qt_meta_stringdata_TestWindow_t qt_meta_stringdata_TestWindow = {
^
.generated/coinrun.moc:65:35: error: ‘qt_meta_stringdata_TestWindow’ was not declared in this scope
{ &QWidget::staticMetaObject, qt_meta_stringdata_TestWindow.data,
^
.generated/coinrun.moc: In member function ‘virtual void* TestWindow::qt_metacast(const char*)’:
.generated/coinrun.moc:78:26: error: ‘qt_meta_stringdata_TestWindow’ was not declared in this scope
if (!strcmp(_clname, qt_meta_stringdata_TestWindow.stringdata0))
^
At global scope:
cc1plus: warning: unrecognized command line option ‘-Wno-deprecated-register’
Makefile:66: recipe for target '.build-release/coinrun.o' failed
make: *** [.build-release/coinrun.o] Error 1
make: Leaving directory '/home/shani/coinrun/coinrun'
coinrun: make failed
Python=3.6
Ubuntu=16.04
QMake version 3.1
Using Qt version 5.9.7 in /home/shani/anaconda2/envs/coinrun/lib
I created a new environment from scratch and installed all the dependencies according to your instructions.
I am trying to recreate the performance figure in the paper, How do i get the number of levels solved per time stamp as mentioned in the graph?
Can you please tell me this for both the cases of training and testing, and also how do i get the average rewards per time stamp.?
Is there any easy way to control the parts of the Procedural Generation? I.e force it to generate environments that have (or don't have) certain properties?
Hey,
I am having some issues...
I created a new conda env, with py3.6. I installed deps as advised,
apt-get install qtbase5-dev mpich
, pip install -r requirements.txt
.
On linux = 16.04. gcc = 5.4.
Running python -m coinrun.interactive
returns the following
Logging to /tmp/openai-2018-12-09-12-30-34-701961
make: Entering directory '/home/act65/repos/coinrun/coinrun'
gcc -std=c++11 -Wall -Wno-unused-variable -Wno-unused-function -Wno-deprecated-register -fPIC -g -O3 -march=native -I/usr/include `pkg-config --cflags Qt5Widgets` -c coinrun.cpp -o.build-release/coinrun.o -MMD -MF .build-release/coinrun.o.dep
In file included from coinrun.cpp:2234:0:
.generated/coinrun.moc:21:1: error: ‘QT_WARNING_DISABLE_DEPRECATED’ does not name a type
QT_WARNING_DISABLE_DEPRECATED
^
.generated/coinrun.moc:31:14: error: ‘qt_meta_stringdata_TestWindow_t’ does not name a type
static const qt_meta_stringdata_TestWindow_t qt_meta_stringdata_TestWindow = {
^
.generated/coinrun.moc:65:35: error: ‘qt_meta_stringdata_TestWindow’ was not declared in this scope
{ &QWidget::staticMetaObject, qt_meta_stringdata_TestWindow.data,
^
.generated/coinrun.moc: In member function ‘virtual void* TestWindow::qt_metacast(const char*)’:
.generated/coinrun.moc:78:26: error: ‘qt_meta_stringdata_TestWindow’ was not declared in this scope
if (!strcmp(_clname, qt_meta_stringdata_TestWindow.stringdata0))
^
At global scope:
cc1plus: warning: unrecognized command line option ‘-Wno-deprecated-register’
Makefile:66: recipe for target '.build-release/coinrun.o' failed
make: *** [.build-release/coinrun.o] Error 1
make: Leaving directory '/home/act65/repos/coinrun/coinrun'
coinrun: make failed
Not sure how to debug this. Any hints are welcome
Please and thank you
Alex
Please help me understand why the previous state is always equal to the next state ?
if thats the case how will any NN will work on state.
import numpy as np
from q_learning.utils import Scalarize
from coinrun import make,setup_utils
def testing():
setup_utils.setup_and_load()
episodes = 10
env = Scalarize(make('standard', num_envs=1))
for i in range(episodes):
previous_state = env.reset()
while True:
env.render()
action = np.random.randint(0, env.action_space.n)
next_state, reward, done, info = env.step(action)
print("current state is equal to previous state : ", np.array_equal(next_state, previous_state))
previous_state = next_state
if done or reward > 0:
break
def main():
testing()
if __name__ == '__main__':
main()
Output:
....
current state is equal to previous state : True
current state is equal to previous state : True
current state is equal to previous state : True
current state is equal to previous state : True
current state is equal to previous state : True
current state is equal to previous state : True
current state is equal to previous state : True
...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.