Giter VIP home page Giter VIP logo

chainerrl's Introduction

ChainerRL and PFRL

Build Status Coverage Status Documentation Status PyPI

ChainerRL (this repository) is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Chainer, a flexible deep learning framework. PFRL is the PyTorch analog of ChainerRL.

Breakout Humanoid Grasping Atlas

Installation

ChainerRL is tested with 3.6. For other requirements, see requirements.txt.

ChainerRL can be installed via PyPI:

pip install chainerrl

It can also be installed from the source code:

python setup.py install

Refer to Installation for more information on installation.

Getting started

You can try ChainerRL Quickstart Guide first, or check the examples ready for Atari 2600 and Open AI Gym.

For more information, you can refer to ChainerRL's documentation.

Algorithms

Algorithm Discrete Action Continous Action Recurrent Model Batch Training CPU Async Training
DQN (including DoubleDQN etc.) ✓ (NAF) x
Categorical DQN x x
Rainbow x x
IQN x x
DDPG x x
A3C ✓ (A2C)
ACER x
NSQ (N-step Q-learning) ✓ (NAF) x
PCL (Path Consistency Learning) x
PPO x
TRPO x
TD3 x x x
SAC x x x

Following algorithms have been implemented in ChainerRL:

Following useful techniques have been also implemented in ChainerRL:

Visualization

ChainerRL has a set of accompanying visualization tools in order to aid developers' ability to understand and debug their RL agents. With this visualization tool, the behavior of ChainerRL agents can be easily inspected from a browser UI.

Environments

Environments that support the subset of OpenAI Gym's interface (reset and step methods) can be used.

Contributing

Any kind of contribution to ChainerRL would be highly appreciated! If you are interested in contributing to ChainerRL, please read CONTRIBUTING.md.

License

MIT License.

Citations

To cite ChainerRL in publications, please cite our JMLR paper:

@article{JMLR:v22:20-376,
  author  = {Yasuhiro Fujita and Prabhat Nagarajan and Toshiki Kataoka and Takahiro Ishikawa},
  title   = {ChainerRL: A Deep Reinforcement Learning Library},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {77},
  pages   = {1-14},
  url     = {http://jmlr.org/papers/v22/20-376.html}
}

chainerrl's People

Contributors

corochann avatar delta2323 avatar imos avatar iory avatar katrinleinweber avatar keisuke-nakata avatar kiyukuta avatar knorth55 avatar kuni-kuni avatar ljvmiranda921 avatar lyx-x avatar marioyc avatar mitmul avatar mmilk1231 avatar monado3 avatar mr4msm avatar muupan avatar okuta avatar prabhatnagarajan avatar seann999 avatar toslunar avatar uidilr avatar ummavi avatar xinyuewang1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chainerrl's Issues

env.monitor has been deprecated as of 12/23/2016

gym.error.Error: env.monitor has been deprecated as of 12/23/2016. Remove your call to env.monitor.start(directory) and instead wrap your env with env = gym.wrappers.Monitor(env, directory) to record data.

average_loss always 0 when using episodic_replay=True (DQN)

Trying this two different q_functions:

(non recurrent)

class QFunction(chainer.Chain, StateQFunction):

        def __init__(self, n_input_channels=3, n_actions = 4, bias=0.1):
            self.n_actions = n_actions
            self.n_input_channels = n_input_channels
            conv_layers = chainer.ChainList(
                L.Convolution2D(n_input_channels, 32, 8, stride=4, bias=bias),
                L.Convolution2D(32, 64, 4, stride=2, bias=bias),
                L.Convolution2D(64, 64, 3, stride=1, bias=bias),
                L.Convolution2D(64, 128, 7, stride=1, bias=bias)
                )

            lin_layer = L.Linear(128, 128)                     

            a_stream = MLP(128,n_actions,[2])
            v_stream = MLP(128,1,[2])

            super().__init__(conv_layers=conv_layers, lin_layer=lin_layer, a_stream=a_stream,v_stream=v_stream)

        def __call__(self, x, test=False):
            """
            Args:
                x (ndarray or chainer.Variable): An observation
                test (bool): a flag indicating whether it is in test mode
            """
            h = x
            for l in self.conv_layers:
                h = F.relu(l(h))
            h = self.lin_layer(h)

            batch_size = x.shape[0]
            ya = self.a_stream(h, test=test)
            mean = F.reshape(F.sum(ya,axis=1) / self.n_actions, (batch_size,1))
            ya, mean = F.broadcast(ya,mean)
            ya -= mean

            ys = self.v_stream(h,test=test)
            
            ya,ys = F.broadcast(ya,ys)
            q = ya+ys
            return chainerrl.action_value.DiscreteActionValue(q)


(recurrent)

class QFunctionRecurrent(chainer.Chain, StateQFunction):

        def __init__(self, n_input_channels=3, n_actions = 4, bias=0.1):
            self.n_actions = n_actions
            self.n_input_channels = n_input_channels
            conv_layers = chainer.ChainList(
                L.Convolution2D(n_input_channels, 32, 8, stride=4, bias=bias),
                L.Convolution2D(32, 64, 4, stride=2, bias=bias),
                L.Convolution2D(64, 64, 3, stride=1, bias=bias),
                L.Convolution2D(64, 128, 7, stride=1, bias=bias)
                )

            lstm_layer = L.LSTM(128, 128)                     

            a_stream = MLP(128,n_actions,[2])
            v_stream = MLP(128,1,[2])

            super().__init__(conv_layers=conv_layers, lstm_layer=lstm_layer, a_stream=a_stream,v_stream=v_stream)

        def __call__(self, x, test=False):
            """
            Args:
                x (ndarray or chainer.Variable): An observation
                test (bool): a flag indicating whether it is in test mode
            """
            h = x
            for l in self.conv_layers:
                h = F.relu(l(h))
            h = self.lstm_layer(h)

            batch_size = x.shape[0]
            ya = self.a_stream(h, test=test)
            mean = F.reshape(F.sum(ya,axis=1) / self.n_actions, (batch_size,1))
            ya, mean = F.broadcast(ya,mean)
            ya -= mean

            ys = self.v_stream(h,test=test)
            
            ya,ys = F.broadcast(ya,ys)
            q = ya+ys
            return chainerrl.action_value.DiscreteActionValue(q)

I found that for the non-recurrent version the loss is not zero and the agent will eventually master the gym environment provided.

However, changing nothing else than adding an lstm layer and setting episodic_replay to True the average_loss will become 0 all the time and the agents is not able to learn to better interact with its environent.

First, I thought that this was due to some kind of rounding issues so I set the minibatch_size=1, episodic_update_len = 1 (assuming that one episodic replay will now only containg one time step) but still no changes.

I wonder if this is some kind of bug or (which I think is more likely) an error on my side.

Any help is very much appreciated!

Extend gym.Wrapper instead of env_modifiers

Since gym has introduced its own interface to modify envs using gym.Wrapper, I think it is better to use it in ChainerRL instead of directly modifying methods as in env_modifiers.

PyTorch as an additional backend

I'm curious about whether ChainerRL can support PyTorch as an additional NN backend. Its interface is similar to Chainer's, but I'm not sure how easy it would be to support both. Any suggestions and opinions are welcome.

MuJoCo-ACER Examples

Are there any example codes of ACER in the continuous action spaces, using the MuJoCo environments?

Add suppression option for print messages during training loop?

In chainerrl.experiments.train_agent, statistical information is reported via print per episode during the training loop. However, this sometimes looks so verbose and I want to suppress these messages, but currently there is no good way to do so. Adding some option that enables/disables these prints might be beneficial.

env.spec.timestep_limit has been deprecated

gym now complains:

DEPRECATION WARNING: env.spec.timestep_limit has been deprecated. Replace your call to env.spec.timestep_limit with env.spec.tags.get('wrapper_config.TimeLimit.max_episode_steps'). This change was made 12/28/2016 and is included in version 0.7.0

Documentation on usage of recurrent models

In ChainerRL, to use user-defined recurrent models, you need to make sure they implement chainerrl.recurent.Recurrent interface, otherwise they won't be treated as recurrent models.

When your model's recurrent-ness comes from chainer.links.LSTM, all you have to do is inheriting chainer.recurrent.RecurrentChainMixin.

This kind of information is missing in the document.

The tutorial code causes TypeError on python 3.4

On python 3.4, random.sample don't accept collections.deque, so I got such error.

Traceback (most recent call last):
  File "quickstart.py", line 111, in <module>
    action = agent.act_and_train(obs, reward)
  File "/opt/rl/lib/python3.4/site-packages/chainerrl/agents/dqn.py", line 340, in act_and_train
    self.replay_updator.update_if_necessary(self.t)
  File "/opt/rl/lib/python3.4/site-packages/chainerrl/replay_buffer.py", line 194, in update_if_necessary
    transitions = self.replay_buffer.sample(self.batchsize)
  File "/opt/rl/lib/python3.4/site-packages/chainerrl/replay_buffer.py", line 42, in sample
    return random.sample(self.memory, n)
  File "/opt/rl/lib/python3.4/random.py", line 311, in sample
    raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
TypeError: Population must be a sequence or set.  For dicts, use list(d).

python 2.7 works fine, and meybe 3.5+.

ValueError: On entry to SGEMV parameter number 8 had an illegal value

Travis CI failed on examples/gym/train_ddpg_gym.py:

Traceback (most recent call last):
  File "examples/gym/train_ddpg_gym.py", line 173, in <module>
    main()
  File "examples/gym/train_ddpg_gym.py", line 170, in main
    max_episode_len=timestep_limit)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/experiments/train_agent.py", line 144, in train_agent_with_evaluation
    logger=logger)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/experiments/train_agent.py", line 52, in train_agent
    action = agent.act_and_train(obs, r)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/agents/ddpg.py", line 314, in act_and_train
    self.replay_updater.update_if_necessary(self.t)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/replay_buffer.py", line 327, in update_if_necessary
    self.update_func(transitions)
  File "/home/travis/build/pfnet/chainerrl/chainerrl/agents/ddpg.py", line 246, in update
    self.actor_optimizer.update(lambda: self.compute_actor_loss(batch))
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/chainer/optimizer.py", line 416, in update
    loss.backward()
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/chainer/variable.py", line 398, in backward
    gxs = func.backward(in_data, out_grad)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/chainer/functions/connection/linear.py", line 59, in backward
    gW = gy.T.dot(x).astype(W.dtype, copy=False)
ValueError: On entry to SGEMV parameter number 8 had an illegal value

This may be the same issue as chainer/chainer#2744

Date and time format for experiments

Human-readability of the name of the subdirectory for an experiment might be improved. The current imprementation (time_str = datetime.datetime.now().strftime('%Y%m%d%H%M%S%f') in chainerrl/experiments/prepare_output_dir.py) produces e.g. 21120903182945898662. How about

  • strftime('%Y%m%d-%H%M%S-%f') (e.g. 21120903-182945-898662), or
  • the basic format in ISO 8601 (e.g. 21120903T182945.898662+0900)?

Question on gym action space

Hi, I've defined my own OpenAI gym and have specified my actions in the environment as follows:

`
self.actions = ["NOOP", "LEFT", "RIGHT", "FIRE", "CLOAK"]

self.action_space = spaces.Discrete(len(self.actions))
`

When I try my environment against the 'train_dqn_gym.py' example I can see from my debug output that the training is correctly resulting in trying a variety of different actions.

However, with both 'train_a3c_gym.py' and 'train_acer_gym.py' the action value provided to my step value is always 0 (NOOP) - it never tries any other action.

Have I coded something wrong in my environment? I would appreciate any tips on how to investigate my issue further.

Specify successful configurations for examples

Current examples don't specify in what configuration they work well, except newer ones (train_pcl_gym.py and train_reinforce_gym.py). Such instructions are important because users can easily confirm that the implementations actually work.

  • ale/train_a3c_ale.py
  • ale/train_acer_ale.py
  • ale/train_dqn_ale.py
  • ale/train_nsq_ale.py
  • gym/train_a3c_gym.py
  • gym/train_acer_gym.py
  • gym/train_ddpg_gym.py
  • gym/train_dqn_gym.py
  • gym/train_pcl_gym.py
  • gym/train_reinforce_gym.py

REINFORCE

A simple REINFORCE implementation that doesn't require a value function would be helpful.

Type of observation and action space

I think the examples of gym interface show, the agent and q_functions expect the observation and action to be a Box or a Discrete for each. Is it correct?

If so, how should I use the observation and action other typed, especially a Tuple?
Do I need to modify the environment as to return a Box, or do I have another choice?

Thanks,

Windows Bash Run Chainerrl unknown cuda error

recently I got a windows 10 computer, successfully installed bash on ubuntu for windows, cuda, cudnn, chainer and chainerrl. But to run the example, I got the following error. Any suggestions?

(py2env) neil@DESKTOP-C22605O:~/chainerrl$ xvfb-run -s "-screen 0 1400x900x24" python examples/gym/train_dqn_gym.py
Output files are saved in dqn_out/20170324141722891586
INFO:gym.envs.registration:Making new env: Pendulum-v0
Traceback (most recent call last):
File "examples/gym/train_dqn_gym.py", line 179, in
main()
File "examples/gym/train_dqn_gym.py", line 154, in main
episodic_update=args.episodic_replay, episodic_update_len=16)
File "/home/neil/py2env/local/lib/python2.7/site-packages/chainerrl/agents/dqn.py", line 115, in init
cuda.get_device(gpu).use()
File "cupy/cuda/device.pyx", line 75, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2083)
File "cupy/cuda/device.pyx", line 81, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:2035)
File "cupy/cuda/runtime.pyx", line 178, in cupy.cuda.runtime.setDevice (cupy/cuda/runtime.cpp:2915)
File "cupy/cuda/runtime.pyx", line 130, in cupy.cuda.runtime.check_status (cupy/cuda/runtime.cpp:2241)
cupy.cuda.runtime.CUDARuntimeError: cudaErrorUnknown: unknown error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.