datawhalechina / joyrl Goto Github PK

An easier PyTorch deep reinforcement learning library.

Home Page: https://datawhalechina.github.io/joyrl/

License: MIT License

Python 96.86% Jupyter Notebook 3.04% Shell 0.08% Batchfile 0.02%

joyrl's Introduction

JoyRL

JoyRL is a parallel reinforcement learning library based on PyTorch and Ray. Unlike existing RL libraries, JoyRL is helping users to release the burden of implementing algorithms with tough details, unfriendly APIs, and etc. JoyRL is designed for users to train and test RL algorithms with only hyperparameters configuration, which is mush easier for beginners to learn and use. Also, JoyRL supports plenties of state-of-art RL algorithms including RLHF(core of ChatGPT)(See algorithms below). JoyRL provides a modularized framework for users as well to customize their own algorithms and environments.

Install

⚠️ Note that donot install JoyRL through any mirror image!!!

# you need to install Anaconda first
conda create -n joyrl python=3.10
conda activate joyrl
pip install -U joyrl

Torch install:

# CPU
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1
# CUDA 11.8
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121

Usage

Quick Start

the following presents a demo to use joyrl. As you can see, first create a yaml file to config hyperparameters, then run the command as below in your terminal. That's all you need to do to train a DQN agent on CartPole-v1 environment.

joyrl --yaml ./presets/ClassControl/CartPole-v1/CartPole-v1_DQN.yaml

or you can run the following code in your python file.

import joyrl
if __name__ == "__main__":
    print(joyrl.__version__)
    yaml_path = "./presets/ClassControl/CartPole-v1/CartPole-v1_DQN.yaml"
    joyrl.run(yaml_path = yaml_path)

Documentation

More tutorials and API documentation are hosted on JoyRL docs or JoyRL 中文文档.

Algorithms

Name	Reference	Author
Q-learning	RL introduction	johnjim0816
Sarsa	RL introduction	johnjim0816
DQN	DQN Paper	johnjim0816
Double DQN	DoubleDQN Paper	johnjim0816
Dueling DQN	DuelingDQN Paper	johnjim0816
NoisyDQN	NoisyDQN Paper	johnjim0816
DDPG	DDPG Paper	johnjim0816
TD3	TD3 Paper	johnjim0816
A2C/A3C	A3C Paper	johnjim0816
PPO	PPO Paper	johnjim0816
SoftQ	SoftQ Paper	johnjim0816

Why JoyRL?

RL Platform	# of Alg. ⁽¹⁾	Custom Env	Async Training	RNN Support	Multi-Head Observation	Backend
Baselines	9	✔️ (gym)	❌	✔️	❌	TF1
Stable-Baselines	11	✔️ (gym)	❌	✔️	❌	TF1
Stable-Baselines3	7	✔️ (gym)	❌	❌	✔️	PyTorch
Ray/RLlib	16	✔️	✔️	✔️	✔️	TF/PyTorch
SpinningUp	6	✔️ (gym)	❌	❌	❌	PyTorch
Dopamine	7	❌	❌	❌	❌	TF/JAX
ACME	14	✔️ (dm_env)	❌	✔️	✔️	TF/JAX
keras-rl	7	✔️ (gym)	❌	❌	❌	Keras
cleanrl	9	✔️ (gym)	❌	❌	❌	poetry
rlpyt	11	❌	❌	✔️	✔️	PyTorch
ChainerRL	18	✔️ (gym)	❌	✔️	❌	Chainer
Tianshou	20	✔️ (Gymnasium)	❌	✔️	✔️	PyTorch
JoyRL	11	✔️ (Gymnasium)	✔️	✔️	✔️	PyTorch

Here are some other highlghts of JoyRL:

Provide a series of Chinese courses JoyRL Book (with the English version in progress), suitable for beginners to start with a combination of theory

Contributors

John Jim

Peking University

Qi Wang

Shanghai Jiao Tong University

Yiyuan Yang

University of Oxford

joyrl's People

Contributors

Stargazers

Watchers

Forkers

gsc579 codenamerole wangzhongren-code julianyu123456 cr-bh jack-honey du970821 qhcv misaki-wang elonmusk9577 xuanlinzeng jacklynn2005 scchy gyyer skypow2012 zdynb lightcccblue kailigithub

joyrl's Issues

关于joyrl，初学者的困惑，请作者帮忙答疑解惑

网上很多能找到的教程都是利用已有的环境来教学，但实际上对于很多非强化学习专业研究者来说如何快速利用强化学习包来解决自己领域内场景的问题才是最棘手的，因此希望作者可以出一个教程甚至可以录一个视频，从下载joyrl，到编写自己的一个环境，到调用joyrl对应的rl算法，到最后解决，能够写出一整个完整的教程，我觉得这是对于其他教程来说一个很好的突破。希望作者可以考虑一下并尽快抓住这个初学者的痛处，毕竟现在考研结束了，很多研0的同学都开始找自己感兴趣的方向了，如果趁这个时候可以做出一个好的教程，那么对于想要学习rl的研0同学一定是帮助很大的。

offline_run.py中_check_obs_action_space_info方法有误

执行离线测试代码时报错如下：

(RL) liber@DESKTOP-HJ34P4I D:\Projects\joyrl>python offline_run.py --yaml presets/ClassControl/CartPole-v1/CartPole-v1_DQN.yaml
2024-09-03 10:35:53,380 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
Traceback (most recent call last):
  File "D:\Projects\joyrl\offline_run.py", line 240, in <module>
    launcher.run()
  File "D:\Projects\joyrl\offline_run.py", line 229, in run
    self._check_obs_action_space_info(env)
  File "D:\Projects\joyrl\offline_run.py", line 217, in _check_obs_action_space_info
    action_type_list, action_size_list = self._check_obs_action_space_info(env)
  File "D:\Projects\joyrl\offline_run.py", line 217, in _check_obs_action_space_info
    action_type_list, action_size_list = self._check_obs_action_space_info(env)
  File "D:\Projects\joyrl\offline_run.py", line 217, in _check_obs_action_space_info
    action_type_list, action_size_list = self._check_obs_action_space_info(env)
  [Previous line repeated 991 more times]
  File "D:\Projects\joyrl\offline_run.py", line 216, in _check_obs_action_space_info
    self.cfg.obs_space_info = ObsSpaceInfo(size = state_size_list, type = state_type_list)
  File "D:\Projects\joyrl\joyrl\framework\core_types.py", line 78, in __init__
    self._check_type_size()
  File "D:\Projects\joyrl\joyrl\framework\core_types.py", line 81, in _check_type_size
    assert len(self.type) == len(self.size), 'obs type and size must have the same length'
RecursionError: maximum recursion depth exceeded while calling a Python object

经检查发现定义了两个_check_obs_action_space_info方法，将第二个方法改成_check_obs_state_space_info，同时修改run方法中self._check_obs_action_space_info(env)为self._check_obs_state_space_info(env)之后问题解决。

pip install torch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0报错问题

在window上配环境时，pip install torch==1.10.0 torchvision==0.11.1 torchaudio==0.10.0报错，改为pip install torch==1.10.0 torchvision==0.11.1 torchaudio==0.10.0解决了，为加快速度使用了镜像源

其他很多.yaml文件运行不了

作者你好，我想知道为什么我将.yaml文件为preset里的其他文件之后，运行会一直出错。难道只能运行你给的例子吗？其他的很多.yaml文件都运行不了

Issues of `observation_space` and `action_space` of costumed environment

Python version: 3.10.14
joyrl version: 0.6.5.1
Pytorch version: torch 2.2.1+cu121
torchaudio 2.2.1+cu121
torchvision 0.17.1+cu121

I intend to define observation space as follows:
[Image_of_one_channel, 1d-vector, 1d-vector]
And output space as follows:
[1d-vector]

For the observation spaces, I defined it as follows:

self.observation_space = spaces.Tuple(spaces=[
    spaces.Box(low=0, high=float("inf"), shape=(1, N, N)),
    spaces.Box(low=0, high=float("inf"), shape=(A,)),
    spaces.Box(low=0, high=float("inf"), shape=(B, ))
])
self.action_space = spaces.Box(low=0, high=float("inf"), shape=(B, 1))

in which capitalized letters represent numbers. But I notice that functions for spaces.Tuple is not yet implemented, as shown below:

# run.py, class Launcher
...
    def _check_obs_action_space_info(self, env):
        obs_space = env.observation_space
        if isinstance(obs_space, Box):
            if len(obs_space.shape) == 3:
                state_type_list = [ObsType.IMAGE]
                state_size_list = [[obs_space.shape[0], obs_space.shape[1], obs_space.shape[2]]]
            else:
                state_type_list = [ObsType.VECTOR]
                state_size_list = [[obs_space.shape[0]]]
        elif isinstance(obs_space, Discrete):
            state_type_list = [ObsType.VECTOR]
            state_size_list = [[obs_space.n]]
        else:
            raise ValueError('obs_space type error')
...

The second problem arises when I try to assign action space to continuous vector in $\mathbf{R}^k$ but I didn't save the error log. In general, the action layers is parsed to have the last layer with output shape 0, and raise the error. When I tried to modify the source code to force it not be 0, other errors ocurrs.

Finally, the corresponding network architecture field in the .yaml file is

algo_cfg:
  branch_layers:
    - name: view
      layers:
      - layer_type: conv2d
        in_channel: 1
        out_channel: 16 
        kernel_size: 4
        stride: 2
        activation: relu
      - layer_type: pooling
        pooling_type: max2d
        kernel_size: 2
        stride: 2
        padding: 0
      - layer_type: flatten
      - layer_type: norm
        norm_type: LayerNorm
        normalized_shape: 512
      - layer_type: linear
        layer_size: [128]
        activation: relu
    - name: lwh
      layers:
      - layer_type: linear
        layer_size: [32]
        activation: relu
      - layer_type: linear
        layer_size: [32]
        activation: relu
    - name: parts
      layers:
      - layer_type: linear
        layer_size: [256]
        activation: relu
      - layer_type: linear
        layer_size: [256]
        activation: relu
  merge_layers:
    - layer_type: linear
      layer_size: [256]
      activation: relu
    - layer_type: linear
      layer_size: [256]
      activation: relu
...

datawhalechina / joyrl Goto Github PK

joyrl's Introduction

JoyRL

Install

Usage

Quick Start

Documentation

Algorithms

Why JoyRL?

Contributors

joyrl's People

Contributors

Stargazers

Watchers

Forkers

joyrl's Issues

Recommend Projects

Recommend Topics

Recommend Org