openai / safety-starter-agents Goto Github PK

Basic constrained RL agents used in experiments for the "Benchmarking Safe Exploration in Deep Reinforcement Learning" paper.

Home Page: https://openai.com/blog/safety-gym/

License: MIT License

Python 100.00%

safety-starter-agents's Introduction

Status: Archive (code is provided as-is, no updates expected)

Safety Starter Agents

A companion repo to the paper "Benchmarking Safe Exploration in Deep Reinforcement Learning," containing a variety of unconstrained and constrained RL algorithms.

This repo contains the implementations of PPO, TRPO, PPO-Lagrangian, TRPO-Lagrangian, and CPO used to obtain the results in the "Benchmarking Safe Exploration" paper, as well as experimental implementations of SAC and SAC-Lagrangian not used in the paper.

Note that the PPO implementations here follow the convention from Spinning Up rather than Baselines: they use the early stopping trick, omit observation and reward normalization, and do not use the clipped value loss, among other potential diffs. As a result, while it is easy to fairly compare this PPO to this TRPO, it is not the strongest PPO implementation (in the sense of sample efficiency) and can be improved on substantially.

Supported Platforms

This package has been tested on Mac OS Mojave and Ubuntu 16.04 LTS, and is probably fine for most recent Mac and Linux operating systems.

Requires Python 3.6 or greater.

Installation

To install this package:

git clone https://github.com/openai/safety-starter-agents.git

cd safety-starter-agents

pip install -e .

Warning: Installing this package does not install Safety Gym. If you want to use the algorithms in this package to train agents on onstrained RL environments, make sure to install Safety Gym according to the instructions on the Safety Gym repo.

Getting Started

Example Script: To run PPO-Lagrangian on the Safexp-PointGoal1-v0 environment from Safety Gym, using neural networks of size (64,64):

from safe_rl import ppo_lagrangian
import gym, safety_gym

ppo_lagrangian(
	env_fn = lambda : gym.make('Safexp-PointGoal1-v0'),
	ac_kwargs = dict(hidden_sizes=(64,64))
	)

Reproduce Experiments from Paper: To reproduce an experiment from the paper, run:

cd /path/to/safety-starter-agents/scripts
python experiment.py --algo ALGO --task TASK --robot ROBOT --seed SEED 
	--exp_name EXP_NAME --cpu CPU

where

ALGO is in ['ppo', 'ppo_lagrangian', 'trpo', 'trpo_lagrangian', 'cpo'].
TASK is in ['goal1', 'goal2', 'button1', 'button2', 'push1', 'push2'] .
ROBOT is in ['point', 'car', 'doggo'].
SEED is an integer. In the paper experiments, we used seeds of 0, 10, and 20, but results may not reproduce perfectly deterministically across machines.
CPU is an integer for how many CPUs to parallelize across.

EXP_NAME is an optional argument for the name of the folder where results will be saved. The save folder will be placed in /path/to/safety-starter-agents/data.

Plot Results: Plot results with:

cd /path/to/safety-starter-agents/scripts
python plot.py data/path/to/experiment

Watch Trained Policies: Test policies with:

cd /path/to/safety-starter-agents/scripts
python test_policy.py data/path/to/experiment

Cite the Paper

If you use Safety Starter Agents code in your paper, please cite:

@article{Ray2019,
    author = {Ray, Alex and Achiam, Joshua and Amodei, Dario},
    title = {{Benchmarking Safe Exploration in Deep Reinforcement Learning}},
    year = {2019}
}

safety-starter-agents's People

Contributors

Stargazers

Watchers

Forkers

arghyachatterjee jayagupta678 vzhuang parhamgohari wwxfromtju neighthan joncarter1 krishpop minded-hua sk413025 leithhobson hongyi-zhou liuyuqi123 homangab anneott ut-amrl xiaogaogaoxiao jbhoffman613 lucifer2288 srf1986 drkwint stephennfernandes mmilk1231 dkkim93 svengronauer akjayant pphamtue manojkesani flodorner jucao michahu mahaitongdae ashkanbj anyboby global-localhost global19 global19-atlassian-net zikangxiong williamd4112 ymzhang01 yangli0505 aicools zwc662 nadinemeng shidi1985 srishtidh vathan27 kellsky dawsonc thuang classicvalues florisdenhengst bingqingchen qisong-yang djmartingale brlrpo arthemis95 liuzuxin zhangyuh15 hlhang9527 hsuth1996 clarkzhao nitishg20 hh30hh jeappen pihari zhihanlee yardenas fanyangr liuqi8827 isabella232 rachelluoyt aayush-jain01 yzae2623 tohsin xiangxiangzhu elijahahianyo ricky-zhu xueliu8617112 linnetfire 5l1v3r1 ayoubjadouli a-why-not-fork-repositories-good-luck joolstorrentecalo lucit21 jingxuanyang goompean mdtrimboli yubaozhang yang-new indranil-sri apocalypsex ghas-results seanpm2001 apollohuang1 dtch1997 ghas-results janhuman shenjiede farshadb

safety-starter-agents's Issues

placeholder_from_space only accepts Box and Discrete spaces

Hello!

I just started playing around with the code and I am trying to run a ppo_lagrangian agent for a custom environment. The issue is that my observation space is a dictionary that includes a variety of spaces, in particular 5 Box spaces and a MultiDiscrete space. I have changed run_agent.run_polopt_agent to accept my observation and action space as arguments and now I am getting a NotImplementedError from network.placeholder_from_space.

I was wondering if trying to find a workaround is worth-pursuing or these techniques are only meant to be run on simple Box and Discrete spaces.

Method to continue from checkpoint

Hi,
Is there a way we can continue or resume the training from some given inputs? We are saving all epochs at save frequency

sac-lagrangian shows poor performance on PointGoal1?

On running the lagrangian version of SAC I get the following curve for costs. I tried changing the constraint limits to a range of values and didn't get much benefit:

Am I doing something wrong, or is this expected in offpolicy algorithms?

Please provide conda environment.yml file

While I'm able to install and run safety-gym, I am unable to install safety-starter-agents. It seems like there might be some conflicts due to older version of Tensorflow.
Could you please provide a environment.yml file Conda file with all the necessary dependencies?

backtracking linesearch

toy bechmarking experiment takes up too much GPU memory

I installed Safety-gym and this repository. I run the experiment by the following command:
"
python experiment.py --algo cpo --task goal1 --robot point --seed 0 --exp_name pointgoal1-cposeed0 --cpu 1
"
But this command takes too much GPU memory. My GPU is Tesla P40, and this simple experiment takes up almost 23G memory, which is quite strange.

Could you please help me?

same random seed for train_env and test_env

Hi there,

I have two questions regarding the test_env:

Why did you only have test_env for sac, not for ppo and trpo?
In safe_rl.sac.sac.py line 273 you set the seeds of env and test_env using the same seeds, then test_env would be the same as the training envs, right? Is the purpose of test_env only testing the deterministic actions, not at all the generalization of the policy?

# Setting seeds
    tf.set_random_seed(seed)
    np.random.seed(seed)
    env.seed(seed)
    test_env.seed(seed)

Thank you very much in advance.

ImportError: libmpi.so.12: cannot open shared object file: No such file or directory

System: Ubuntu 18 04
Compiler: PyCharm
There was a problem installing the dependency package mpi4py==3.0.2
Does anyone have a similar problem? How to solve it?

[Disscusion] Alternative code base for safe reinforcement learning research: OmniSafe

The safety-starter-agents codebase has been a valuable resource for early-stage research in the field of reinforcement learning. However, it has come to our attention that the author is no longer maintaining the library, resulting in some frustration due to the absence of updates for the latest algorithms and the lack of support for model-based, offline security reinforcement learning algorithms.

In response to this issue and inspired by the streamlined design philosophy of safety-starter-agents, we have developed an infrastructural framework, OmniSafe, aimed at accelerating safe reinforcement learning research. Our framework supports a range of algorithms, including On-policy, Off-policy, model-based, offline, and control-based approaches, with continuous updates for the latest algorithms.

Thanks to safety-starter-agents, a superb codebase, we are able to build upon the achievements of our predecessors in the field of scientific research, and we hope that OmniSafe can provide support for further scientific research in safe reinforcement learning for everyone.

The OmniSafe git repository: https://github.com/OmniSafeAI/omnisafe

Hyperparameters for each environment-agent combination

Hello

In the paper, you mention that the results are presented with the hand-tuned hyperparameters for each algorithm class (Sec 5.2). Can you also share those hyperparams? This will save the computation cost for the grid search as well as add to the reproducibility value.