Giter VIP home page Giter VIP logo

safety-starter-agents's Introduction

Status: Archive (code is provided as-is, no updates expected)

Safety Starter Agents

A companion repo to the paper "Benchmarking Safe Exploration in Deep Reinforcement Learning," containing a variety of unconstrained and constrained RL algorithms.

This repo contains the implementations of PPO, TRPO, PPO-Lagrangian, TRPO-Lagrangian, and CPO used to obtain the results in the "Benchmarking Safe Exploration" paper, as well as experimental implementations of SAC and SAC-Lagrangian not used in the paper.

Note that the PPO implementations here follow the convention from Spinning Up rather than Baselines: they use the early stopping trick, omit observation and reward normalization, and do not use the clipped value loss, among other potential diffs. As a result, while it is easy to fairly compare this PPO to this TRPO, it is not the strongest PPO implementation (in the sense of sample efficiency) and can be improved on substantially.

Supported Platforms

This package has been tested on Mac OS Mojave and Ubuntu 16.04 LTS, and is probably fine for most recent Mac and Linux operating systems.

Requires Python 3.6 or greater.

Installation

To install this package:

git clone https://github.com/openai/safety-starter-agents.git

cd safety-starter-agents

pip install -e .

Warning: Installing this package does not install Safety Gym. If you want to use the algorithms in this package to train agents on onstrained RL environments, make sure to install Safety Gym according to the instructions on the Safety Gym repo.

Getting Started

Example Script: To run PPO-Lagrangian on the Safexp-PointGoal1-v0 environment from Safety Gym, using neural networks of size (64,64):

from safe_rl import ppo_lagrangian
import gym, safety_gym

ppo_lagrangian(
	env_fn = lambda : gym.make('Safexp-PointGoal1-v0'),
	ac_kwargs = dict(hidden_sizes=(64,64))
	)

Reproduce Experiments from Paper: To reproduce an experiment from the paper, run:

cd /path/to/safety-starter-agents/scripts
python experiment.py --algo ALGO --task TASK --robot ROBOT --seed SEED 
	--exp_name EXP_NAME --cpu CPU

where

  • ALGO is in ['ppo', 'ppo_lagrangian', 'trpo', 'trpo_lagrangian', 'cpo'].
  • TASK is in ['goal1', 'goal2', 'button1', 'button2', 'push1', 'push2'] .
  • ROBOT is in ['point', 'car', 'doggo'].
  • SEED is an integer. In the paper experiments, we used seeds of 0, 10, and 20, but results may not reproduce perfectly deterministically across machines.
  • CPU is an integer for how many CPUs to parallelize across.

EXP_NAME is an optional argument for the name of the folder where results will be saved. The save folder will be placed in /path/to/safety-starter-agents/data.

Plot Results: Plot results with:

cd /path/to/safety-starter-agents/scripts
python plot.py data/path/to/experiment

Watch Trained Policies: Test policies with:

cd /path/to/safety-starter-agents/scripts
python test_policy.py data/path/to/experiment

Cite the Paper

If you use Safety Starter Agents code in your paper, please cite:

@article{Ray2019,
    author = {Ray, Alex and Achiam, Joshua and Amodei, Dario},
    title = {{Benchmarking Safe Exploration in Deep Reinforcement Learning}},
    year = {2019}
}

safety-starter-agents's People

Contributors

jachiam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

safety-starter-agents's Issues

placeholder_from_space only accepts Box and Discrete spaces

Hello!

I just started playing around with the code and I am trying to run a ppo_lagrangian agent for a custom environment. The issue is that my observation space is a dictionary that includes a variety of spaces, in particular 5 Box spaces and a MultiDiscrete space. I have changed run_agent.run_polopt_agent to accept my observation and action space as arguments and now I am getting a NotImplementedError from network.placeholder_from_space.

I was wondering if trying to find a workaround is worth-pursuing or these techniques are only meant to be run on simple Box and Discrete spaces.

sac-lagrangian shows poor performance on PointGoal1?

On running the lagrangian version of SAC I get the following curve for costs. I tried changing the constraint limits to a range of values and didn't get much benefit:

lagrangian_sac_pointgoal1

Am I doing something wrong, or is this expected in offpolicy algorithms?

Please provide conda environment.yml file

While I'm able to install and run safety-gym, I am unable to install safety-starter-agents. It seems like there might be some conflicts due to older version of Tensorflow.
Could you please provide a environment.yml file Conda file with all the necessary dependencies?

toy bechmarking experiment takes up too much GPU memory

I installed Safety-gym and this repository. I run the experiment by the following command:
"
python experiment.py --algo cpo --task goal1 --robot point --seed 0 --exp_name pointgoal1-cposeed0 --cpu 1
"
But this command takes too much GPU memory. My GPU is Tesla P40, and this simple experiment takes up almost 23G memory, which is quite strange.
image
Could you please help me?

same random seed for train_env and test_env

Hi there,

I have two questions regarding the test_env:

  1. Why did you only have test_env for sac, not for ppo and trpo?
  2. In safe_rl.sac.sac.py line 273 you set the seeds of env and test_env using the same seeds, then test_env would be the same as the training envs, right? Is the purpose of test_env only testing the deterministic actions, not at all the generalization of the policy?
# Setting seeds
    tf.set_random_seed(seed)
    np.random.seed(seed)
    env.seed(seed)
    test_env.seed(seed)

Thank you very much in advance.

[Disscusion] Alternative code base for safe reinforcement learning research: OmniSafe

The safety-starter-agents codebase has been a valuable resource for early-stage research in the field of reinforcement learning. However, it has come to our attention that the author is no longer maintaining the library, resulting in some frustration due to the absence of updates for the latest algorithms and the lack of support for model-based, offline security reinforcement learning algorithms.

In response to this issue and inspired by the streamlined design philosophy of safety-starter-agents, we have developed an infrastructural framework, OmniSafe, aimed at accelerating safe reinforcement learning research. Our framework supports a range of algorithms, including On-policy, Off-policy, model-based, offline, and control-based approaches, with continuous updates for the latest algorithms.

Thanks to safety-starter-agents, a superb codebase, we are able to build upon the achievements of our predecessors in the field of scientific research, and we hope that OmniSafe can provide support for further scientific research in safe reinforcement learning for everyone.

The OmniSafe git repository: https://github.com/OmniSafeAI/omnisafe

Hyperparameters for each environment-agent combination

Hello

In the paper, you mention that the results are presented with the hand-tuned hyperparameters for each algorithm class (Sec 5.2). Can you also share those hyperparams? This will save the computation cost for the grid search as well as add to the reproducibility value.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.