Giter VIP home page Giter VIP logo

mod-rl-rcbf's Introduction

SAC-RCBF

Repository containing the code for the paper "Safe Reinforcement Learning using Robust Control Barrier Functions". Specifically, an implementation of SAC + Robust Control Barrier Functions (RCBFs) for safe reinforcement learning in multiple custom environments.

While exploring, an RL agent can take actions that lead the system to unsafe states. Here, we use a differentiable RCBF safety layer that minimially alters (in the least-squares sense) the actions taken by the RL agent to ensure the safety of the agent.

Robust Control Barrier Functions (RCBFs)

In this work, we focus on RCBFs that are formulated with respect to differential inclusions of the following form:

$$\dot{x} \in f(x) + g(x)u + D(x)$$

Here D(x) is a disturbance set unkown apriori to the robot, which we learn online during traing via Gaussian Processes (GPs). The underlying library is GPyTorch.

The QP used to ensure the system's safety is given by:

where h(x) is the RCBF, and u_RL is the action outputted by the RL policy. As such, the final (safe) action taken in the environment is given by u = u_RL + u_RCBF as shown in the following diagram:

Coupling RL & RCBFs to Improve Training Performance

The above is sufficient to ensure the safety of the system, however, we would also like to improve the performance of the learning by letting the RCBF layer guide the training. This is achieved via:

  • Using a differentiable version of the safety layer that allows us to backpropagte through the RCBF based Quadratic Program (QP) resulting in an end-to-end policy.
  • Using the GPs and the dynamics prior to generate synthetic data (model-based RL).

Other Approaches

In addition, the approach is compared against two other frameworks (implemented here) in the experiments:

Running the Experiments

There are two sets of experiments in the paper. The first set evaluates the sample efficiency of SAC-RCBF in two custom environments. The second set evaluates the efficacy of the proposed Modular SAC-RCBF approach at learning the reward-driven task independently from the safety constraints, which results in better transfer performance.

Experiment 1.1 (Sample Efficiency - Unicycle Env)

  • Baseline:
python main.py --cuda --env Unicycle --cbf_mode baseline --max_episodes 200 --seed 12345
  • Baseline w/ comp:
python main.py --env Unicycle --cuda --cbf_mode baseline --use_comp True --max_episodes 200 --seed 12345
  • MF SAC-RCBF:
python main.py --cuda --env Unicycle --cbf_mode full --max_episodes 200 --seed 12345
  • MB SAC-RCBF:
python main.py --cuda --env Unicycle --model_based --updates_per_step 2 --batch_size 512 --rollout_batch_size 5 --real_ratio 0.3 --gp_max_episodes 70 --cbf_mode full --max_episodes 200 --seed 12345

Experiment 1.2 (Sample Efficiency - Simulated Cars Env)

  • Baseline:
python main.py --cuda --env SimulatedCars --max_episodes 300 --cbf_mode baseline --seed 12345
  • Baseline w/ comp:
python main.py --env SimulatedCars --cuda --cbf_mode baseline --use_comp True --max_episodes 300 --seed 12345
  • MF SAC-RCBF:
python main.py --cuda --env SimulatedCars --max_episodes 300 --cbf_mode full --seed 12345
  • MB SAC-RCBF:
python main.py --cuda --env SimulatedCars --model_based --updates_per_step 2 --batch_size 512 --rollout_batch_size 5 --real_ratio 0.3 --max_episodes 300 --cbf_mode full --gp_max_episodes 70 --seed 12345

Experiment 2.1 (Modular Learning - Unicycle)

  • SAC w/o obstacles (upper performance upper bound):
python main.py --cuda --env Unicycle --cbf_mode off --rand_init True --obs_config none --seed 12345
  • Modular SAC-RCBF:
python main.py --cuda --env Unicycle --cbf_mode mod --rand_init True --seed 12345
  • SAC-RCBF:
python main.py --cuda --env Unicycle --cbf_mode full --rand_init True --seed 12345
  • Baseline:
python main.py --cuda --env Unicycle --cbf_mode baseline --rand_init True --seed 12345
  • Test zero-shot transfer:
python main.py --mode test --validate_episodes 200 --resume [run #] --cbf_mode baseline --env Unicycle --obs_config random --seed 12345

Experiment 2.2 (Modular Learning - Pvtol)

  • SAC w/o obstacles/safety operator (upper performance upper bound):
python main.py --cuda --env Pvtol --rand_init True --cbf_mode baseline --rand_init True --obs_config none --seed 12345
  • Modular SAC-RCBF:
python main.py --cuda --env Pvtol --rand_init True --cbf_mode mod --seed 12345
  • SAC-RCBF:
python main.py --cuda --env Pvtol --rand_init True --cbf_mode full --seed 12345
  • Baseline:
python main.py --cuda --env Pvtol --rand_init True --cbf_mode baseline --seed 12345
  • Test zero-shot transfer:
python main.py --mode test --validate_episodes 200 --resume [run #] --cbf_mode baseline --env Pvtol --obs_config random --seed 12345

mod-rl-rcbf's People

Contributors

yemam3 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.