Giter VIP home page Giter VIP logo

spibb's Introduction

Implementation of Safe Policy Improvement with Baseline Bootstrapping and Safe Policy Improvement with Soft Baseline Bootstrapping

This project can be used to reproduce the finite MDPs experiments presented in:

  • the ICML2019 paper: Safe Policy Improvement with Baseline Bootstrapping, by Romain Laroche, Paul Trichelair, and Rémi Tachet des Combes. (SPIBB)
  • the ECML-PKDD2019 paper: Safe Policy Improvement with Soft Baseline Bootstrapping, by Kimia Nadjahi, Romain Laroche, and Rémi Tachet des Combes. (Soft-SPIBB)

For the DQN implementation of SPIBB and Soft-SPIBB, please refer to the git repository at this address.

Prerequisites

The project is implemented in Python 3.5 and requires numpy and scipy.

Usage

We include the following:

  • Libraries of the following algorithms:

    • basic RL,
    • SPIBB:
      • Pi_b-SPIBB,
      • Pi_{\leq b}-SPIBB,
    • Soft-SPIBB:
      • Exact-Soft-SPIBB (1-step or not),
      • Approx-Soft-SPIBB (1-step or not),
    • HCPI:
      • doubly-robust,
      • importance_sampling,
      • weighted_importance_sampling,
      • weighted_per_decision_IS,
      • per_decision_IS,
    • Robust MDP,
    • and Reward-adjusted MDP.
  • Environments:

    • Gridworld environment,
    • Random MDPs environment.
  • SPIBB experiments:

    • Gridworld experiment of Section 3.1. Run:

      python gridworld_main.py #name_of_experiment# #random_seed#

    • Gridworld experiment with random behavioural policy of Section 3.2. Run:

      python gridworld_random_behavioural_main.py #name_of_experiment# #random_seed#

    • Random MDPs experiment of Section 3.3. Run:

      python randomMDPs_main.py #name_of_experiment# #random_seed#

  • Soft-SPIBB Random MDPs experiment of Section 4.1.Run:

    python soft_randomMDPs_main.py #name_of_experiment# #random_seed#

We DO NOT include the following:

  • The hyper-parameter search (Appendix C.2 in SPIBB paper): it should be easy to re-implement.
  • The figure generator: it has too many specificities to be made understandable for a user at the moment. Also, it is not hard to re-implement one's own visualization tools.
  • The multi-CPU implementation: its structure is too much dependent on the cluster tools.

License

This project is BSD-licensed.

Reference

Please use the following bibtex entry if you use this code for SPIBB:

@inproceedings{Laroche2019,
    title={Safe Policy Improvement with Baseline Bootstrapping},
    author={Laroche, Romain and Trichelair, Paul and Tachet des Combes, R\'emi},
    booktitle={Proceedings of the 36th International Conference on Machine Learning (ICML)},
    year={2019}
}

Please use the following bibtex entry if you use this code for Soft-SPIBB:

@inproceedings{NadjahiLarocheTachet2019,
    title={Safe Policy Improvement with Soft Baseline Bootstrapping},
    author={Nadjahi, Kimia and Laroche, Romain and Tachet des Combes, R\'emi},
    booktitle={Proceedings of the 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
    year={2019}
}

spibb's People

Contributors

romainlaroche avatar kimiandj avatar

Stargazers

 avatar Abhishek Sharma avatar Penguinim avatar eaves avatar Wei-Chen Liao avatar Ragnar avatar Jason avatar Yinmin.Zhang avatar Vladislav Kurenkov avatar  avatar  avatar Massimo Caccia avatar  avatar Han Wang avatar Lintao Liu avatar  avatar  avatar  avatar Jongmin Lee avatar Edouard Leurent avatar  avatar STYLIANOS IORDANIS avatar TRICHELAIR Paul avatar Ranjodh Singh avatar Thiago avatar

Watchers

Remi Tachet des Combes avatar TRICHELAIR Paul avatar paper2code - bot avatar

spibb's Issues

Inquiry about how to generate "npy_filename" for Q_baseline

We notice that the process of generating the file entitled "state_action_val_used_size_5_env_type_0.npy" is missing. Could you please leave us some hints about how to generate the baseline policy?

Your kind help is very much appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.