Giter VIP home page Giter VIP logo

rl-perturbed-reward's Introduction

RL with Perturbed Rewards

This is the tensorflow implementation of Reinforcement Learning with Perturbed Rewards as described in the following AAAI 2020 paper (Spotlight):

@inproceedings{wang2020rlnoisy,
  title={Reinforcement Learning with Perturbed Rewards},
  author={Wang, Jingkang and Liu, Yang and Li, Bo},
  booktitle={AAAI},
  year={2020}
}

The implementation is based on keras-rl and OpenAI baselines frameworks. Thanks to the original authors!

  • gym-control: Classic control games
  • gym-atari: Atari-2600 games

Dependencies

  • python 3.5
  • tensorflow 1.10.0, keras 2.1.0
  • gym, scipy, scipy, joblib, keras
  • progressbar2, mpi4py, cloudpickle, opencv-python, h5py, pandas

Note: make sure that you have successfully installed the baseline package and other packages following (using virtualenvwrapper to create virtual environment):

mkvirtualenv rl-noisy --python==/usr/bin/python3
pip install -r requirements.txt
cd gym-atari/baselines
pip install -e .

Examples

  • Classic control (DQN on Cartpole)
cd gym-control
python cem_cartpole.py                                           # true reward
python dqn_cartpole.py --error_positive 0.1 --reward noisy       # perturbed reward
python dqn_cartpole.py --error_positive 0.1 --reward surrogate   # surrogate reward (estimated)
  • Atari-2600 (PPO on Phoenix)
cd gym-atari/baselines
python -m baselines.run --alg=ppo2 --env=PhoenixNoFrameskip-v4 \  # true reward
       --num_timesteps=5e7 --normal=True                          
python -m baselines.run --alg=ppo2 --env=PhoenixNoFrameskip-v4 \  # noisy reward
       --num_timesteps=5e7 --save_path=logs-phoenix/phoenix/ppo2_50M_noisy_0.2 \
       --weight=0.2 --normal=False --surrogate=False --noise_type=anti_iden
python -m baselines.run --alg=ppo2 --env=PhoenixNoFrameskip-v4 \  # surrogate reward (estimated)
       --num_timesteps=5e7 --save_path=logs-phoenix/phoenix/ppo2_50M_noisy_0.2 \
       --weight=0.2 --normal=False --surrogate=True --noise_type=anti_iden

Reproduce the Results

To reproduce all the results reported in the paper, please refer to scripts/ folders in rl-noisy-reward-control and rl-noisy-reward-atari:

  • gym-control/scripts
    • Cartpole
      • train-cem.sh (CEM)
      • train-dqn.sh (DQN)
      • train-duel-dqn.sh (Dueling-DQN)
      • train-qlearn.sh (Q-Learning)
      • train-sarsa.sh (Deep SARSA)
    • Pendulum
      • train-ddpg.sh (DDPG)
      • train-naf.sh (NAF)
  • gym-atari/scripts
    • train-alien.sh (Alien)
    • train-carnival.sh (Carnival)
    • train-mspacman.sh (MsPacman)
    • train-phoenix.sh (Phoenix)
    • train-pong.sh (Pong)
    • train-seaquest.sh (Seaquest)
    • train-normal.sh (Training with true rewards)

If you have eight available GPUs (Memory > 8GB), you can directly run the *.sh scripts one at a time. Otherwise, you can follow the instructions in the scripts and run the experiments. It ususally takes one or two days (GTX-1080 Ti) to train the policy.

cd rl-noisy-reward-atari/baselines
sh scripts/train-alien.sh

The logs and models will be saved automatically. We provide results_single.py for getting the averaged scores:

python -m baselines.results_single --log_dir logs-alien

Citation

Please cite our paper if you use this code in your research work.

Questions/Bugs

Please submit a Github issue or contact [email protected] if you have any questions or find any bugs.

rl-perturbed-reward's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rl-perturbed-reward's Issues

tabular data/ noisy instances/ new datasets

Hi,
thanks for sharing your implementation. I have some questions about it:

  1. Does it also work on tabular data?
  2. Is the code tailored to the datasets used in the paper or can one apply it to any data?
  3. Is it possible to identify the noisy instances (return the noisy IDs or the clean set)?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.