Giter VIP home page Giter VIP logo

train-procgen-pytorch's Introduction

Training Procgen environment with Pytorch

๐Ÿ†•โœ…๐ŸŽ‰ updated code: 10th September 2020: bug fixes + support recurrent policy.

Introduction

This repository contains code to train baseline ppo agent in Procgen implemented with Pytorch.

This implementation is inspired to accelerate the research in procgen environment. It aims to reproduce the result in Procgen paper. Code is designed to satisfy both readability and productivity. I tried to match the code as close as possible to OpenAI baselines's while following the coding style from ikostrikov's.

There were several key points to watch out for procgen, which differ from the general RL implementations

  • Xavier uniform initialization was used for conv layers rather than orthogonal initialization.
  • Do not use observation normalization
  • Gradient accumulation to handle large mini-batch size.

Training logs for starpilot can be found on logs/procgen/starpilot.

Requirements

  • python>=3.6
  • torch 1.3
  • procgen
  • pyyaml

Train

Use train.py to train the agent in procgen environment. It has the following arguments:

  • --exp_name: ID to designate your expriment.s
  • --env_name: Name of the Procgen environment.
  • --start_level: Start level for for environment.
  • --num_levels: Number of training levels for environment.
  • --distribution_mode: Mode of your environ
  • --param_name: Configurations name for your training. By default, the training loads hyperparameters from config.yml/procgen/param_name.
  • --num_timesteps: Number of total timesteps to train your agent.

After you start training your agent, log and parameters are automatically stored in logs/procgen/env-name/exp-name/

Try it out

Sample efficiency on easy environments

python train.py --exp_name easy-run-all --env_name ENV_NAME --param_name easy --num_levels 0 --distribution_mode easy --num_timesteps 25000000

Sample efficiency on hard environments

python train.py --exp_name hard-run-all --env_name ENV_NAME --param_name hard --num_levels 0 --distribution_mode hard --num_timesteps 200000000

Generalization on easy environments

python train.py --exp_name easy-run-200 --env_name ENV_NAME --param_name easy-200 --num_levels 200 --distribution_mode easy --num_timesteps 25000000

Generalization on hard environments

python train.py --exp_name hard-run-500 --env_name ENV_NAME --param_name hard-500 --num_levels 500 --distribution_mode hard --num_timesteps 200000000

If your GPU device could handle larger memory than 5GB, increase the mini-batch size to facilitate the trianing.

TODO

  • Implement Data Augmentation from RAD.
  • Create evaluation code to measure the test performance.

References

[1] PPO: Human-level control through deep reinforcement learning
[2] GAE: High-Dimensional Continuous Control Using Generalized Advantage Estimation
[3] IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
[4] Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
[5] Leveraging Procedural Generation to Benchmark Reinforcement Learning

train-procgen-pytorch's People

Contributors

joonleesky avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.