Giter VIP home page Giter VIP logo

gdrl's Introduction

Grokking Deep Reinforcement Learning

Note: At the moment, only running the code from the docker container (below) is supported. Docker allows for creating a single environment that is more likely to work on all systems. Basically, I install and configure all packages for you, except docker itself, and you just run the code on a tested environment.

To install docker, I recommend a web search for "installing docker on <your os here>". For running the code on a GPU, you have to additionally install nvidia-docker. NVIDIA Docker allows for using a host's GPUs inside docker containers. After you have docker (and nvidia-docker if using a GPU) installed, follow the three steps below.

Running the code

  1. Clone this repo: git clone --depth 1 https://github.com/mimoralea/gdrl.git && cd gdrl
  2. Pull the gdrl image with: docker pull mimoralea/gdrl:v0.12
  3. Spin up a container: docker run -it --rm -p 8888:8888 -v "$PWD"/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.12 (remember to use nvidia-docker if you are using a GPU.)
  4. Open a browser and go to the URL shown in the terminal (likely to be: http://localhost:8888). The password is: gdrl

About the book

Book's website

https://www.manning.com/books/grokking-deep-reinforcement-learning

Table of content

  1. Introduction to deep reinforcement learning
  2. Mathematical foundations of reinforcement learning
  3. Balancing immediate and long-term goals
  4. Balancing the gathering and utilization of information
  5. Evaluating agents' behaviors
  6. Improving agents' behaviors
  7. Achieving goals more effectively and efficiently
  8. Introduction to value-based deep reinforcement learning
  9. More stable value-based methods
  10. Sample-efficient value-based methods
  11. Introduction to policy-based deep reinforcement learning
  12. Parallelizing policy-based methods
  13. Deterministic policy gradient methods
  14. Conservative policy optimization methods
  15. Towards artificial general intelligence

Detailed table of content

1. Introduction to deep reinforcement learning

2. Mathematical foundations of reinforcement learning

  • (Livebook)
  • (Notebook)
    • Implementations of several MDPs:
      • Bandit Walk
      • Bandit Slippery Walk
      • Slippery Walk Three
      • Random Walk
      • Russell and Norvig's Gridworld from AIMA
      • FrozenLake
      • FrozenLake8x8

3. Balancing immediate and long-term goals

  • (Livebook)
  • (Notebook)
    • Implementations of methods for finding optimal policies:
      • Policy Evaluation
      • Policy Improvement
      • Policy Iteration
      • Value Iteration

4. Balancing the gathering and utilization of information

  • (Livebook)
  • (Notebook)
    • Implementations of exploration strategies for bandit problems:
      • Random
      • Greedy
      • E-greedy
      • E-greedy with linearly decaying epsilon
      • E-greedy with exponentially decaying epsilon
      • Optimistic initialization
      • SoftMax
      • Upper Confidence Bound
      • Bayesian

5. Evaluating agents' behaviors

  • (Livebook)
  • (Notebook)
    • Implementation of algorithms that solve the prediction problem (policy estimation):
      • On-policy first-visit Monte-Carlo prediction
      • On-policy every-visit Monte-Carlo prediction
      • Temporal-Difference prediction (TD)
      • n-step Temporal-Difference prediction (n-step TD)
      • TD(λ)

6. Improving agents' behaviors

  • (Livebook)
  • (Notebook)
    • Implementation of algorithms that solve the control problem (policy improvement):
      • On-policy first-visit Monte-Carlo control
      • On-policy every-visit Monte-Carlo control
      • On-policy TD control: SARSA
      • Off-policy TD control: Q-Learning
      • Double Q-Learning

7. Achieving goals more effectively and efficiently

  • (Livebook)
  • (Notebook)
    • Implementation of more effective and efficient reinforcement learning algorithms:
      • SARSA(λ) with replacing traces
      • SARSA(λ) with accumulating traces
      • Q(λ) with replacing traces
      • Q(λ) with accumulating traces
      • Dyna-Q
      • Trajectory Sampling

8. Introduction to value-based deep reinforcement learning

  • (Livebook)
  • (Notebook)
    • Implementation of a value-based deep reinforcement learning baseline:
      • Neural Fitted Q-iteration (NFQ)

9. More stable value-based methods

  • (Livebook)
  • (Notebook)
    • Implementation of "classic" value-based deep reinforcement learning methods:
      • Deep Q-Networks (DQN)
      • Double Deep Q-Networks (DDQN)

10. Sample-efficient value-based methods

  • (Livebook)
  • (Notebook)
    • Implementation of main improvements for value-based deep reinforcement learning methods:
      • Dueling Deep Q-Networks (Dueling DQN)
      • Prioritized Experience Replay (PER)

11. Introduction to policy-based deep reinforcement learning

  • (No Livebook yet)
  • (No Notebook yet)
    • Implementation of classic policy-based deep reinforcement learning methods:
      • Policy Gradients without value function and Monte-Carlo returns (REINFORCE)
      • Policy Gradients with value function baseline trained with Monte-Carlo returns (VPG)

12. Parallelizing policy-based methods

  • (No Livebook yet)
  • (No Notebook yet)
    • Implementation of main improvements to policy-based deep reinforcement learning methods:
      • Asynchronous Advantage Actor-Critic (A3C)
      • Generalized Advantage Estimation (GAE)
      • [Synchronous] Advantage Actor-Critic (A2C)

13. Deterministic policy gradient methods

  • (No Livebook yet)
  • (No Notebook yet)
    • Implementation of deterministic policy gradient deep reinforcement learning methods:
      • Deep Deterministic Policy Gradient (DDPG)
      • Twin Delayed Deep Deterministic Policy Gradient (TD3)

14. Conservative policy optimization methods

  • (No Livebook yet)
  • (No Notebook yet)
    • Implementation of conservative policy gradient deep reinforcement learning methods:
      • Trust Region Policy Optimization (TRPO)
      • Proximal Policy Optimization (PPO)

15. Towards artificial general intelligence

  • (No Livebook yet)
  • (No Notebook)

gdrl's People

Contributors

mimoralea avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.