vikrams169 / using-rl-in-cartpole-v0 Goto Github PK

View Code? Open in Web Editor NEW

This repository displays the use of Reinforcement Learning, specifically QLearning, REINFORCE, and Actor Critic (A2C) methods to play CartPole-v0 of OpenAI Gym.

Python 100.00%

using-rl-in-cartpole-v0's Introduction

Using-RL-in-CartPole-v1

This repository displays the use of Reinforcement Learning, specifically QLearning, REINFORCE, and Advantage Actor Critic (A2C) methods to play CartPole-v1 of OpenAI Gym.

The Cart Lake environment can be better explained or reviwed by going to the souce code here.
In this environment, there exists a pole on a frictionless wire/line, and the goal is to keep it moving without collapsing for as long as possible. The reward for standing each timestep is +1, and if the pole moves more than 15 degrees from the vertical, the episode ends (so basically no negative rewards). There are only two possible actions that are moving the point on the pole on the wire/line right or left, every timestep.
This environment has been solved with the objective of reaching maximum reward (thus reaching the final goal), and has been done so, by using three deep reinforcement learning techniques (all use a neural network function approximator having same architecture, mapping form state to action/policy), each trained on 5,000 episodes.
To better play this environment, there are three deep reinforcement learning techniques used, and compared:

1. Deep QLearning Method

Using experience replay with bootstrapping every timestep.
The average total rewards and episode lengths look like:

2. REINFORCE Method

Uses a policy gradient technique with every visit monte carlo sampling at the end of each episode.
The average rewards and episode lengths look like:

3. Advantage Actor-Critic (A2C) Method

A single network architecture mapping to both value and policy, to obtain advantages to use instead of returns in a policy gradient and Qlearning update.
The average rewards and episode lengths look like:
As it can be seen, though Deep QLearning and REINFORCE methods give similar results (not always but true in this case), actor critic methods can do much better, in this case, almost twice (as written in the paper on A3C by Google DeepMind)!

Recommend Projects

vikrams169 / using-rl-in-cartpole-v0 Goto Github PK

using-rl-in-cartpole-v0's Introduction

Using-RL-in-CartPole-v1

1. Deep QLearning Method

2. REINFORCE Method

3. Advantage Actor-Critic (A2C) Method

using-rl-in-cartpole-v0's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent