Giter VIP home page Giter VIP logo

practical_rl's Introduction

Practical_RL

An open course on reinforcement learning in the wild. Taught on-campus at HSE and YSDA and maintained to be friendly to online students (both english and russian).

Manifesto:

  • Optimize for the curious. For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.
  • Practicality first. Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that makes you to “feel” it on a practical problem.
  • Git-course. Know a way to make the course better? Noticed a typo in a formula? Found a useful link? Made the code more readable? Made a version for alternative framework? You're awesome! Pull-request it!

Github contributors

Course info

Additional materials

Syllabus

The syllabus is approximate: the lectures may occur in a slightly different order and some topics may end up taking two weeks.

  • week01_intro Introduction

    • Lecture: RL problems around us. Decision processes. Stochastic optimization, Crossentropy method. Parameter space search vs action space search.
    • Seminar: Welcome into openai gym. Tabular CEM for Taxi-v0, deep CEM for box2d environments.
    • Homework description - see week1/README.md.
  • week02_value_based Value-based methods

    • Lecture: Discounted reward MDP. Value-based approach. Value iteration. Policy iteration. Discounted reward fails.
    • Seminar: Value iteration.
    • Homework description - see week2/README.md.
  • week03_model_free Model-free reinforcement learning

    • Lecture: Q-learning. SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. TD(Lambda).
    • Seminar: Qlearning Vs SARSA Vs Expected Value SARSA
    • Homework description - see week3/README.md.
  • recap_deep_learning - deep learning recap

    • Lecture: Deep learning 101
    • Seminar: Intro to pytorch/tensorflow, simple image classification with convnets
  • week04_approx_rl Approximate (deep) RL

    • Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick; experience replay, target networks, double/dueling/bootstrap DQN, etc.
    • Seminar: Approximate Q-learning with experience replay. (CartPole, Atari)
  • week05_explore Exploration

    • Lecture: Contextual bandits. Thompson Sampling, UCB, bayesian UCB. Exploration in model-based RL, MCTS. "Deep" heuristics for exploration.
    • Seminar: bayesian exploration for contextual bandits. UCB for MCTS.
  • week06_policy_based Policy Gradient methods

    • Lecture: Motivation for policy-based, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance reduction(baseline), advantage actor-critic (incl. GAE)
    • Seminar: REINFORCE, advantage actor-critic
  • week07_seq2seq Reinforcement Learning for Sequence Models

    • Lecture: Problems with sequential data. Recurrent neural networks. Backprop through time. Vanishing & exploding gradients. LSTM, GRU. Gradient clipping
    • Seminar: character-level RNN language model
  • week08_pomdp Partially Observed MDP

    • Lecture: POMDP intro. POMDP learning (agents with memory). POMDP planning (POMCP, etc)
    • Seminar: Deep kung-fu & doom with recurrent A3C and DRQN
  • week09_policy_II Advanced policy-based methods

    • Lecture: Trust region policy optimization. NPO/PPO. Deterministic policy gradient. DDPG
    • Seminar: Approximate TRPO for simple robot control.
  • week10_planning Model-based RL & Co

    • Lecture: Model-Based RL, Planning in General, Imitation Learning and Inverse Reinforcement Learning
    • Seminar: MCTS for toy tasks
  • yet_another_week Inverse RL and Imitation Learning

    • All that cool RL stuff that you won't learn from this course :)

Course staff

Course materials and teaching by: [unordered]

Contributions

practical_rl's People

Contributors

ai-ahmed avatar alexeyhorkin avatar alien-kz avatar anton-br avatar arogozhnikov avatar dmittov avatar dniku avatar fritz449 avatar guitaricet avatar hikjik avatar jheuristic avatar justheuristic avatar kharitonov-ivan avatar kirili4ik avatar kventinel avatar laktionov avatar mknbv avatar nickveld avatar nkdhny avatar omrigan avatar q0o0p avatar qwasser avatar razoralm avatar re9ulus avatar recycletechno avatar scitator avatar tigerneil avatar vovcick avatar yhn112 avatar zshrav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

practical_rl's Issues

Many video are removed

Hi,

Thank you for the awesome course!!!! Unfortunately many of the videos show the following:

Nothing found
The owner either removed the files or restricted access, or there's a typo in the link.

Thank you for fixing it,
-Simon

Bugs in Coursera Grading

There are various bugs / problems in the grading mechanics. Individual descriptions of the problems can be found in the course forums.

I believe it would be very beneficial to give a quick response in the forums, acknowledging the bugs, and giving people a rough idea when they can expect a fix. Otherwise students will constantly be wondering if it's a problem in the grading or a problem in their solution.

Thanks a lot : )

docker notebook raise Permission Error

I use the docker to set up environment , when i go into notebook it raised error below:

Server error: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tornado/web.py", line 1592, in _execute result = yield result
File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1133, in run value = future.result()
File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result raise self._exception
File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 326, in wrapper yielded = next(result)
File "/usr/local/lib/python3.5/dist-packages/notebook/services/contents/handlers.py", line 112, in get path=path, type=type, format=format, content=content,
File "/usr/local/lib/python3.5/dist-packages/notebook/services/contents/filemanager.py", line 431, in get model = self._dir_model(path, content=content)
File "/usr/local/lib/python3.5/dist-packages/notebook/services/contents/filemanager.py", line 313, in _dir_model for name in os.listdir(os_dir):
PermissionError: [Errno 13] Permission denied: '/notebooks'

`week3_model_free/sumbit.py` has bugged `submit_qlearning1` and `submit_qlearning1`

The functions submit_qlearning1 and submit_qlearning1 are broken. They keep returning an error in the submission. The workaround is the following:

  • After the first part of the assignment substitute
    from submit import submit_qlearning1
    submit_qlearning1(rewards, <YOUR-EMAIL>, <TOKEN>)
    
    with
    rewards1 = rewards.copy()
    
  • After second part of the assignment, substitute
    from submit import submit_qlearning2
    submit_qlearning2(rewards,  <YOUR-EMAIL>, <TOKEN>)
    
    with
    rewards2 = rewards.copy()
    
  • In the end, run
    from submit import submit_qlearning_all
    submit_qlearning_all(rewards1, rewards2, <YOUR-EMAIL>, <TOKEN>)
    
    because the submit_qlearning_all seems to not be bugged.

This workaround was suggested to me by Jay Glascoe on the Coursera forum.

Installing dependencies

Any issues concerning installation can just as well be sent here.

We assume that you have basic data science toolkit (sklearn, numpy/scipy/pandas). Basically whatever comes with default anaconda distribution.

The majority of course assignments assignments use OpenAI gym

If you don't/can't install that (e.g. you use windows and installation is tricky), there's a docker container contributed to the course.

Deep learning

You will also need one of the following three stacks:

The frameworks can be easily installed on Mac OS and Linux. Windows installation is, a bit tougher, so if you don't feel like it, try using docker (e.g. kitematic gui or console on windows).

Install docker

Clone docker repo: https://hub.docker.com/r/justheuristic/practical_rl
(or just docker pull justheuristic/practical_rl if you have docker shell)

If you want to build it yourself, use these instructions.

If you run into any trouble, feel free to post here, even if it's like "i don't know what the hell all these letters mean!!!".

Week2 MCTS Speed-Up Question

I was looking into the gitter channel that doesn't seem to be monitored anymore, so I hope asking a question here is OK. Otherwise please feel free to close this issue.

I'm currently working on Week 2 with MCTS and have been able to get it to run with basic UCB1. However as expected, running it on most games, the rollout takes too long for it to be meaningful, thus MCTS won't really progress without a really large time budget.

In the assignment it was suggested to use a classifier for action selection, but doesn't that circumvent the point of using the MCTS in the first place by playing out random selections?

Seq2seq

int8 in as_matrix breaks everything if size of tokens is more than int8 size!

Broken links

I used the following two commands to identify broken links. markdown-link-check is https://github.com/tcort/markdown-link-check

find ./Practical_RL/ -type f -name '*.ipynb' -exec jupyter nbconvert --to markdown {} \;
find ./Practical_RL/ -name \*.md -exec markdown-link-check -q {} \; > link_check.txt

This is the list of broken links:

FILE: ./Practical_RL//week4_approx_rl/seminar_pytorch.md

FILE: ./Practical_RL//week4_approx_rl/homework_lasagne.md

FILE: ./Practical_RL//week4_approx_rl/README.md

FILE: ./Practical_RL//week4_approx_rl/seminar_tf.md

FILE: ./Practical_RL//week7_pomdp/practice_tensorflow.md

FILE: ./Practical_RL//week7_pomdp/homework_common_part2.md

FILE: ./Practical_RL//week8_scst/bonus.md

FILE: ./Practical_RL//week8_scst/README.md

FILE: ./Practical_RL//week9_policy_II/seminar_TRPO_pytorch.md

FILE: ./Practical_RL//week9_policy_II/seminar_TRPO_tensorflow.md

FILE: ./Practical_RL//week9_policy_II/seminar_TRPO_theano.md

FILE: ./Practical_RL//week2_value_based/README.md

FILE: ./Practical_RL//week2_value_based/seminar2_MCTS.md

FILE: ./Practical_RL//week2_value_based/seminar1_VI.md

FILE: ./Practical_RL//yet_another_week/README.md

FILE: ./Practical_RL//week6_policy_based/homework_tensorflow.md

FILE: ./Practical_RL//week6_policy_based/README.md

FILE: ./Practical_RL//week3_model_free/README.md

FILE: ./Practical_RL//week3_model_free/homework.md

Related issues: #78.

Week 4 Seminar: Does not improve

Sorry for creating another issue, but I'm having troubles with the basic problem in the Week 4 Seminar, specifically doing it in Tensorflow. It's a nicely made notebook and relatively straightforward to fill out, yet when I'm actually trying to train, the mean reward hovers around 15 no matter how long I train. I tried increasing the epsilon and adding more nodes/layers to the network to no avail. Judging from the videos, it seems like the network isn't learning anything. Are there any additional hints someone could give? I definitely appreciate the note in the notebook about deep rl being f*** ed up lol

Issue with overflow in practice_tensorflow.ipynb

I don't have the best grasp of this stuff so forgive me if this is just a misunderstanding.

So in this notebook (https://github.com/yandexdataschool/Practical_RL/blob/master/week4_%5Brecap%5D_deep_learning/practice_tensorflow.ipynb) in the very first few code cells we write a function and then use TF to speed it up. However as written, it seems like the cumulative sum of squares up to 10**8 will result in an integer quite a bit larger than can be encoded in int64. As suck both TF and naive numpy will give the wrong answer and the speed is misleading,

Maybe the operation should be changed to something that results in less large numbers. This would retain the large number of iterations (10**8 for example) to show the dramatic speedup. Alternately, 10**6 iterations is OK with sum of squares.

week6.5 review & fix

Could you please

  • Look through the edited week6.5 notebooks and see if there's anything missing
  • Assign points to homework assignment parts in the readme

dns_server_failure when trying to open jupyter URL in browser

I run docker on my Ubuntu like this:
$ sudo docker run -it -v $(pwd):/notebooks -p 8888:8888 justheuristic/practical_rl sh ../run_jupyter.sh

In output I saw this suggestion:
Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://9cd070fb79bf:8888/?token=ad1a5a0aab43efb47a9a805388fcf508d0b5f84a16e4542b&token=ad1a5a0aab43efb47a9a805388fcf508d0b5f84a16e4542b

My browser reacts:

Network Error (dns_server_failure)
Your request could not be processed because an error occurred contacting the DNS server.
The DNS server may be temporarily unavailable, or there could be a network problem.

Did anyone face this problem?

week1 gym leaderboard - wait for deadline and remove

As reported by many HSE students, this line in week01 homework is no longer valid:

Please upload the results to openai gym and send links to all submissions in the e-mail

We can't change it right away cuz it would cause lots of merge conflicts. The solution is to wait till the 6th of february (after deadline) and fix it then.

Emails vs anytask?

There are several mentions of emails on the course page. I think those shall be removed as anytask invitations are available now.

Do you provide solutions?

Hi, thanks for developing such wonderful course for beginners, I have finished some tasks, I wonder do you provide solutions to these tasks so that I can check my homework? Thank you!

CartPole-v0

I suggest to add into the HW that sometimes there is a limitation of max reward = 200.
To fix that instead of
env = gym.make("CartPole-v0")
should be like that
env = gym.make("CartPole-v0").env

I guess, it has been said orally somewhere, but without this info it took additional hour to find out what's wrong.

week4 dqn overhaul

I'm taking this class on coursera and stuck on the dqn assignment for breakout. I was able to implement the code (though I don't know if it's correct or not). But the training takes super long time and does not seem to converge. In the description, it seems like it should reach a mean reward over 10 around 10k steps, however even after 100k steps, the mean reward still fluctuates a lot, sometimes around 10, sometimes around 0.

Here is a screenshot of my training.
image

could you provide a rough sense of how these two figures should be like?

we also raised this question in the class forum:
https://www.coursera.org/learn/practical-rl/discussions/weeks/4/threads/yzC8W14LEei7pAoHCSt0dA/replies/78XCWF4REeiosBJ671zJCg/comments/-qc_Xl9_EeiYTgr_SihX-A

It would be great if someone could help us out. Thanks

issues with gym

If there's something wrong with openai gym and chat didn't resolve it in 10 minutes, feel free to complain here.

joblib+gym

joblib generates identical sessions if games are run in multiple threads. Simple notebook that reproduce the problem on my machine is attached.

Notebook on gist

Multiple Dockerfiles

There is a Dockerfile in the root of the repo

Also there is a separate Dockerfile in a folder docker/

Which one is right? Is it possible to run the repo in mybinder and everware with a correct Dockerfile?

get token but cannot run in my browser

PS C:\Users\Nathaniel> docker run 491528bcff41
Unable to find image '491528bcff41:latest' locally
C:\Program Files\Docker\Docker\Resources\bin\docker.exe: Error response from daemon: pull access denied for 491528bcff41, repository does not exist or may require 'docker login'.
See 'C:\Program Files\Docker\Docker\Resources\bin\docker.exe run --help'.
PS C:\Users\Nathaniel> docker run justheuristic/practical_rl:latest
[I 13:27:41.201 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[I 13:27:41.379 NotebookApp] Serving notebooks from local directory: /notebooks
[I 13:27:41.379 NotebookApp] 0 active kernels
[I 13:27:41.379 NotebookApp] The Jupyter Notebook is running at:
[I 13:27:41.379 NotebookApp] http://6084fd93b11e:8888/?token=81a441a15748a2f85280d14a39bcd237b1e16c6ad0ff6972
[I 13:27:41.379 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 13:27:41.380 NotebookApp]

Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
    http://6084fd93b11e:8888/?token=81a441a15748a2f85280d14a39bcd237b1e16c6ad0ff6972&token=81a441a15748a2f85280d14a39bcd237b1e16c6ad0ff6972

week1 - MountainCar-v0

I started doing this class this week and I really like how well everything was thought out for self-learning. I really like the notebooks. I feel like I got a good grasp on Crossentropy, completing the notebook. However, I then tried to apply the learnings to MountainCar-v0, but have been quite stuck. It seems like the initial policy intialization never allows the agent to get the car once onto the mountain, thus our elite states do not contain any meaningful progress we can learn from.

Are there any hints that I could use in how to overcome this challenge?

Kernel silent death on remote machie (xvfb?)

On remote machine with no display the following actions causes some problem in python output: the kernel dies every time on matplotlib.pyplot or tqdm called.
Also, env.render() from gym also causes kernel death.

The algorithm (copied from week4 seminar):

import os
if type(os.environ.get("DISPLAY")) is not str or len(os.environ.get("DISPLAY"))==0:
    !bash ../xvfb start
    %env DISPLAY=:1

import gym
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# for now kernel is still alive. plt. commands work fine

env = gym.make("CartPole-v0")
env.reset()
n_actions = env.action_space.n
state_dim = env.observation_space.shape

plt.imshow(env.render("rgb_array", close=True))

Kernel dies on the last string. After restart matplotlib also causes kernel death. Restarting the whole jupyter doesn't help either. Creating new env (in conda) - no use.
The only way I found yet is uninstalling Anaconda and installing all needed packages again.

That's what definitely should not happen to any remote machine.

Week 3 homework, Part I: On-policy learning and SARSA

I face an error from pandas when ever I run the cell containing
`from IPython.display import clear_output
from pandas import ewma, Series

moving_average = lambda ts, span=100: ewma(Series(ts), min_periods=span//10, span=span).values

rewards_sarsa, rewards_ql = [], []

for i in range(5000):
rewards_sarsa.append(play_and_train(env, agent_sarsa))
rewards_ql.append(play_and_train(env, agent_ql))
#Note: agent.epsilon stays constant

if i %100 ==0:
    clear_output(True)
    print('EVSARSA mean reward =', np.mean(rewards_sarsa[-100:]))
    print('QLEARNING mean reward =', np.mean(rewards_ql[-100:]))
    plt.title("epsilon = %s" % agent_ql.epsilon)
    plt.plot(moving_average(rewards_sarsa), label='ev_sarsa')
    plt.plot(moving_average(rewards_ql), label='qlearning')
    plt.grid()
    plt.legend()
    plt.ylim(-500, 0)
    plt.show()` 

the error message is :
`ImportError Traceback (most recent call last)
in ()
1 from IPython.display import clear_output
----> 2 from pandas import ewma, Series
3 # from pandas import Dataframe.ewm as ewm
4 moving_average = lambda ts, span=100: ewma(Series(ts), min_periods=span//10, span=span).values
5

ImportError: cannot import name 'ewma'
`

I tried googling it but I can't find a fix for it.

using python 3.5 and pandas 0.23.0 on ubuntu 16.04

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.