yandexdataschool / practical_rl Goto Github PK

A course in reinforcement learning in the wild

License: The Unlicense

Jupyter Notebook 76.03% Python 23.59% Shell 0.29% Dockerfile 0.09%

reinforcement-learning course-materials deep-learning deep-reinforcement-learning git-course mooc tensorflow pytorch pytorch-tutorials keras hacktoberfest

practical_rl's Introduction

Practical_RL

An open course on reinforcement learning in the wild. Taught on-campus at HSE and YSDA and maintained to be friendly to online students (both english and russian).

Manifesto:

Optimize for the curious. For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.
Practicality first. Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that makes you to “feel” it on a practical problem.
Git-course. Know a way to make the course better? Noticed a typo in a formula? Found a useful link? Made the code more readable? Made a version for alternative framework? You're awesome! Pull-request it!

Course info

FAQ: About the course, Technical issues thread, Lecture Slides, Online Student Survival Guide
Anonymous feedback form.
Virtual course environment:
- Google Colab - set open -> github -> yandexdataschool/pracical_rl -> {branch name} and select any notebook you want.
- Installing dependencies on your local machine (recommended).
- Alternative: Azure Notebooks.

Additional materials

RL reading group

Syllabus

The syllabus is approximate: the lectures may occur in a slightly different order and some topics may end up taking two weeks.

week01_intro Introduction
- Lecture: RL problems around us. Decision processes. Stochastic optimization, Crossentropy method. Parameter space search vs action space search.
- Seminar: Welcome into openai gym. Tabular CEM for Taxi-v0, deep CEM for box2d environments.
- Homework description - see week1/README.md.
week02_value_based Value-based methods
- Lecture: Discounted reward MDP. Value-based approach. Value iteration. Policy iteration. Discounted reward fails.
- Seminar: Value iteration.
- Homework description - see week2/README.md.
week03_model_free Model-free reinforcement learning
- Lecture: Q-learning. SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. TD(Lambda).
- Seminar: Qlearning Vs SARSA Vs Expected Value SARSA
- Homework description - see week3/README.md.
recap_deep_learning - deep learning recap
- Lecture: Deep learning 101
- Seminar: Intro to pytorch/tensorflow, simple image classification with convnets
week04_approx_rl Approximate (deep) RL
- Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick; experience replay, target networks, double/dueling/bootstrap DQN, etc.
- Seminar: Approximate Q-learning with experience replay. (CartPole, Atari)
week05_explore Exploration
- Lecture: Contextual bandits. Thompson Sampling, UCB, bayesian UCB. Exploration in model-based RL, MCTS. "Deep" heuristics for exploration.
- Seminar: bayesian exploration for contextual bandits. UCB for MCTS.
week06_policy_based Policy Gradient methods
- Lecture: Motivation for policy-based, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance reduction(baseline), advantage actor-critic (incl. GAE)
- Seminar: REINFORCE, advantage actor-critic
week07_seq2seq Reinforcement Learning for Sequence Models
- Lecture: Problems with sequential data. Recurrent neural networks. Backprop through time. Vanishing & exploding gradients. LSTM, GRU. Gradient clipping
- Seminar: character-level RNN language model
week08_pomdp Partially Observed MDP
- Lecture: POMDP intro. POMDP learning (agents with memory). POMDP planning (POMCP, etc)
- Seminar: Deep kung-fu & doom with recurrent A3C and DRQN
week09_policy_II Advanced policy-based methods
- Lecture: Trust region policy optimization. NPO/PPO. Deterministic policy gradient. DDPG
- Seminar: Approximate TRPO for simple robot control.
week10_planning Model-based RL & Co
- Lecture: Model-Based RL, Planning in General, Imitation Learning and Inverse Reinforcement Learning
- Seminar: MCTS for toy tasks
yet_another_week Inverse RL and Imitation Learning
- All that cool RL stuff that you won't learn from this course :)

Course staff

Course materials and teaching by: [unordered]

Pavel Shvechikov - lectures, seminars, hw checkups, reading group
Nikita Putintsev - seminars, hw checkups, organizing our hot mess
Alexander Fritsler - lectures, seminars, hw checkups
Oleg Vasilev - seminars, hw checkups, technical support
Dmitry Nikulin - tons of fixes, far and wide
Mikhail Konobeev - seminars, hw checkups
Ivan Kharitonov - seminars, hw checkups
Ravil Khisamov - seminars, hw checkups
Anna Klepova - hw checkups
Fedor Ratnikov - admin stuff

Contributions

Using pictures from Berkeley AI course
Massively refering to CS294
Several tensorflow assignments by Scitator
A lot of fixes from arogozhnikov
Other awesome people: see github contributors
Alexey Umnov helped us a lot during spring2018

practical_rl's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger libfun eshestakova ferrine amoliu iamsile sunjieee jeffzhengye allensmile jxlin pc27149 wwxfromtju benjamesbabala golovanovsrg dmitrykmsk ayanina illarionpetrov gezort leezqcst yatsurama dosoff scitator nellaivijay theotheo lukaszobara mathkann ml-lab matiskay kmkolasinski paojianghu knhuq ml-ai-nlp-ir kovarsky mlee156 kastnerkyle johndpope oarriaga danmartinez78 sakshamb44 vybhavk johannah denyslazarenko vdt spandyie ufarufalab dim25 coll3ctions v-italy george-wu509 vskabelkin valerasarapas dstuck c-mrf alexsisu mlenthusiast lecjacks bydmitry pshishkin sbairishal just4art guillermogsjc bihaqo saulugo neychev mihdyachkov g-e0s iwitaly dhamaniasad tswr bat1 shanigam brqda nkdhny andrew-angrew mmottahedi hvedr rubenszimbres ajnovice andandandand kv91 harshulsoni anirband prashanth-nus dmelli lisiyuan656 anuragreddygv323 ejake ghostintheshellarise hariom-yadaw shmuma mjk276 evaristoc matthewwilfred denmoroz medic20 vincentsohjh tony32769 ghellstern arnabgho aboustati

practical_rl's Issues

Many video are removed

Hi,

Thank you for the awesome course!!!! Unfortunately many of the videos show the following:

Nothing found
The owner either removed the files or restricted access, or there's a typo in the link.

Thank you for fixing it,
-Simon

Bugs in Coursera Grading

There are various bugs / problems in the grading mechanics. Individual descriptions of the problems can be found in the course forums.

I believe it would be very beneficial to give a quick response in the forums, acknowledging the bugs, and giving people a rough idea when they can expect a fix. Otherwise students will constantly be wondering if it's a problem in the grading or a problem in their solution.

Thanks a lot : )

correct links to Sutton's book

It was moved

correct link is
http://incompleteideas.net/sutton/book/ebook/the-book.html

Bad error message for an expired Coursera token

See https://www.coursera.org/learn/practical-rl/discussions/weeks/1/threads/xwiluv0AEeiI8gouSncDyg. The error should be handled explicitly and a good message should be presented to the user.

Line in question: https://github.com/yandexdataschool/Practical_RL/blob/coursera/grading.py#L34

Try PyVirtualDisplay instead of xvfb for prettier first cell

https://pyvirtualdisplay.readthedocs.io/en/latest/

docker notebook raise Permission Error

I use the docker to set up environment , when i go into notebook it raised error below:

Server error: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tornado/web.py", line 1592, in _execute result = yield result
File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 1133, in run value = future.result()
File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result raise self._exception
File "/usr/local/lib/python3.5/dist-packages/tornado/gen.py", line 326, in wrapper yielded = next(result)
File "/usr/local/lib/python3.5/dist-packages/notebook/services/contents/handlers.py", line 112, in get path=path, type=type, format=format, content=content,
File "/usr/local/lib/python3.5/dist-packages/notebook/services/contents/filemanager.py", line 431, in get model = self._dir_model(path, content=content)
File "/usr/local/lib/python3.5/dist-packages/notebook/services/contents/filemanager.py", line 313, in _dir_model for name in os.listdir(os_dir):
PermissionError: [Errno 13] Permission denied: '/notebooks'

`week3_model_free/sumbit.py` has bugged `submit_qlearning1` and `submit_qlearning1`

The functions submit_qlearning1 and submit_qlearning1 are broken. They keep returning an error in the submission. The workaround is the following:

After the first part of the assignment substitute

from submit import submit_qlearning1
submit_qlearning1(rewards, <YOUR-EMAIL>, <TOKEN>)

with

rewards1 = rewards.copy()

After second part of the assignment, substitute

from submit import submit_qlearning2
submit_qlearning2(rewards,  <YOUR-EMAIL>, <TOKEN>)

with

rewards2 = rewards.copy()

In the end, run

from submit import submit_qlearning_all
submit_qlearning_all(rewards1, rewards2, <YOUR-EMAIL>, <TOKEN>)

because the submit_qlearning_all seems to not be bugged.

This workaround was suggested to me by Jay Glascoe on the Coursera forum.

Link to Q-Learning Agent in week3 seminar_qlearning is broken

There is a link to http://inst.eecs.berkeley.edu/~cs188/sp09/pacman.html, but it redirects to 404.

Week 1: Natural Evolutional Strategy training best practice

Using NES algorithm to train Pong with only one-core CPU, it took me over 10 hours and the performance is very poor. Are there any best practice for training NES on Pong?

Installing dependencies

Any issues concerning installation can just as well be sent here.

We assume that you have basic data science toolkit (sklearn, numpy/scipy/pandas). Basically whatever comes with default anaconda distribution.

Anaconda: https://www.continuum.io/downloads (or simply use python with numpy/sklearn)

The majority of course assignments assignments use OpenAI gym

Installing gym: https://github.com/openai/gym#installation

If you don't/can't install that (e.g. you use windows and installation is tricky), there's a docker container contributed to the course.

Deep learning

You will also need one of the following three stacks:

PyTorch:
- Installing on Linux / Mac OS: http://pytorch.org/
- Installing on windows: https://anaconda.org/peterjc123/pytorch (CPU only)
TensorFlow:
- pip install tensorflow
- pip install keras
- Detailed guide
Theano:
- only theano and lasagne - pick bleeding edge version
- all 3 of them

The frameworks can be easily installed on Mac OS and Linux. Windows installation is, a bit tougher, so if you don't feel like it, try using docker (e.g. kitematic gui or console on windows).

Install docker

Simple interface for docker: kitematic (all platforms)
Guide for windows, linux, or macOS.

Clone docker repo: https://hub.docker.com/r/justheuristic/practical_rl
(or just docker pull justheuristic/practical_rl if you have docker shell)

If you want to build it yourself, use these instructions.

If you run into any trouble, feel free to post here, even if it's like "i don't know what the hell all these letters mean!!!".

week2: link to "Guide to MCTS" in README is broken

The link leads to http://www.cameronius.com/research/mcts/about/index.html, which redirects to http://www.cameronius.com, which does not seem to be relevant to MCTS.

Week2 MCTS Speed-Up Question

I was looking into the gitter channel that doesn't seem to be monitored anymore, so I hope asking a question here is OK. Otherwise please feel free to close this issue.

I'm currently working on Week 2 with MCTS and have been able to get it to run with basic UCB1. However as expected, running it on most games, the rollout takes too long for it to be meaningful, thus MCTS won't really progress without a really large time budget.

In the assignment it was suggested to use a classifier for action selection, but doesn't that circumvent the point of using the MCTS in the first place by playing out random selections?

Seq2seq

int8 in as_matrix breaks everything if size of tokens is more than int8 size!

may you share videos?

videos would be very helpful

Broken links

I used the following two commands to identify broken links. markdown-link-check is https://github.com/tcort/markdown-link-check

find ./Practical_RL/ -type f -name '*.ipynb' -exec jupyter nbconvert --to markdown {} \;
find ./Practical_RL/ -name \*.md -exec markdown-link-check -q {} \; > link_check.txt

This is the list of broken links:

FILE: ./Practical_RL//week4_approx_rl/seminar_pytorch.md

https://s14.postimg.org/uzay2q5rl/qlearning_scheme.png

FILE: ./Practical_RL//week4_approx_rl/homework_lasagne.md

https://s18.postimg.org/gbmsq6gmx/dqn_scheme.png

FILE: ./Practical_RL//week4_approx_rl/README.md

https://www.nervanasys.com/demystifying-deep-reinforcement-learning/

FILE: ./Practical_RL//week4_approx_rl/seminar_tf.md

https://s14.postimg.org/uzay2q5rl/qlearning_scheme.png

FILE: ./Practical_RL//week7_pomdp/practice_tensorflow.md

https://s7.postimg.org/4y36s2b2z/env_pool.png

FILE: ./Practical_RL//week7_pomdp/homework_common_part2.md

https://github.com/yandexdataschool/Practical_RL/blob/master/week4/Seminar4.2_conv_agent.ipynb

FILE: ./Practical_RL//week8_scst/bonus.md

https://s30.postimg.org/f8um3kt5d/google_seq2seq_attention.gif

FILE: ./Practical_RL//week8_scst/README.md

https://github.com/yandexdataschool/HSE_deeplearning/blob/master/week7/captioning_solution_ars.ipynb

FILE: ./Practical_RL//week9_policy_II/seminar_TRPO_pytorch.md

FILE: ./Practical_RL//week9_policy_II/seminar_TRPO_tensorflow.md

FILE: ./Practical_RL//week9_policy_II/seminar_TRPO_theano.md

FILE: ./Practical_RL//week2_value_based/README.md

http://incompleteideas.net/sutton/book/bookdraft2017june19.pdf

FILE: ./Practical_RL//week2_value_based/seminar2_MCTS.md

FILE: ./Practical_RL//week2_value_based/seminar1_VI.md

https://github.com/berkeleydeeprlcourse/homework/tree/master/sp17_hw/hw2

FILE: ./Practical_RL//yet_another_week/README.md

http://incompleteideas.net/sutton/book/the-book-2nd.html

FILE: ./Practical_RL//week6_policy_based/homework_tensorflow.md

https://s17.postimg.org/orswlfzcv/nnet_arch.png

FILE: ./Practical_RL//week6_policy_based/README.md

https://www.sdsj.ru/slides/Vetrov.pdf

FILE: ./Practical_RL//week3_model_free/README.md

http://ai.berkeley.edu/project

FILE: ./Practical_RL//week3_model_free/homework.md

Related issues: #78.

(Russian) Видео для Episode 3.5 - лекция по Deep Learning 101

Неделя 3.5, вы указываете в lecture slides слайды из лекции Яндекс по Deep Learning (https://yadi.sk/i/yAO2AJ3M3EKP8g)

Если ли видео по этой лекции?

Week 4 Seminar: Does not improve

Sorry for creating another issue, but I'm having troubles with the basic problem in the Week 4 Seminar, specifically doing it in Tensorflow. It's a nicely made notebook and relatively straightforward to fill out, yet when I'm actually trying to train, the mean reward hovers around 15 no matter how long I train. I tried increasing the epsilon and adding more nodes/layers to the network to no avail. Judging from the videos, it seems like the network isn't learning anything. Are there any additional hints someone could give? I definitely appreciate the note in the notebook about deep rl being f*** ed up lol

Installing dependencies on gpu2

Could someone with sudo rights install ffmpeg and avconv on Shad' gpu, please?

Bad beginning at week 3 homework

To begin with, please edit qlearning.py - just copy your implementation from the first part of this assignment.

README for week 1 refers to gym_interface.ipynb instead of seminar_gym_interface.ipynb

The repo contains https://github.com/yandexdataschool/Practical_RL/blob/master/week1_intro/seminar_gym_interface.ipynb, but https://github.com/yandexdataschool/Practical_RL/blob/master/week1_intro/README.md refers to gym_interface.ipynb.

enhancement - download associated youtube content

OSX
https://github.com/rg3/youtube-dl

brew install youtube-dl
youtube-dl https://www.youtube.com/watch\?v\=2pWv7GOvuf0
TODO - dig up every youtube link

Issue with overflow in practice_tensorflow.ipynb

I don't have the best grasp of this stuff so forgive me if this is just a misunderstanding.

So in this notebook (https://github.com/yandexdataschool/Practical_RL/blob/master/week4_%5Brecap%5D_deep_learning/practice_tensorflow.ipynb) in the very first few code cells we write a function and then use TF to speed it up. However as written, it seems like the cumulative sum of squares up to 10**8 will result in an integer quite a bit larger than can be encoded in int64. As suck both TF and naive numpy will give the wrong answer and the speed is misleading,

Maybe the operation should be changed to something that results in less large numbers. This would retain the large number of iterations (10**8 for example) to show the dramatic speedup. Alternately, 10**6 iterations is OK with sum of squares.

Issues with week2 main assignment

Even with python2 and a working display main assignment sometimes fails.

@justheuristic What issues did you meet?

week6.5 review & fix

Could you please

Look through the edited week6.5 notebooks and see if there's anything missing
Assign points to homework assignment parts in the readme

Binder to Launch VM not working

After clicking on the "launch|binder" button, I get the following error.

dns_server_failure when trying to open jupyter URL in browser

I run docker on my Ubuntu like this:
$ sudo docker run -it -v $(pwd):/notebooks -p 8888:8888 justheuristic/practical_rl sh ../run_jupyter.sh

In output I saw this suggestion:
Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://9cd070fb79bf:8888/?token=ad1a5a0aab43efb47a9a805388fcf508d0b5f84a16e4542b&token=ad1a5a0aab43efb47a9a805388fcf508d0b5f84a16e4542b

My browser reacts:

Network Error (dns_server_failure)
Your request could not be processed because an error occurred contacting the DNS server.
The DNS server may be temporarily unavailable, or there could be a network problem.

Did anyone face this problem?

week1 LunarLander swig fix

If you use old swig lib for LunarLander-v2, you may get an error. See this issue for solution:
openai/gym#100

PS.Cause it was hard to find in notebook.

week1 gym leaderboard - wait for deadline and remove

As reported by many HSE students, this line in week01 homework is no longer valid:

Please upload the results to openai gym and send links to all submissions in the e-mail

We can't change it right away cuz it would cause lots of merge conflicts. The solution is to wait till the 6th of february (after deadline) and fix it then.

Emails vs anytask?

There are several mentions of emails on the course page. I think those shall be removed as anytask invitations are available now.

Do you provide solutions?

Hi, thanks for developing such wonderful course for beginners, I have finished some tasks, I wonder do you provide solutions to these tasks so that I can check my homework? Thank you!

CartPole-v0

I suggest to add into the HW that sometimes there is a limitation of max reward = 200.
To fix that instead of
env = gym.make("CartPole-v0")
should be like that
env = gym.make("CartPole-v0").env

I guess, it has been said orally somewhere, but without this info it took additional hour to find out what's wrong.

week2, mdp viz: lightgreen is not a known color

Some software combination causes this.

Ways to solve:

choose other color, green for example
use numeric color encoding

Neural network architecture image in Week 6 Policy Based is missing

As the title says, the image is missing. I've tried various architectures now but have not been able to make it work

https://github.com/yandexdataschool/Practical_RL/blob/master/week6_policy_based/homework_tensorflow.ipynb

TF primer on Coursera branch imports missing module

https://github.com/yandexdataschool/Practical_RL/blob/coursera/week1_intro/primer/recap_tensorflow.ipynb calls from mnist import load_dataset, but the corresponding module is missing.

(source)

coursera/week5_policy_based/practice_a3c.ipynb uses reward_scale=0.01, but grader doesn't

Coursera grader accepts score at least 5000, but with 0.01 scaling that's 5x SotA.

Possible ways to fix:

Remove scaling
Rescale score in grader submission
Rescale grader lower bound

week4 dqn overhaul

I'm taking this class on coursera and stuck on the dqn assignment for breakout. I was able to implement the code (though I don't know if it's correct or not). But the training takes super long time and does not seem to converge. In the description, it seems like it should reach a mean reward over 10 around 10k steps, however even after 100k steps, the mean reward still fluctuates a lot, sometimes around 10, sometimes around 0.

Here is a screenshot of my training.

could you provide a rough sense of how these two figures should be like?

we also raised this question in the class forum:
https://www.coursera.org/learn/practical-rl/discussions/weeks/4/threads/yzC8W14LEei7pAoHCSt0dA/replies/78XCWF4REeiosBJ671zJCg/comments/-qc_Xl9_EeiYTgr_SihX-A

It would be great if someone could help us out. Thanks

Weird Dockerfile

The following blocks are identical:

https://github.com/yandexdataschool/Practical_RL/blob/master/docker/Dockerfile#L31-L46

https://github.com/yandexdataschool/Practical_RL/blob/master/docker/Dockerfile#L51-L66

Is this expected?

issues with gym

If there's something wrong with openai gym and chat didn't resolve it in 10 minutes, feel free to complain here.

NoSuchDisplayException in week4_approx_rl/seminar_pytorch.ipynb

In first block replace if os.environ.get("DISPLAY") is not str to if type(os.environ.get("DISPLAY")) is not str

joblib+gym

joblib generates identical sessions if games are run in multiple threads. Simple notebook that reproduce the problem on my machine is attached.

Notebook on gist

add slides pls (week5)

Hi, Alex!
Could you please add a link to your presentation to this file.
Thanks!

get MCTS seminar to work

Please see openai/gym#1056. The close kwarg was removed at some point.

Multiple Dockerfiles

There is a Dockerfile in the root of the repo

Also there is a separate Dockerfile in a folder docker/

Which one is right? Is it possible to run the repo in mybinder and everware with a correct Dockerfile?

get token but cannot run in my browser

PS C:\Users\Nathaniel> docker run 491528bcff41
Unable to find image '491528bcff41:latest' locally
C:\Program Files\Docker\Docker\Resources\bin\docker.exe: Error response from daemon: pull access denied for 491528bcff41, repository does not exist or may require 'docker login'.
See 'C:\Program Files\Docker\Docker\Resources\bin\docker.exe run --help'.
PS C:\Users\Nathaniel> docker run justheuristic/practical_rl:latest
[I 13:27:41.201 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[I 13:27:41.379 NotebookApp] Serving notebooks from local directory: /notebooks
[I 13:27:41.379 NotebookApp] 0 active kernels
[I 13:27:41.379 NotebookApp] The Jupyter Notebook is running at:
[I 13:27:41.379 NotebookApp] http://6084fd93b11e:8888/?token=81a441a15748a2f85280d14a39bcd237b1e16c6ad0ff6972
[I 13:27:41.379 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 13:27:41.380 NotebookApp]

Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
    http://6084fd93b11e:8888/?token=81a441a15748a2f85280d14a39bcd237b1e16c6ad0ff6972&token=81a441a15748a2f85280d14a39bcd237b1e16c6ad0ff6972

week1 - MountainCar-v0

I started doing this class this week and I really like how well everything was thought out for self-learning. I really like the notebooks. I feel like I got a good grasp on Crossentropy, completing the notebook. However, I then tried to apply the learnings to MountainCar-v0, but have been quite stuck. It seems like the initial policy intialization never allows the agent to get the car once onto the mountain, thus our elite states do not contain any meaningful progress we can learn from.

Are there any hints that I could use in how to overcome this challenge?

Introductory Gym notebook throws a warning

https://www.coursera.org/learn/practical-rl/notebook/IKHzO/openai-gym throws

WARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.

in cell 2.

Kernel silent death on remote machie (xvfb?)

On remote machine with no display the following actions causes some problem in python output: the kernel dies every time on matplotlib.pyplot or tqdm called.
Also, env.render() from gym also causes kernel death.

The algorithm (copied from week4 seminar):

import os
if type(os.environ.get("DISPLAY")) is not str or len(os.environ.get("DISPLAY"))==0:
    !bash ../xvfb start
    %env DISPLAY=:1

import gym
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# for now kernel is still alive. plt. commands work fine

env = gym.make("CartPole-v0")
env.reset()
n_actions = env.action_space.n
state_dim = env.observation_space.shape

plt.imshow(env.render("rgb_array", close=True))

Kernel dies on the last string. After restart matplotlib also causes kernel death. Restarting the whole jupyter doesn't help either. Creating new env (in conda) - no use.
The only way I found yet is uninstalling Anaconda and installing all needed packages again.

That's what definitely should not happen to any remote machine.

Image in week4_approx_rl/seminar_tf.ipynb is missing

The image with the following URL is no longer available
https://s14.postimg.org/uzay2q5rl/qlearning_scheme.png

Can you please upload the image?

Thanks

Lecture PDFs

https://www.coursera.org/learn/practical-rl/discussions/all/threads/xqqs-iMKEemThg4NpXJRtg

Is it possible that presentation PDFs be released to the public?

Week 3 homework, Part I: On-policy learning and SARSA

I face an error from pandas when ever I run the cell containing
`from IPython.display import clear_output
from pandas import ewma, Series

moving_average = lambda ts, span=100: ewma(Series(ts), min_periods=span//10, span=span).values

rewards_sarsa, rewards_ql = [], []

for i in range(5000):
rewards_sarsa.append(play_and_train(env, agent_sarsa))
rewards_ql.append(play_and_train(env, agent_ql))
#Note: agent.epsilon stays constant

if i %100 ==0:
    clear_output(True)
    print('EVSARSA mean reward =', np.mean(rewards_sarsa[-100:]))
    print('QLEARNING mean reward =', np.mean(rewards_ql[-100:]))
    plt.title("epsilon = %s" % agent_ql.epsilon)
    plt.plot(moving_average(rewards_sarsa), label='ev_sarsa')
    plt.plot(moving_average(rewards_ql), label='qlearning')
    plt.grid()
    plt.legend()
    plt.ylim(-500, 0)
    plt.show()`

the error message is :
`ImportError Traceback (most recent call last)
in ()
1 from IPython.display import clear_output
----> 2 from pandas import ewma, Series
3 # from pandas import Dataframe.ewm as ewm
4 moving_average = lambda ts, span=100: ewma(Series(ts), min_periods=span//10, span=span).values
5

ImportError: cannot import name 'ewma'
`

I tried googling it but I can't find a fix for it.

using python 3.5 and pandas 0.23.0 on ubuntu 16.04