huggingface / deep-rl-class Goto Github PK

View Code? Open in Web Editor NEW

3.6K 85.0 542.0 45.9 MB

This repo contains the syllabus of the Hugging Face Deep Reinforcement Learning Course.

License: Apache License 2.0

Jupyter Notebook 42.15% MDX 57.85%

deep-reinforcement-learning reinforcement-learning reinforcement-learning-excercises deep-learning

deep-rl-class's Introduction

The Hugging Face Deep Reinforcement Learning Course 🤗 (v2.0)

If you like the course, don't hesitate to ⭐ star this repository. This helps us 🤗.

This repository contains the Deep Reinforcement Learning Course mdx files and notebooks. The website is here: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt

The syllabus 📚: https://simoninithomas.github.io/deep-rl-course
The course 📚: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt
Sign up here ➡️➡️➡️ http://eepurl.com/ic5ZUD

Citing the project

To cite this repository in publications:

@misc{deep-rl-course,
  author = {Simonini, Thomas and Sanseviero, Omar},
  title = {The Hugging Face Deep Reinforcement Learning Class},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huggingface/deep-rl-class}},
}

deep-rl-class's People

Contributors

Stargazers

Watchers

Forkers

ilyushin algonacci techthiyanes annajiat hannaabiakl qxzsilver1 bousejin thusithac kidkid168 thisisanshgupta af-74413592 mbrukman jonathansum sukanya-rs rsohlot anibalarias eastqiang akari0216 hangj11 jieyimao noking-xdj iptkachev jcgit2018 tongni1975 thegravityzero sarmbatch ryukijano itsundef anubrata polivio backpropper utkusaglm jezzarax castorfou m5l14i11 bharatr21 galeos93 gregtozzi aucan timakovi sachdevkartik alexandreokano carlos-aguayo mutnpc yedmavus marleyshan21 leumastai wesleywt deutschmn diskshima amansingh0-0 gutlapallinikhil wwymak svrajeshkumar ai-hub-deep-learning-fundamental sea-conch sahbic ironbar nathanyee rekcul fadelt reshmamonica raj-gupta1 saharahul shubh2806 swapnasourav gyaniultimate fagrahmed12 jbpacker badlogicmanpreet bahree abhinavm24 johko angelinux akhil4rajan rojinva shamik-07 mulixbf bybinyam thejarmanitor acforvs amir-abidi shagun-bohra astrakhantsevaaa kingabzpro sts-sadr raphaelreinauer krishnaveni2802 yibit gokulsg takshpanchal maksymdel neoryans ambernardino ilyua herbertzou gknb bkuriach sambitmukherjee dnouri

deep-rl-class's Issues

Unit1 Tutorial: The package_to_hub function throws error(NoSuchDisplayException: Cannot connect to "None")

Followed the handson tutorial in Unit1 and the package_to_hub function throws an error when trying to push the model to huggingface hub. Below is the error message.

NoSuchDisplayException Traceback (most recent call last)
in
32 eval_env=eval_env,
33 repo_id=repo_id,
---> 34 commit_message=commit_message)
35
36 # Note: if after running the package_to_hub function and it gives an issue of rebasing, please run the following code
/usr/local/lib/python3.7/dist-packages/pyglet/canvas/xlib.py in init(self, name, x_screen)
121 self._display = xlib.XOpenDisplay(name)
122 if not self._display:
--> 123 raise NoSuchDisplayException('Cannot connect to "%s"' % name)
124
125 screen_count = xlib.XScreenCount(self._display)
NoSuchDisplayException: Cannot connect to "None"

The corresponding code is shown below

import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

env_id = "LunarLander-v2"
model_architecture = "PPO"
repo_id = "JJRohan/ppo-LunarLander-v2"
commit_message = "Second commit, revise the process"

package_to_hub(model=model, 
               model_name=model_name,  
               model_architecture=model_architecture, 
               env_id=env_id, 
               eval_env=eval_env, 
               repo_id=repo_id,
               commit_message=commit_message)

Add knowledge checks

It would be cool to have some knowledge checks at the end of each unit

Variable values forgotten by package_to_hub script

Relevant cell:
https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit1/unit1.ipynb#scrollTo=JPG7ofdGIHN8&line=3&uniqifier=1

I kept getting errors when trying to run package_to_hub() because the variables set in prior cells weren't being inherited, so the execution would fail with something like "Cannot access property of: None" (I'm paraphrasing, apologies).

I was able to successfully execute the script in the cell by adding these lines to the beginning of the script in the cell:

from stable_baselines3 import PPO
model_name = "myModelName"
model = PPO.load(model_name)

Other cells remember variables set in prior cells, so I don't know why this cell forgets them and sets them to None.

Unit 2 - Installing requirements

Small issue, duplicated command, haven't seen it addressed before:

I believe it should be just:

Show numerical rank in Leaderboard.

One of the leaderboards is a huggingface space here. This is what it currently looks like:

It does not have any numerical ranks beside the entries. I think it would be very interesting and useful to have a numerical rank beside each entry. In that way, one can know where one stands with respect to others with clear numeric rank.

It would also be useful to have a Kaggle-like percentage ranking. It should show the percentile of the entry for each entry.

Please consider implementing this. It would be very useful if one could say:

"My model is currently the $n$ -th best model in the leaderboard."
"My model is in the top $k$ % in the leaderboard."

Error on package_to_hub

CalledProcessError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py in git_pull(self, rebase, lfs)
1073 encoding="utf-8",
-> 1074 cwd=self.local_dir,
1075 )

3 frames
CalledProcessError: Command '['git', 'pull']' returned non-zero exit status 128.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py in git_pull(self, rebase, lfs)
1076 logger.info(result.stdout)
1077 except subprocess.CalledProcessError as exc:
-> 1078 raise EnvironmentError(exc.stderr)
1079
1080 def git_add(

OSError: warning: no common commits
From https://huggingface.co/ChechkovEugene/ppo-LunarLander-v2

b4bcf4f...6e99bcb main -> origin/main (forced update)
fatal: refusing to merge unrelated histories

OS Error when attempting to push trained agent using package_to_hub() function

The following error is shown when attempting to push the trained agent using the package_to_hub() function:

For reference I am using Google Colab and Mac OS Big Sur 11.5.2

CalledProcessError: Command '['git', 'pull']' returned non-zero exit status 128.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py in git_pull(self, rebase, lfs)
1076 logger.info(result.stdout)
1077 except subprocess.CalledProcessError as exc:
-> 1078 raise EnvironmentError(exc.stderr)
1079
1080 def git_add(

OSError: warning: no common commits
From https://huggingface.co/Sicko-Code/PPO-lunarlander-v2

3823c08...4b3486e main -> origin/main (forced update)
fatal: refusing to merge unrelated histories

Unit 1 :: State vs Observation

The distinction between state and observation taught here
appears somewhat non-standard inasmuch as the latter is
required to be a partial state here, which need not be
the case in general. In a fully observable scenario, state
and observation are identical.

This view is not accepted here (cf. Quiz 1).

We receive an observation when we play with chess environment
Incorrect. Since we have access to the whole checkboard information.

BTW, the expected answer is incorrect even with the course's
non-standard definition requiring observations to be partial,
since chess is not fully observable given the current position.
See #122

Suggested update example Unit 1: discounting

I have a suggestion to update an example in the blogpost for Unit 1. I'm probably being somewhat nitpicky and probably in general the example already work well to get the general point across to most beginners, but technically I think itcould be more precise (without making it harder to understand).

The issue is with the example of the cat near the bigger piles of cheese, and the explanation of why discounting is used in RL. See picture below:

The explanation sort of implies that in this case the discounting would be related to the spatial proximity of the potential hazard (cat in this case) to the bigger piles of cheese. But discounting is of course solely about temporal aspects. In fact, in this example, if the initial position of the cat had been in the bottom of the screen, we would still be discounting the larger piles of cheese at the top due to their (temporal) distance. And if, for whatever reason, our mouse happens to have already moved closer towards the top of the screen, then from that point onwards it will be the single-cheese cells at the bottom that get discounted.

As a potential solution, I think I would suggest to remove the cat altogether, and provide examples of "invisible" hazards. For example, maybe we prefer to eat cheese fast because its taste gets worse over time. Or maybe we simply have a random stopping time for the episode, and so if we run for the bigger piles of cheese there is a (randomised) risk that we might not arrive in time.

Unit 2 Gym version 0.24.0 Warning

A quick small one-

This warning didn't cause any problems, but maybe it would be nice to have a note saying "ignore that error" or something lol

On Unit 2 Step 3 I get this warning (in Colab)...
"Warning: Gym version v0.24.0 has a number of critical issues with gym.make such that the reset and step functions are called before returning the environment. It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1"

package_to_hub HTTPError

package_to_hub function throws an error when trying to push the model to huggingface hub. Below is the error message.

HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/models/Alt41r/ppo-LunarLander-v2/commit/main (Request ID: xdvHxk6O2PH6-yVVnYbfd)

from cell:

import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

env_id = "LunarLander-v2"

model_architecture = "PPO"

repo_id = "Alt41r/ppo-LunarLander-v2"

commit_message = "Upload PPO LunarLander-v2 trained agent"


eval_env = DummyVecEnv([lambda: gym.make(env_id)])


package_to_hub(model=model, 
               model_name=model_name, 
               model_architecture=model_architecture, 
               env_id=env_id, 
               eval_env=eval_env,
               repo_id=repo_id, 
               commit_message=commit_message)

Translation to Chinese

Hello there, thanks for the course.
The course is great, and I am sure that there are lots of Chinese students who want to study this course.
Is there any plan to translate this course to Chinese?
If not, I think I can work on that.

: )

leaderboard: metrics don't update after pushing changes to model

leaderboard doesn't appear to update scores when pushing changes to an existing model. I'm guessing this is due to line 105 in app.py, and should be changed to allow score updates.

Add some of the content in Discord as special unit content

E.g. from #rl-announcements channel, we got this for unit 2

Hey everyone 👋 .

If you want to go deeper into Stable Baselines3 before our next week unit about Deep Q-Learning you can check these cool environments 🚀

Minigrid environment: puzzle environments where your agent needs to find the way out using keys 🔐 and doors 🚪 : https://github.com/maximecb/gym-minigrid

Procgen Benchmark: 16 simple-to-use procedurally-generated gym environments (platform, shooters etc). You have an example with Stable-Baselines3 here: https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#sb3-and-procgenenv

VizDoom: a Doom like environment 🔥 Nicholas Renotte made a very good tutorial on how to train an agent playing it using Stable Baselines-3: https://youtu.be/eBCU-tqLGfQ

Unit 2: bug in evaluate_agent function

episode_rewards dosen't append total_rewards_ep if episode stops after 99 steps (for ex., when the agent performs useless cyclic actions). It leads to wrong evaluate scores on the Taxi-v3 leaderboard.

To fix it just add episode_rewards.append(total_rewards_ep) after inner loop (line 29).

UPD: i'm wrong. To fix it, move episode_rewards.append(total_rewards_ep) (from if done:) out of the inner loop like I did in the PR.

[Contributions Welcome] Write a guide using Optuna to find the best hyperparameters

As a special content for Unit 1, many people have asked for an example showing Optuna to find the right hyperparameters for PPO. Would anyone be interested in adding one? It does not need to be a long one, just showing how to use it to find the right hyperparameters and finally pushing to the Hub.

Unit 1 Notebook - Issue on package_to_hub cell

Hey,

I want to note in the cell related to the package_to_hub package these lines of code:

# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])

should be after:

# TODO: Define the name of the environment
env_id = "LunarLander-v2"

To avoid NameError issues.

Regards

Unit 1 Quiz Q3 wrong answer expected

Given only the current board position, chess is only partially observable,
since state that affects the set of possible next moves (namely, caslting
and en passant) cannot always be determined from the board alone.

We receive an observation when we play with chess environment
Incorrect. Since we have access to the whole checkboard information.

BTW, the difference between observation and state sought after here is questionable.
See #123

Unit 1 Pyglet version

This is a real quick one:

In unit 1 today when I did
pip install pyglet
it said that pyglet 2.0 is only compatible with python >3.8.3 (if I'm remembering correctly). So I tried
pip install pyglet==1.5.1
and I was able to complete the rest of unit 1 without a hitch. 1.5.1 was suggested in the discord to solve a different issue, but it seems to solve this one too

[Contributions Welcome] Add content on how to record videos of agent playing

Unit 2 Notebook epsilon_greedy_policy Missing Parameter

In the unit 2 notebook, epsilon_greedy_policy is missing a parameter.
In the solution, env.action_space.sample() is called in the else condition. However, env is not in epsilon_greedy_policy's scope.

def epsilon_greedy_policy(Qtable, state, epsilon):
  # Randomly generate a number between 0 and 1
  random_int = random.uniform(0,1)
  # if random_int > greater than epsilon --> exploitation
  if random_int > epsilon:
    # Take the action with the highest value given a state
    # np.argmax can be useful here
    action = np.argmax(Qtable[state])
  # else --> exploration
  else:
    action = env.action_space.sample()
  
  return action

Adding env to the function definition and function calls should fix this bug.

Thanks!

Disable GPU for unit2.ipynb on Colab

unit2.ipynb doesn't use GPU. Perhaps we could disable it on colab, so when ppl clone it they don't use valuable resources unnecessarily

Trying to push to hub results in AttributeError

I created docker container for local environment and trying to execute the last cell with package_to_hub function results in following error:
AttributeError : module 'stable_baselines3' has no attribute 'get_system_info'

Base docker image: nvcr.io/nvidia/pytorch:22.04-py3
apt-get dependencies installed:
- git
- python3-pip \
- build-essential \
- python-opengl \
- ffmpeg \
- xvfb \
- libosmesa6-dev \
- libgl1-mesa-dev \
- libglfw3 \
- swig \
Additionally installed git-lfs and mujoco 1.5.0
pip dependencies installed:
- pyglet==1.5.1
- lockfile
- glfw
- imageio
- pyvirtualdisplay
- gym[all]
- stable-baselines3[extra]
- huggingface_sb3
- ale-py==0.7.4
- ipywidgets
- opencv-python-headless

Full log:

AttributeError                            Traceback (most recent call last)
Input In [22], in <cell line: 28>()
     25 eval_env = DummyVecEnv([lambda: gym.make(env_id)])
     27 # PLACE the package_to_hub function you've just filled here
---> 28 package_to_hub(model=model, # Our trained model
     29                model_name=model_name, # The name of our trained model 
     30                model_architecture=model_architecture, # The model architecture we used: in our case PPO
     31                env_id=env_id, # Name of the environment
     32                eval_env=eval_env, # Evaluation Environment
     33                repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
     34                commit_message=commit_message)

File ~/.local/lib/python3.8/site-packages/huggingface_sb3/push_to_hub.py:314, in package_to_hub(model, model_name, model_architecture, env_id, eval_env, repo_id, commit_message, is_deterministic, n_eval_episodes, token, local_repo_path, video_length)
    311     is_deterministic = not is_atari(env_id)
    313 # Step 2: Create a config file
--> 314 _generate_config(model_name, repo_local_path)
    316 # Step 3: Evaluate the agent
    317 mean_reward, std_reward = _evaluate_agent(
    318     model, eval_env, n_eval_episodes, is_deterministic, repo_local_path
    319 )

File ~/.local/lib/python3.8/site-packages/huggingface_sb3/push_to_hub.py:53, in _generate_config(model, repo_local_path)
     51     data = json.load(json_file)
     52     # Add system_info elements to our JSON
---> 53     data["system_info"] = stable_baselines3.get_system_info(print_info=False)[0]
     55 # Step 3: Write our config.json file
     56 with open(Path(repo_local_path) / "config.json", "w") as outfile:

AttributeError: module 'stable_baselines3' has no attribute 'get_system_info'

[lib][RL] Using pytorch-lightning, catalyst-rl, sample-factory

Can you please tell me how you look at using the following libraries, ideas, for training in a course or research, production and use for reinforcement learning?

pytorch-lightning https://www.pytorchlightning.ai/
PyTorch Lightning bolts https://github.com/PyTorchLightning/lightning-bolts
catalyst-rl https://github.com/catalyst-team/catalyst-rl
sample-factory https://github.com/alex-petrenko/sample-factory

What do you think about this?

Unit5 tutorial errors with gym version 0.26.1 . The version is not specified in the tutorial

When running the tutorial on google colab, gym version 0.25.2 gets installed currently and things work.
Installing on other systems, you get the most recent version by default which is 0.26.1 and it has breaking changes that make it incompatible with code designed to work with 0.25.x
Suggestion to specify the versions of dependencies in this and other tutorials or the smaller fix would be to just specify the version of gym to be 0.25.x .

Add Discord 101

cc @simoninithomas

I can help with this 🚀

Unit 1 Glossary Markov Property

The definition feels a bit backwards, considering that the agent
can always decide to act based only on the current state, but
whether or not that can be optimal depends on the environment,
or its state progression to be precise, satisfies a Markov property.

Unit 1 crash when pushing to HF hub

I'm successfully creating, training and evaluating a model.
When invoking I'm getting the stacktrace below.

The repository is created in my HF account (with a .gitattributes via an initial commit).

This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.
/usr/local/lib/python3.8/dist-packages/stable_baselines3/common/evaluation.py:65: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
  warnings.warn(
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
[<ipython-input-41-aa4aae99d08a>](https://localhost:8080/#) in <module>
     25 
     26 # method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub
---> 27 package_to_hub(model=model, # Our trained model
     28                model_name=model_name, # The name of our trained model
     29                model_architecture=model_architecture, # The model architecture we used: in our case PPO

9 frames
[/usr/local/lib/python3.8/dist-packages/huggingface_sb3/push_to_hub.py](https://localhost:8080/#) in package_to_hub(model, model_name, model_architecture, env_id, eval_env, repo_id, commit_message, is_deterministic, n_eval_episodes, token, video_length, logs)
    373 
    374         # Step 4: Generate a video
--> 375         _generate_replay(model, replay_env, video_length, is_deterministic, tmpdirname)
    376 
    377         # Step 5: Generate the model card

[/usr/local/lib/python3.8/dist-packages/huggingface_sb3/push_to_hub.py](https://localhost:8080/#) in _generate_replay(model, eval_env, video_length, is_deterministic, local_path)
    134         )
    135 
--> 136         obs = env.reset()
    137         lstm_states = None
    138         episode_starts = np.ones((env.num_envs,), dtype=bool)

[/usr/local/lib/python3.8/dist-packages/stable_baselines3/common/vec_env/vec_video_recorder.py](https://localhost:8080/#) in reset(self)
     66     def reset(self) -> VecEnvObs:
     67         obs = self.venv.reset()
---> 68         self.start_video_recorder()
     69         return obs
     70 

[/usr/local/lib/python3.8/dist-packages/stable_baselines3/common/vec_env/vec_video_recorder.py](https://localhost:8080/#) in start_video_recorder(self)
     78         )
     79 
---> 80         self.video_recorder.capture_frame()
     81         self.recorded_frames = 1
     82         self.recording = True

[/usr/local/lib/python3.8/dist-packages/gym/wrappers/monitoring/video_recorder.py](https://localhost:8080/#) in capture_frame(self)
    130 
    131         render_mode = "ansi" if self.ansi_mode else "rgb_array"
--> 132         frame = self.env.render(mode=render_mode)
    133 
    134         if frame is None:

[/usr/local/lib/python3.8/dist-packages/stable_baselines3/common/vec_env/dummy_vec_env.py](https://localhost:8080/#) in render(self, mode)
     85         """
     86         if self.num_envs == 1:
---> 87             return self.envs[0].render(mode=mode)
     88         else:
     89             return super().render(mode=mode)

[/usr/local/lib/python3.8/dist-packages/gym/core.py](https://localhost:8080/#) in render(self, mode, **kwargs)
    293 
    294     def render(self, mode="human", **kwargs):
--> 295         return self.env.render(mode, **kwargs)
    296 
    297     def close(self):

[/usr/local/lib/python3.8/dist-packages/gym/envs/box2d/lunar_lander.py](https://localhost:8080/#) in render(self, mode)
    386 
    387     def render(self, mode="human"):
--> 388         from gym.envs.classic_control import rendering
    389 
    390         if self.viewer is None:

[/usr/local/lib/python3.8/dist-packages/gym/envs/classic_control/rendering.py](https://localhost:8080/#) in <module>
     25 
     26 try:
---> 27     from pyglet.gl import *
     28 except ImportError as e:
     29     raise ImportError(

[/usr/local/lib/python3.8/dist-packages/pyglet/gl/__init__.py](https://localhost:8080/#) in <module>
    233 elif compat_platform == 'darwin':
    234     from .cocoa import CocoaConfig as Config
--> 235 del base  # noqa: F821
    236 
    237 

NameError: name 'base' is not defined

Leaderboard - Had to refresh the dashboard

I waited about ~6 hours to see if my Lunar Lander model from unit 1 would show up on the leaderboard. I noticed that at the very bottom it had a 'refresh' button. Once I clicked refresh my model shows on the leaderboard, but if it is necessary to click that refresh button, it would be good to add a note in the unit 1 jupyter notebook because other people will also likely wonder why their model is not showing.

Import in unit-1 causes using deprecated functionality of a dependency and leads to error

In unit-1 notebook, in the section titled Step 9: Load a saved LunarLander model from the Hub 🤗, there is an external library that is used to render the steps of the lunar lander.

That library is colabgymrender.

It is first installed via:

!pip install colabgymrender==1.0.2

Then when it is imported, it throws a RuntimeError, like so:

RuntimeError: imageio.ffmpeg.download() has been deprecated. Use 'pip install imageio-ffmpeg' instead.'

It is a dependency error rather than an error with any of the tutorial content.

The full error message.

RuntimeError                              Traceback (most recent call last)

[<ipython-input-20-830a12517f0a>](https://localhost:8080/#) in <module>
----> 1 from colabgymrender.recorder import Recorder
      2 
      3 directory = './video'
      4 env = Recorder(eval_env, directory)
      5 

2 frames

[/usr/local/lib/python3.7/dist-packages/colabgymrender/recorder.py](https://localhost:8080/#) in <module>
      1 from pyvirtualdisplay import Display
----> 2 from moviepy.editor import *
      3 import time
      4 import gym
      5 import cv2

[/usr/local/lib/python3.7/dist-packages/moviepy/editor.py](https://localhost:8080/#) in <module>
     24 # Checks to see if the user has set a place for their own version of ffmpeg
     25 if os.getenv('FFMPEG_BINARY', 'ffmpeg-imageio') == 'ffmpeg-imageio':
---> 26     imageio.plugins.ffmpeg.download()
     27 
     28 # Clips

[/usr/local/lib/python3.7/dist-packages/imageio/plugins/ffmpeg.py](https://localhost:8080/#) in download(directory, force_download)
     36 def download(directory=None, force_download=False):  # pragma: no cover
     37     raise RuntimeError(
---> 38         "imageio.ffmpeg.download() has been deprecated. "
     39         "Use 'pip install imageio-ffmpeg' instead.'"
     40     )

RuntimeError: imageio.ffmpeg.download() has been deprecated. Use 'pip install imageio-ffmpeg' instead.'

Variable called before assigment in unit-1

In the section- 'The package_to_hub function', there is a case of one variable being called before it was assigned.

This is what is given in the notebook:

# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])

# TODO: Define the name of the environment
env_id = 'LunarLander-v2'

Here env_id name is called before it is assigned, and leads to error.

This is what it should have been:

# TODO: Define the name of the environment
env_id = 'LunarLander-v2'

# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])

Unit1 step 07 adding Callback function

Knowing how to Save/Load the model is great. Could we add a callback function, CallbackList, CheckpointCallback, and EvalCallback? Thanks

Add pre-requisites for course

cc @simoninithomas

Broken cover image in Unit 1

https://huggingface.co/deep-rl-course/unit1/hands-on?fw=pt#unit-1-train-your-first-deep-reinforcement-learning-agent

goal conditioned RL

Please tell me if the course will cover the topic of goal conditioned RL?

When with the help of award design, high results are achieved on simple algorithms, comparable to the latest RL SoTA

package_to_hub not working

At the end of unit 1 I cannot execute the package_to_hub function.
I get the error "NameError: name 'base' is not defined" in line 235 of /usr/local/lib/python3.7/dist-packages/pyglet/gl/init.py

Equation in Unit2 Mid-way quiz not render properly

Broken link to Fast.ai course in Unit 1 > The “Deep” in Reinforcement Learning

At the bottom of https://huggingface.co/deep-rl-course/unit1/deep-rl there is a link to "the FastAI Practical Deep Learning for Coders" course. It currently links to https://course.fast.a/, which is missing the i at the end.

The correctly link should be https://course.fast.ai/
e.g.
the FastAI Practical Deep Learning for Coders

Unit 5 Tutorial errors and changes to the code

Hi, there are errors in the unit5. ipynb. I tested for the gym environment 0.24 and 0.26, the errors I got are following:

ValueError: too many values to unpack (expected 4) (in reinforce function)
Expected nd_array and tuple ( in policy function)

I have made the following changes in the code, if somebody has problem try to change those functions with the following code

`class Policy(nn.Module):
def init(self, s_size, a_size, h_size):
super(Policy, self).init()
self.fc1 = nn.Linear(s_size, h_size)
self.fc2 = nn.Linear(h_size, a_size)

def forward(self, x):
    x = F.relu(self.fc1(x))
    x = self.fc2(x)
    return F.softmax(x, dim=1)

def act(self, state):
    if len(state) == 2:
        state = state[0]
    state = torch.from_numpy(state).float().unsqueeze(0).to(device)
    probs = self.forward(state).cpu()
    m = Categorical(probs)
    action = m.sample()
    return action.item(), m.log_prob(action)`

For reinforce function replace this state, reward, done, _ = env.step(action) with this one state, reward, done, _, _ = env.step(action)

Similarly for the evaluation function, change this new_state, reward, done, info = env.step(action) to this new_state, reward, done, info, _ = env.step(action).

This additional information after the env.step is being reported in the gym as "truncated", as it can be seen from the below code from gym docs observation, reward, terminated, truncated, info = env.step(action)

Unit 2 - Bellman Equation - Value State Figure misleads reader

Hello,

In this figure https://huggingface.co/blog/assets/70_deep_rl_q_part1/bellman2.jpg there are 6 rewards of -1 but in the equation below the value-state is calculated to -7.

Would you like to see again the figure?

Regards

[Contributions welcomed] Create a glossary

Creating a glossary with key words would be very beneficial. Contributions for v0 are welcomed 🔥

Error in Unit 5

In a Solution section it says

class Policy(nn.Module):
    def __init__(self, s_size, a_size, h_size):
        super(Policy, self).__init__()
        self.fc1 = nn.Linear(s_size, h_size)
        self.fc2 = nn.Linear(h_size, a_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.softmax(x, dim=1)
    
    def act(self, state):
        state = torch.from_numpy(state).float().unsqueeze(0).to(device)
        probs = self.forward(state).cpu()
        m = Categorical(probs)
        action = np.argmax(m)
        return action.item(), m.log_prob(action)

And then

I make a mistake, can you guess where?

To find out let's make a forward pass:

But when I run the following code

debug_policy = Policy(s_size, a_size, 64)
debug_policy.act(env.reset())

The error is

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)

Which is different than the one shared below

Logging with tensorboard and wandb

Hey there!

I would like to make a notebook which helps others get started with logging of their experiments with tensorboard and wandb, along with pushing the logs to hub.

Why Double DQN don't use the accumulated reward

The main goal of Deep Learning is to maximize the accumulate reward. In the Q-Learn we use the accumulate reward to update the Qtable. However, the DDQN use the instant reward instead of accumulated reward to update the network.

Can anyone tell me, Why?

Huggy bonus unit should use smaller checkpoint intervals

The oldest checkpoint I have already received ~90% of the reward that the final trained one gets, making it hard to see any progression.
A smaller checkpoint interval would give users a 'bad' model to compare to the finially trained on.

[Contributions welcome] Add example showing how to load a pretrained agent

For Unit 1, there is no example showing how to load an agent from the hub. Having an example for it would be very very cool 🔥

Advantages/disadvantages between monte carlo and td learning

I have just read the part 1 of the introduction of Q-learning and althought I believe the methods are very well described I have missed an explanation of the advantages/disadvantages of using either methods. Maybe that will come in part 2 but just in case...

This link might be a good reference to explain the difference in variance and bias between the methods.

This other link also is very interesting regarding when it may have sense to use monte carlo over td.

Thanks and congratulations for the course
Guillermo

model_name must not have an extension

If a user sets model_name with an extension, model.save(name) won't append .zip. The course's internal tooling appends .zip to the name despite the content. As the outcome there will be an error that the file was not found.
Steps to reproduce.

model_name = 'my_model.ext'
model.save(model_name)

package_to_hub(model, model_name=model_name, ...)

this will return an error that my_model.ext.zip does not exist.

Add content about certification

Unit II - Part II Update Rule for Q-values

Hey guys,

I think there are two typos in step 4 update rule. Atm, it is written as:

$Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t+1}+\gamma \max_{\alpha}Q(S_{t+1}, \alpha) - Q(S_{t}, A_{t})]$

instead, I think it should be:
$Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t}+\gamma \max_{\alpha}Q(S_{t+1}, A_{t+1}) - Q(S_{t}, A_{t})]$

where $A_{t+1}$ is the best action of the next state and $R_{t}$ refers to the immediate reward at step $t$.

Regards,
Vangelis