Giter VIP home page Giter VIP logo

deep-rl-class's Introduction

Thumbnail

If you like the course, don't hesitate to ⭐ star this repository. This helps us πŸ€—.

This repository contains the Deep Reinforcement Learning Course mdx files and notebooks. The website is here: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt

Citing the project

To cite this repository in publications:

@misc{deep-rl-course,
  author = {Simonini, Thomas and Sanseviero, Omar},
  title = {The Hugging Face Deep Reinforcement Learning Class},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huggingface/deep-rl-class}},
}

deep-rl-class's People

Contributors

alexpalms avatar artachtron avatar avoroshilov avatar dario248 avatar dcarpintero avatar dennissoemers avatar diskshima avatar dylwil3 avatar e-dong avatar fzyzcjy avatar guiyrt avatar heispv avatar imflash217 avatar joeadsp avatar jonathansum avatar josejuanmartinez avatar lucifermorningstar1305 avatar lunarflu avatar medvedev avatar mike-wazowsk1 avatar mishig25 avatar osanseviero avatar pierrecounathe avatar robertoschiavone avatar s-n-o-r-l-a-x avatar simoninithomas avatar sryu1 avatar supersecurehuman avatar theicfire avatar yayab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-rl-class's Issues

Unit1 Tutorial: The package_to_hub function throws error(NoSuchDisplayException: Cannot connect to "None")

Followed the handson tutorial in Unit1 and the package_to_hub function throws an error when trying to push the model to huggingface hub. Below is the error message.

NoSuchDisplayException Traceback (most recent call last)
in
32 eval_env=eval_env,
33 repo_id=repo_id,
---> 34 commit_message=commit_message)
35
36 # Note: if after running the package_to_hub function and it gives an issue of rebasing, please run the following code
/usr/local/lib/python3.7/dist-packages/pyglet/canvas/xlib.py in init(self, name, x_screen)
121 self._display = xlib.XOpenDisplay(name)
122 if not self._display:
--> 123 raise NoSuchDisplayException('Cannot connect to "%s"' % name)
124
125 screen_count = xlib.XScreenCount(self._display)
NoSuchDisplayException: Cannot connect to "None"

The corresponding code is shown below

import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

env_id = "LunarLander-v2"
model_architecture = "PPO"
repo_id = "JJRohan/ppo-LunarLander-v2"
commit_message = "Second commit, revise the process"

package_to_hub(model=model, 
               model_name=model_name,  
               model_architecture=model_architecture, 
               env_id=env_id, 
               eval_env=eval_env, 
               repo_id=repo_id,
               commit_message=commit_message)

Variable values forgotten by package_to_hub script

Relevant cell:
https://colab.research.google.com/github/huggingface/deep-rl-class/blob/master/notebooks/unit1/unit1.ipynb#scrollTo=JPG7ofdGIHN8&line=3&uniqifier=1

I kept getting errors when trying to run package_to_hub() because the variables set in prior cells weren't being inherited, so the execution would fail with something like "Cannot access property of: None" (I'm paraphrasing, apologies).

I was able to successfully execute the script in the cell by adding these lines to the beginning of the script in the cell:

from stable_baselines3 import PPO
model_name = "myModelName"
model = PPO.load(model_name)

Other cells remember variables set in prior cells, so I don't know why this cell forgets them and sets them to None.

Show numerical rank in Leaderboard.

One of the leaderboards is a huggingface space here. This is what it currently looks like:

image

It does not have any numerical ranks beside the entries. I think it would be very interesting and useful to have a numerical rank beside each entry. In that way, one can know where one stands with respect to others with clear numeric rank.

It would also be useful to have a Kaggle-like percentage ranking. It should show the percentile of the entry for each entry.

Please consider implementing this. It would be very useful if one could say:

  • "My model is currently the $n$ -th best model in the leaderboard."
  • "My model is in the top $k$ % in the leaderboard."

Error on package_to_hub

CalledProcessError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py in git_pull(self, rebase, lfs)
1073 encoding="utf-8",
-> 1074 cwd=self.local_dir,
1075 )

3 frames
CalledProcessError: Command '['git', 'pull']' returned non-zero exit status 128.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py in git_pull(self, rebase, lfs)
1076 logger.info(result.stdout)
1077 except subprocess.CalledProcessError as exc:
-> 1078 raise EnvironmentError(exc.stderr)
1079
1080 def git_add(

OSError: warning: no common commits
From https://huggingface.co/ChechkovEugene/ppo-LunarLander-v2

  • b4bcf4f...6e99bcb main -> origin/main (forced update)
    fatal: refusing to merge unrelated histories

OS Error when attempting to push trained agent using package_to_hub() function

The following error is shown when attempting to push the trained agent using the package_to_hub() function:

For reference I am using Google Colab and Mac OS Big Sur 11.5.2

CalledProcessError: Command '['git', 'pull']' returned non-zero exit status 128.

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/huggingface_hub/repository.py in git_pull(self, rebase, lfs)
1076 logger.info(result.stdout)
1077 except subprocess.CalledProcessError as exc:
-> 1078 raise EnvironmentError(exc.stderr)
1079
1080 def git_add(

OSError: warning: no common commits
From https://huggingface.co/Sicko-Code/PPO-lunarlander-v2

  • 3823c08...4b3486e main -> origin/main (forced update)
    fatal: refusing to merge unrelated histories

Unit 1 :: State vs Observation

The distinction between state and observation taught here
appears somewhat non-standard inasmuch as the latter is
required to be a partial state here, which need not be
the case in general. In a fully observable scenario, state
and observation are identical.

This view is not accepted here (cf. Quiz 1).


  • We receive an observation when we play with chess environment
    Incorrect. Since we have access to the whole checkboard information.

BTW, the expected answer is incorrect even with the course's
non-standard definition requiring observations to be partial,
since chess is not fully observable given the current position.
See #122

Suggested update example Unit 1: discounting

I have a suggestion to update an example in the blogpost for Unit 1. I'm probably being somewhat nitpicky and probably in general the example already work well to get the general point across to most beginners, but technically I think itcould be more precise (without making it harder to understand).

The issue is with the example of the cat near the bigger piles of cheese, and the explanation of why discounting is used in RL. See picture below:

afbeelding

The explanation sort of implies that in this case the discounting would be related to the spatial proximity of the potential hazard (cat in this case) to the bigger piles of cheese. But discounting is of course solely about temporal aspects. In fact, in this example, if the initial position of the cat had been in the bottom of the screen, we would still be discounting the larger piles of cheese at the top due to their (temporal) distance. And if, for whatever reason, our mouse happens to have already moved closer towards the top of the screen, then from that point onwards it will be the single-cheese cells at the bottom that get discounted.

As a potential solution, I think I would suggest to remove the cat altogether, and provide examples of "invisible" hazards. For example, maybe we prefer to eat cheese fast because its taste gets worse over time. Or maybe we simply have a random stopping time for the episode, and so if we run for the bigger piles of cheese there is a (randomised) risk that we might not arrive in time.

Unit 2 Gym version 0.24.0 Warning

A quick small one-

This warning didn't cause any problems, but maybe it would be nice to have a note saying "ignore that error" or something lol

On Unit 2 Step 3 I get this warning (in Colab)...
"Warning: Gym version v0.24.0 has a number of critical issues with gym.make such that the reset and step functions are called before returning the environment. It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1"

package_to_hub HTTPError

package_to_hub function throws an error when trying to push the model to huggingface hub. Below is the error message.

HTTPError: 400 Client Error: Bad Request for url: https://huggingface.co/api/models/Alt41r/ppo-LunarLander-v2/commit/main (Request ID: xdvHxk6O2PH6-yVVnYbfd)

from cell:

import gym

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

env_id = "LunarLander-v2"

model_architecture = "PPO"

repo_id = "Alt41r/ppo-LunarLander-v2"

commit_message = "Upload PPO LunarLander-v2 trained agent"


eval_env = DummyVecEnv([lambda: gym.make(env_id)])


package_to_hub(model=model, 
               model_name=model_name, 
               model_architecture=model_architecture, 
               env_id=env_id, 
               eval_env=eval_env,
               repo_id=repo_id, 
               commit_message=commit_message)

Translation to Chinese

Hello there, thanks for the course.
The course is great, and I am sure that there are lots of Chinese students who want to study this course.
Is there any plan to translate this course to Chinese?
If not, I think I can work on that.

: )

Add some of the content in Discord as special unit content

E.g. from #rl-announcements channel, we got this for unit 2

Hey everyone πŸ‘‹ .

If you want to go deeper into Stable Baselines3 before our next week unit about Deep Q-Learning you can check these cool environments πŸš€

Unit 2: bug in evaluate_agent function

episode_rewards dosen't append total_rewards_ep if episode stops after 99 steps (for ex., when the agent performs useless cyclic actions). It leads to wrong evaluate scores on the Taxi-v3 leaderboard.

To fix it just add episode_rewards.append(total_rewards_ep) after inner loop (line 29).

UPD: i'm wrong. To fix it, move episode_rewards.append(total_rewards_ep) (from if done:) out of the inner loop like I did in the PR.

Unit 1 Notebook - Issue on package_to_hub cell

Hey,

I want to note in the cell related to the package_to_hub package these lines of code:

# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])

should be after:

# TODO: Define the name of the environment
env_id = "LunarLander-v2"

To avoid NameError issues.

Regards

Unit 1 Quiz Q3 wrong answer expected

Given only the current board position, chess is only partially observable,
since state that affects the set of possible next moves (namely, caslting
and en passant) cannot always be determined from the board alone.


  • We receive an observation when we play with chess environment
    Incorrect. Since we have access to the whole checkboard information.

BTW, the difference between observation and state sought after here is questionable.
See #123

Unit 1 Pyglet version

This is a real quick one:

In unit 1 today when I did
pip install pyglet
it said that pyglet 2.0 is only compatible with python >3.8.3 (if I'm remembering correctly). So I tried
pip install pyglet==1.5.1
and I was able to complete the rest of unit 1 without a hitch. 1.5.1 was suggested in the discord to solve a different issue, but it seems to solve this one too

Unit 2 Notebook epsilon_greedy_policy Missing Parameter

In the unit 2 notebook, epsilon_greedy_policy is missing a parameter.
In the solution, env.action_space.sample() is called in the else condition. However, env is not in epsilon_greedy_policy's scope.

def epsilon_greedy_policy(Qtable, state, epsilon):
  # Randomly generate a number between 0 and 1
  random_int = random.uniform(0,1)
  # if random_int > greater than epsilon --> exploitation
  if random_int > epsilon:
    # Take the action with the highest value given a state
    # np.argmax can be useful here
    action = np.argmax(Qtable[state])
  # else --> exploration
  else:
    action = env.action_space.sample()
  
  return action

Adding env to the function definition and function calls should fix this bug.

Thanks!

Trying to push to hub results in AttributeError

I created docker container for local environment and trying to execute the last cell with package_to_hub function results in following error:
AttributeError : module 'stable_baselines3' has no attribute 'get_system_info'

  • Base docker image: nvcr.io/nvidia/pytorch:22.04-py3
  • apt-get dependencies installed:
    • git
    • python3-pip \
    • build-essential \
    • python-opengl \
    • ffmpeg \
    • xvfb \
    • libosmesa6-dev \
    • libgl1-mesa-dev \
    • libglfw3 \
    • swig \
  • Additionally installed git-lfs and mujoco 1.5.0
  • pip dependencies installed:
    • pyglet==1.5.1
    • lockfile
    • glfw
    • imageio
    • pyvirtualdisplay
    • gym[all]
    • stable-baselines3[extra]
    • huggingface_sb3
    • ale-py==0.7.4
    • ipywidgets
    • opencv-python-headless

Full log:

AttributeError                            Traceback (most recent call last)
Input In [22], in <cell line: 28>()
     25 eval_env = DummyVecEnv([lambda: gym.make(env_id)])
     27 # PLACE the package_to_hub function you've just filled here
---> 28 package_to_hub(model=model, # Our trained model
     29                model_name=model_name, # The name of our trained model 
     30                model_architecture=model_architecture, # The model architecture we used: in our case PPO
     31                env_id=env_id, # Name of the environment
     32                eval_env=eval_env, # Evaluation Environment
     33                repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
     34                commit_message=commit_message)

File ~/.local/lib/python3.8/site-packages/huggingface_sb3/push_to_hub.py:314, in package_to_hub(model, model_name, model_architecture, env_id, eval_env, repo_id, commit_message, is_deterministic, n_eval_episodes, token, local_repo_path, video_length)
    311     is_deterministic = not is_atari(env_id)
    313 # Step 2: Create a config file
--> 314 _generate_config(model_name, repo_local_path)
    316 # Step 3: Evaluate the agent
    317 mean_reward, std_reward = _evaluate_agent(
    318     model, eval_env, n_eval_episodes, is_deterministic, repo_local_path
    319 )

File ~/.local/lib/python3.8/site-packages/huggingface_sb3/push_to_hub.py:53, in _generate_config(model, repo_local_path)
     51     data = json.load(json_file)
     52     # Add system_info elements to our JSON
---> 53     data["system_info"] = stable_baselines3.get_system_info(print_info=False)[0]
     55 # Step 3: Write our config.json file
     56 with open(Path(repo_local_path) / "config.json", "w") as outfile:

AttributeError: module 'stable_baselines3' has no attribute 'get_system_info'

[lib][RL] Using pytorch-lightning, catalyst-rl, sample-factory

Can you please tell me how you look at using the following libraries, ideas, for training in a course or research, production and use for reinforcement learning?

  1. pytorch-lightning https://www.pytorchlightning.ai/
  2. PyTorch Lightning bolts https://github.com/PyTorchLightning/lightning-bolts
  3. catalyst-rl https://github.com/catalyst-team/catalyst-rl
  4. sample-factory https://github.com/alex-petrenko/sample-factory

What do you think about this?

Unit5 tutorial errors with gym version 0.26.1 . The version is not specified in the tutorial

  • When running the tutorial on google colab, gym version 0.25.2 gets installed currently and things work.

  • Installing on other systems, you get the most recent version by default which is 0.26.1 and it has breaking changes that make it incompatible with code designed to work with 0.25.x

  • Suggestion to specify the versions of dependencies in this and other tutorials or the smaller fix would be to just specify the version of gym to be 0.25.x .

Unit 1 Glossary Markov Property

The definition feels a bit backwards, considering that the agent
can always decide to act based only on the current state, but
whether or not that can be optimal depends on the environment,
or its state progression to be precise, satisfies a Markov property.

Unit 1 crash when pushing to HF hub

I'm successfully creating, training and evaluating a model.
When invoking I'm getting the stacktrace below.

The repository is created in my HF account (with a .gitattributes via an initial commit).

This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.
/usr/local/lib/python3.8/dist-packages/stable_baselines3/common/evaluation.py:65: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.
  warnings.warn(
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
[<ipython-input-41-aa4aae99d08a>](https://localhost:8080/#) in <module>
     25 
     26 # method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub
---> 27 package_to_hub(model=model, # Our trained model
     28                model_name=model_name, # The name of our trained model
     29                model_architecture=model_architecture, # The model architecture we used: in our case PPO

9 frames
[/usr/local/lib/python3.8/dist-packages/huggingface_sb3/push_to_hub.py](https://localhost:8080/#) in package_to_hub(model, model_name, model_architecture, env_id, eval_env, repo_id, commit_message, is_deterministic, n_eval_episodes, token, video_length, logs)
    373 
    374         # Step 4: Generate a video
--> 375         _generate_replay(model, replay_env, video_length, is_deterministic, tmpdirname)
    376 
    377         # Step 5: Generate the model card

[/usr/local/lib/python3.8/dist-packages/huggingface_sb3/push_to_hub.py](https://localhost:8080/#) in _generate_replay(model, eval_env, video_length, is_deterministic, local_path)
    134         )
    135 
--> 136         obs = env.reset()
    137         lstm_states = None
    138         episode_starts = np.ones((env.num_envs,), dtype=bool)

[/usr/local/lib/python3.8/dist-packages/stable_baselines3/common/vec_env/vec_video_recorder.py](https://localhost:8080/#) in reset(self)
     66     def reset(self) -> VecEnvObs:
     67         obs = self.venv.reset()
---> 68         self.start_video_recorder()
     69         return obs
     70 

[/usr/local/lib/python3.8/dist-packages/stable_baselines3/common/vec_env/vec_video_recorder.py](https://localhost:8080/#) in start_video_recorder(self)
     78         )
     79 
---> 80         self.video_recorder.capture_frame()
     81         self.recorded_frames = 1
     82         self.recording = True

[/usr/local/lib/python3.8/dist-packages/gym/wrappers/monitoring/video_recorder.py](https://localhost:8080/#) in capture_frame(self)
    130 
    131         render_mode = "ansi" if self.ansi_mode else "rgb_array"
--> 132         frame = self.env.render(mode=render_mode)
    133 
    134         if frame is None:

[/usr/local/lib/python3.8/dist-packages/stable_baselines3/common/vec_env/dummy_vec_env.py](https://localhost:8080/#) in render(self, mode)
     85         """
     86         if self.num_envs == 1:
---> 87             return self.envs[0].render(mode=mode)
     88         else:
     89             return super().render(mode=mode)

[/usr/local/lib/python3.8/dist-packages/gym/core.py](https://localhost:8080/#) in render(self, mode, **kwargs)
    293 
    294     def render(self, mode="human", **kwargs):
--> 295         return self.env.render(mode, **kwargs)
    296 
    297     def close(self):

[/usr/local/lib/python3.8/dist-packages/gym/envs/box2d/lunar_lander.py](https://localhost:8080/#) in render(self, mode)
    386 
    387     def render(self, mode="human"):
--> 388         from gym.envs.classic_control import rendering
    389 
    390         if self.viewer is None:

[/usr/local/lib/python3.8/dist-packages/gym/envs/classic_control/rendering.py](https://localhost:8080/#) in <module>
     25 
     26 try:
---> 27     from pyglet.gl import *
     28 except ImportError as e:
     29     raise ImportError(

[/usr/local/lib/python3.8/dist-packages/pyglet/gl/__init__.py](https://localhost:8080/#) in <module>
    233 elif compat_platform == 'darwin':
    234     from .cocoa import CocoaConfig as Config
--> 235 del base  # noqa: F821
    236 
    237 

NameError: name 'base' is not defined

Leaderboard - Had to refresh the dashboard

I waited about ~6 hours to see if my Lunar Lander model from unit 1 would show up on the leaderboard. I noticed that at the very bottom it had a 'refresh' button. Once I clicked refresh my model shows on the leaderboard, but if it is necessary to click that refresh button, it would be good to add a note in the unit 1 jupyter notebook because other people will also likely wonder why their model is not showing.

Import in unit-1 causes using deprecated functionality of a dependency and leads to error

In unit-1 notebook, in the section titled Step 9: Load a saved LunarLander model from the Hub πŸ€—, there is an external library that is used to render the steps of the lunar lander.

That library is colabgymrender.

It is first installed via:

!pip install colabgymrender==1.0.2

Then when it is imported, it throws a RuntimeError, like so:

RuntimeError: imageio.ffmpeg.download() has been deprecated. Use 'pip install imageio-ffmpeg' instead.'

It is a dependency error rather than an error with any of the tutorial content.

The full error message.
RuntimeError                              Traceback (most recent call last)

[<ipython-input-20-830a12517f0a>](https://localhost:8080/#) in <module>
----> 1 from colabgymrender.recorder import Recorder
      2 
      3 directory = './video'
      4 env = Recorder(eval_env, directory)
      5 

2 frames

[/usr/local/lib/python3.7/dist-packages/colabgymrender/recorder.py](https://localhost:8080/#) in <module>
      1 from pyvirtualdisplay import Display
----> 2 from moviepy.editor import *
      3 import time
      4 import gym
      5 import cv2

[/usr/local/lib/python3.7/dist-packages/moviepy/editor.py](https://localhost:8080/#) in <module>
     24 # Checks to see if the user has set a place for their own version of ffmpeg
     25 if os.getenv('FFMPEG_BINARY', 'ffmpeg-imageio') == 'ffmpeg-imageio':
---> 26     imageio.plugins.ffmpeg.download()
     27 
     28 # Clips

[/usr/local/lib/python3.7/dist-packages/imageio/plugins/ffmpeg.py](https://localhost:8080/#) in download(directory, force_download)
     36 def download(directory=None, force_download=False):  # pragma: no cover
     37     raise RuntimeError(
---> 38         "imageio.ffmpeg.download() has been deprecated. "
     39         "Use 'pip install imageio-ffmpeg' instead.'"
     40     )

RuntimeError: imageio.ffmpeg.download() has been deprecated. Use 'pip install imageio-ffmpeg' instead.'

Variable called before assigment in unit-1

In the section- 'The package_to_hub function', there is a case of one variable being called before it was assigned.

This is what is given in the notebook:

# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])

# TODO: Define the name of the environment
env_id = 'LunarLander-v2'

Here env_id name is called before it is assigned, and leads to error.

This is what it should have been:

# TODO: Define the name of the environment
env_id = 'LunarLander-v2'

# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])

goal conditioned RL

Please tell me if the course will cover the topic of goal conditioned RL?

When with the help of award design, high results are achieved on simple algorithms, comparable to the latest RL SoTA

package_to_hub not working

At the end of unit 1 I cannot execute the package_to_hub function.
I get the error "NameError: name 'base' is not defined" in line 235 of /usr/local/lib/python3.7/dist-packages/pyglet/gl/init.py

Unit 5 Tutorial errors and changes to the code

Hi, there are errors in the unit5. ipynb. I tested for the gym environment 0.24 and 0.26, the errors I got are following:

  1. ValueError: too many values to unpack (expected 4) (in reinforce function)
  2. Expected nd_array and tuple ( in policy function)

I have made the following changes in the code, if somebody has problem try to change those functions with the following code

`class Policy(nn.Module):
def init(self, s_size, a_size, h_size):
super(Policy, self).init()
self.fc1 = nn.Linear(s_size, h_size)
self.fc2 = nn.Linear(h_size, a_size)

def forward(self, x):
    x = F.relu(self.fc1(x))
    x = self.fc2(x)
    return F.softmax(x, dim=1)

def act(self, state):
    if len(state) == 2:
        state = state[0]
    state = torch.from_numpy(state).float().unsqueeze(0).to(device)
    probs = self.forward(state).cpu()
    m = Categorical(probs)
    action = m.sample()
    return action.item(), m.log_prob(action)`

For reinforce function replace this state, reward, done, _ = env.step(action) with this one state, reward, done, _, _ = env.step(action)

Similarly for the evaluation function, change this new_state, reward, done, info = env.step(action) to this new_state, reward, done, info, _ = env.step(action).

This additional information after the env.step is being reported in the gym as "truncated", as it can be seen from the below code from gym docs observation, reward, terminated, truncated, info = env.step(action)

Error in Unit 5

In a Solution section it says

class Policy(nn.Module):
    def __init__(self, s_size, a_size, h_size):
        super(Policy, self).__init__()
        self.fc1 = nn.Linear(s_size, h_size)
        self.fc2 = nn.Linear(h_size, a_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.softmax(x, dim=1)
    
    def act(self, state):
        state = torch.from_numpy(state).float().unsqueeze(0).to(device)
        probs = self.forward(state).cpu()
        m = Categorical(probs)
        action = np.argmax(m)
        return action.item(), m.log_prob(action)

And then

I make a mistake, can you guess where?

  • To find out let's make a forward pass:

But when I run the following code

debug_policy = Policy(s_size, a_size, 64)
debug_policy.act(env.reset())

The error is

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)

Which is different than the one shared below

Logging with tensorboard and wandb

Hey there!

I would like to make a notebook which helps others get started with logging of their experiments with tensorboard and wandb, along with pushing the logs to hub.

Why Double DQN don't use the accumulated reward

The main goal of Deep Learning is to maximize the accumulate reward. In the Q-Learn we use the accumulate reward to update the Qtable. However, the DDQN use the instant reward instead of accumulated reward to update the network.

Can anyone tell me, Why?

image

Huggy bonus unit should use smaller checkpoint intervals

The oldest checkpoint I have already received ~90% of the reward that the final trained one gets, making it hard to see any progression.
A smaller checkpoint interval would give users a 'bad' model to compare to the finially trained on.

Advantages/disadvantages between monte carlo and td learning

I have just read the part 1 of the introduction of Q-learning and althought I believe the methods are very well described I have missed an explanation of the advantages/disadvantages of using either methods. Maybe that will come in part 2 but just in case...

This link might be a good reference to explain the difference in variance and bias between the methods.

This other link also is very interesting regarding when it may have sense to use monte carlo over td.

Thanks and congratulations for the course
Guillermo

model_name must not have an extension

If a user sets model_name with an extension, model.save(name) won't append .zip. The course's internal tooling appends .zip to the name despite the content. As the outcome there will be an error that the file was not found.
Steps to reproduce.

model_name = 'my_model.ext'
model.save(model_name)

package_to_hub(model, model_name=model_name, ...)

this will return an error that my_model.ext.zip does not exist.

Unit II - Part II Update Rule for Q-values

Hey guys,

I think there are two typos in step 4 update rule. Atm, it is written as:

$Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t+1}+\gamma \max_{\alpha}Q(S_{t+1}, \alpha) - Q(S_{t}, A_{t})]$

instead, I think it should be:
$Q(S_{t}, A_{t})\leftarrow Q(S_{t}, A_{t}) + \alpha [R_{t}+\gamma \max_{\alpha}Q(S_{t+1}, A_{t+1}) - Q(S_{t}, A_{t})]$

where $A_{t+1}$ is the best action of the next state and $R_{t}$ refers to the immediate reward at step $t$.

Regards,
Vangelis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.