shmuma / ptan Goto Github PK

View Code? Open in Web Editor NEW

524.0 22.0 162.0 631 KB

PyTorch Agent Net: reinforcement learning toolkit for pytorch

License: MIT License

Python 100.00%

ptan's Introduction

PTAN

PTAN stands for PyTorch AgentNet -- reimplementation of AgentNet library for PyTorch

This library was used in "Deep Reinforcement Learning Hands-On" book, here you can find sample sources.

Code branches

The repository is maintained to keep dependency versions up-to-date. This requires efforts and time to test all the examples on new versions, so, be patient.

The logic is following: there are several branches of the code, corresponding to major pytorch version code was tested. Due to incompatibilities in pytorch and other components, code in the printed book might differ from the code in the repo.

At the moment, there are the following branches available:

master: contains the code with the latest pytorch which was tested. At the moment, it is pytorch 1.7.
torch-1.3-book-ed2: code printed in the book (second edition) with minor bug fixes. Uses pytorch=1.3 which is available only on conda repos.
torch-1.7: pytorch 1.7. Merged with master.

All the branches uses python 3.7, more recent versions weren't tested.

Installation

From sources:

python setup.py install

From pypi:

pip install ptan

From github:

pip install pip install git+https://github.com/Shmuma/ptan.git

Requirements

PyTorch: version 1.1.0 is required
PyTorch Ignite: provides extra bindings for ignite
OpenAI Gym: pip install gym gym[atari]
Python OpenCV: pip install opencv-python
TensorBoardX: pip install tensorboardX

Note for Anaconda Python users

To run some of the samples, you will need these modules:

conda install pytorch torchvision -c pytorch
pip install tensorboard-pytorch
pip install gym
pip install gym[atari]
pip install opencv-python

Documentation

Ptan introduction

Random pieces of information

ExperienceSource vs ExperienceSourceFirstLast: #17 (comment)

ptan's People

Contributors

Stargazers

Watchers

Forkers

codeaudit little1tow amoliu b02901017 meelement charonn haje01 daominglyu gdwei wecacuee mit-drl stefanzzz22 wwxfromtju bryonkucharski lizhangosu yitanjang dhanush-ai1990 nanite-bot johny-c lucasosouza holybayes godmoves alvincjin francohtlin ztx0728 landoufulxf megayeye stjordanis guyko81 hexi2015 vishalbelsare andreistirb djbyrne qucheng cpnota srzambito yadroz elitalobo tk1363704 villwill quantumiracle vlievin vlaskinvlad yhamidullah psyche-mia yulkang kerns-ai-lab hglun mishuk-015 jskdr carljohanrehn thswind alexbuce alleboudy guilinf makiskans shaswot-utokyo dennispiskovatskov vwxyzjn samirtouzani beoy diem389 volodymyrk alessandrotringali sailfish009 thomas-gale ericzhang2008 hermannliang bniss jeremylee268 zivzone cloudcell daggerfall-is-the-best-tes-game peterboost behrouz-psh zdv2622 wunder2dream antonreborn fulw pglah dsharpc xieyj17 sijeong jonndoe karost2790 shaobin ploxoy andrelip kiddxtrizz elephann demia814 snowlunar yang0110 domjrivera hansjoergw yukikitayama kunpengliu0827 fable67 lowelltech minoki

ptan's Issues

Unable to install ptan via pip (old versions of dependencies not available)

Hello,

first of all: thank you so much for your book! It was really a pleasure to read and had the right balance between theory and practical implementation.
I am currently reimplementing some of the chapters. Now I need to run your code (to compare the outcomes to my implementation, I do not use ptan).
PTAN currently needs pytorch==1.3 which is not available via pip. The requirements.txt of the book's repository needs pytorch 0.4.1 which is also not available via pip anymore.

Does anybody has suggestions how I can install ptan currently? Or even better how I am able to run the book's code that uses ptan. In the chapters without ptan it was easily possible to adapt the code to pytorch 1.5.

Thanks in advance!
Markus

At least some samples currently broken

Hi,

The DQN_Speedup samples no longer run since the upgrade of torch to 0.4.0. The ptan agent interface no longer accepts the cuda variable as a parameter to the init function

Populate the repo with README

Please, include installation guide!

Can not install ptan because of Pytorch version

Hello, first of all thank you very much for your deep rl book. When I try to install ptan I get the following error. I get the error that Pytorch's version does not fit. As far as I can see pytorch == 1.3.0 has been removed. What should I do?

My Pytorch and torchvison version are:

torch==1.6.0
torchvision==0.7.0

pip install ptan==0.6

Collecting ptan==0.6
Using cached ptan-0.6.tar.gz (19 kB)
ERROR: Could not find a version that satisfies the requirement torch==1.3.0 (from ptan==0.6) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2, 1.4.0, 1.5.0, 1.5.1, 1.6.0)
ERROR: No matching distribution found for torch==1.3.0 (from ptan==0.6)
Note: you may need to restart the kernel to use updated packages.

Slower speed than book indicates

I'm using PyTorch 1.0.1 with Cuda 10.1 and have 8 cores. I have the same 1080 TI as you have.
However, I get only half the speed when running your examples (or less) than what you indicate in the book. Even with all other programs closed.

Do you have any idea what's going on? Any particular reason why the performance could be so much worse? I made sure it's definitely using CUDA in the examples (otherwise it would be 100x slower for some of the problems)

Hi, how do I get numpy array froma LazyFrame to simply play the Trained nets ?

I have into Chapter 7 of your book. Its really impressive, however many details are buried within this PTAN package.
I believe I have trained a number of nets against atari games in Chapt 7, but to replay them is causing me some frustration. I tried to modify the play game code from Chapter 6. But now state = env.reset() / step returns a ptan.common.wraper.LazyFrames object. It is not obvious how to convert this back into a simple numpy array to select a Single Best Action, for playing a trained game.
state_v = torch.tensor(np.array([state], copy=False))
returns a Typerror as numpy does not understand your LazyFrames object type. It is not obvious to simply convert a single obs(LazyFrame) into a numpy object, and hence into a torch tensor to feed into DQN network.
Hoping for some help, to continue

Re-open "ExperienceReplayBuffer stores second-last transition twice"

Re-open of #25 .

a2c.py and a2c_atari.py throw error with sample run files

For example when I run a2c.py -r "runs/a2c/a2c_cartpole.ini" tons of errors pop up.

Regardless I like that you've implemented a lot of algorithms and put them here. It's very useful for someone new to RL like me, I'm mainly just reading through the code to figure out what is going on.
It's just a shame the samples don't seem to be working as intended. :(

It requires pytorch 1.3.0, but I can't download it

I have 1.3.1 on my computer, ptan this does not work with it. I tried and failed to download 1.3.0 release. So what to do with my "Deep Reinforcement Learning Hands-on" book?

Examples in chapters 8 and 9 don't run

All of them produce the following error:
Traceback (most recent call last):
File "04_dqn_noisy_net.py", line 59, in
common.setup_ignite(engine, params, exp_source, NAME, extra_metrics=('snr_1', 'snr_2'))
File "/home/asterix/ML/Deep-Reinforcement-Learning-Hands-On-Second-Edition/Chapter08/lib/common.py", line 159, in setup_ignite
ptan_ignite.EpisodeFPSHandler().attach(engine)
File "/usr/local/lib/python3.8/dist-packages/ptan-0.6-py3.8.egg/ptan/ignite.py", line 80, in attach
File "/home/asterix/.local/lib/python3.8/site-packages/ignite/engine/engine.py", line 269, in add_event_handler
if event_name not in self._allowed_events:
File "/home/asterix/.local/lib/python3.8/site-packages/ignite/engine/events.py", line 124, in eq
raise NotImplementedError
NotImplementedError

Can not install ptan

Thank you for your book, the way you described let me understand easier on RL algorithm, However, I'm stucking on chapter 07, which using your library(ptan). I install every thing you recommended as below.

conda install pytorch torchvision -c pytorch
pip install tensorboard-pytorch
pip install gym
pip install gym[atari]
pip install opencv-python

and, pip install ptan. the error need torch 1.3.0 but I couldn't the way to install torch 1.3.0
ERROR: Could not find a version that satisfies the requirement torch==1.3.0 (from ptan) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2)ERROR: No matching distribution found for torch==1.3.0 (from ptan)

Is it possible to work with torch 1.5 or other way to install torch 1.3.0?

Thank you

torch.autograd Variables are deprecated

ptan/samples/rainbow/lib/common.py

Line 88 in 84d3492

states_v = Variable(torch.from_numpy(states))

atari-py is no longer supported

If you go to the github page of Atari-py then you will see the following

Status: Deprecated (don't expect bug fixes or other updates)

Notice: atari-py is fully deprecated and no future updates, bug fixes or releases will be made. Please use the official Arcade Learning Environment Python package (ale-py) instead; it is fully backwards compatible with all atari-py code.

I believe that this library should upgrade.

Outdated Experience Source

Experience Source asserts when env == gym.Env
However, newer gymnasium Env is (same but) diferent from gym.Env.

the solution is just to delete this assertion and include gym.Env typing

class ExperienceSource:
    def __init__(self, env:gym.Env,   agent, ...): <------------------
             """
             ···
             """
             #assert isinstance(env, (gym.Env, tuple, list)) <-------------------
             assert isinstance(agent, BaseAgent)
             ...

unable to install ptan

Hi,
I have pytorch 1.3.1 installed on my pc. And I cant install PTAN as it cant satisfy tour check of pytorch 1.3.0

Any suggestions or way around for this problem

Not able to run ptan.ignite when custom events inherit from enum.Enum

Whenever I use ptan.ignite and call EpisodeFPSHandler().attach(engine), I get a NotImplementedError:

File "../common.py", line 159, in setup_ignite
ptan_ignite.EpisodeFPSHandler().attach(engine)
File "../venv/lib/python3.6/site-packages/ptan/ignite.py", line 80, in attach
engine.add_event_handler(EpisodeEvents.EPISODE_COMPLETED, self)
File "../venv/lib/python3.6/site-packages/ignite/engine/engine.py", line 269, in add_event_handler
if event_name not in self._allowed_events:
File "../venv/lib/python3.6/site-packages/ignite/engine/events.py", line 124, in __eq__
raise NotImplementedError

From what I see, ptan.ignite's custom event classes, EpisodeEvents and PeriodEvents, inherit from enum.Enum and they do get registered to the ignite Engine and they are in the ignite's Engine._allowed_events, however they don't become CallableEventWithFilter objects for some reason. I was able to fix the issue by making the custom event classes inherit from EventEnum in ignite.engine:

from ignite.engine import EventEnum
class EpisodeEvents(EventEnum):
class PeriodEvents(EventEnum):

I don't know why I had the issue, I was wondering if was because of some ignite update or something, but I thought you should know.

Thanks.

Weights should not affect probabilities in PrioReplayBuffer

In samples/rainbow/05_dqn_prio_replay.py, weights are propagated to batch_weights_v and multiplied by (state_action_values - expected_state_action_values) ** 2 to calculate losses_v.

(losses_v + 1e-5) is then used to calculate probabilites.

However, according to https://arxiv.org/pdf/1511.05952.pdf (Priority Experience Replay article, see Algorithm 1), TD-error is used as priority before it is multiplied to a weight.

Is it a mistake?

DQN Speedup Windows CUDA Error

First, thanks for the great work! I tried the DQN Speedup files and was able to get 01 and 02 to run (with about 50fps on an GTX 1070), but anthing >=3 gives a Pytorch 32 Error for me as soon as I use Cuda, because apparently Pytorch on Windows can't use multiprocessing! (CUDA IPC not supported) Is there any way around this, because the speedup would be really great for experiments, and multiprocessing seems to be the most important part!

Where to find and how to restore dqn_speedup model?

I want to train DQN agent and run Pong env to see how the agent plays.

After successful completion 05_new_wrappers.py I can find only event logs files: events.out.tfevents.1556915999.ip-172-31-42-166

Where to find and how to restore dqn_speedup model?

AttributeError: module 'ignite' has no attribute 'EndOfEpisodeHandler' - still exists in ptan version 0.7

Hi Shmuma,
Thanks for putting together this library.

I'm running
Python: 3.8
ptan: 0.7
pytorch-ignite: 0.4.2

The error I'm getting is listed below. In this issue (#41) you mentioned that the EndOfEpisodeHandler issue was fixed in ptan version 0.7 but I'm running it and I still have this issue.
Any advice?

Thanks!

Chris

Traceback (most recent call last):
File "train_model.py", line 114, in
tb = common.setup_ignite(engine, exp_source, f"simple-{args.run}",
File "C:\dev\trading\drl-stock-trading-1\lib\common.py", line 76, in setup_ignite
handler = ptan.ignite.EndOfEpisodeHandler(exp_source, subsample_end_of_episode=100)
AttributeError: module 'ignite' has no attribute 'EndOfEpisodeHandler'

Which PyTorch version?

I can't instal the package. There's info in requirements that PyTorch 1.0.0 is required. I instal it and get the info in anaconda propmpt that version 1.3.0 is required. Unfortunatelly I can't instal version 1.3.0 because of some conflicts but still not sure if that would resolve the issue.

there's a typo line 497 of experience.py

ptan/ptan/experience.py

Line 497 in 84d3492

 return QLearningPreprocessor(model, target_model, use_double_dqn=False, **kwards) 

Examples under ptan/samples are outdated

It seems that the examples under ptan/samples are outdated. For instance, the code for creating agent in dqn_expreplay.py does not match the current definition

agent = ptan.agent.DQNAgent(model, action_selector), cuda=cuda_enabled)

While in the current class definition, the creator arguments are mismatched:

class DQNAgent(BaseAgent):

def __init__(self, dqn_model, action_selector, device="cpu", preprocessor=default_states_preprocessor):
    self.dqn_model = dqn_model
    self.action_selector = action_selector
    self.preprocessor = preprocessor
    self.device = device

Can we update the samples as well?
Thanks.

Multi agent support

Hi, I like your library. Great job! However, I deep dived your code and it's hard to make adaptions for multi agent scenarios. Did you ever think about it?

Hi Any chance of updating to PyTorch 1.0 ?

Hello
I am finding it very difficult to install against the previous PyTorch 0.4.0, as it is now released at 1.0. Is it possible to support PyTorch 1.0 and update the code and samples ? Do we expect it to be very different or any errors ?

Observation>:
When I download and run setup.py, it seems to do an install. However When I then attempt a python import ptan, I do have noetd I get an error
file/home/jules/anaconda3/lib/python3.6/site-packages/ptan-0.3-py3.6.egg/ptan/common/wrapper.py
ImportError:Cannot import name 'spaces' in line 6 of wrapper.py

It seems to be a problem in my environment seeing spaces within gym. Although with Anaconda I always have to set sys.path.append("/home/jules/gym") to make the gym package visible, and usable I still cannot import gym.spaces under a bash python prompt.
However if I edit in VSCode, do the imports and run the python code it appears I can import ptan and run some cartpole example without errors.
=> So may not an Issue

Pytorch version 1.7.1

I noticed that the requirements in setup.py show that the version of PyTorch must be 1.7.0, I'm wondering whether 1.7.1v will be OK?

[FR] Possibility of rendering and testing the models after training.

Things PTAN and the book can improve on.

I really like your book. The knowledge with implementation helps me understand many concepts faster. However, I cannot make progress from Chapter 09 since the introduction of PTAN. PTAN is a great library and I intended to apply it to my game environment (which is not openAI Gym). However, I had hard time doing that for the following reasons.
One problem is that you are trying to generalize the library to fit many agents and use-cases while targeting for only Open AI gym environment, which leads to un-reusability for mainstream development. I believe this hindered the popularity of this library also.
For example, the experience source implementation is overcomplicated with many options, many if else conditions and untidy __iter function. It greatly reduces the readability and readers of the book because as a reader, we need an easy to read and reuse code for other practical experiments, not a fully-packed library.
I know PTAN helps abstract repeated implementation. However, the abstraction here hurt the understanding of readers when you try to put much stuff inside the library.

Random policy within intitial replay buffer

Right now there is no way to actually fill the intitial replay buffer with random actions

dqn uses huberloss instead of mseloss

https://blog.openai.com/openai-baselines-dqn/
... In the DQN Nature paper the authors write: “We also found it helpful to clip the error term from the update [...] to be between -1 and 1.“. There are two ways to interpret this statement — clip the objective, or clip the multiplicative term when computing gradient. The former seems more natural, but it causes the gradient to be zero on transitions with high error, which leads to suboptimal performance, as found in one DQN implementation. The latter is correct and has a simple mathematical interpretation — Huber Loss. You can spot bugs like these by checking that the gradients appear as you expect —...

i am reallly sorry for emitting so many issues but i really love the repo, thanks you

01_original.py not running

I tried to run the 1st code under dqn_speedup by running the command python 01_original.py and get the following error.
Traceback (most recent call last): File "01_original.py", line 32, in <module> exp_source = ptan.experience.ExperienceSourceFirstLast(env, agent, gamma=params['gamma'], steps_count=1) File "/home/daksh/anaconda2/envs/torch/lib/python2.7/site-packages/ptan/experience.py", line 168, in __init__ super(ExperienceSourceFirstLast, self).__init__(env, agent, steps_count+1, steps_delta, vectorized=vectorized) TypeError: super() argument 1 must be type, not classobj
Please look into it and let me know.

Is it possible to support PyTorch 1.5?

pollicy functions should use torch functions instead of numpy

https://pytorch.org/docs/stable/distributions.html score functions and categorical sampling is already implemented in pytorch, using numpy should be discouraged.
policy network should output a probability distribution

ExperienceReplayBuffer stores second-last transition twice

Hi Maxim,
first of all, thank you so much for the book! It helps me a lot for my thesis!

Second, I think that the ExperienceReplayBuffer stores the second-last transition twice, which could bias the training if an environment only has a few steps (like mine).
Maybe I have overlooked something, but this is my minimal example showing the described behaviour:

import ptan
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import gym

EPSILON_START = 1.0
GAMMA = 1.0
REWARD_STEPS = 1
LEARNING_RATE = 0.001
MAX_STEPS = 20
REPLAY_SIZE = 10
MAX_STEPS_PER_EPISODE = 3
device = torch.device("cpu")

class Environment(gym.Env):
    def __init__(self):
        self.state = 0
        self.observation_space = gym.spaces.Discrete(5)
        self.action_space = gym.spaces.Discrete(2)
    
    def reset(self):
        self.state = 0
        return self.state
    
    def step(self, action):
        self.state += 1
        if self.state == 4:
            done = True
        else:
            done = False
        reward = self.state
        return self.state, reward, done, None

class DQN(nn.Module):
    def __init__(self, input_shape, n_actions):
        super(DQN, self).__init__()
        self.n_actions = n_actions
        
    def forward(self, x):
        return torch.rand(1,self.n_actions)

env = Environment()
net = DQN(env.observation_space.shape, env.action_space.n).to(device)
selector = ptan.actions.EpsilonGreedyActionSelector(EPSILON_START)
agent = ptan.agent.DQNAgent(net, selector, device=device)
exp_source = ptan.experience.ExperienceSourceFirstLast(env, agent, GAMMA, steps_count=REWARD_STEPS)
buffer = ptan.experience.ExperienceReplayBuffer(exp_source, REPLAY_SIZE)

step_idx = 0

while step_idx < MAX_STEPS:
    step_idx += 1
    buffer.populate(1)
    new_rewards = exp_source.pop_rewards_steps()
    
    if new_rewards:
        print("episode over: step {}: (total_reward, steps) = {}".format(step_idx, new_rewards[0]))
        
print()
print(*buffer.buffer, sep='\n')

The output is:

episode over: step 6: (total_reward, steps) = (10.0, 4)
episode over: step 11: (total_reward, steps) = (10.0, 4)
episode over: step 16: (total_reward, steps) = (10.0, 4)

ExperienceFirstLast(state=0, action=0, reward=1.0, last_state=1)
ExperienceFirstLast(state=1, action=0, reward=2.0, last_state=2)
ExperienceFirstLast(state=2, action=0, reward=3.0, last_state=3)
ExperienceFirstLast(state=2, action=0, reward=3.0, last_state=3)
ExperienceFirstLast(state=3, action=0, reward=4.0, last_state=None)
ExperienceFirstLast(state=0, action=0, reward=1.0, last_state=1)
ExperienceFirstLast(state=1, action=0, reward=2.0, last_state=2)
ExperienceFirstLast(state=2, action=0, reward=3.0, last_state=3)
ExperienceFirstLast(state=2, action=0, reward=3.0, last_state=3)
ExperienceFirstLast(state=3, action=1, reward=4.0, last_state=None)

Notice how the transition from state 2 to 3 is stored twice each time. I used the ptan version that you can get via pip today. Could you have a look into this?
Best regards!

drop of f/s in dqn_speedup

Hi! I tried running the code and f/s dropped significantly after 10 episodes. Is it normal?

python3 05_new_wrappers.py
WARN: <class ‘lib.atari_wrappers.FrameStack’> doesn’t implement ‘reset’ method, but it implements deprecated ‘_reset’ method.
935: done 1 games, mean reward -19.000, speed 281.93 f/s, eps 0.99
1694: done 2 games, mean reward -20.000, speed 334.75 f/s, eps 0.98
2502: done 3 games, mean reward -20.333, speed 338.90 f/s, eps 0.97
3441: done 4 games, mean reward -20.500, speed 335.34 f/s, eps 0.97
4537: done 5 games, mean reward -20.200, speed 289.07 f/s, eps 0.95
5531: done 6 games, mean reward -19.833, speed 330.80 f/s, eps 0.94
6649: done 7 games, mean reward -19.714, speed 335.40 f/s, eps 0.93
7648: done 8 games, mean reward -19.625, speed 334.24 f/s, eps 0.92
8427: done 9 games, mean reward -19.778, speed 331.16 f/s, eps 0.92
9462: done 10 games, mean reward -19.700, speed 333.20 f/s, eps 0.91
10399: done 11 games, mean reward -19.818, speed 40.25 f/s, eps 0.90
11157: done 12 games, mean reward -19.917, speed 18.19 f/s, eps 0.89
12234: done 13 games, mean reward -19.769, speed 17.30 f/s, eps 0.88
13305: done 14 games, mean reward -19.714, speed 16.67 f/s, eps 0.87
14345: done 15 games, mean reward -19.733, speed 16.37 f/s, eps 0.86
15368: done 16 games, mean reward -19.688, speed 16.10 f/s, eps 0.85
16308: done 17 games, mean reward -19.706, speed 15.96 f/s, eps 0.84
17303: done 18 games, mean reward -19.667, speed 15.72 f/s, eps 0.83
18406: done 19 games, mean reward -19.632, speed 15.95 f/s, eps 0.82
19307: done 20 games, mean reward -19.700, speed 15.08 f/s, eps 0.81
20146: done 21 games, mean reward -19.714, speed 16.20 f/s, eps 0.80
21251: done 22 games, mean reward -19.727, speed 16.02 f/s, eps 0.79
22008: done 23 games, mean reward -19.783, speed 15.60 f/s, eps 0.78
22968: done 24 games, mean reward -19.750, speed 15.50 f/s, eps 0.77
23731: done 25 games, mean reward -19.800, speed 16.23 f/s, eps 0.76
24857: done 26 games, mean reward -19.769, speed 16.67 f/s, eps 0.75
25617: done 27 games, mean reward -19.815, speed 16.48 f/s, eps 0.74
26535: done 28 games, mean reward -19.857, speed 16.76 f/s, eps 0.73
27413: done 29 games, mean reward -19.897, speed 16.02 f/s, eps 0.73
28251: done 30 games, mean reward -19.900, speed 16.86 f/s, eps 0.72
29279: done 31 games, mean reward -19.871, speed 15.92 f/s, eps 0.71

ExperienceSourceFirstLast

Can someone explain the main difference between ExperienceSourceFirstLast and ExperienceSource? Are we still storing every incoming state?

Code run time

Is there a way to run the training and evaluation for a certain number of time-steps? Currently the PPO examples in both the first and second edition run forever.