kashif / firedup Goto Github PK
View Code? Open in Web Editor NEWClone of OpenAI's Spinning Up in PyTorch
License: MIT License
Clone of OpenAI's Spinning Up in PyTorch
License: MIT License
Since I have to install this package before it works through pip install -e .
, this is inconvenient for modify the code, can I use it without install?
I cloned your repo and ran the vpg algo and compared the perf with the tensorflow version. I did an average of 5 runs to take care of the random seed and I saw some interesting results
Tensorflow: Avg Episode Return 81
Pytorch: Avg Episode Return 31
Why do you think this might be the case.
Disclaimer: I haven't read your code thoroughly so there might be some very small mistake. But is diff in performance of RL algos substantial in tf and pytorch ?
How do I utilize a GPU for training?
firedup/fireup/algos/sac/sac.py
Line 280 in 7011b0c
If you are trying to implement Spinning Up's version, you need to use q1_pi
(https://github.com/openai/spinningup/blob/master/spinup/algos/sac/sac.py#L176)
# Soft actor-critic losses
pi_loss = (alpha * logp_pi - q1_pi).mean()
When calculating the scaled log_std in SAC policy, you scale log_std + 1
to the range [LOG_STD_MIN, LOG_STD_MAX]
. Is this because the range of the tanh
function is [-1, 1]
?
Is it really necessary? Wouldn't the scaling limit the output range to [LOG_STD_MIN, LOG_STD_MAX]
even without that?
Hi, I ran this pytorch version SAC on Mujoco, which took time almost three times more than the original tf version code? Why did this happen? Is there any way to improve the speed?
Hi @kashif, thanks for making this available!
The DDPG implementation currently gives me this following error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [300, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
I am on torch version
torch 1.6.0
torchvision 0.5.0
I tried this repo with a simple env
class SimpleEnv(gym.Env):
metadata = {'render.modes': ['human']}
def __init__(self):
super(SimpleEnv, self).__init__()
self.observation_space = spaces.Box(low=0, high=2, shape=(4, 4))
self.action_space = spaces.Discrete(3)
self.reset()
def step(self, action):
ob = self.observation_space.sample()
reward = 1
episode_over = False if random.random()>0.5 else True
return ob, reward, episode_over, {}
def reset(self):
ob = self.observation_space.sample()
return ob
def render(self, mode='human'):
pass
and use it with the policy gradient agent as
env = SimpleEnv
env.seed(0)
ac_kwargs = dict(hidden_sizes=(16,))
agent = vpg(env, ac_kwargs=ac_kwargs)
episode_count = 100
reward = 0
done = False
for i in range(episode_count):
ob = env.reset()
while True:
print(done)
action = agent.act(ob, reward, done)
ob, reward, done, _ = env.step(action)
if done:
break
But when I run this it get:
RuntimeError: size mismatch, m1: [1 x 16], m2: [4 x 16] at /opt/conda/conda-bld/pytorch-cpu_1549626403278/work/aten/src/TH/generic/THTensorMath.cpp:940
This seems because of the mismatch of observation space and the Actor-Critic network. But it works well with env provided by the gym. Did I missed something here?
Hi, why SAC use pi = policy.rsample()
while VPG use pi = policy.sample()
? Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.