kashif / firedup Goto Github PK

Clone of OpenAI's Spinning Up in PyTorch

License: MIT License

Python 100.00%

reinforcement-learning pytorch spinningup deep-learning

firedup's Issues

How can I easily modify the code?

Since I have to install this package before it works through pip install -e ., this is inconvenient for modify the code, can I use it without install?

Performance Differences between Tensorflow and Pytorch

I cloned your repo and ran the vpg algo and compared the perf with the tensorflow version. I did an average of 5 runs to take care of the random seed and I saw some interesting results

Tensorflow: Avg Episode Return 81
Pytorch: Avg Episode Return 31
Why do you think this might be the case.

Disclaimer: I haven't read your code thoroughly so there might be some very small mistake. But is diff in performance of RL algos substantial in tf and pytorch ?

Use `q1_pi1` for `pi_loss`

firedup/fireup/algos/sac/sac.py

Line 280 in 7011b0c

pi_loss = (alpha * logp_pi - min_q_pi).mean()

If you are trying to implement Spinning Up's version, you need to use q1_pi (https://github.com/openai/spinningup/blob/master/spinup/algos/sac/sac.py#L176)

                # Soft actor-critic losses
                pi_loss = (alpha * logp_pi - q1_pi).mean()

Confusion: Why are you adding 1 here?

When calculating the scaled log_std in SAC policy, you scale log_std + 1 to the range [LOG_STD_MIN, LOG_STD_MAX]. Is this because the range of the tanh function is [-1, 1]?
Is it really necessary? Wouldn't the scaling limit the output range to [LOG_STD_MIN, LOG_STD_MAX] even without that?

https://github.com/chutaklee/firedup/blob/ed3634525703f3169b190f6e7951d69c38a5372d/fireup/algos/sac/core.py#L92-L93

Much slower than original spinningup tf version

Hi, I ran this pytorch version SAC on Mujoco, which took time almost three times more than the original tf version code? Why did this happen? Is there any way to improve the speed?

ddpg torch error

Hi @kashif, thanks for making this available!

The DDPG implementation currently gives me this following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [300, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I am on torch version

torch                             1.6.0
torchvision                       0.5.0

Is there any requirements for the env to fit in this repo?

I tried this repo with a simple env

class SimpleEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        super(SimpleEnv, self).__init__()
        self.observation_space = spaces.Box(low=0, high=2, shape=(4, 4))
        self.action_space = spaces.Discrete(3)
        self.reset()

    def step(self, action):
        ob = self.observation_space.sample()
        reward = 1
        episode_over = False if random.random()>0.5 else True
        return ob, reward, episode_over, {}

    def reset(self):
        ob = self.observation_space.sample()
        return ob

    def render(self, mode='human'):
        pass

and use it with the policy gradient agent as

    env = SimpleEnv
    env.seed(0)
    ac_kwargs = dict(hidden_sizes=(16,))
    agent = vpg(env, ac_kwargs=ac_kwargs)
    episode_count = 100
    reward = 0
    done = False

    for i in range(episode_count):
        ob = env.reset()
        while True:
            print(done)
            action = agent.act(ob, reward, done)
            ob, reward, done, _ = env.step(action)
            if done:
                break

But when I run this it get:
RuntimeError: size mismatch, m1: [1 x 16], m2: [4 x 16] at /opt/conda/conda-bld/pytorch-cpu_1549626403278/work/aten/src/TH/generic/THTensorMath.cpp:940

This seems because of the mismatch of observation space and the Actor-Critic network. But it works well with env provided by the gym. Did I missed something here?

pi = policy.rsample()?

Hi, why SAC use pi = policy.rsample() while VPG use pi = policy.sample()? Thanks.

kashif / firedup Goto Github PK

firedup's Issues

How can I easily modify the code?

Performance Differences between Tensorflow and Pytorch

Run on GPU

Use `q1_pi1` for `pi_loss`

Confusion: Why are you adding 1 here?

Much slower than original spinningup tf version

ddpg torch error

Is there any requirements for the env to fit in this repo?

pi = policy.rsample()?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent