Giter VIP home page Giter VIP logo

firedup's Issues

How can I easily modify the code?

Since I have to install this package before it works through pip install -e ., this is inconvenient for modify the code, can I use it without install?

Performance Differences between Tensorflow and Pytorch

I cloned your repo and ran the vpg algo and compared the perf with the tensorflow version. I did an average of 5 runs to take care of the random seed and I saw some interesting results

Tensorflow: Avg Episode Return 81
Pytorch: Avg Episode Return 31
Why do you think this might be the case.

Disclaimer: I haven't read your code thoroughly so there might be some very small mistake. But is diff in performance of RL algos substantial in tf and pytorch ?

ddpg torch error

Hi @kashif, thanks for making this available!

The DDPG implementation currently gives me this following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [300, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I am on torch version

torch                             1.6.0
torchvision                       0.5.0

Is there any requirements for the env to fit in this repo?

I tried this repo with a simple env

class SimpleEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        super(SimpleEnv, self).__init__()
        self.observation_space = spaces.Box(low=0, high=2, shape=(4, 4))
        self.action_space = spaces.Discrete(3)
        self.reset()

    def step(self, action):
        ob = self.observation_space.sample()
        reward = 1
        episode_over = False if random.random()>0.5 else True
        return ob, reward, episode_over, {}

    def reset(self):
        ob = self.observation_space.sample()
        return ob

    def render(self, mode='human'):
        pass

and use it with the policy gradient agent as

    env = SimpleEnv
    env.seed(0)
    ac_kwargs = dict(hidden_sizes=(16,))
    agent = vpg(env, ac_kwargs=ac_kwargs)
    episode_count = 100
    reward = 0
    done = False

    for i in range(episode_count):
        ob = env.reset()
        while True:
            print(done)
            action = agent.act(ob, reward, done)
            ob, reward, done, _ = env.step(action)
            if done:
                break

But when I run this it get:
RuntimeError: size mismatch, m1: [1 x 16], m2: [4 x 16] at /opt/conda/conda-bld/pytorch-cpu_1549626403278/work/aten/src/TH/generic/THTensorMath.cpp:940

This seems because of the mismatch of observation space and the Actor-Critic network. But it works well with env provided by the gym. Did I missed something here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.