Giter VIP home page Giter VIP logo

random-network-distillation-pytorch's People

Contributors

jcwleo avatar kslazarev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

random-network-distillation-pytorch's Issues

How long did you get 6100?

Hello,

I also built RND model, but I am stuck at 2500... How many total steps will agent improve further? I am not sure whether it is related to bug of my code, so I wanna check with you. Thank you.

Generalized Advantage Estimator problem

Hello I have problem in utils.py.

in make_train_data:

if use_gae:
        gae = np.zeros_like([num_worker, ])
        for t in range(num_step - 1, -1, -1):
            delta = reward[:, t] + gamma * value[:, t + 1] * (1 - done[:, t]) - value[:, t]
            gae = delta + gamma * lam * (1 - done[:, t]) * gae

            discounted_return[:, t] = gae + value[:, t]

            # For Actor
        adv = discounted_return - value[:, :-1]

I am confused, in my recognize gae is

2019-02-10 10 14 55

2019-02-10 10 13 03

I haven't no idea why need to add V(t)

discounted_return[:, t] = gae + value[:, t]

Can you explain what I missing ? thx

training error

I find the code loads the pretrained weights in training. I tried to train without pretrained weight. But it seems a wrong operations. There is my result.
image

Reward converge at 4600

Thanks for the code, this is much understandable than the original one.

But, the agent I trained can only get a maximum score of 4600 and still maintain the performance no matter how many times I trained.

Note that OpenAI can reach a score of 10,000.
Do I miss something?

input_size in model.py?

I'm working with a different env (other than atari or mario) and I want to change the input shape to the CNN. It seems like self.input_size is ignored? Do you have any explanation for the math going on when setting up the network (the parameters for each layer)? When I change the size from (84, 84) to anything else, I get size mismatch errors at the linear layer.

Thanks!

Mario eval is slow

Hi @jcwleo,

Your implementation is amazing. I was currently searching for a simple but powerful implementation, I modified it a bit and train it with SuperMarioBros. However, when I want to eval my agent (using eval.py), the framerate is very slow.

Do you know why?

Thanks,
Have a nice day,

Intrinsic reward calculation, sum or mean?

Hi!

I have a question related to how the intrinsic rewards are calculated.
Why do you use the sum(1) instead of mean(1)?

intrinsic_reward = (target_next_feature - predict_next_feature).pow(2).sum(1) / 2

That would calculate the sum along the 512 output neurons, which is different than calculating the mean along those outputs.

At the original release with tensorflow, they use reduce_mean, and im a little bit confused.
https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/policies/cnn_gru_policy_dynamics.py#L241

Hope you could clear me,
Thank you in advance

Extrinsic reward clipping

In the RND paper on page 15, it mentions that extrinsic rewards are clipped in [-1,1].
But in the official RND code in atari_wrappers.py it clips extrinsic rewards using the ClipRewardEnv function which does:

"""Bin reward to {+1, 0, -1} by its sign."""
        return float(np.sign(reward))

I believe the implementation and the explanation in the paper is a little different.
In your implementation (jcwleo) you are clipping by doing:

        total_reward = total_reward.reshape([num_step, num_env_workers]).transpose().clip(-1, 1)

I believe this is different than the official implementation. Does anyone have an explanation of this discrepancy and what to use ?

Input shape is not correct in Linear-8 layer in CnnActorCriticNetwork feature model

Problem: Input shape is not correct in Linear-8 layer in CnnActorCriticNetwork feature model. Or maybe there is a typo in Conv2d-5 Layer kernel_size. In predictor model and target model kernel_size == 3, but in feature model kernel_size == 4.

https://github.com/jcwleo/random-network-distillation-pytorch/blob/master/model.py#L97-L106

Feature model summary: input_shape (4,84,84)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [4, 32, 20, 20]           8,224
              ReLU-2            [4, 32, 20, 20]               0
            Conv2d-3              [4, 64, 9, 9]          32,832
              ReLU-4              [4, 64, 9, 9]               0
            Conv2d-5              [4, 64, 7, 7]          36,928
              ReLU-6              [4, 64, 7, 7]               0
           Flatten-7                  [4, 3136]               0
            Linear-8                   [4, 256]         803,072
              ReLU-9                   [4, 256]               0
           Linear-10                   [4, 448]         115,136
             ReLU-11                   [4, 448]               0
================================================================
Total params: 996,192
Trainable params: 996,192
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.43
Forward/backward pass size (MB): 1.43
Params size (MB): 3.80
Estimated Total Size (MB): 5.66
----------------------------------------------------------------

Traceback:

Traceback (most recent call last):
  File "/Applications/PyCharm CE 2018.3 EAP.app/Contents/helpers/pydev/pydevd.py", line 1689, in <module>
    main()
  File "/Applications/PyCharm CE 2018.3 EAP.app/Contents/helpers/pydev/pydevd.py", line 1683, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Applications/PyCharm CE 2018.3 EAP.app/Contents/helpers/pydev/pydevd.py", line 1083, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm CE 2018.3 EAP.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "train.py", line 274, in <module>
    main()
  File "train.py", line 152, in main
    actions, value_ext, value_int, policy = agent.get_action(np.float32(states) / 255.)
  File "/Users/kslazarev/PycharmProjects/random-network-distillation-pytorch/agents.py", line 59, in get_action
    policy, value_ext, value_int = self.model(state)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/kslazarev/PycharmProjects/random-network-distillation-pytorch/model.py", line 158, in forward
    x = self.feature(state)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 67, in forward
    return F.linear(input, self.weight, self.bias)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/functional.py", line 1352, in linear
    ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t())
RuntimeError: size mismatch, m1: [16 x 2304], m2: [3136 x 256] at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/wheel_3.6/pytorch/aten/src/TH/generic/THTensorMath.cpp:940

Use several GPUs if they exist

timestamp, pci.bus_id, temperature.gpu, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB]
2019/01/11 15:41:41.215, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:41.216, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:42.217, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:42.218, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:43.218, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:43.219, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:44.220, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:44.220, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:45.221, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:45.221, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:46.222, 00000000:01:00.0, 60, 79 %, 1 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:46.222, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:47.223, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:47.223, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:48.224, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:48.225, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:49.225, 00000000:01:00.0, 61, 50 %, 1 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:49.226, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:50.226, 00000000:01:00.0, 65, 97 %, 51 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:50.227, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:51.227, 00000000:01:00.0, 65, 96 %, 53 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:51.227, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:52.228, 00000000:01:00.0, 61, 3 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:52.229, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:53.229, 00000000:01:00.0, 61, 3 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB

I tried to train system but get error

When I try to train system from start by removing the pretrained models it complains that it can not find the models.

python train.py
{'OPTIONS': {'envtype': '[atari, mario]', 'trainmethod': 'RND', 'envid': 'MontezumaRevengeNoFrameskip-v4', 'maxstepperepisode': '4500', 'extcoef': '2.', 'learningrate': '1e-4', 'numenv': '2', 'numstep': '128', 'gamma': '0.999', 'intgamma': '0.99', 'lambda': '0.95', 'stableeps': '1e-8', 'statestacksize': '4', 'preprocheight': '84', 'proprocwidth': '84', 'usegae': 'True', 'usegpu': 'True', 'usenorm': 'False', 'usenoisynet': 'False', 'clipgradnorm': '0.5', 'entropy': '0.001', 'epoch': '4', 'minibatch': '4', 'ppoeps': '0.1', 'intcoef': '1.', 'stickyaction': 'True', 'actionprob': '0.25', 'updateproportion': '0.25', 'lifedone': 'False', 'obsnormstep': '50'}}
load model...
Traceback (most recent call last):
  File "train.py", line 281, in <module>
    main()
  File "train.py", line 100, in main
    agent.model.load_state_dict(torch.load(model_path))
  File "/home/rjn/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 356, in load
    f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'models/MontezumaRevengeNoFrameskip-v4.model'

About sticky action

Hi,

In your code (envs.py), I saw that you first use MaxAndSkipEnv() to wrap the environment, and then apply the sticky action.
However, in RND's author's code, I found that they first wrap the env by StickyActionEnv(), then wrap it by MaxAndSkipEnv(). So, it seems your agent will have more "sticky" actions. I think this makes things a little bit different.

Action values are incremented by 1 for the Breakout game ?

Hi,
Is the reason for the following code modifying the actions for the breakout game is eliminating the NOOP action from the available set of actions that can be taken by the agent ?
envs.py:

if 'Breakout' in self.env_id: 
    action += 1

train.py:

if 'Breakout' in env_id:
    output_size -= 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.