Random Network Distillation pytorch

License: MIT License

Python 99.57% Shell 0.43%

rnd extrinsic-reward random-network-distillation pytorch intrinsic-reward curiosity-driven reinforcement-learning

random-network-distillation-pytorch's People

Contributors

Stargazers

Watchers

random-network-distillation-pytorch's Issues

How long did you get 6100?

Hello,

I also built RND model, but I am stuck at 2500... How many total steps will agent improve further? I am not sure whether it is related to bug of my code, so I wanna check with you. Thank you.

Generalized Advantage Estimator problem

Hello I have problem in utils.py.

in make_train_data:

if use_gae:
        gae = np.zeros_like([num_worker, ])
        for t in range(num_step - 1, -1, -1):
            delta = reward[:, t] + gamma * value[:, t + 1] * (1 - done[:, t]) - value[:, t]
            gae = delta + gamma * lam * (1 - done[:, t]) * gae

            discounted_return[:, t] = gae + value[:, t]

            # For Actor
        adv = discounted_return - value[:, :-1]

I am confused, in my recognize gae is

I haven't no idea why need to add V(t)

discounted_return[:, t] = gae + value[:, t]

Can you explain what I missing ? thx

training error

I find the code loads the pretrained weights in training. I tried to train without pretrained weight. But it seems a wrong operations. There is my result.

Reward converge at 4600

Thanks for the code, this is much understandable than the original one.

But, the agent I trained can only get a maximum score of 4600 and still maintain the performance no matter how many times I trained.

Note that OpenAI can reach a score of 10,000.
Do I miss something?

I'm working with a different env (other than atari or mario) and I want to change the input shape to the CNN. It seems like self.input_size is ignored? Do you have any explanation for the math going on when setting up the network (the parameters for each layer)? When I change the size from (84, 84) to anything else, I get size mismatch errors at the linear layer.

Thanks!

Mario eval is slow

Hi @jcwleo,

Your implementation is amazing. I was currently searching for a simple but powerful implementation, I modified it a bit and train it with SuperMarioBros. However, when I want to eval my agent (using eval.py), the framerate is very slow.

Do you know why?

Thanks,
Have a nice day,

Intrinsic reward calculation, sum or mean?

Hi!

I have a question related to how the intrinsic rewards are calculated.
Why do you use the sum(1) instead of mean(1)?

random-network-distillation-pytorch/agents.py

Line 76 in e383fb9

 intrinsic_reward = (target_next_feature - predict_next_feature).pow(2).sum(1) / 2 

That would calculate the sum along the 512 output neurons, which is different than calculating the mean along those outputs.

At the original release with tensorflow, they use reduce_mean, and im a little bit confused.
https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/policies/cnn_gru_policy_dynamics.py#L241

Hope you could clear me,
Thank you in advance

RNN 모델 추가

global_grad_norm_ has no effect

refactoring make_train_data

README asset

Extrinsic reward clipping

In the RND paper on page 15, it mentions that extrinsic rewards are clipped in [-1,1].
But in the official RND code in atari_wrappers.py it clips extrinsic rewards using the ClipRewardEnv function which does:

"""Bin reward to {+1, 0, -1} by its sign."""
        return float(np.sign(reward))

I believe the implementation and the explanation in the paper is a little different.
In your implementation (jcwleo) you are clipping by doing:

        total_reward = total_reward.reshape([num_step, num_env_workers]).transpose().clip(-1, 1)

I believe this is different than the official implementation. Does anyone have an explanation of this discrepancy and what to use ?

Input shape is not correct in Linear-8 layer in CnnActorCriticNetwork feature model

Problem: Input shape is not correct in Linear-8 layer in CnnActorCriticNetwork feature model. Or maybe there is a typo in Conv2d-5 Layer kernel_size. In predictor model and target model kernel_size == 3, but in feature model kernel_size == 4.

https://github.com/jcwleo/random-network-distillation-pytorch/blob/master/model.py#L97-L106

Feature model summary: input_shape (4,84,84)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [4, 32, 20, 20]           8,224
              ReLU-2            [4, 32, 20, 20]               0
            Conv2d-3              [4, 64, 9, 9]          32,832
              ReLU-4              [4, 64, 9, 9]               0
            Conv2d-5              [4, 64, 7, 7]          36,928
              ReLU-6              [4, 64, 7, 7]               0
           Flatten-7                  [4, 3136]               0
            Linear-8                   [4, 256]         803,072
              ReLU-9                   [4, 256]               0
           Linear-10                   [4, 448]         115,136
             ReLU-11                   [4, 448]               0
================================================================
Total params: 996,192
Trainable params: 996,192
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.43
Forward/backward pass size (MB): 1.43
Params size (MB): 3.80
Estimated Total Size (MB): 5.66
----------------------------------------------------------------

Traceback:

Traceback (most recent call last):
  File "/Applications/PyCharm CE 2018.3 EAP.app/Contents/helpers/pydev/pydevd.py", line 1689, in <module>
    main()
  File "/Applications/PyCharm CE 2018.3 EAP.app/Contents/helpers/pydev/pydevd.py", line 1683, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Applications/PyCharm CE 2018.3 EAP.app/Contents/helpers/pydev/pydevd.py", line 1083, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm CE 2018.3 EAP.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "train.py", line 274, in <module>
    main()
  File "train.py", line 152, in main
    actions, value_ext, value_int, policy = agent.get_action(np.float32(states) / 255.)
  File "/Users/kslazarev/PycharmProjects/random-network-distillation-pytorch/agents.py", line 59, in get_action
    policy, value_ext, value_int = self.model(state)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/kslazarev/PycharmProjects/random-network-distillation-pytorch/model.py", line 158, in forward
    x = self.feature(state)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 67, in forward
    return F.linear(input, self.weight, self.bias)
  File "/Users/kslazarev/.pyenv/versions/3.6.7/envs/env/lib/python3.6/site-packages/torch/nn/functional.py", line 1352, in linear
    ret = torch.addmm(torch.jit._unwrap_optional(bias), input, weight.t())
RuntimeError: size mismatch, m1: [16 x 2304], m2: [3136 x 256] at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/wheel_3.6/pytorch/aten/src/TH/generic/THTensorMath.cpp:940

Use several GPUs if they exist

timestamp, pci.bus_id, temperature.gpu, utilization.gpu [%], utilization.memory [%], memory.total [MiB], memory.free [MiB], memory.used [MiB]
2019/01/11 15:41:41.215, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:41.216, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:42.217, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:42.218, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:43.218, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:43.219, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:44.220, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:44.220, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:45.221, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:45.221, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:46.222, 00000000:01:00.0, 60, 79 %, 1 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:46.222, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:47.223, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:47.223, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:48.224, 00000000:01:00.0, 60, 0 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:48.225, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:49.225, 00000000:01:00.0, 61, 50 %, 1 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:49.226, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:50.226, 00000000:01:00.0, 65, 97 %, 51 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:50.227, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:51.227, 00000000:01:00.0, 65, 96 %, 53 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:51.227, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:52.228, 00000000:01:00.0, 61, 3 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB
2019/01/11 15:41:52.229, 00000000:02:00.0, 33, 0 %, 0 %, 11178 MiB, 11168 MiB, 10 MiB
2019/01/11 15:41:53.229, 00000000:01:00.0, 61, 3 %, 0 %, 11177 MiB, 3366 MiB, 7811 MiB

I tried to train system but get error

When I try to train system from start by removing the pretrained models it complains that it can not find the models.

python train.py
{'OPTIONS': {'envtype': '[atari, mario]', 'trainmethod': 'RND', 'envid': 'MontezumaRevengeNoFrameskip-v4', 'maxstepperepisode': '4500', 'extcoef': '2.', 'learningrate': '1e-4', 'numenv': '2', 'numstep': '128', 'gamma': '0.999', 'intgamma': '0.99', 'lambda': '0.95', 'stableeps': '1e-8', 'statestacksize': '4', 'preprocheight': '84', 'proprocwidth': '84', 'usegae': 'True', 'usegpu': 'True', 'usenorm': 'False', 'usenoisynet': 'False', 'clipgradnorm': '0.5', 'entropy': '0.001', 'epoch': '4', 'minibatch': '4', 'ppoeps': '0.1', 'intcoef': '1.', 'stickyaction': 'True', 'actionprob': '0.25', 'updateproportion': '0.25', 'lifedone': 'False', 'obsnormstep': '50'}}
load model...
Traceback (most recent call last):
  File "train.py", line 281, in <module>
    main()
  File "train.py", line 100, in main
    agent.model.load_state_dict(torch.load(model_path))
  File "/home/rjn/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 356, in load
    f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'models/MontezumaRevengeNoFrameskip-v4.model'

About sticky action

Hi,

In your code (envs.py), I saw that you first use MaxAndSkipEnv() to wrap the environment, and then apply the sticky action.
However, in RND's author's code, I found that they first wrap the env by StickyActionEnv(), then wrap it by MaxAndSkipEnv(). So, it seems your agent will have more "sticky" actions. I think this makes things a little bit different.

if 'Breakout' in self.env_id: 
    action += 1

train.py:

if 'Breakout' in env_id:
    output_size -= 1

if i want to employe this work to a new env, what should i do?

Tanks for the great work!
I'd like to konw if i want employe this work on the new continuous env which is created by myself, what should i do?Do you have any suggests?

jcwleo / random-network-distillation-pytorch Goto Github PK

random-network-distillation-pytorch's People

Contributors

Stargazers

Watchers

Forkers

random-network-distillation-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org