ikostrikov / rlpd Goto Github PK

View Code? Open in Web Editor NEW

190.0 190.0 22.0 159 KB

License: MIT License

Python 100.00%

rlpd's People

Contributors

Stargazers

Watchers

rlpd's Issues

Reproducing normalized scores on door-binary

Currently we are trying to recreate the door-binary results but it seems either the performance is not as good as the results or we compute the normalized score wrong.

After running the example script for door-binary (the pen-binary one but changed the env name), it seems the eval return goes only as high as around -33, which to my understanding is equivalent to taking 33 steps to reach a success state / receive reward 0 instead of reward -1.

However, the paper shows a training curve where the normalized score reaches around 80, which should be equivalent to a eval return on door-binary of around -20?

Any help on understanding how normalized scores are computed / this discrepancy is much appreciated!

The effect of the LayerNorm?

I have a question about LayerNorm. In the paper, you mentioned that if we implement LayerNorm in the network, the Q-values will be bounded by the norm of the weight layer. With the formula explained, I’m still perplexed by why the inequality holds for the last and the second-to-last term. To make the inequality hold, I think we should keep the norm of the output of LayerNorm less than 1. But this can not be guaranteed, could you please provide me with more descriptions of this conclusion?

Question about offline ratio in offline/online data sampling

I'm currently attempting to reproduce the results from the paper. Upon examining the code, it appears that the setting for offline ratio is limited to 0.5 in this implementation. This observation is based on the design of the data combination in the combine function, where v and other_dict[k] are expected to have the same size in order to be merged into tmp?

Could you provide some insights on how other offline ratios, such as 25% and 75% (as shown in Figure 12), were implemented in the paper? I have experimented with mixing the offline and online data in a randomized manner, but haven't had success. I'd be grateful for any guidance or clarification on this matter. Thank you!

Details on the data used to train in Adroit Sparse environments

I was reading through the paper but couldn't find details on the exact data used for Adroit Sparse tests.

One question I have is for all the demonstrated results how many transitions (or how many episodes/demos) are used for each of the 3 tasks as part of the offline buffer?

Another question, going through the code I noticed that only 90% of the expert dataset for adroit it seems by default is used, but it also appears that there is behavior cloning data included. What is this BC data / how was the BC data collected?

Flax FrozenDict: dict.copy() takes no keyword arguments

Reproduce error

flax 0.7.5
jaxlib 0.4.21+cuda12.cudnn89
Ubuntu 22.04

Running

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning_pixels.py --env_name=cheetah-run-v0 \
                --start_training 5000 \
                --max_steps 300000 \
                --config=configs/rlpd_pixels_config.py \
                --project_name=rlpd_vd4rl

I am getting: TypeError: dict.copy() takes no keyword arguments.

Possible fix

In file rlpd/rlpd/agents/drq/drq_learner.py:

import flax.core.frozen_dict as frozen_dict
actor_params = frozen_dict.FrozenDict(actor_def.init(actor_key, observations)["params"]) # line 121
critic_params = frozen_dict.FrozenDict(critic_def.init(critic_key, observations, actions)["params"]) # line 145

Conflicting Mujoco dependencies for Adroit?

It seems the gym version required uses Mujoco 150 or something, but the sparse adroit envs use a later version when installed. Any idea which should be used?

ikostrikov / rlpd Goto Github PK

rlpd's People

Contributors

Stargazers

Watchers

Forkers

rlpd's Issues

Reproducing normalized scores on door-binary

The effect of the LayerNorm?

Question about offline ratio in offline/online data sampling

Details on the data used to train in Adroit Sparse environments

Flax FrozenDict: dict.copy() takes no keyword arguments

Reproduce error

Possible fix

Conflicting Mujoco dependencies for Adroit?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent