Giter VIP home page Giter VIP logo

rlpd's People

Contributors

ikostrikov2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rlpd's Issues

Reproducing normalized scores on door-binary

Currently we are trying to recreate the door-binary results but it seems either the performance is not as good as the results or we compute the normalized score wrong.

image
After running the example script for door-binary (the pen-binary one but changed the env name), it seems the eval return goes only as high as around -33, which to my understanding is equivalent to taking 33 steps to reach a success state / receive reward 0 instead of reward -1.

However, the paper shows a training curve where the normalized score reaches around 80, which should be equivalent to a eval return on door-binary of around -20?

Any help on understanding how normalized scores are computed / this discrepancy is much appreciated!

The effect of the LayerNorm?

I have a question about LayerNorm. In the paper, you mentioned that if we implement LayerNorm in the network, the Q-values will be bounded by the norm of the weight layer. With the formula explained, Iโ€™m still perplexed by why the inequality holds for the last and the second-to-last term. To make the inequality hold, I think we should keep the norm of the output of LayerNorm less than 1. But this can not be guaranteed, could you please provide me with more descriptions of this conclusion?
1699522827736

Question about offline ratio in offline/online data sampling

I'm currently attempting to reproduce the results from the paper. Upon examining the code, it appears that the setting for offline ratio is limited to 0.5 in this implementation. This observation is based on the design of the data combination in the combine function, where v and other_dict[k] are expected to have the same size in order to be merged into tmp?

Could you provide some insights on how other offline ratios, such as 25% and 75% (as shown in Figure 12), were implemented in the paper? I have experimented with mixing the offline and online data in a randomized manner, but haven't had success. I'd be grateful for any guidance or clarification on this matter. Thank you!

Details on the data used to train in Adroit Sparse environments

I was reading through the paper but couldn't find details on the exact data used for Adroit Sparse tests.

One question I have is for all the demonstrated results how many transitions (or how many episodes/demos) are used for each of the 3 tasks as part of the offline buffer?

Another question, going through the code I noticed that only 90% of the expert dataset for adroit it seems by default is used, but it also appears that there is behavior cloning data included. What is this BC data / how was the BC data collected?

Flax FrozenDict: dict.copy() takes no keyword arguments

Reproduce error

  • flax 0.7.5
  • jaxlib 0.4.21+cuda12.cudnn89
  • Ubuntu 22.04

Running

XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning_pixels.py --env_name=cheetah-run-v0 \
                --start_training 5000 \
                --max_steps 300000 \
                --config=configs/rlpd_pixels_config.py \
                --project_name=rlpd_vd4rl

I am getting: TypeError: dict.copy() takes no keyword arguments.

Possible fix

In file rlpd/rlpd/agents/drq/drq_learner.py:

import flax.core.frozen_dict as frozen_dict
actor_params = frozen_dict.FrozenDict(actor_def.init(actor_key, observations)["params"]) # line 121
critic_params = frozen_dict.FrozenDict(critic_def.init(critic_key, observations, actions)["params"]) # line 145

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.