ikostrikov / rlpd Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Currently we are trying to recreate the door-binary results but it seems either the performance is not as good as the results or we compute the normalized score wrong.
After running the example script for door-binary (the pen-binary one but changed the env name), it seems the eval return goes only as high as around -33, which to my understanding is equivalent to taking 33 steps to reach a success state / receive reward 0 instead of reward -1.
However, the paper shows a training curve where the normalized score reaches around 80, which should be equivalent to a eval return on door-binary of around -20?
Any help on understanding how normalized scores are computed / this discrepancy is much appreciated!
I have a question about LayerNorm. In the paper, you mentioned that if we implement LayerNorm in the network, the Q-values will be bounded by the norm of the weight layer. With the formula explained, Iโm still perplexed by why the inequality holds for the last and the second-to-last term. To make the inequality hold, I think we should keep the norm of the output of LayerNorm less than 1. But this can not be guaranteed, could you please provide me with more descriptions of this conclusion?
I'm currently attempting to reproduce the results from the paper. Upon examining the code, it appears that the setting for offline ratio is limited to 0.5 in this implementation. This observation is based on the design of the data combination in the combine function, where v
and other_dict[k]
are expected to have the same size in order to be merged into tmp
?
Could you provide some insights on how other offline ratios, such as 25% and 75% (as shown in Figure 12), were implemented in the paper? I have experimented with mixing the offline and online data in a randomized manner, but haven't had success. I'd be grateful for any guidance or clarification on this matter. Thank you!
I was reading through the paper but couldn't find details on the exact data used for Adroit Sparse tests.
One question I have is for all the demonstrated results how many transitions (or how many episodes/demos) are used for each of the 3 tasks as part of the offline buffer?
Another question, going through the code I noticed that only 90% of the expert dataset for adroit it seems by default is used, but it also appears that there is behavior cloning data included. What is this BC data / how was the BC data collected?
Running
XLA_PYTHON_CLIENT_PREALLOCATE=false python train_finetuning_pixels.py --env_name=cheetah-run-v0 \
--start_training 5000 \
--max_steps 300000 \
--config=configs/rlpd_pixels_config.py \
--project_name=rlpd_vd4rl
I am getting: TypeError: dict.copy() takes no keyword arguments
.
In file rlpd/rlpd/agents/drq/drq_learner.py
:
import flax.core.frozen_dict as frozen_dict
actor_params = frozen_dict.FrozenDict(actor_def.init(actor_key, observations)["params"]) # line 121
critic_params = frozen_dict.FrozenDict(critic_def.init(critic_key, observations, actions)["params"]) # line 145
It seems the gym version required uses Mujoco 150 or something, but the sparse adroit envs use a later version when installed. Any idea which should be used?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.