Comments (6)
That's a good question and depends on the project. When developing a completely new RL algorithm, tuning it will likely require a good amount of time and compute resources. I also think that people often give up on ideas too early because of this. It helps to have a lot of metrics to see what might be going wrong but it's still a lot of trial and error. For ideas that are compatible with previous algorithms, you can start from an existing implementation and hope that its hyper parameters will also work for your modification.
from dreamer.
Yes, deter_size
should be larger because there is more to keep track of by the model and kl_scale
should be smaller to allow the model to incorporate more information from each image than in DMC tasks. I've actually ran those experiments so I know that it helps. I will update the repository here at some point, but it's not ready yet.
from dreamer.
Hi, @danijar
In many cases, I know the basic meaning of the hyperparameters, but I have no clue when and how to tune them. Just as you mentioned in this issue, you suggested to tune kl_scale
and deter_size
for Atari games. But what makes you think so?
Another example involves some papers from DeepMind, which prefer RMSprop to Adam and use a different epsilon
than the default setup. I know the underlying mechanism of these optimizers, but I have no idea in which situation one should be preferred than the other. Here's some resources I collected about optimizers, which also involves some of my personal thoughts:
Most RL papers use either RMSprop or Adam as the optimizer. From this discussion, I summarize several cases that RMSprop may be preferable over Adam:
- The reason RMSprop may be preferable is because of the unclear effect of momentum on RL.
- RMSprop is more stable in non-stationary problems and with RNNs
- RMSprop is more suitable for sparse problems
𝝐 is generally chosen from 1e-8∼1e-4. 𝝐 affects the step size: Large 𝝐 corresponds to small step size, stable training, and slow training progress. For small projects(e.g., mujoco environment), setting 𝝐 to 1e-8 could speed up the training and get away from local optima. For large projects, 𝝐 is usually set to 1e-5 ∼ 1 for stable training.
Do these make sense to you?
from dreamer.
There is barely any theory behind this. Most of the time, it's just that people have tuned many parameters and over time have found which of them are the most sensitive for a particular algorithm. I also think Adam tends to work better than RMSProp even for reinforcement learning, but again this is only from experience and from seeing what more recent papers are using.
from dreamer.
Hi, @danijar
Thanks, I see. Then why would you suggest to tune kl_scale
and deter_size
for Atari games? Are you indicating that Atari games are more complicated to model, and therefore, kl_scale
and deter_size
should be larger than they are for DeepMind Control?
from dreamer.
Hi, @danijar
Thanks for your insights. Well, it is unexpected to me that kl_scale
should be smaller. I thought it was supposed to be larger because the actor was trained based on the imagined features derived from the prior. Therefore, I thought if the prior were more close to the posterior, the actor would perform better. What I omitted before was that, as you said, when kl_scale
is larger, the posterior loses more information during encoding, which makes it harder for the actor comes up with the right actions. I think there may be a tradeoff between these two situations.
from dreamer.
Related Issues (20)
- Performance on sparse reward HOT 2
- A question about reward and observation pairing in wrapper HOT 2
- Tensorflow-probability version HOT 2
- Invalid one-hot action with Google Research football environment HOT 1
- lost of file 'dm_control' HOT 3
- difference between "CheetahRun-v0" on DM vs "half-cheetah-v2" on Mujuco HOT 1
- Spikes in Loss? HOT 2
- Runtime performance HOT 1
- Free nats over batch and time dimension? HOT 1
- Differences in free nats clipping between Dreamer, early and final PlaNet implementation HOT 2
- What is this line for? HOT 1
- How to run on short episodes? HOT 2
- slow in atari tasks HOT 2
- my.hackmit.org Can't register HOT 1
- AttributeError: 'MirroredStrategy' object has no attribute 'experimental_run_v2' HOT 1
- the code is running without any results and output HOT 5
- freenats inconsistent with tf1 repo HOT 1
- Can't reproduce results in some environments HOT 3
- Provided scores don't match the results HOT 2
- KL clipping: before or after averaging? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dreamer.