Comments (4)
Hi @xlnwel, thanks for your questions. Could you please move your question about the RSSM into a second Github issue? That way, it'll be easier for others who might have similar questions.
Regarding the action distribution, the reason the mean is bounded to [-5, +5]
before being transformed by the tanh is to avoid numerical instabilities. For computing log-probabilities, we need to invert the tanh and this becomes difficult in highly saturated regions. By the way, SAC also bounds the mean using tanh but doesn't scale up by 5 afterwards, which limits its action range a bit too much. Check out the link in the code to play around with the tanh normal distribution: https://www.desmos.com/calculator/rcmcf5jwe7
I'm closing the issue for now, but please follow up if this doesn't answer your question.
from dreamer.
Sorry if I am not understanding your former answer correctly, but why we need SampleDist
? I especially don't understand what argmax in SampleDist.mode() is doing.
tf.gather(sample, tf.argmax(logprob))[0]
from dreamer.
Many tfd.distributions
methods such as entropy
become invalid after tfd.TransformedDistribution
and SampleDist
can be used to approximate those stats. The line you cited just found the sample with the maximum logprob
.
from dreamer.
@xlnwel Thank you! I see.
from dreamer.
Related Issues (20)
- A question about reward and observation pairing in wrapper HOT 2
- Tensorflow-probability version HOT 2
- Invalid one-hot action with Google Research football environment HOT 1
- lost of file 'dm_control' HOT 3
- difference between "CheetahRun-v0" on DM vs "half-cheetah-v2" on Mujuco HOT 1
- Spikes in Loss? HOT 2
- Runtime performance HOT 1
- Free nats over batch and time dimension? HOT 1
- Differences in free nats clipping between Dreamer, early and final PlaNet implementation HOT 2
- What is this line for? HOT 1
- How to run on short episodes? HOT 2
- slow in atari tasks HOT 2
- my.hackmit.org Can't register HOT 1
- AttributeError: 'MirroredStrategy' object has no attribute 'experimental_run_v2' HOT 1
- the code is running without any results and output HOT 5
- freenats inconsistent with tf1 repo HOT 1
- Can't reproduce results in some environments HOT 3
- Provided scores don't match the results HOT 2
- KL clipping: before or after averaging? HOT 1
- Different std of models
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dreamer.