Comments (3)
Yes, you can think of the prior as a loose guess about the next state and of the posterior as a more precise guess, because it is based on additional evidence, namely the observed image.
During training images are available and the GRU takes a posterior sample as input. During imagination, the future images are not available and the GRU takes a prior sample as input instead.
Intuitively, this works because the posterior and prior are trained to be close to each other using the KL regularizer. And because the random samples have some spread during training, the GRU becomes robust to seeing new samples during imagination.
Mathematically, you are right that this simply follows from the evidence lower bound.
When interacting with the environment, the current image is available, and so we use the posterior.
from dreamer.
Check out the PlaNet paper where we introduced this model, especially Figure 2c. You are right that the prior and posterior transitions differ. Specifically, the prior does not have access to the embedding of the image. This allows predicting ahead without knowing the images. In the figure I pointed you to, the solid lines are always used but the dashed lines are only used by the posterior. I don't think your diagrams are correct, or I don't quite understand them. I would draw them as
Prior: Posterior:
a ---. a ---.
| |
| |
h ---+---> h' h ---+---> h'
| | | |
| v | v
s ---' s' s ---' s'
^
|
o'
from dreamer.
Hi @danijar,
Sorry that I misplaced the some of notations in the diagrams and left the previous issue unorganized.
Here's a figure I draw about the RSSM
The left part shows how the RSSM looks like in a single time step when we call RSSM.observe
(this line), where dreamer._imagine_ahead
(this line). My point, if I understand them right, is that the GRU takes RSSM.observe
. In contrast, when we call dreamer._imagine_ahead
, the GRU takes
Hope these can help you understand the figure I drew. Of course, if I made any mistakes, please let me know.
from dreamer.
Related Issues (20)
- A question about reward and observation pairing in wrapper HOT 2
- Tensorflow-probability version HOT 2
- Invalid one-hot action with Google Research football environment HOT 1
- lost of file 'dm_control' HOT 3
- difference between "CheetahRun-v0" on DM vs "half-cheetah-v2" on Mujuco HOT 1
- Spikes in Loss? HOT 2
- Runtime performance HOT 1
- Free nats over batch and time dimension? HOT 1
- Differences in free nats clipping between Dreamer, early and final PlaNet implementation HOT 2
- What is this line for? HOT 1
- How to run on short episodes? HOT 2
- slow in atari tasks HOT 2
- my.hackmit.org Can't register HOT 1
- AttributeError: 'MirroredStrategy' object has no attribute 'experimental_run_v2' HOT 1
- the code is running without any results and output HOT 5
- freenats inconsistent with tf1 repo HOT 1
- Can't reproduce results in some environments HOT 3
- Provided scores don't match the results HOT 2
- KL clipping: before or after averaging? HOT 1
- Different std of models
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dreamer.