Comments (3)
Dear Shariq,
I have the same question as jeanibarz.
Also, in figure 1, the 2nd MLP layer receives the output of the 1st MLP, however, in your code, I assume the 2nd MLP layer is called self.critics:
The input of self.critics is an encoding of only observations (called s_encoding) plus other values, rather than the output of 1st MLP layer:
Is my understanding correct?
Thanks!
from maac.
By analizing the code, I intuitively let to this conclusion...
Each agent has a unique encoding function, state_encoder, which contrary to which is said in the paper, the embedding is based only based on the state, e_i:
Lines 52 to 59 in 6174a01
However, as it is explained in the paper, we also need the embedding of other agents too, e_j, which is done with the self.critics_encoder which it takes both state and action of agents, and that is shared:
Lines 35 to 44 in 6174a01
Lately, in the forward function, the key and value outputs are discarded for current agent:
Lines 128 to 130 in 6174a01
Thus, by looking only to the code, I would say that the Figure 1 is not completely correct @jeanibarz
Moreover, @yuchen-x, I think you are right, the 2nd MLP is called self.critics. The input of these second MLP is correct, as the s_encodings refer to those e_i, values and the *other_all_values refer to the x_i:
Lines 111 to 112 in 6174a01
This is, state_encoder refers to the first MLP, and its output would be e_i, which only takes into account the state.
self.citics_encoder takes both state and action into accounts, and is used to get the e_j.
This is not any true evidence, just my self conclusion after taking a look to the code and the paper explanation.
from maac.
Hi all, sorry for the confusion. The deviation of the code from the figure is described in the section of the paper entitled "Multi-Agent Advantage Function." Essentially, we want to calculate a Q-value for each possible action such that we can compute an advantage function, so we remove actions from the input and feed the state alone to a separate encoder from which we compute queries. I guess my intention was that the simplified version (with no advantage function) was easier to understand visually in a figure, but I realize how that can be misleading. Hope this clears things up!
from maac.
Related Issues (20)
- Problem of optimizing policy HOT 4
- Seeding fails to produce deterministic results HOT 9
- About SAC implementation HOT 1
- question about reward HOT 10
- How to implement MADDPG+SAC and COMA+SAC HOT 2
- About query, key and value input embedding HOT 1
- How does the gradient back-propagate from Q to the action $a_i$? HOT 2
- When I run "python main.py fullobs_collect_treasure V1" I meet error "ImportError: cannot import name 'Wall'"
- Bias on value extractors ?
- Memory usage increases a lot when use the latest version of OpenAI baselines
- Memory Leak HOT 1
- How to solve env_id? HOT 2
- Where is the code to load the model?
- Critic function learning
- Why does your implementation of MADDPG not work in your fork of MPE?
- The function names of "update_policies" and "update_critic" are reversed
- How to visualize during training
- issue thanks!
- Is this code applicable to continuous actions?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maac.