Dear Shariq, In your article "Actor-Attention-Critic for Multi-Agent

Dear Shariq, I have the same question as jeanibarz. <p dir="auto

Critic encoders as shared modules ? about maac HOT 3 CLOSED

shariqiqbal2810 commented on July 28, 2024

Critic encoders as shared modules ?

from maac.

Comments (3)

yuchen-x commented on July 28, 2024

Dear Shariq,

I have the same question as jeanibarz.

Also, in figure 1, the 2nd MLP layer receives the output of the 1st MLP, however, in your code, I assume the 2nd MLP layer is called self.critics:

The input of self.critics is an encoding of only observations (called s_encoding) plus other values, rather than the output of 1st MLP layer:

Is my understanding correct?

Thanks!

from maac.

aklein1995 commented on July 28, 2024

By analizing the code, I intuitively let to this conclusion...

Each agent has a unique encoding function, state_encoder, which contrary to which is said in the paper, the embedding is based only based on the state, e_i:

MAAC/utils/critics.py

Lines 52 to 59 in 6174a01

 state_encoder = nn.Sequential() 

 if norm_in: 

 state_encoder.add_module('s_enc_bn', nn.BatchNorm1d( 

 sdim, affine=False)) 

 state_encoder.add_module('s_enc_fc1', nn.Linear(sdim, 

 hidden_dim)) 

 state_encoder.add_module('s_enc_nl', nn.LeakyReLU()) 

 self.state_encoders.append(state_encoder)

However, as it is explained in the paper, we also need the embedding of other agents too, e_j, which is done with the self.critics_encoder which it takes both state and action of agents, and that is shared:

MAAC/utils/critics.py

Lines 35 to 44 in 6174a01

 for sdim, adim in sa_sizes: 

 idim = sdim + adim 

 odim = adim 

 encoder = nn.Sequential() 

 if norm_in: 

 encoder.add_module('enc_bn', nn.BatchNorm1d(idim, 

 affine=False)) 

 encoder.add_module('enc_fc1', nn.Linear(idim, hidden_dim)) 

 encoder.add_module('enc_nl', nn.LeakyReLU()) 

 self.critic_encoders.append(encoder)

Lately, in the forward function, the key and value outputs are discarded for current agent:

MAAC/utils/critics.py

Lines 128 to 130 in 6174a01

 for i, a_i, selector in zip(range(len(agents)), agents, curr_head_selectors): 

 keys = [k for j, k in enumerate(curr_head_keys) if j != a_i] 

 values = [v for j, v in enumerate(curr_head_values) if j != a_i]

Thus, by looking only to the code, I would say that the Figure 1 is not completely correct @jeanibarz

Moreover, @yuchen-x, I think you are right, the 2nd MLP is called self.critics. The input of these second MLP is correct, as the s_encodings refer to those e_i, values and the *other_all_values refer to the x_i:

MAAC/utils/critics.py

Lines 111 to 112 in 6174a01

 # extract state encoding for each agent that we're returning Q for 

 s_encodings = [self.state_encoders[a_i](states[a_i]) for a_i in agents]

This is, state_encoder refers to the first MLP, and its output would be e_i, which only takes into account the state.
self.citics_encoder takes both state and action into accounts, and is used to get the e_j.

This is not any true evidence, just my self conclusion after taking a look to the code and the paper explanation.

from maac.

shariqiqbal2810 commented on July 28, 2024

Hi all, sorry for the confusion. The deviation of the code from the figure is described in the section of the paper entitled "Multi-Agent Advantage Function." Essentially, we want to calculate a Q-value for each possible action such that we can compute an advantage function, so we remove actions from the input and feed the state alone to a separate encoder from which we compute queries. I guess my intention was that the simplified version (with no advantage function) was easier to understand visually in a figure, but I realize how that can be misleading. Hope this clears things up!

from maac.

Critic encoders as shared modules ? about maac HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	state_encoder = nn.Sequential()
	if norm_in:
	state_encoder.add_module('s_enc_bn', nn.BatchNorm1d(
	sdim, affine=False))
	state_encoder.add_module('s_enc_fc1', nn.Linear(sdim,
	hidden_dim))
	state_encoder.add_module('s_enc_nl', nn.LeakyReLU())
	self.state_encoders.append(state_encoder)

	for sdim, adim in sa_sizes:
	idim = sdim + adim
	odim = adim
	encoder = nn.Sequential()
	if norm_in:
	encoder.add_module('enc_bn', nn.BatchNorm1d(idim,
	affine=False))
	encoder.add_module('enc_fc1', nn.Linear(idim, hidden_dim))
	encoder.add_module('enc_nl', nn.LeakyReLU())
	self.critic_encoders.append(encoder)

	for i, a_i, selector in zip(range(len(agents)), agents, curr_head_selectors):
	keys = [k for j, k in enumerate(curr_head_keys) if j != a_i]
	values = [v for j, v in enumerate(curr_head_values) if j != a_i]

	# extract state encoding for each agent that we're returning Q for
	s_encodings = [self.state_encoders[a_i](states[a_i]) for a_i in agents]