tinyzqh / light_mappo Goto Github PK

Lightweight version of MAPPO to help you quickly migrate to your local environment.

Python 99.57% Shell 0.43%

light_mappo's Issues

在使用连续动作空间时，输出的动作取值无法设置上下限

师兄您好，我在调整env_continuous.py文件中第30行代码u_action_space = spaces.Box(low=0.0, high=90.0, shape=(self.signal_action_dim,), dtype=np.float32)后，action取值并没有限制在0和90之间，请问师兄这是为什么呢?谢谢师兄了。

训练效果查看

请问训练结束后，得到logs 和 models 怎么查看和使用？log

s使用tensoboard进行查看吗，怎么来加载模型测试来查看效果呢？

ModuleNotFoundError: No module named 'envs.env_core'

from envs.env_core import EnvCore

How to render if using customized environment?

I am using my customized env and want to render in 'human' and 'rgb_array' mode. Could you please give some examples or implementation?

加入自己的环境，使用env_continuous时碰到的问题

在自己修改代码后，选择的是continuous env，智能体separated policy更新action，但是env_runner.py中的 collect 函数这里只有MultiDiscrete 和Discrete两个选项，没有Box选项，请问这个情况要怎么处理？感谢！

请问env_continuous文件在哪里

直接把env_discrete文件里的 self.discrete_action_input = False是不是就是用于连续动作

一次回合结束时重置环境导致obs发生变化

在env_wrappers.py中，step_wait()的"obs[i] = self.envs[i].reset()"判断episode是否结束，这里将reset之后的观测值传给了obs[i]，导致episode结束的那一刻的obs被覆盖。这样赋值是否不妥？因为reset之后的obs可以认为是随机的，不应该将其赋给obs[i]，而应该直接调用"self.envs[i].reset()"？

env

給出的范例只有 sub-agnet_obs ，这里是不是没有特别区分观测信息与全局状态信息？这里的 sub_agent_obs 就是指智能体的部分观测信息的列表吗？那这样全局信息是怎么处理的呢，就是把部分观测信息的融合作为全局信息？

选use_eval的时候运行报错NotImplementedError

是不是连续动作空间的环境不能用eval

Traceback (most recent call last):
File "G:\lcz\mappo\train\train.py", line 149, in
main(sys.argv[1:])
File "G:\lcz\mappo\train\train.py", line 137, in main
runner.run()
File "G:\lcz\mappo\runner\shared\env_runner.py", line 88, in run
self.eval(total_num_steps)
File "C:\Users\ljh99\anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "G:\lcz\mappo\runner\shared\env_runner.py", line 183, in eval
raise NotImplementedError
NotImplementedError

回合结束后没有reset环境

RT，runner在训练的时候，如果环境给出了done，runner没有进行reset，在某些环境中可能会导致不收敛

Action mask?

您好！如果agent的动作维度不一致时，light-mappo如何进行action mask？

自己的环境观测空间是box时该如何修改代码呢？

每个智能体的观测空间是一个3宽高的Box，那么在env_core.py中self.obs_dim该如何设置呢？以及env_discrete.py里的observation_space又该如何设置？最后是否还有其他要做特别修改的地方吗？

训练问题

训练出来的模型真的有效果吗？

env_runner warmup()函数位置

env_runner warmup()函数位置在episode外

如何实现论文里的可视化呢

share_policy置False时出错

报错定位于runner/separated/env_runner.py中的collcet函数中的
actions = np.array(actions).transpose(1, 0, 2)
初步排查发现在这句代码上面的循环中，当agent_id取1时，生成的动作的shape与agent_id取0时不同

VecEnvWrapper使用

There is the mistake in the env_wrappers.py that is the VecEnvWrappercan is the unresolved reference.

请问这个修改了环境之后具体怎么跑？

如果想换policy，怎么换？

MAPPO-L

Thanks very much for your codes.

Have you considered to extend it into other variants of MAPPO, such as MAPPO-L?

How to set continuous action

I want to use continuous actions, but an error is reported after setting the self.discrete_action_space in the environment to false.

tinyzqh / light_mappo Goto Github PK

light_mappo's Issues

Recommend Projects

Recommend Topics

Recommend Org