Giter VIP home page Giter VIP logo

yangchen1997 / multi-agent-reinforcement-learning Goto Github PK

View Code? Open in Web Editor NEW
179.0 179.0 23.0 280 KB

PyTorch implements multi-agent reinforcement learning algorithms, including QMIX, Independent PPO, Centralized PPO, Grid Wise Control, Grid Wise Control+PPO, Grid Wise Control+DDPG.

License: MIT License

Python 100.00%
centralized-ppo grid-wise-control independent-ppo multi-agent-reinforcement-learning pettingzoo pytorch qmix

multi-agent-reinforcement-learning's People

Contributors

detectivecode avatar tianyu-z avatar yangchen1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multi-agent-reinforcement-learning's Issues

请问有运行结果吗?

谢谢作者的实现,很有帮助!
我这边也实现了相关算法在pettingzoo的simple_spread_v2的代码,最后能收敛到 -800 分左右。
max_cycle = 100
n_agents = 3
local_ratio = 0.5
最后收敛到-800左右,您这边有运行的结果吗,想和您对一下,看看算法实现和环境是否有问题。

这个pettingzoo的版本是不是写错了,并没有1.12.0这个版本呀?

ERROR: Could not find a version that satisfies the requirement PettingZoo==1.12.0 (from versions: 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.1.10, 0.1.11, 0.1.12, 0.1.13, 0.1.14, 1.0.0, 1.0.1, 1.1.0, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.1, 1.19.0, 1.20.1, 1.21.0, 1.22.0, 1.22.1, 1.22.2)
ERROR: No matching distribution found for PettingZoo==1.12.0

系统提示没有1.12.0这个版本的pettingzoo

可视化

您好,请问训练好模型之后怎么做可视化呢?像 readme 里面那样?

您好

Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI 这个是这个论文的代码吗

Independent PPO 算法中Actor loss计算问题

您好,我在阅读ippo算法的代码时,有一些地方没看懂,有几个问题想请教一下

1.在Independent PPO算法中计算Actor网络的loss中,我理解的是把每个agent的loss加起来算一个平均值,请问为什么要这样计算呢?

curr_log_probs = []
curr_state_values = []
# 这里在一次更新中使用了每个agent采样的数据来计算loss
for agent_num in range(self.n_agents):
    one_action_mean, self.rnn_hidden[i] = self.ppo_actor(obs[:, i], self.rnn_hidden[i])
    curr_state_value = self.ppo_critic(obs[:, i])
    dist = MultivariateNormal(one_action_mean, self.cov_mat)
    curr_log_prob = dist.log_prob(actions[:, i])
    curr_log_probs.append(curr_log_prob)
    curr_state_values.append(curr_state_value)
curr_log_probs = torch.stack(curr_log_probs, dim=1)
curr_state_values = torch.stack(curr_state_values, dim=0)

我看到有一些其他的ippo实现中,每个agent采样的数据分别保存在自己的replay buffer中,然后每次策略更新时只使用一个agent采样到的数据更新,请问这两种方式有区别吗?

2.ippo网络结构中rnn网络的使用

one_action_mean, self.rnn_hidden[i] = self.ppo_actor(obs[:, i], self.rnn_hidden[i])

在上面的代码中,我理解self.rnn_hidden[i]这个向量每次更新时只使用了一次,那是不是这样写也可以:

one_action_mean, _ = self.ppo_actor(obs[:, i], self.rnn_hidden[i])

而且我注意到每次self.rnn_hidden[i]都初始化为全零,那是不是给actor网络传入一个全零向量就可以了?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.