Giter VIP home page Giter VIP logo

off-policy's Introduction

Off-Policy Multi-Agent Reinforcement Learning (MARL) Algorithms

This repository contains implementations of various off-policy multi-agent reinforcement learning (MARL) algorithms.

Authors: Akash Velu and Chao Yu

Algorithms supported:

  • MADDPG (MLP and RNN)
  • MATD3 (MLP and RNN)
  • QMIX (MLP and RNN)
  • VDN (MLP and RNN)

Environments supported:

1. Usage

WARNING #1: by default all experiments assume a shared policy by all agents i.e. there is one neural network shared by all agents

WARNING #2: only QMIX and MADDPG are thoroughly tested; however,our VDN and MATD3 implementations make small modifications to QMIX and MADDPG, respectively. We display results using our implementation here.

All core code is located within the offpolicy folder. The algorithms/ subfolder contains algorithm-specific code for all methods. RMADDPG and RMATD3 refer to RNN implementationso of MADDPG and MATD3, and mQMIX and mVDN refer to MLP implementations of QMIX and VDN. We additionally support prioritized experience replay (PER).

  • The envs/ subfolder contains environment wrapper implementations for the MPEs and SMAC.

  • Code to perform training rollouts and policy updates are contained within the runner/ folder - there is a runner for each environment.

  • Executable scripts for training with default hyperparameters can be found in the scripts/ folder. The files are named in the following manner: train_algo_environment.sh. Within each file, the map name (in the case of SMAC and the MPEs) can be altered.

  • Python training scripts for each environment can be found in the scripts/train/ folder.

  • The config.py file contains relevant hyperparameter and env settings. Most hyperparameters are defaulted to the ones used in the paper; however, please refer to the appendix for a full list of hyperparameters used.

2. Installation

Here we give an example installation on CUDA == 10.1. For non-GPU & other CUDA version installation, please refer to the PyTorch website.

# create conda environment
conda create -n marl python==3.6.1
conda activate marl
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
# install on-policy package
cd on-policy
pip install -e .

Even though we provide requirement.txt, it may have redundancy. We recommend that the user try to install other required packages by running the code and finding which required package hasn't installed yet.

2.1 Install StarCraftII 4.10

unzip SC2.4.10.zip
# password is iagreetotheeula
echo "export SC2PATH=~/StarCraftII/" > ~/.bashrc

2.2 Install MPE

# install this package first
pip install seaborn

There are 3 Cooperative scenarios in MPE:

  • simple_spread
  • simple_speaker_listener, which is 'Comm' scenario in paper
  • simple_reference

3.Train

Here we use train_mpe_maddpg.sh as an example:

cd offpolicy/scripts
chmod +x ./train_mpe_maddpg.sh
./train_mpe_maddpg.sh

Local results are stored in subfold scripts/results. Note that we use Weights & Bias as the default visualization platform; to use Weights & Bias, please register and login to the platform first. More instructions for using Weights&Bias can be found in the official documentation. Adding the --use_wandb in command line or in the .sh file will use Tensorboard instead of Weights & Biases.

4. Results

Results for the performance of RMADDPG and QMIX on the Particle Envs and QMIX in SMAC are depicted here. These results are obtained using a normal (not prioitized) replay buffer.

off-policy's People

Contributors

akashvelu avatar zbzhu99 avatar zoeyuchao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

off-policy's Issues

Questions on the meaning of what wandb records

As I work with this code, I find that what wandb records is somewhat different from what I intuitively expect.

When I try to train mqmix with MPE environment, in the directory, 'off-policy/offpolicy/runner/mlp/base_runner.py', the function 'batch_train_q' has a loop to call 'self.trainer.train_policy_on_batch'. In this case, train_policy_on_batch in offpolicy/algorithms/mqmix/mqmix.py will be called, and the global and local q functions will be updated with returning a train_info.

It seems that the train_info labeled with policy ids doesn't represent the differences between different policies, but rather the differences between each training in the loop. And this can also confirmed in wandb as the figures have little difference.

RuntimeError: CUDA error: an illegal memory access was encountered

Traceback (most recent call last):
File "train_mpe.py", line 157, in
main(sys.argv[1:])
File "train_mpe.py", line 147, in main
total_num_steps = runner.run()
File "D:\off-policy-release\offpolicy\runner\mlp\base_runner.py", line 153, in run
env_info = self.collecter(explore=True, training_episode=True, warmup=False)
File "D:\off-policy-release\offpolicy\runner\mlp\mpe_runner.py", line 145, in shared_collect_rollout
self.train()
File "D:\off-policy-release\offpolicy\runner\mlp\base_runner.py", line 189, in batch_train
train_info, new_priorities, idxes = update(p_id, sample)
File "D:\off-policy-release\offpolicy\algorithms\maddpg\maddpg.py", line 117, in shared_train_policy_on_batch
rewards = to_torch(rewards).to(**self.tpdv).view(-1, 1)
RuntimeError: CUDA error: an illegal memory access was encountered
When I run the maddpg, it encount cuda error

Bug with idx_range, causing error with Prioritized ER

Describe the bug
When using PER with QMIX, an issue arises with the idx_range returned by the insert function of RecPolicyBuffer:

line 267, in insert
for idx in range(idx_range[0], idx_range[1]):
IndexError: index 1 is out of bounds for axis 0 with size 1`.

The reason seems to be that the insert function takes as first parameter the number of episodes to insert, instead of the number of steps (as the function description explains it).

To try fixing the issue, I computed the number of steps to insert, from the number of episodes, as such:
from line 164:

episode_length = acts.shape[0]
assert episode_length == self.episode_length, ("different dimension!")
# My line
number_insert_steps = num_insert_episodes * episode_length

And then I replace num_insert_episode by num_insert_steps in the rest of the function.

This seems to work.
However, I am not completely sure that it was intended like that.

Tell me if I am wrong and the issue is related to something else.

need a help

Hello, I have encountered some problems. I wonder if you can help me.
that:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
wandb: Currently logged in as: zhouweiqing (use wandb login --relogin to force relogin)
wandb: ERROR Error while calling W&B API: project not found (<Response [404]>)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.wandb.ai/graphql
i can not solve it

环境可视化问题

请问我运行项目以后只能得到weight&biass平台的数据指标,但不能把simple_XXX.py环境渲染出来是为什么啊?

Run time

How much time is usually needed when running on mpe by Qmix?

Error with wandb

Hello, I have encountered some problems. I wonder if you can help me.
that:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
wandb: Currently logged in as: zhouweiqing (use wandb login --relogin to force relogin)
wandb: ERROR Error while calling W&B API: project not found (<Response [404]>)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.wandb.ai/graphql
i can not solve it

Originally posted by @zhouweiqing-star in #3 (comment)

mqmix hypernet b2

in mqmix mixer

self.hyper_b2 = nn.Sequential(
init_(nn.Linear(self.cent_obs_dim, self.hypernet_hidden_dim)),
nn.ReLU(),
init_(nn.Linear(self.hypernet_hidden_dim, 1))
).to(self.device)

should be

self.hyper_b2 = nn.Sequential(
init_(nn.Linear(self.cent_obs_dim, self.mixer_hidden_dim)),
nn.ReLU(),
init_(nn.Linear(self.mixer_hidden_dim, 1))
).to(self.device)

?

Can you open-source MASAC code base?

Hello,
Thanks for open-sourcing a really good work. I was wondering if you guys can open-source the MASAC code base as it would help to understand the variations of MASAC with MADDPG. Thanks, in advance for the help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.