Light

thisaul / maddpg_mpe Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 3.03 MB

Tensorflow Implementation of MADDPG in Multiagent-Particle Environments

License: MIT License

Python 72.59% Jupyter Notebook 27.41%

maddpg_mpe's Introduction

MADDPG_MPE

Tensorflow Implementation of the MADDPG algorithm from Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (Lowe et. al. 2017)

Requirements

Multiagent-Particle Environments (MPE)
dependencies: Python (3.5.4), OpenAI Gym (0.10.5), NumPy (1.14.5)

Code structure

maddpg_mpe.ipynb: core code for the MADDPG algorithm and different training scenarios in MPE
make_env.py: contains code for importing a multiagent environment as an OpenAI Gym-like object
replay_buffer.py: replay buffer code for MADDPG
distributions.py: useful probability distributions
utils.py: useful tensorflow functions

Training parameters

MAX_EPISODE_LEN: maximum length of each episode for the environment (default: 25)
NUM_EPISODES: total number of training episodes (default: 25000)
LR: learning rate (default: 1e-2)
BATCH_SIZE: batch size (default: 1024)
NUM_UNITS: number of units in the MLP (default: 64)

Training scenarios

Name in paper	Env name	Notes
Cooperative Communication	`simple_speaker_listener.py`	One agent is the ‘speaker’ (gray) that does not move (observes goal of other agent), and other agent is the listener (cannot speak, but must navigate to correct landmark).
Predator-Prey	`simple_tag.py`	Good agents (green) are faster and want to avoid being hit by adversaries (red). Adversaries are slower and want to hit good agents. Obstacles (large black circles) block the way.
Cooperative Navigation	`simple_spread.py`	N agents, N landmarks. Agents are rewarded based on how far any agent is from each landmark. Agents are penalized if they collide with other agents. So, agents have to learn to cover all the landmarks while avoiding collisions.
Physical Deception	`simple_adversary.py`	1 adversary (red), N good agents (green), N landmarks (usually N=2). All agents observe position of landmarks and other agents. One landmark is the ‘target landmark’ (colored green). Good agents rewarded based on how close one of them is to the target landmark, but negatively rewarded if the adversary is close to target landmark. Adversary is rewarded based on how close it is to the target, but it doesn’t know which landmark is the target landmark. So good agents have to learn to ‘split up’ and cover all landmarks to deceive the adversary.

maddpg_mpe's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.