Multi-Agent Cooperation in Sequential Social Dilemmas

This project was completed as part of the senior requirement for Yale Computer Science. Spring 2019. More Information

This work is an implementation and exploration of current work in Multiagent Reinforcement Learning (MARL). It is highly recommended that you read the following two papers before diving in.

Quick Start

Switch to your virtual env
pip install -r requirements.txt
python train.py
python test.py ~/ray_results/prison_A3C/[training_instance]/ [checkpoint_num]

Training results are usually saved in your ray_results directory located in the root directory

Environments

Pycolab provides the abstraction for creating environments. Although this repository includes three environments, only the PrisonEnvironment has been fully developed and tested.

The PrisonEnvironment instantiates a gridworld variant of the classic Prisoner's Dilemma. At each step of the game, both agents independently choose to move left, move right, or stay still. The left side of the board represents full defection and the right side of the board represents full cooperation. Intermediate positions are a linear combination of the extremes. Rewards are distributed every 10 timesteps of the game. The figure below shows the corresponding rewards for four primary states of the game.

python play.py allows you to quickly run a manual version of the game. The script is extremely helpful when debugging the environment alone.

Learning

Reinforcement Learning is handled by RLLib. Currently all training is done using the A3C algorithm.

Unresolved Issues

When initializing A3C agents in test.py, asynchronous changes to the environment mess with the game visualization. One potential solution is to wait until the interactions with the environment have finished before starting the game. This only takes a few seconds. The better solution would be to fix the issue and submit a PR!

Related Works

Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., & Graepel, T. (2017). Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (pp. 464-473).
Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A. G., Dunning, I., Zhu, T., McKee, K., Koster, R., Tina Zhu, Roff, H., Graepel, T. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in Neural Information Processing Systems (pp. 3330-3340).
Jaques, N., Lazaridou, A., Hughes, E., Gulcehre, C., Ortega, P. A., Strouse, D. J., Leibo, J. Z. & de Freitas, N. (2018). Intrinsic Social Motivation via Causal Influence in Multi-Agent RL. arXiv preprint arXiv:1810.08647.
Credit to Sequential Social Dilemma Games for providing a useful example of RLLib.

aslansd / multiagent Goto Github PK

multiagent's Introduction

Multi-Agent Cooperation in Sequential Social Dilemmas

Quick Start

Environments

Learning

Unresolved Issues

Related Works

multiagent's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent