instadeepai / mava Goto Github PK

View Code? Open in Web Editor NEW

677.0 677.0 83.0 275.51 MB

🦁 A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX

License: Apache License 2.0

Dockerfile 0.36% Makefile 0.19% Python 99.43% JavaScript 0.02%

jax marl multi-agent-reinforcement-learning multi-agent-systems multiagent reinforcement-learning research

mava's People

Contributors

Stargazers

Watchers

Forkers

minghao2016 buddih09 arita37 billmanai ljarendse wangguangyuan trendingtechnology yuanleirl blankslide ozzyxu bbpatil kkkclearlove baagie7 qitianma liuweiping2020 mlanas wavhudimulaudzi ascrypto nacef-labidi kulikov-vision zayh921 mohamedjihedriahi malsenwi veria70 yangkaia123 zhangshuhao0928 ruanjohn jcformanek sash-a dliofindia vananle bladesaber 1998x-stack igabriel85 ffxu1024 yyds-xtt xiaohuojianchendiwen biogeek felixchalumeau mincheolseong siddarthsingh1 cool-rr shiyuzh2007 mttga djmmax timefly-1989 ljk314 antesha alaterre pddub97 takieddinesoualhi aaabbbcq dickensvoler ziksby edantoledo callumtilbury maelh kwangki-kim lianhaoyin timityjoe aslansd urela siot-decada-robotics miguelangelmy ramiribat awabalii obinnaokechukwu eric-ho-matrix toksjazz libin-star sumerudataanalaytics eltociear cassimahmedattia bdevan5 ahmed-5 wiemkhlifi simondutoit jemmaldaniel louay-ben-nessir deyh2020 sandguine arnaudgardille

mava's Issues

pre-commit vs github action differ

Add component: networked architecture

Adapt MADDPG trainer to use recurrent execution

Create continuous action wrapper for OpenAI MPE environment

OpenAI MPE repo: https://github.com/openai/multiagent-particle-envs

Implement module: Communication

Fix _transform_observations called per agent problem

Try and resolve the problem with calling _transform_observations for each agent even though it is the same calculation. It has its own loop over all agents. Also, try and do a batch update of all networks instead of the sequential updates that are currently done. This is mostly to do with the shared networks between agents that are getting updated sequentially. This might introduce some problem where agent order determines the effect it has on shared network weights, which we do not want.

Get IDQN example working on flatland

Implement distributed version of MADDPG

Implement sequence adder

Similar to sequence adder in acme but for MARL (see here https://github.com/deepmind/acme/blob/master/acme/adders/reverb/sequence.py for single agent)

Implement additional logging metrics

Metric to track during training:

mean/std/min/max for the following:

for cumulative rewards
episode length
value function estimates
losses for the objectives
exploration parameters (like mean entropy for stochastic policy optimization, or current epsilon for epsilon-greedy as in DQN)

Implement baseline: MAPPO

Get IDQN training on simple env

Implement additive and monotonic mixing modules.

Fix mypy import issue

Error - Module [] has no attribute [] . Mainly in nested __init__.py files.

Implement Network Stats Wraper.

Notification/Flag when training is being resumed or using a checkpoint

It is hard to tell when training from scratch or resuming from a checkpoint. A flag to opt in/out for resuming should be added (if not already there) and some indication that a checkpoint is being resumed.

Fix recurrent relationship in centralised architectures

Recurrent relationship of _embed_spec in Centralised and Decentralised architectures

Implement distributed version of MADQN

Implement observation and reward scaling wrappers

Best practice advice:

Make sure everything is reasonably scaled.

Rule of thumb:

Observations: Make everything mean 0, standard deviation 1.
Reward: If you control it, then scale it to a reasonable value.
Do it across ALL your data so far.
Look at all observations and rewards and make sure there aren't crazy outliers

Fix memory leak issue

It seems that the RAM used throughout training keeps increasing as the training progresses. This might be due to some memory leakage problem.

Implement checkpointing

This will allow for periodic saving of the system networks and loading it again to resume training.

Implement fingerprints for MADQN

Implement A2C critic architecture

Implement baseline: Qmix

Refactor architectures to include actor only architecture types

Give maddpg builder config option to set optimiser

Implement baseline: COMA

Fix training error

The agents are not learning anymore. Investigate why that is and fix it.

Handle continous legal actions.

Implement basic environment from Qmix paper.

Move networks out of mixing modules.

Use parallel adder for Sequential Envs

Consider creating base env wrapper

Implement baseline: DIAL

Implement prison environment from DIAL paper

Create MA-DDPG trainer that uses a recurrent executor

Add/cleanup doc strings and comments

Handle discrete legal actions for MADQN

Implement logging

Add some gaurdrails to repo

Implement Mava sequence adder

Diagnostics for MADDPG training

Recurrent executor

General MARL env loop

This is in connection with implementing logging metric #27. If we have one general MARL env loop, we will only have to implement the metric logging function once. Then we can have all the other env inherit this. Similar argument goes for other functions associated with the env loop that can be shared across different envs.

Env multi-agent gym wrapper from RLlib

Fix kramdown security flaw

There is currently a security flaw in v2.3.0 of kramdown (used in compiling github pages) - GHSA-52p9-v744-mwjj. Unfortunately, the latest version of GitHub pages (213) is locked on v2.3.0 of kramdown. We need to wait for them to upgrade.

Our side the change have been made in branch - https://github.com/mava-team/mava/tree/feature/fix-kramdown-security-issue .

Issue submitted to github pages repo - tschaub/gh-pages#380 .