Giter VIP home page Giter VIP logo

model-based-marl's Introduction

Official PyTorch implementation of the paper "Scalable Model-based Policy Optimization for Decentralized Networked Systems", accepted by The 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

Paper link: Scalable Model-based Policy Optimization for Decentralized Networked Systems

Algorithms

  1. DMPO (Decentralized Model-based PO, Our method)
  2. DPPO (Decentralized PPO)
  3. CPPO (Centralized PPO)
  4. IC3Net (Individualized Controlled Continuous Communication Model)
  5. IA2C (Independent Advantage Actor-Critic)

Key parameters for decentralized algorithms:

  1. radius_v: communication radius for value function, 1,2,3....
  2. radius_pi: communication radius for policy, default 1
  3. radius_p: communication radius for environment model, default 1

Environments

  1. CACC Catchup
  2. CACC Slowdown
  3. Ring Attenuation
  4. Figure Eight
  5. ATSC Grid
  6. ATSC Monaco
  7. UAVFC (will be available soon)
  8. Custom Environments

Environment setup

CACC, Flow and ATSC Environments

CACC, Flow and ATSC are developed based on Sumo, you need to install the corresponding version of sumo as follows:

  1. SUMO installation. Version 1.11.0

The commit number of SUMO, available at https://github.com/eclipse/sumo used to run the results is 2147d155b1. To install SUMO, you are recommended to refer to https://sumo.dlr.de/docs/Installing/Linux_Build.html to install the specific version via repository checkout. Note that the latest version of SUMO is not compatible with Flow environments. In brief, after you checkout to that version, run the following command to build the SUMO binaries.

sudo apt-get install cmake python g++ libxerces-c-dev libfox-1.6-dev libgdal-dev libproj-dev libgl2ps-dev swig
cd <sumo_dir> # please insert the correct directory name here
export SUMO_HOME="$PWD"
mkdir build/cmake-build && cd build/cmake-build
cmake ../..
make -j$(nproc)

After building, you need to manually ad the bin folder into your path:

export PATH=$PATH:$SUMO_HOME/bin
  1. Setting up the environment.

It's recommended to set up the environment via Anaconda. The environment specification is in environment.yml. After installing the required packages, run

export PYTHONPATH="$SUMO_HOME/tools:$PYTHONPATH"

in terminal to include the SUMO python packages.

Custom Environments

We support both discrete and continuous action spaces. Similar to gym, you need to write reset and step functions. For more details please see algorithms/envs/Custom_Env.py

  1. reset ():
Input: None
Output: State → np.array((number of agent, dimension of state))
  1. step (action):
Input: Action → np.array((number of agent, dimension of action))
Output: State → np.array((number of agent, dimension of action)), Reward → np.array((number of agent,)), Done → np.array((number of agent,))
  1. You need to create a parameter file such as Catchup_CPPO.py in algorithms/config

Logging data during training

We uses WandB as logger.

  1. Setting up WandB.

Before running our code, you should log in to WandB locally. Please refer to https://docs.wandb.ai/quickstart for more detail.

Usage

Train the agent by:

python launcher.py --env ENV --algo ALGO --device DEVICE

ENV specifies which environment to run in, including eight, ring, catchup, slowdown, Grid, Monaco, custom_env_name.

ALGO specifies the algorithm to use, including IC3Net, CPPO, DPPO, DMPO, IA2C.

DEVICE specifies the device to use, including cpu, cuda:0, cuda:1, cuda:2...

such as:

python launcher.py --env 'slowdown' --algo 'DMPO' --device 'cuda:0'
python launcher.py --env 'catchup' --algo 'DPPO' --device 'cuda:0'

Test the agent by:

After trainging, the actors model will be saved in checkpoints/standard _xxx/Models/xxxbest_actor.pt, You just need to add following code in algorithms/algo/agent/DPPO.py(DMPO.py/CPPO.py/...):

self.actors.load_state_dict(torch.load(test_actors_model))

after initializing actors:

self.collect_pi, self.actors = self._init_actors()

where:

test_actors_model = 'checkpoints/standard _xxx/Models/xxxbest_actor.pt'

Results in video form

Description of the following videos in ATSC-Grid

ATSC-Grid Net

This is the network structure of 5*5 intersections in ATSC-Grid. The highlighted areas of the red frame are shown in the following videos and they are also the key area for displaying execution result in DPPO and DMPO. Through the key indicator "Insertion-backlogged vehicles" in the numerical panel on the left, we can observe that "Insertion-backlogged vehicles" will gradually increase with the load of the traffic flow, and then gradually decrease with the effective decision of the traffic light. The maximum value of this indicator is 1486 vehs in DPPO and 1033 vehs in DMPO, which indicates that DMPO can reduce the backlog of vehicles at intersections. Therefore, compared with DPPO, DMPO is more effective in solving traffic jams. Through the three intersections we focused on in the video, we can also observe that DMPO can make traffic jams last for less time.

Execution result of DPPO(Decentralized PPO) in ATSC-Grid

Execution.result.of.DPPO.in.ATSC-Grid.mp4

Execution result of DMPO(Our method) in ATSC-Grid

Execution.result.of.DMPO.in.ATSC-Grid.mp4

Description of the following videos in ATSC-Monaco

Real Net_point

This is a more challenging scenario with a heterogeneous network structure with diverse action and observation spaces: ATSC-Monaco traffic network with 28 intersections of real traffic network in Monaco city. The highlighted areas of the red frame are shown in the following videos and they are also the key area for displaying execution result in DPPO and DMPO. Through the key indicator "arrived vehicles" in the numerical panel on the left, we can observe that " arrived vehicles " will gradually increase with the load of the traffic flow. The maximum value of this indicator is 563 vehs in DPPO and 752 vehs in DMPO, which indicates that DMPO can reduce the backlog of vehicles at intersections and allow more vehicles to arrive at their destinations. Therefore, compared with DPPO, DMPO is more effective in solving traffic jams. Through the three intersections we focused on in the video, we can also observe that DMPO can make traffic jams last for less time.

Execution result of DPPO(Decentralized PPO) in ATSC-Monaco

Execution.result.of.DPPO.in.ATSC-Monaco.mp4

Execution result of DMPO(Our method) in ATSC-Monaco

Execution.result.of.DMPO.in.ATSC-Monaco.mp4

Description of the following videos in UAVFC

UAVFC

This is a 5×5 formation grid of UAV swarm. The objective of UAVFC is to reach the destination, avoid forest and maintain formation. We tested the performance of different algorithms under the same scenario. From the following videos, we can observe that the swarm of UAVs all approached their destination, avoided the trees and maintained the formation in DMPO, but in this process there are some UAVs out of the formation in CPPO.

Execution results of DMPO(Our method) and CPPO(Centralized PPO) in UAVFC

Execution.results.of.DMPO.CPPO.in.UAVFC.mp4

Cite

Please cite our paper if you use the code or datasets in your own work:

@article{du2022fully,
  title={Scalable Model-based Policy Optimization for Decentralized Networked Systems},
  author={Du, Yali and Ma, Chengdong and Liu, Yuchen and Lin, Runji and Dong, Hao and Wang, Jun and Yang, Yaodong},
  journal={arXiv preprint arXiv:2207.06559},
  year={2022}
}

model-based-marl's People

Contributors

cdm1619 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.