xuehaipan / mate Goto Github PK

MATE: the Multi-Agent Tracking Environment.

Home Page: https://mate-gym.readthedocs.io

License: MIT License

Python 90.99% Shell 9.01%

multi-agent-reinforcement-learning openai-gym openai-gym-environment reinforcement-learning reinforcement-learning-algorithms reinforcement-learning-environment

mate's Introduction

MATE: the Multi-Agent Tracking Environment

This repo contains the source code of MATE, the Multi-Agent Tracking Environment. The full documentation can be found at https://mate-gym.readthedocs.io. The full list of implemented agents can be found in section Implemented Algorithms. For detailed description, please checkout our paper (PDF, bibtex).

This is an asymmetric two-team zero-sum stochastic game with partial observations, and each team has multiple agents (multiplayer). Intra-team communications are allowed, but inter-team communications are prohibited. It is cooperative among teammates, but it is competitive among teams (opponents).

Installation

git config --global core.symlinks true  # required on Windows
pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate

NOTE: Python 3.7+ is required, and Python versions lower than 3.7 is not supported.

It is highly recommended to create a new isolated virtual environment for MATE using conda:

git clone https://github.com/XuehaiPan/mate.git && cd mate
conda env create --no-default-packages --file conda-recipes/basic.yaml  # or full-cpu.yaml to install RLlib
conda activate mate

Getting Started

Make the MultiAgentTracking environment and play!

import mate

# Base environment for MultiAgentTracking
env = mate.make('MultiAgentTracking-v0')
env.seed(0)
done = False
camera_joint_observation, target_joint_observation = env.reset()
while not done:
    camera_joint_action, target_joint_action = env.action_space.sample()  # your agent here (this takes random actions)
    (
        (camera_joint_observation, target_joint_observation),
        (camera_team_reward, target_team_reward),
        done,
        (camera_infos, target_infos)
    ) = env.step((camera_joint_action, target_joint_action))

Another example with a built-in single-team wrapper (see also Built-in Wrappers):

import mate

env = mate.make('MultiAgentTracking-v0')
env = mate.MultiTarget(env, camera_agent=mate.GreedyCameraAgent(seed=0))
env.seed(0)
done = False
target_joint_observation = env.reset()
while not done:
    target_joint_action = env.action_space.sample()  # your agent here (this takes random actions)
    target_joint_observation, target_team_reward, done, target_infos = env.step(target_joint_action)

4 Cameras vs. 8 Targets (9 Obstacles)

Examples and Demos

mate/evaluate.py contains the example evaluation code for the MultiAgentTracking environment. Try out the following demos:

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 2 targets, 9 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-4v2-9.yaml

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 8 targets, 9 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-4v8-9.yaml

# <MultiAgentTracking<MultiAgentTracking-v0>>(8 cameras, 8 targets, 9 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-8v8-9.yaml

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 8 targets, 0 obstacle)
python3 -m mate.evaluate --episodes 1 --config MATE-4v8-0.yaml

# <MultiAgentTracking<MultiAgentTracking-v0>>(0 camera, 8 targets, 32 obstacles)
python3 -m mate.evaluate --episodes 1 --config MATE-Navigation.yaml

4 Cameras vs. 2 Targets (9 obstacles)	4 Cameras vs. 8 Targets (9 obstacles)	8 Cameras vs. 8 Targets (9 obstacles)	4 Cameras vs. 8 Targets (no obstacles)	8 Targets Navigation (no cameras)

You can specify the agent classes and arguments by:

python3 -m mate.evaluate --camera-agent module:class --camera-kwargs <JSON-STRING> --target-agent module:class --target-kwargs <JSON-STRING>

You can find the example code for agents in examples. The full list of implemented agents can be found in section Implemented Algorithms. For example:

# Example demos in examples
python3 -m examples.naive

# Use the evaluation script
python3 -m mate.evaluate --episodes 1 --render-communication \
    --camera-agent examples.greedy:GreedyCameraAgent --camera-kwargs '{"memory_period": 20}' \
    --target-agent examples.greedy:GreedyTargetAgent \
    --config MATE-4v8-9.yaml \
    --seed 0

You can implement your own custom agents classes to play around. See Make Your Own Agents for more details.

Environment Configurations

The MultiAgentTracking environment accepts a Python dictionary mapping or a configuration file in JSON or YAML format. If you want to use customized environment configurations, you can copy the default configuration file:

cp "$(python3 -m mate.assets)"/MATE-4v8-9.yaml MyEnvCfg.yaml

Then make some modifications for your own. Use the modified environment by:

env = mate.make('MultiAgentTracking-v0', config='/path/to/your/cfg/file')

There are several preset configuration files in mate/assets directory.

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 camera, 2 targets, 9 obstacles)
env = mate.make('MATE-4v2-9-v0')

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 camera, 8 targets, 9 obstacles)
env = mate.make('MATE-4v8-9-v0')

# <MultiAgentTracking<MultiAgentTracking-v0>>(8 camera, 8 targets, 9 obstacles)
env = mate.make('MATE-8v8-9-v0')

# <MultiAgentTracking<MultiAgentTracking-v0>>(4 camera, 8 targets, 0 obstacles)
env = mate.make('MATE-4v8-0-v0')

# <MultiAgentTracking<MultiAgentTracking-v0>>(0 camera, 8 targets, 32 obstacles)
env = mate.make('MATE-Navigation-v0')

You can reinitialize the environment with a new configuration without creating a new instance:

>>> env = mate.make('MultiAgentTracking-v0', wrappers=[mate.MoreTrainingInformation])  # we support wrappers
>>> print(env)
<MoreTrainingInformation<MultiAgentTracking<MultiAgentTracking-v0>>(4 cameras, 8 targets, 9 obstacles)>

>>> env.load_config('MATE-8v8-9.yaml')
>>> print(env)
<MoreTrainingInformation<MultiAgentTracking<MultiAgentTracking-v0>>(8 cameras, 8 targets, 9 obstacles)>

Besides, we provide a script mate/assets/generator.py to generate a configuration file with responsible camera placement:

python3 -m mate.assets.generator --path 24v48.yaml --num-cameras 24 --num-targets 48 --num-obstacles 20

See Environment Customization for more details.

Built-in Wrappers

MATE provides multiple wrappers for different settings. Such as fully observability, discrete action spaces, single team multi-agent, etc. See Built-in Wrappers for more details.

Wrapper		Description
observation	`EnhancedObservation`	Enhance the agent’s observation, which sets all observation mask to `True`.
	`SharedFieldOfView`	Share field of view among agents in the same team, which applies the `or` operator over the observation masks. The target agents share the empty status of warehouses.
	`MoreTrainingInformation`	Add more environment and agent information to the `info` field of `step()`, enabling full observability of the environment.
	`RescaledObservation`	Rescale all entity states in the observation to [-1, +1].
	`RelativeCoordinates`	Convert all locations of other entities in the observation to relative coordinates.
action	`DiscreteCamera`	Allow cameras to use discrete actions.
action	`DiscreteTarget`	Allow targets to use discrete actions.
reward	`AuxiliaryCameraRewards`	Add additional auxiliary rewards for each individual camera.
reward	`AuxiliaryTargetRewards`	Add additional auxiliary rewards for each individual target.
single-team	`MultiCamera`	Wrap into a single-team multi-agent environment.
	`MultiTarget`	Wrap into a single-team multi-agent environment.
	`SingleCamera`	Wrap into a single-team single-agent environment.
	`SingleTarget`	Wrap into a single-team single-agent environment.
communication	`MessageFilter`	Filter messages from agents of intra-team communications.
	`RandomMessageDropout`	Randomly drop messages in communication channels.
	`RestrictedCommunicationRange`	Add a restricted communication range to channels.
	`NoCommunication`	Disable intra-team communications, i.e., filter out all messages.
	`ExtraCommunicationDelays`	Add extra message delays to communication channels.
miscellaneous	`RepeatedRewardIndividualDone`	Repeat the `reward` field and assign individual `done` field of `step()`, which is similar to MPE.

You can create an environment with multiple wrappers at once. For example:

env = mate.make('MultiAgentTracking-v0',
                wrappers=[
                    mate.EnhancedObservation,
                    mate.MoreTrainingInformation,
                    mate.WrapperSpec(mate.DiscreteCamera, levels=5),
                    mate.WrapperSpec(mate.MultiCamera, target_agent=mate.GreedyTargetAgent(seed=0)),
                    mate.RepeatedRewardIndividualDone,
                    mate.WrapperSpec(mate.AuxiliaryCameraRewards,
                                     coefficients={'raw_reward': 1.0,
                                                   'coverage_rate': 1.0,
                                                   'soft_coverage_score': 1.0,
                                                   'baseline': -2.0}),
                ])

Implemented Algorithms

The following algorithms are implemented in examples:

Rule-based:
1. Random (source: mate/agents/random.py)
2. Naive (source: mate/agents/naive.py)
3. Greedy (source: mate/agents/greedy.py)
4. Heuristic (source: mate/agents/heuristic.py)
Multi-Agent Reinforcement Learning Algorithms:
1. IQL (https://arxiv.org/abs/1511.08779)
2. QMIX (https://arxiv.org/abs/1803.11485)
3. MADDPG (MA-TD3) (https://arxiv.org/abs/1706.02275)
4. IPPO (https://arxiv.org/abs/2011.09533)
5. MAPPO (https://arxiv.org/abs/2103.01955)
Multi-Agent Reinforcement Learning Algorithms with Multi-Agent Communication:
1. TarMAC (base algorithm: IPPO) (https://arxiv.org/abs/1810.11187)
2. TarMAC (base algorithm: MAPPO)
3. I2C (base algorithm: MAPPO) (https://arxiv.org/abs/2006.06455)
Population Based Adversarial Policy Learning, available meta-solvers:
1. Self-Play (SP)
2. Fictitious Self-Play (FSP) (https://proceedings.mlr.press/v37/heinrich15.html)
3. PSRO-Nash (NE) (https://arxiv.org/abs/1711.00832)

NOTE: all learning-based algorithms are tested with Ray 1.12.0 on Ubuntu 20.04 LTS.

Citation

If you find MATE useful, please consider citing:

@inproceedings{pan2022mate,
  title     = {{MATE}: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control},
  author    = {Xuehai Pan and Mickel Liu and Fangwei Zhong and Yaodong Yang and Song-Chun Zhu and Yizhou Wang},
  booktitle = {Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year      = {2022},
  url       = {https://openreview.net/forum?id=SyoUVEyzJbE}
}

License

MIT License

mate's People

Contributors

Stargazers

Watchers

Forkers

unrealtracking timefly-1989 ignaciocarlucho everardog lizat-i luorq3 kxyzc1992 natiy4

mate's Issues

Error in evaluate

I have come across an error when running the evaluation script (evaluate.py), and I believe it may be a bug in the framework.

python3 -m evaluate --episodes 1 --config MATE-4v2-9.yaml

Error Description:
When executing the script, I receive the following error message:

(mate) admire@admire-System-Product-Name:~/mate/mate/mate$ python3 -m evaluate --episodes 1 --config MATE-4v2-9.yaml
/home/admire/anaconda3/envs/mate/lib/python3.8/site-packages/pkg_resources/__init__.py:121: DeprecationWarning: pkg_resources is deprecated as an API
  warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)
/home/admire/anaconda3/envs/mate/lib/python3.8/site-packages/gym/utils/seeding.py:47: DeprecationWarning: WARN: Function `rng.randint(low, [high, size, dtype])` is marked as deprecated and will be removed in the future. Please use `rng.integers(low, [high, size, dtype])` instead.
  deprecation(
/home/admire/mate/mate/mate/entities.py:585: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
  self.empty_bits = np.zeros(consts.NUM_WAREHOUSES, dtype=np.bool8)
/home/admire/mate/mate/mate/environment.py:476: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
  (self.num_cameras, self.num_targets), dtype=np.bool8
Traceback (most recent call last):
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/admire/mate/mate/mate/evaluate.py", line 481, in <module>
    main()
  File "/home/admire/mate/mate/mate/evaluate.py", line 403, in main
    env = mate.make('MultiAgentTracking-v0', config=args.config, wrappers=wrappers)
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/site-packages/gym/envs/registration.py", line 676, in make
    return registry.make(id, **kwargs)
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/site-packages/gym/envs/registration.py", line 520, in make
    return spec.make(**kwargs)
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/site-packages/gym/envs/registration.py", line 137, in make
    env = self.entry_point(**_kwargs)
  File "/home/admire/mate/mate/mate/__init__.py", line 41, in make_environment
    env = wrapper(env)
  File "/home/admire/mate/mate/mate/wrappers/typing.py", line 103, in __call__
    return self.wrapper(env, *self.args, **self.kwargs)
  File "/home/admire/mate/mate/mate/wrappers/single_team.py", line 307, in __init__
    super().__init__(env, team=Team.TARGET, opponent_agent=camera_agent)
  File "/home/admire/mate/mate/mate/wrappers/single_team.py", line 185, in __init__
    self.opponent_agents_ordered = opponent_agent.spawn(self.num_opponents)
  File "/home/admire/mate/mate/mate/agents/base.py", line 105, in spawn
    return [self.clone() for _ in range(num_agents)]
  File "/home/admire/mate/mate/mate/agents/base.py", line 105, in <listcomp>
    return [self.clone() for _ in range(num_agents)]
  File "/home/admire/mate/mate/mate/agents/base.py", line 98, in clone
    clone = copy.deepcopy(self)
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/copy.py", line 269, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/copy.py", line 229, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/admire/anaconda3/envs/mate/lib/python3.8/copy.py", line 263, in _reconstruct
    y = func(*args)
TypeError: _generator_ctor() takes from 0 to 1 positional arguments but 2 were given

I have made sure that all the dependencies, including MATE, Gym, and NumPy, are up to date and compatible with each other. However, the error persists. I have also attempted to install other version of gym but was unable to find a resolution.

Steps to Reproduce:

Run the evaluate.py script with the provided command:

python3 -m evaluate --episodes 1 --config MATE-4v2-9.yaml.

Environment Information:
python 3.8.0
gym 0.23.1
numpy 1.24.3

Please provide me with the versions of all the packages you have installed to ensure smooth execution of the entire framework.
Thank you for your attention to this matter, and I look forward to your response.

Best regards

Bug: 'numpy.random._generator.Generator' object has no attribute 'randint'

Hello,

I encountered an issue while using your repository. When attempting to run the code, I encountered the following error:

AttributeError: 'numpy.random._generator.Generator' object has no attribute 'randint'
It seems that the error is caused by the missing randint attribute in the numpy.random.Generator object. According to my understanding, the randint method should be available in newer versions of NumPy.

I suspect that this issue may be related to the version of NumPy being used. Therefore, I have a few questions:

What version of NumPy did you use during the development and testing of this repository?
Could you provide a specific NumPy version number that I can try, either to downgrade or upgrade my own NumPy version to potentially resolve this issue?
If this issue is indeed related to the NumPy version, have you considered updating the code to accommodate a wider range of NumPy versions?
Thank you for your help and support! Please let me know if you need any further information.

Evaluate about tarmac

I want to use mate/evaluate.py to evaluate the algorithm TarMAC, however, it raise error.
when I run following script

python3 -m mate.evaluate --episodes 1 --render-communication --camera-agent examples.tarmac.camera.agent:TarMACCameraAgent --target-agent examples.tarmac.target.agent:TarMACTargetAgent --config MATE-4v2-9.yaml --seed 0

it report :

File "/home/anaconda3/envs/mate/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/anaconda3/envs/mate/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/mate-main/mate/evaluate.py", line 481, in
main()
File "/home/mate-main/mate/evaluate.py", line 426, in main
status = evaluate(env, target_agents, render=True, video_path=args.save_video)
File "/home/mate-main/mate/evaluate.py", line 119, in evaluate
env, target_agents, target_joint_observation, target_infos
File "/home/mate-main/mate/wrappers/single_team.py", line 90, in group_step
joint_action = group_act(agents, joint_observation, infos, deterministic=deterministic)
File "/home/mate-main/mate/wrappers/single_team.py", line 75, in group_act
for agent, observation, info in zip(agents, joint_observation, infos)
File "/home/mate-main/mate/wrappers/single_team.py", line 75, in
for agent, observation, info in zip(agents, joint_observation, infos)
File "/home/mate-main/examples/tarmac/target/agent.py", line 59, in act
observation, state=self.hidden_state, info=info, deterministic=deterministic
File "/home/mate-main/examples/utils/rllib_policy.py", line 184, in compute_single_action
preprocessed_observation = self.preprocess_observation(observation)
File "/home/mate-main/examples/tarmac/target/agent.py", line 93, in preprocess_observation
] = preprocessed_observation.ravel()
ValueError: could not broadcast input array from shape (101,) into shape (61,)

when I change the env config to MATE-4v4-9.yaml, it report:

ValueError: could not broadcast input array from shape (111,) into shape (61,)

Is there any suggestion? Looking forward to your reply, thanks!

The self.worker parameter obtained by loading the checkpoint file has a value of None and how to train on GPU

Hello, the Mate you shared has been extremely helpful for my research. I am currently studying your code, but I have encountered a bug that I haven't been able to solve despite spending a lot of time on it. Could you please take a look and assist me?

Here is the process that triggered my issue:

When running train.py in the PSRO, it calls the train() function inside example/hrl/mappo/camera/train.py. It generates a checkpoint-1 at the following path:

mate/examples/psro/ray_results/debug/NE-camera.HRL-MAPPO-vs.-target.MAPPO/camera/00001/PSRO-camera.HRL-MAPPO/PSRO-camera.HRL-MAPPO-00001_0_2023-06-26_22-43-32/checkpoint_000001/checkpoint-1

Then, at line 91 of example/utils/rllib_policy.py:

self.checkpoint_path, self.worker, self.params = load_checkpoint(checkpoint_path)

By loading the aforementioned checkpoint-1, it retrieves self.worker. When setting the state of self.worker, the values of the parameters "fused" and "foreach" in self.worker['state']['shared_policy']['_optimizer_variables'][0]['param_groups'][0] are both None.

Subsequently, during the process of setting the state of self.worker, it eventually leads to line 161 in ray/rllib/utils/torch_utils.py:

tensor = torch.from_numpy(np.asarray(item))

When the values of fused and foreach are None and are being converted to a tensor as items, the following error occurs:

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

My approach has been:

Identify the location where checkpoint-1 is generated.
When loading checkpoint-1, the values of foreach and fused in self.worker are None, suggesting that either these parameters were not present during the generation of checkpoint-1 or their values were intentionally set to None. I have gone through the code inside tune.run() line by line but have been unable to find the location where checkpoint-1 is generated. Therefore, I cannot confirm how foreach and fused were set when generating checkpoint-1.

2.Add parameters in the config file.
Within example/hrl/mappo/camera/config.py, I added the parameters 'foreach': False and 'fused': True under config['model']['custom_model_config']. However, when loading checkpoint-1, the values of these two parameters remained as None.

These are the two approaches I have tried, but neither of them has resolved the issue. I would greatly appreciate any insights you can provide.