Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).

License: Other

Python 99.73% Shell 0.27%

policy-optimization safe-reinforcement-learning multi-agent-reinforcement-learning

multi-agent-constrained-policy-optimisation's Introduction

Hi there 👋

🔭 I’m currently working on safe reinforcement learning theory and its applications in robotics.
🌱 We organized a safe reinforcement learning workshop and seminars, the researchers and students who are interested in safe RL are welcome to join us! The recorded videos are available on YouTube's Safe RL Channel, please see the YouTube Channel, Safe RL Seminar Homepage or Safe RL Workshop Homepage.

multi-agent-constrained-policy-optimisation's People

Contributors

Stargazers

Watchers

multi-agent-constrained-policy-optimisation's Issues

ModuleNotFoundError: No module named 'onpolicy'

I had implemented all the requirements for safe multi-agent Mujoco, but when I tried to run test.py it gave the below error

from onpolicy.envs.safety_ma_mujoco.safety_multiagent_mujoco import mujoco_env
ModuleNotFoundError: No module named 'onpolicy'

Please let me know how to solve this issue, i am trying to run this environment for Multi-Agent-Constrained-Policy-Optimisation.

Thank you.

Problem with Discrete action space

I have employed macpo on an environment I have developed myself on resources allocation in 5g, but I am getting errors as follows.

    main(sys.argv[1:])
  File "/home/mzi/ran-slicing-simulation/scripts/train/train_macpo.py", line 163, in main
    runner.run()
  File "/home/mzi/ran-slicing-simulation/scripts/../runners/slicing_runner_macpo.py", line 75, in run
    train_infos = self.train()
  File "/home/mzi/ran-slicing-simulation/scripts/../runners/base_runner_macpo.py", line 171, in train
    train_info = self.trainer[agent_id].train(self.buffer[agent_id])
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 629, in train
    = self.trpo_update(sample, update_actor)
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 385, in trpo_update
    g_step_dir = self.conjugate_gradient(self.policy.actor,
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 227, in conjugate_gradient
    _Avp = self.fisher_vector_product(actor, obs, rnn_states, action, masks, available_actions, active_masks, p)
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 241, in fisher_vector_product
    kl = self.kl_divergence(obs, rnn_states, action, masks, available_actions, active_masks, new_actor=actor,
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 215, in kl_divergence
    return kl.sum(1, keepdim=True)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I tried to find the root cause behind the error, and it seems something wrong with action_logits properties: it cannot calculate mean or std deviation, although action log probabilities are fine. Find the screen shot attached below:

I believe this goes back to my action space being Categorical as std and mean does not make sense in such a scenario.
I was wondering how I can exploit macpo for my case.

Repeated calculation

It seems that the commutative variable is recalculated here：

First time:

Multi-Agent-Constrained-Policy-Optimisation/MACPO/macpo/algorithms/r_mappo/r_macpo.py

Line 403 in 2f3e516

 r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute r_coef: = g^T H^{-1} b 

Repeated :

Multi-Agent-Constrained-Policy-Optimisation/MACPO/macpo/algorithms/r_mappo/r_macpo.py

Line 426 in 2f3e516

 r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute r_coef: = g^T H^{-1} b 

Mismatch between code and paper

Hi, thanks for your excellent job. I find a mismatch between code and paper about V-value network. As follows,
In paper:

In code:

Will different ways of using V-value network have a great impact on the experimental results? Thank you!

Error met when running manyagent_ant

I try to run the code and follow the readme.md. I succeed in running Ant-v2 and Halfcheetah-v2 map but when running manyagent_ant. an error encountered as below:

I use mujoco_py==2.0.2.8, which is the same as the version in requirements.txt. If this is an issue about the version of packages, I hope you can update the requirementx.txt and we can use the correct environment. Thank you!

Environment Error

When I try to run the code, there's a problem like this:

I want to know how to fix this problem. I don't know whether this issue is caused by the version of numpy. In my configuration, the conda venv use python=3.7.0 and numpy=1.21.6.(I tried to lower the version of numpy but it does not work.)

To be mentioned, after following the readme.md, I find that there are still quite a few libraries need to installed to run the code. I hope you can improve your readme instruction.

_use_naive_recurrent_generator

I try to use to naive_recurrent train the model, but when using the macpo algorithm, naive_recurrent_generator provides 17 values, but the 18 values required to update the sample of the network in the training class, maybe do not include the "aver_ episode_ costs"。

What's the matter and how can I modify it.
Thanks!

grad norm explosion!

I noticed that when the critic learning rate in my application is rather large (lr=0.008 and critic_lr=0.5), the grad norm goes to infinity, and values turn to Nan.
Clipping grad norm is suggested in such a scenario, but first I wanted to know your opinion on this matter. Is such critic_lr way off its meaningful range of values and I should not explore it, or I should go with clipping grad norm?!
Thanks!

Problems during installing

When I first follow the instructions to install mujoco200, everything seems to be ok.

But to run the code, it says that gym is required. However, when I execute the command ''pip install gym[all]'', it seems that mujoco200 can not be detected.

I want to know how to solve these problems. Thank you

Render

Hello, can you release your render code?
I use the render function of baseclass of AntEnv, however I got this:

chauncygu / multi-agent-constrained-policy-optimisation Goto Github PK

multi-agent-constrained-policy-optimisation's Introduction

Hi there 👋

multi-agent-constrained-policy-optimisation's People

Contributors

Stargazers

Watchers

Forkers

multi-agent-constrained-policy-optimisation's Issues

Recommend Projects

Recommend Topics

Recommend Org