Giter VIP home page Giter VIP logo

multi-agent-constrained-policy-optimisation's Introduction

Hi there πŸ‘‹

  • πŸ”­ I’m currently working on safe reinforcement learning theory and its applications in robotics.

  • 🌱 We organized a safe reinforcement learning workshop and seminars, the researchers and students who are interested in safe RL are welcome to join us! The recorded videos are available on YouTube's Safe RL Channel, please see the YouTube Channel, Safe RL Seminar Homepage or Safe RL Workshop Homepage.

multi-agent-constrained-policy-optimisation's People

Contributors

chauncygu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

multi-agent-constrained-policy-optimisation's Issues

Problem with Discrete action space

I have employed macpo on an environment I have developed myself on resources allocation in 5g, but I am getting errors as follows.

    main(sys.argv[1:])
  File "/home/mzi/ran-slicing-simulation/scripts/train/train_macpo.py", line 163, in main
    runner.run()
  File "/home/mzi/ran-slicing-simulation/scripts/../runners/slicing_runner_macpo.py", line 75, in run
    train_infos = self.train()
  File "/home/mzi/ran-slicing-simulation/scripts/../runners/base_runner_macpo.py", line 171, in train
    train_info = self.trainer[agent_id].train(self.buffer[agent_id])
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 629, in train
    = self.trpo_update(sample, update_actor)
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 385, in trpo_update
    g_step_dir = self.conjugate_gradient(self.policy.actor,
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 227, in conjugate_gradient
    _Avp = self.fisher_vector_product(actor, obs, rnn_states, action, masks, available_actions, active_masks, p)
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 241, in fisher_vector_product
    kl = self.kl_divergence(obs, rnn_states, action, masks, available_actions, active_masks, new_actor=actor,
  File "/home/mzi/ran-slicing-simulation/scripts/../algorithms/r_macpo.py", line 215, in kl_divergence
    return kl.sum(1, keepdim=True)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

I tried to find the root cause behind the error, and it seems something wrong with action_logits properties: it cannot calculate mean or std deviation, although action log probabilities are fine. Find the screen shot attached below:

Screen Shot 2022-09-19 at 4 56 43 PM

I believe this goes back to my action space being Categorical as std and mean does not make sense in such a scenario.
I was wondering how I can exploit macpo for my case.

Repeated calculation

It seems that the commutative variable is recalculated here:

First time:

r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute r_coef: = g^T H^{-1} b

Repeated :

r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True) # todo: compute r_coef: = g^T H^{-1} b

Mismatch between code and paper

Hi, thanks for your excellent job. I find a mismatch between code and paper about V-value network. As follows,
In paper:
image

In code:
image

Will different ways of using V-value network have a great impact on the experimental results? Thank you!

Error met when running manyagent_ant

I try to run the code and follow the readme.md. I succeed in running Ant-v2 and Halfcheetah-v2 map but when running manyagent_ant. an error encountered as below:
0db4e0368bc0dd7d3babe19446e529e

I use mujoco_py==2.0.2.8, which is the same as the version in requirements.txt. If this is an issue about the version of packages, I hope you can update the requirementx.txt and we can use the correct environment. Thank you!

Environment Error

When I try to run the code, there's a problem like this:
image

I want to know how to fix this problem. I don't know whether this issue is caused by the version of numpy. In my configuration, the conda venv use python=3.7.0 and numpy=1.21.6.(I tried to lower the version of numpy but it does not work.)

To be mentioned, after following the readme.md, I find that there are still quite a few libraries need to installed to run the code. I hope you can improve your readme instruction.

_use_naive_recurrent_generator

I try to use to naive_recurrent train the model, but when using the macpo algorithm, naive_recurrent_generator provides 17 values, but the 18 values required to update the sample of the network in the training class, maybe do not include the "aver_ episode_ costs"。

What's the matter and how can I modify it.
Thanks!

grad norm explosion!

I noticed that when the critic learning rate in my application is rather large (lr=0.008 and critic_lr=0.5), the grad norm goes to infinity, and values turn to Nan.
Clipping grad norm is suggested in such a scenario, but first I wanted to know your opinion on this matter. Is such critic_lr way off its meaningful range of values and I should not explore it, or I should go with clipping grad norm?!
Thanks!

Problems during installing

When I first follow the instructions to install mujoco200, everything seems to be ok.
8225032e5a6f2776fd3e8e8ea01a144

But to run the code, it says that gym is required. However, when I execute the command ''pip install gym[all]'', it seems that mujoco200 can not be detected.
036155ab54d3f103ea58694f0cdb017
I want to know how to solve these problems. Thank you

Render

Hello, can you release your render code?
I use the render function of baseclass of AntEnv, however I got this:
image
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.