Giter VIP home page Giter VIP logo

mava's People

Contributors

alaterre avatar arnupretorius avatar asadjeewa avatar callumtilbury avatar cwichka avatar driessmit avatar edantoledo avatar eltociear avatar jcformanek avatar jemmaldaniel avatar kaleabtessera avatar kevineloff avatar lbeyers avatar ldfrancis avatar liamclarkza avatar louay-ben-nessir avatar mmorris44 avatar mnguyen0226 avatar nashlen avatar omaymamahjoub avatar ruanjohn avatar sash-a avatar sgrimbly avatar siddarthsingh1 avatar simondutoit avatar sipheleledanisa avatar ulricharmel avatar wiemkhlifi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mava's Issues

Fix _transform_observations called per agent problem

Try and resolve the problem with calling _transform_observations for each agent even though it is the same calculation. It has its own loop over all agents. Also, try and do a batch update of all networks instead of the sequential updates that are currently done. This is mostly to do with the shared networks between agents that are getting updated sequentially. This might introduce some problem where agent order determines the effect it has on shared network weights, which we do not want.

Implement additional logging metrics

Metric to track during training:

mean/std/min/max for the following:

  • for cumulative rewards
  • episode length
  • value function estimates
  • losses for the objectives
  • exploration parameters (like mean entropy for stochastic policy optimization, or current epsilon for epsilon-greedy as in DQN)

Implement observation and reward scaling wrappers

Best practice advice:

  • Make sure everything is reasonably scaled.

Rule of thumb:

  • Observations: Make everything mean 0, standard deviation 1.
  • Reward: If you control it, then scale it to a reasonable value.
  • Do it across ALL your data so far.
  • Look at all observations and rewards and make sure there aren't crazy outliers

Fix memory leak issue

It seems that the RAM used throughout training keeps increasing as the training progresses. This might be due to some memory leakage problem.

Implement checkpointing

This will allow for periodic saving of the system networks and loading it again to resume training.

Fix training error

The agents are not learning anymore. Investigate why that is and fix it.

General MARL env loop

This is in connection with implementing logging metric #27. If we have one general MARL env loop, we will only have to implement the metric logging function once. Then we can have all the other env inherit this. Similar argument goes for other functions associated with the env loop that can be shared across different envs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.