Giter VIP home page Giter VIP logo

ldm's Introduction

Latent Dynamics Mixture

PyTorch implementation of "Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture", NeurIPS2021.

Requirements

Our code is based on the reference implementation of variBAD.

Refer to the requirements in https://github.com/lmzintgraf/varibad.

You don't need MuJoCo license to run the gridworld experiment.

How to Run

python main.py --env-type gridworld_ldm

python main.py --env-type mujoco_ant_dir_ldm

python main.py --env-type mujoco_ant_goal_ldm

python main.py --env-type mujoco_cheetah_vel_ldm

If you want to run rl2 and varibad,

python main.py --env-type (envname)_rl2

python main.py --env-type (envname)_varibad

Evaluation results will be stored in the logs folder If you want to change the configurations of LDM, refer to configurations in the config folder The major part of the algorithm is in metalearner_ldm.py.

ldm's People

Contributors

suyoung-lee avatar

Stargazers

Jun-Hyun Bae avatar Leyang Wen avatar Jihoon Park avatar  avatar Yoon, Seungje avatar Young_Painter_L avatar Kuk Jin Kim avatar  avatar Jongseong Chae avatar  avatar WenZhou Lyu avatar dyabel avatar Wenhao Ding avatar Andac Tan avatar Haque Ishfaq avatar Yao(Mark) Mu avatar Jiaxian Guo avatar

Watchers

 avatar

ldm's Issues

Which inference function is used in the experimental diagram of the paper

visualise_behaviour() and evaluate() are used in the log() function respectively
visualise_behaviour() function uses test_env, but the results are not written to the log and are only printed out
evaluate() function uses train_env, the result is written in the log
I plot the result diagram of the two functions and find that the result of evaluate() function is close to that of the paper, but the result of visualise_behaviour() function is poor
This is the result of three seeds in the environment of half_cheetah_vel
image

action limit

Iโ€˜m so sorry to bother you again. In my own env, I find the action will extremely beyond the action limitation. Then I check the policy act function, there is no limit for the output action which is sampled from the FixNormal. Should I add the action.tanh() in the code?
Here is the policy action I recorded. image. My action limit is [-1,1]. But the action in your env (hf vel) seems to work well. Though the action sometimes will beyond the -1 or 1 too.

GPU Memory

I'm sorry to bother you. When I try your code, the 48g GPU Memory is still not enough. And I just try 2 works and 1 mixture work. I wonder how I sove the problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.