Giter VIP home page Giter VIP logo

lapo's Introduction

Code for the ICLR 2024 Spotlight paper “Learning to Act without Actions”

LAPO: Latent Action Policies

Paper (arXiv)Dataset (Google Drive)

Overview

Pre-training large models on vast amounts of web data has proven to be an effective approach for obtaining powerful, general models in domains such as language and vision. However, this paradigm has not yet taken hold in reinforcement learning. This is because videos, the most abundant form of embodied behavioral data on the web, lack the action labels required by existing methods for imitating behavior from demonstrations. We introduce Latent Action Policies (LAPO), a method for recovering latent action information—and thereby latent-action policies, world models, and inverse dynamics models—purely from videos. LAPO is the first method able to recover the structure of the true action space just from observed dynamics, even in challenging procedurally-generated environments. LAPO enables training latent-action policies that can be rapidly fine-tuned into expert-level policies, either offline using a small action-labeled dataset, or online with rewards. LAPO takes a first step towards pre-training powerful, generalist policies and world models on the vast amounts of videos readily available on the web.

Stage 1: Training an inverse dynamics model (IDM) via a world model (WM)

In stage 1, LAPO trains a latent inverse dynamics model (IDM) to predict latent actions through a joint optimization process with a latent world model. As illustrated above, the world model needs to use the past observation + the latent action to predict the future observation. This results in a disentangled representation that captures state transition information in a highly-compressed manner.

Stage 2: Behavior cloning a latent policy (π)

In stage 2, LAPO trains a latent action policy that imitates the latent actions predicted by the IDM. This results in an expert-level policy that produces actions in the latent space rather than the true action space.

Stage 3: Decoding a latent policy (π)

In stage 3, the latent action policy is decoded to the true action space, either offline using a small action-labeled dataset, or online through interaction with the environment. The diagram above is simplified—please see the paper and code for details.

Setup instructions

We recommend using python==3.11. If you want to use an older Python version, you need to replace the procgen-mirror package in requirements.txt with procgen. To install dependencies run:

pip install -r requirements.txt

# for setting up the dataset
sudo apt install unzip

Dataset setup (automatic)

There is a script provided that automatically downloads & unzips the data from Google Drive. Uncomment tasks in setup_data.sh to configure which datasets are downloaded.

bash setup_data.sh

Important

Note that bandwidth limits on Google Drive files may prevent this from working. In that case please use the manual approach below.

Dataset setup (manual)

Download the expert data for at least one of the 16 Procgen tasks from here, unzip it, and place it in the expert_data directory. The expert_data dir should look like this:

expert_data
├── bigfish
│  ├── test
│  └── train
├── bossfight
│  ├── test
│  └── train
...

The data is provided as .npz files that contain chunks of trajectory data (observations, actions, logprobs, value estimates,...).

Running experiments

To run stages 1-3 for all 16 Procgen games run:

cd lapo
bash launch.sh

Tip

  • Hyperparameters: You can change hyperparameters by modifying the lapo/config.yaml file.
  • Logging: The easiest way to look at the results is via wandb: the project will log the results to your lapo_stage1, lapo_stage1, and lapo_stage2 projects.
  • Memory usage: By default the code loads ~2.5M frames of expert data (80 chunks * 32k frames). This requires about 40GB of host memory. You can change MAX_DATA_CHUNKS in paths.py to configure this.
  • Runtime: The expected runtime per task on a GPU is roughly 1 hour per stage.

Citation

If you are using LAPO, please consider citing our paper:

Dominik Schmidt, Minqi Jiang.
Learning to Act without Actions
https://arxiv.org/abs/2312.10812

@inproceedings{lapo,
  title={Learning to Act without Actions},
  author={Schmidt, Dominik and Jiang, Minqi},
  booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
  year={2024}
}

lapo's People

Contributors

schmidtdominik avatar

Stargazers

Jia Zeng avatar Xiaoyuan Zhang avatar  avatar joonhyung-lee avatar Hyunseok Cho avatar ZhaoYinghao avatar joseph.py avatar Yoon, Seungje avatar Junyeob Baek avatar Longtao Zheng avatar Vishal Reddy Mandadi avatar Moritz Reuss avatar Tokarev Igor avatar Pavel C avatar Maximilian Wolf avatar Maxim Bobrin avatar  avatar Seungyong Moon avatar Jose Cohenca avatar  avatar Yao Tang avatar  avatar Hany Hamed avatar  avatar  avatar Martin Salo avatar Profintegra avatar Silvio Traversaro avatar Tom Dupuis avatar Justin Chen avatar Alexander Nikulin avatar Shyam Sudhakaran avatar Ayush avatar Sacha Chernyavskiy avatar Martin Marek avatar Siyi Hu avatar Tianyuan Chen avatar Minqi avatar  avatar Joel Schulz-Andres avatar  avatar

Watchers

 avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.