Giter VIP home page Giter VIP logo

imitation-learning-paper-lists's Introduction

Imitation-Learning-Paper-Lists

Paper Collection for Imitation Learning in RL with brief introductions. This collection refers to Awesome-Imitation-Learning and also contains self-collected papers.

To be precise, the "imitation learning" is the general problem of learning from expert demonstration (LfD). There are 2 names derived from such a description, which are Imitation Learning and Apprenticeship Learning due to historical reasons. Usually, apprenticeship learning is mentioned in the context of "Apprenticeship learning via inverse reinforcement learning (IRL)" which recovers the reward function and learns policies from it, while imitation learning began with behavior cloning that learn the policy directly (ref and by Morgan-Kaufmann, NIPS 1989.). However, with the development of related researches, "imitation learning" is always used to represent the general LfD problem setting, which is also our view of point.

Typically, different settings of imitation learning derive to different specific areas. A general setting is that one can only obtain (1) pre-collected trajectories ((s,a) pairs) from uninteractive expert (2) he can interact with the environments (with simulators) (3) without reward signals. Here we list some of the other settings as below:

  1. No actions and only state / observations -> Imitation Learning From Observations (ILFO).

  2. With reward signals -> Imitation Learning with Rewards.

  3. Interactive expert for correctness and data aggregation -> On-policy Imitation Learning (begin as Dagger, Dataset Aggregation).

  4. Can not interact with Environments -> A special case of Batch RL (see a particular list in here, data in Batch RL can contain more than expert demos.)

What we want from imitation learning in different settings (for real world):

  1. Less interact with the real world environments with expert demonstrations to improve sample efficiency and learn good policies. (yet some works use few demonstrations to learn good policies but with a vast cost on interacting with environments)

  2. Real world actions are not available or hard to sample.

  3. Use expert data to improve sample efficiency and learn fast with good exploration ability.

  4. Some online setting that human are easily to join in, e.g., human correct the steering wheel in auto-driving cars.

  5. Learn good policies in real world where interact with the environment is difficult.

In this collection, we will concentrate on the general setting and we collect other settings in "Other Settings" section. For other settings, such as "Self-imitation learning" which imitate the policy from one's own historical data, we do not regard it as an imitation learning task.

These papers are classified mainly based on their methodology instead and their specific task settings (except single-agent/multi-agent settings) but since there are many cross-domain papers, the classification is just for reference. As you can see, many works focus on Robotics, especially papers of UCB.

Overview

Single-Agent

Reveiws&Tutorials

Behavior Cloning

Behavior Cloning (BC) directly replicating the expert’s behavior with supervised learning, which can be improved via data aggregation. One can say that BC is the simplest case of interactive direct policy learning.

One-shot / Zero-shot

Model based

Hierarchical RL

Multi-modal Behaviors

Learning with human preference

Inverse RL

Inverse Rinforcement Learning (IRL) learns hidden objectives of the expert’s behavior.

Reveiws&Tutorials

Papers

Beyesian Methods

Generative Adversarial Methods

Generative Adversarial Imitation Learning (GAIL) apply generative adversarial training manner into learning expert policies, which is derived from inverse RL.

Multi-modal Behaviors

Hierarchical RL

Task Transfer

Model-based

POMDP

Fixed Reward Methods

Recently, there is a paper designs a new idea for imitation learning, which learns a fixed reward signal which obviates the need for dynamic update of reward functions.

Goal-based methods

Beyesian Methods

Other Methods

Multi-Agent

MA Inverse RL

MA-GAIL

Other Settings

Imitation Learning from Observations

Review Papers

Regular Papers

Imitation Learning with rewards

On-policy Imitation Learning

Batch RL

see a particular list in here.

Applications

imitation-learning-paper-lists's People

Contributors

ericonaldo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.