Giter VIP home page Giter VIP logo

ve-principle-for-model-based-rl's Introduction

The Value Equivalence Principle for Model Based Reinforcement Learning

This repository for the implementation of The Value Equivalence Principle for Model based Reinforcement Learning

You can find our report on the reproducibility here.

Table of Contents

Requirements

  • the dependencies are listed in the requirements.yml file
    conda env create -f environment.yml
    

Continous State

Introduction

This section contains the experiments for which we are enforcing VE principle with respect to a set of Neural Networks. Training code for VE and MLE is available. Evaluation of both methods is done by training a double DQN based policy.

Environments

A modified version of the classic CartPole environment is used. The code for which is available in our repository.

Training

Training arguments for train_MLE.py and train_VE.py:

  • value_width: number of nodes in the hidden layers of value function neural network.
  • rank_model: fixed rank for the weight matrices of the value function neural network.

By default all models use cuda:0 as the device:

  • gpu(-g): 'cpu' for disabling cuda usage.

Additional arguments for train_DQN.py:

  • exp(-e): model to be used for training ddqn, i.e 'MLE' or 'VE'.

Train MLE model with rank 6 and width 128

python3 train_MLE.py 128 6

Train VE model with rank 6 and width 128 using cpu

python3 train_VE.py 128 6 -g 'cpu'

Train DDQN based policy using pretrained VE models of rank 4 and width 128

python3 train_DQN.py 128 6 VE

Evaluation

Many pretrained DDQN based polcies are available in the repository. One can check the performance using

python3 eval.py 128 6 VE

Pretrained Models

You can check the '/continous_state/pretrained' directory for all available pretrained pytorch models.

Discrete State

Value Function Polytype

Introduction

This section contains the experiments for which we are enforcing VE principle with respect to a set true value functions. Training code for VE and MLE is available. Evaluation of both methods is done by forming a greedy policy using value-iteration.

Environments

Catch and FourRooms environment can be used for these sets of experiments. Altough we only reproduced the results on Catch.

Training and evaluation

Since these experiments take less than 10 minutes, we don't have a seperated evaluation module. Required Arguments:

  • rank_model: rank of the transition probability matrix.
  • exp: name of method to be used.
  • num_policies: Number of policies, whose value functions shall span of the set V.

Optional Arguments:

  • e(-e): name of environment, i.e. 'Catch' or 'FourRooms'.
  • r(-r): 1 for rendering the final policy on respective environment and 0 for not. By default all models use cuda:0 as the device:
  • gpu(-g): 'cpu' for disabling cuda usage

train and evaluate a model with rank 30 and number of policies 40 using VE method

python3 train_polytype.py 30 VE 40

Linear Function Approximation

Introduction

This section contains the experiments for which we are enforcing the VE principle with respect to a set of Linear function approximators. Training code for VE and MLE is available. Evaluation of both methods can be done using two ways. First, using approximate policy iteration with LSTD. Second, using Double DQN with linear Q value functions.

Environments

Catch and FourRooms environment can be used for these sets of experiments. Altough we only reproduced the results on Catch.

For simplicity the training and evaluation for these experiments was done using a jupyter-notebook. This includes clean code divided in different sections, for training VE, MLE models and evaluating them using approximate policy iteration using LSTD and using Double DQN.

ve-principle-for-model-based-rl's People

Contributors

rajghugare19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.