Giter VIP home page Giter VIP logo

deep_ope's Introduction

In D4RL and RL Unplugged: Benchmarks for Offline Reinforcement Learning, we released a suite of benchmarks for offline reinforcement learning. They are designed to facilitate ease of use, so we provided the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.

Here, we release policies which can be used in conjunction with the RL Unplugged and D4RL datasets to facilitate off-policy evaluation and offline model selection benchmarking.

In this release, we provide:

  • Policies for the tasks in the D4RL, DeepMind Locomotion and Control Suite datasets (described below).

  • Policies trained with the following algorithms (D4PG, ABM, CRR, SAC, DAPG and BC) and snapshots along the training trajectory. This faciliates benchmarking offline model selection.

The policies are available under gs://gresearch/deep-ope, with the RL Unplugged policies in the subdirectory gs://gresearch/deep-ope/rlunplugged and the D4RL policies in the subdirectory gs://gresearch/deep-ope/d4rl.

Task Descriptions

DeepMind Locomotion Dataset

These tasks are made up of the corridor locomotion tasks involving the CMU Humanoid, for which prior efforts have either used motion capture data (Merel et al., 2019a, Merel et al., 2019b) or training from scratch (Song et al., 2020). In addition, the DM Locomotion repository contains a set of tasks adapted to be suited to a virtual rodent (Merel et al., 2020). We emphasize that the DM Locomotion tasks feature the combination of challenging high-DoF continuous control along with perception from rich egocentric observations. For details on how the dataset was generated, please refer to RL Unplugged: Benchmarks for Offline Reinforcement Learning.

DeepMind Control Suite Dataset

DeepMind Control Suite (Tassa et al., 2018) is a set of control tasks implemented in MuJoCo (Todorov et al., 2012). We consider a subset of the tasks provided in the suite that cover a wide range of difficulties.

Most of the datasets in this domain are generated using D4PG. For the environments Manipulator insert ball and Manipulator insert peg we use V-MPO (Song et al., 2020) to generate the data as D4PG is unable to solve these tasks. We release datasets for 9 control suite tasks. For details on how the dataset was generated, please refer to RL Unplugged: Benchmarks for Offline Reinforcement Learning.

D4RL Dataset

A subset of the tasks within the D4RL (Fu et. al. 2020) for offline reinforcement learning is included. These tasks include maze navigation with different robot morphologies, hand manipulation tasks (Rajeswaran et. al. 2017), and tasks from the OpenAI Gym bechmark (Brockman et. al. 2016).

Each task includes a variety of datasets in order to study the interaction between dataset distributions and policies. For further information on what datasets are available, please refer to D4RL: Datasets for Deep Data-Driven Reinforcement Learning.

Using the policies

The rlunplugged_policies.json file provides metadata about the policies in this dataset. It is structured as a list of dictionaries, one for each policy, where the keys contain metadata including:

  • policy_path: The path to the policy on Google Cloud Storage.

  • task.task_name: The task that the policy is trained for.

  • agent_name: The training algorithm used to learn the policy.

  • snapshot_name: Contains the learning step for this policy snapshot.

  • return_mean: The mean return estimated with Monte Carlo rollouts.

  • return_std: The standard error of the mean estimate.

The 'd4rl_policies.json' file contains metadata in a similar format:

  • policy_path: The path to the policy on Google Cloud Storage.

  • task.task_names: A list of tasks that the policy is trained for. (Each task represets a different dataset)

  • agent_name: The training algorithm used to learn the policy.

  • return_mean: The mean return estimated with Monte Carlo rollouts.

  • return_std: The standard error of the mean estimate.

Requirements:

  • Install dependencies: pip install -r requirements.txt
  • (Optional) Setup MuJoCo license key for DM Control and D4RL environments (instructions).

Policy loading example

RLUnplugged policies are stored as TensorFlow SavedModels. Calling the policy on an observation returns an action sample. See load_rlunplugged_policy_example.py for an example of loading a policy.

D4RL policies are stored as pickle files containing weights. See load_d4rl_policy_example.py for an example of loading a policy.

Compute evaluation metrics

TODO Fill in example computing groundtruth and evaluation metrics.

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name Benchmarks for Deep Off-Policy Evaluation
url
sameAs https://github.com/google-research/deep_ope
description Data accompanying [Benchmarks for Deep Off-Policy Evaluation]().
provider
property value
name Google
sameAs https://en.wikipedia.org/wiki/Google

deep_ope's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.