Giter VIP home page Giter VIP logo

barl-simpleoptions's Introduction

SimpleOptions

This Python package aims to provide a simple framework for implementing and using options in Hierarchical Reinforcement Learning (HRL) projects.

Key classes:

  • BaseOption: An abstract class representing an option with an initiation set, option policy, and termination condition.
  • BaseEnvironment: An abstract base class representing an agent's environment. The environment specification is based on the OpenAI Gym Env specifciation, but does not implement it directly. It supports both primitive actions and options, as well as functionality for constructing State-Transition Graphs (STGs) out-of-the-box using NetworkX.
  • OptionAgent: A class representing an HRL agent, which can interact with its environment and has access to a number of options. It includes implementations of Macro-Q Learning and Intra-Option learning, with many customisable features.

This code was written with tabular, graph-based HRL methods in mind. It's less of a plug-and-play repository, and is intended to be used to as a basic framework for developing your own BaseOption and Environment implementations.

How to Install

The easiest way to install this package is to simply run pip install simpleoptions.

Alternatively, you can install from source. Simply download this repository and, in the root directory, run the command pip install .

How to Use This Code

Below, you will find a step-by-step guide introducing the intended workflow for using this code.

Step 1: Implement an Environment

The first step to using this framework involves defining an environment for your agents to interact with. This can be done by subclassing the BaseEnvironment abstract class and filling in the abstract methods. If you have previously worked with OpenAI Gym/Farama Gymnasium, much of this will be familiar to you, although there are a few additional methods on top of the usual step and reset that you'll need to implement.

Step 2: Define Your Options

You must now define/discover options for your agent to use when interacting with its environment. How you go about this is up to you. An ever-growing number of option discovery methods can be found in the hierarchical reinforcement learning literature. We include some option discovery method implementations in the implementations directory.

To define a new type of option, you need to subclass BaseOption and implement the following methods:

  • initiation - a method that takes a state as its input, and returns whether the option can be invoked in that state.
  • termination - a method that takes a state as its input, and returns the probability that the option terminates in that state.
  • policy - a method that takes a state as its input, and returns the action (either a primitive action or another option) that this option would select in this state.

This minimal framework gives you a lot of flexibility in defining your options. For example, your policy method could make use of a simple dictionary mapping states to actions, it could be based on some learned action-value function, or any other function of the state.

As an example, consider an example option that takes an agent to a sub-goal state from any of the nearest 50 states. initiation would return True for the nearest 50 states to the subgoal, and False otherwise. termination would return 0.0 for states in the initiation set, and 1.0 otherwise. policy woudl return the primitive action that takes the agent one step along the shortest path to the subgoal state.

Finally, we also include a PrimitiveOption that can be used to represent the primitive actions made available by a given environment.

Step 5: Giving Options to an Agent and Running It in an Environment

This package also includes an OptionsAgent, an implementation of an agent that learns using the Macro-Q Learning and Intra-Option Learning algorithms.

Once you have defined an environment an a set of options, you can instatiate an OptionsAgent agent and use its run_agent method to train it.

Example Environments

A number of reinforcement learning environments implemented using our BaseEnvironment interfaces can be found here.

barl-simpleoptions's People

Contributors

akshilpatel avatar tmssmith avatar ueva avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

barl-simpleoptions's Issues

Add Example Implementation

Add example code which runs through the entire workflow of implementing a simple environment, creating a DiGraph representing its interaction graph, instantiating options, and running an agent.

Improved logging functionality

Describe Your Feature
Logging of additional data during policy evaluation. Suggested data to log at each decision step:

  • State
  • Active options
  • Task reward
  • Next State
  • Terminal

Optimise code

Optimise some of the code in some of the files where list comprehensions/vectorisation may be applicable

Implement Successor Options

Describe You Feature
Replicate Ramesh et al. (2019) paper, code

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
In order of priority:

  1. Base Successor Options
  2. Incremental Successor Options
  3. Non-uniform and Adaptive-Exploration Successor Options

Additional context
The exploration methods described in (3) above may not be relevant to epoch-based evaluation (with $\epsilon=0$). Further, exploration strategies may be better implemented at the agent level not the option level.

Handle option masking correctly when computing TD targets.

# Compute TD targets.
next_state_values = torch.zeros(self.batch_size, dtype=torch.float32)
with torch.no_grad():
next_state_values[non_terminal_mask] = self.target(non_terminal_next_states).max(1).values
targets = reward_batch + self.gamma * next_state_values

Currently, the value of masked options is not ignored when computing TD targets for Macro-DQN updates.

We need a function that produces an option mask for given states, which can be used here and elsewhere where option masking is needed. Ideally, this should be able to take batches of states and produce batches of option masks. This function will be called potentially many times per time step, so it should be performant.

Episodic evaluation of agents

Describe You Feature
Agent performance is currently evaluated as the total reward achieved over a fixed number of timesteps. An alternative approach would be to evaluate the agent's average return over a fixed number of episodes (with some cut-off length for an evaluation episode).

Is your feature request related to a problem? Please describe.
There are some issues with reporting total reward over a fixed number of timesteps, in particular slightly suboptimal policies could receive significantly lower reward over the same number of timesteps where the evaluation ends shortly before the suboptimal agent would achieve the goal.

Describe the solution you'd like
An episodic evaluation of performance, but with a cut-off for situations where the agent never terminates the episode.

Describe alternatives you've considered
See Empirical Design in Reinforcement Learning for a detailed discussion of alternative approaches.

Support Multi-Level Skill Hierarchies

Currently, only options whose policies are defined over primitive actions are supported. Add support for options whose policies are defined over other options, allowing multi-level skill hierarchies to be defined and used.

Add Tests for Evaluation Output

Add some test cases to ensure that all of the evaluation methods we support are actually giving us the outputs we’re expecting.

This could be done very simply, with a few short episodes of interaction simulated on a very simple MDP, both with and without skills. See the existing run_agent test cases for inspiration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.