Giter VIP home page Giter VIP logo

pandemonium's Introduction

What is pandemonium?

pandemonium is a library that provides implementations for Reinforcement Learning agents that seek to learn about their environment by predicting multiple signals from a single stream of experience.

The project is inspired by the architecture originally developed by Oliver Selfridge in 1959. His computational model is composed of different groups of “demons” working independently to process the visual stimulus, hence the name – pandemonium.

The original pandemonium framework, in turn, served as an inspiration to some of the more recent work such as Horde by Sutton et. al 2011. The authors designed a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. Since then, Horde was further developed and formalized in Adam White’s Doctoral thesis, from which this library borrows most of the definitions and notation.

The goal of this project is to further develop the computational framework established by the creators of Horde and express some of the latest algorithms in RL in terms of hierarchy of demons.

A research toolbox

A single demon can be seen as a sub-agent responsible for learning a piece of predictive knowledge about the agent’s interaction with its environment. Often times this sub-agent utilizes a reinforcement learning mechanism in itself by learning a General Value Function, albeit a self-supervised variant is also common. For example, if the agent implements Actor Critic architecture, an Intrinsic Curiosity Module can be interpreted as a demon that learns to predict the features generated by the agent's encoder, thereby guiding the actor in the direction of novel states. In the same fashion a Pixel Control demon learns to affect the visual observations in a way that improves the representation learning of the UNREAL actor. Some demons can be completely independent from each other, allowing for parallel learning; others share replay buffers, feature extractors or even intermediate computations, enabling complex interactions. Having multiple demons does not make pandemonium a "distributed RL" library in the sense that the agent does not interact with hundreds of environments in parallel. Instead, the focus is on a single stream of experience, but multiple signals / targets / goals to predict / control / achieve.

A piece of software

From a purely programming perspective, the library is designed in a way such that the common building blocks of RL agents are abstracted into modular components which are easy to swap and configure. The structural elements (such as control vs prediction or online vs multistep) are meant to be inherited, while the algorithm specific functionality (i.e an update rule, a duelling network or intrinsic reward source) can be mixed in. Below is a birds-eye view of the building blocks that constitute pandemonium:

The auto-differentiation backend is provided by PyTorch, although a JAX version is being considered.

All the experiment management and monitoring is done via Ray Tune and Ray Dashboard respectively. See experiments module for concrete examples of how to configure and run algorithms.

More info

Extended documentation

Integrated algorithms

  • Actor Critic family
    • Policy gradient with value baseline
    • Multiprocessing
    • Intrinsic Curiosity Module
    • Recurrence
  • DQN family
    • Q-learning
    • Target network
    • Double Q-learning
    • Duelling network
    • Prioritized Experience Replay
    • Noisy Networks
  • Distributional Value Learning
    • C51
    • QR-DQN
    • IQN
  • UNREAL
    • Value Replay
    • Reward Prediction
    • Pixel Control
  • Option Critic

Integrated environemnts

pandemonium's People

Stargazers

 avatar

Watchers

 avatar  avatar

pandemonium's Issues

**UNREAL**

implementing UNsupervised REinforcement and Auxiliary Learning

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.