Giter VIP home page Giter VIP logo

paintception / deep-quality-value-family Goto Github PK

View Code? Open in Web Editor NEW
12.0 3.0 2.0 311.95 MB

Official implementation of the paper "Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning Algorithms": https://arxiv.org/abs/1909.01779 To appear at the next NeurIPS2019 DRL-Workshop

Shell 3.96% Python 96.04%
deep-reinforcement-learning dqn-variants ddqn-framework atari-2600 dqv dqv-max keras-tensorflow

deep-quality-value-family's Introduction

A new family of Deep Reinforcement Learning algorithms: DQV, Dueling-DQV and DQV-Max Learning

This repo contains the code that releases a new family of Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to learn an approximation of the state-value V(s) function alongside an approximation of the state-action value Q(s,a) function. Both approximations learn from each-others estimates, therefore yielding faster and more robust training. This work is an in-depth extension of our original DQV-Learning paper and will be presented in December at the coming NeurIPS Deep Reinforcement Learning (DRLW) Workshop in Vancouver (Canada).

An in depth presentation of the several benefits that these algorithms provide are discussed in our new paper: 'Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning Algorithms'.

Be sure to check out Arxiv for a pre-print of our work!

The main algorithms presented in this repo are:

  • Dueling Deep Quality-Value (Dueling-DQV) Learning: This Repo
  • Deep Quality-Value-Max (DQV-Max) Learning: This Repo
  • Deep Quality-Value (DQV) Learning: originally presented in 'DQV-Learning'', is now properly refactored.

while we also release implementations of:

  • Deep Q-Learning: DQN
  • Double Deep Q-Learning: DDQN

which have been used for some of the comparisons presented in our work.

alt textalt text

If you aim to train an agent from scratch on a game of the Atari Arcade Learning benchmark (ALE) run the training_job.sh script: it allows you to choose which type of agent to train according to the type of policy learning it uses (online for DQV and Dueling-DQV, while offline for all other algorithms). Note that based on which game you choose, some modifications to the code might be required.

In ./models we release a trained model obtained on Pong both for DQV and for DQV-Max.

You can use these models to explore the behavior of the learned value functions with the ./src/test_value_functions.py script. The script will compute the averaged expected return of all visited states and show that the algorithms of the DQV-family suffer less from the overestimation bias of the Q function. The script will also show that our algorithms do not overestimate the V function instead of the Q function.

deep-quality-value-family's People

Contributors

paintception avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deep-quality-value-family's Issues

Why DQV is on-policy?

I saw that DQV used samples sampled with the behavior policy (the epsilon-greedy policy), not the current policy (the greedy policy). Why do you divide DQV into the on-policy methods?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.