Giter VIP home page Giter VIP logo

muzero.jl's Introduction

MuZero.jl

This package provides the core MuZero algorithm in Julia Language:

MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions.

Because MuZero is resource-hungry, the motivation for this project is to provide an implementation of MuZero that is simple enough to be widely accessible, while also being sufficiently powerful and fast to enable meaningful experiments on limited computing resources. I found the Julia language to be instrumental in achieving this goal.

Platform Agnostic instructions

  1. To install Julia on your platform, download from the appropriate mirror and add to PATH, instructions can be found here

  2. To set up Git on your computer follow the instructions here

Training a TicTacToe Agent

To download MuZero.jl and start training a TicTacToe agent with 2 threads, just run:

git clone https://github.com/deveshjawla/MuZero.jl
cd MuZero.jl
julia --project -e 'import Pkg; Pkg.instantiate()'
julia --project -t 3 ./games/tictactoe/main.jl

Note that the MuZero agent is not exposed to the baselines during training and learns purely from self-play, without any form of supervision or prior knowledge.

Playing a game against the trained MuZero

julia --project ./games/tictactoe/play.jl

Features

  • Residual Network and Fully connected network in Flux
  • Reinforcement Learning enviornment and TicTacToe example adapted from ReinforcementLearning.jl
  • Parallel computing natively supported by Julia
  • Multi GPU support for the training and the selfplay
  • Model weights automatically saved at checkpoints
  • Single and two player mode
  • Easily adaptable for new games

Games implemented

  • Tic-tac-toe (Tested with the fully connected network)

Config

You can adapt the configurations of each game by editing the Config of the params.jl file in the games folder.

Contribution Guide

I would like to invite you to contribute to this project by addressing any of the following points:

  • User Interface: Session management, track Learning Performance with TensorBoard, and Diagnostic tools to understand the learned model
  • Benchmarking: Interface and tools for Benchmarking against Perfect solvers, MCTS Only or Network Only players.
  • Logging Tools: To track code performance
  • Optimize code for Performance
  • Support for more than 2 Players
  • Hyper-Parameter Search
  • Support for Continuous action spaces
  • Support of New environments: Zero sum games, RL, Control problems etc.

The next aim for me would be to implement an easy to use Interface and this could be expected in v0.4.0. The User Interface and Benchmarking will most likely be adapted from Jonathan Laurent's AlphaZero.jl.

Acknowledgements and Citation

  1. David Foster for his excellent tutorial
  2. Werner Duvaud : the core algorithm of this Julia implementation is mostly based on his Python implementation. Some parts of this ReadMe are also adpated from his Github repository
  3. Julian Schrittweiser for his tutorial and the associated pseudocode
  4. Jonathan Laurent : Some parts of this ReadMe are adpated from his Github repository

Authors and Contributors

Supporting and Citing

If you want to support this project and help it gain visibility, please consider starring the repository. Doing well on such metrics may also help us secure academic funding in the future. Also, if you use this software as part of your research, I would appreciate that you include the following citation in your paper.

muzero.jl's People

Contributors

apanchot avatar deveshjawla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

mlsamsom seemhuei

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.