MuZero.jl

This package provides the core MuZero algorithm in Julia Language:

MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions.

Because MuZero is resource-hungry, the motivation for this project is to provide an implementation of MuZero that is simple enough to be widely accessible, while also being sufficiently powerful and fast to enable meaningful experiments on limited computing resources. I found the Julia language to be instrumental in achieving this goal.

Platform Agnostic instructions

To install Julia on your platform, download from the appropriate mirror and add to PATH, instructions can be found here
To set up Git on your computer follow the instructions here

Training a TicTacToe Agent

To download MuZero.jl and start training a TicTacToe agent with 2 threads, just run:

git clone https://github.com/deveshjawla/MuZero.jl
cd MuZero.jl
julia --project -e 'import Pkg; Pkg.instantiate()'
julia --project -t 3 ./games/tictactoe/main.jl

Note that the MuZero agent is not exposed to the baselines during training and learns purely from self-play, without any form of supervision or prior knowledge.

Playing a game against the trained MuZero

julia --project ./games/tictactoe/play.jl

Features

Residual Network and Fully connected network in Flux
Reinforcement Learning enviornment and TicTacToe example adapted from ReinforcementLearning.jl
Parallel computing natively supported by Julia
Multi GPU support for the training and the selfplay
Model weights automatically saved at checkpoints
Single and two player mode
Easily adaptable for new games

Games implemented

Tic-tac-toe (Tested with the fully connected network)

Config

You can adapt the configurations of each game by editing the Config of the params.jl file in the games folder.

Contribution Guide

I would like to invite you to contribute to this project by addressing any of the following points:

User Interface: Session management, track Learning Performance with TensorBoard, and Diagnostic tools to understand the learned model
Benchmarking: Interface and tools for Benchmarking against Perfect solvers, MCTS Only or Network Only players.
Logging Tools: To track code performance
Optimize code for Performance
Support for more than 2 Players
Hyper-Parameter Search
Support for Continuous action spaces
Support of New environments: Zero sum games, RL, Control problems etc.

The next aim for me would be to implement an easy to use Interface and this could be expected in v0.4.0. The User Interface and Benchmarking will most likely be adapted from Jonathan Laurent's AlphaZero.jl.

Acknowledgements and Citation

David Foster for his excellent tutorial
Werner Duvaud : the core algorithm of this Julia implementation is mostly based on his Python implementation. Some parts of this ReadMe are also adpated from his Github repository
Julian Schrittweiser for his tutorial and the associated pseudocode
Jonathan Laurent : Some parts of this ReadMe are adpated from his Github repository

Authors and Contributors

Author: Devesh Jawla
Contributors

Supporting and Citing

If you want to support this project and help it gain visibility, please consider starring the repository. Doing well on such metrics may also help us secure academic funding in the future. Also, if you use this software as part of your research, I would appreciate that you include the following citation in your paper.

deveshjawla / muzero.jl Goto Github PK

muzero.jl's Introduction

MuZero.jl

Platform Agnostic instructions

Training a TicTacToe Agent

Playing a game against the trained MuZero

Features

Games implemented

Config

Contribution Guide

Acknowledgements and Citation

Authors and Contributors

Supporting and Citing

muzero.jl's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent