Giter VIP home page Giter VIP logo

alpha-zero-virus-war's Introduction

Alpha Zero General (any game, any framework!)

A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al). It is designed to be easy to adopt for any two-player turn-based adversarial game and any deep learning framework of your choice. A sample implementation has been provided for the game of Othello in PyTorch and Keras. An accompanying tutorial can be found here. We also have implementations for many other games like GoBang and TicTacToe.

To use a game of your choice, subclass the classes in Game.py and NeuralNet.py and implement their functions. Example implementations for Othello can be found in othello/OthelloGame.py and othello/{pytorch,keras}/NNet.py.

Coach.py contains the core training loop and MCTS.py performs the Monte Carlo Tree Search. The parameters for the self-play can be specified in main.py. Additional neural network parameters are in othello/{pytorch,keras}/NNet.py (cuda flag, batch size, epochs, learning rate etc.).

To start training a model for Othello:

python main.py

Choose your framework and game in main.py.

Docker Installation

For easy environment setup, we can use nvidia-docker. Once you have nvidia-docker set up, we can then simply run:

./setup_env.sh

to set up a (default: pyTorch) Jupyter docker container. We can now open a new terminal and enter:

docker exec -ti pytorch_notebook python main.py

Experiments

We trained a PyTorch model for 6x6 Othello (~80 iterations, 100 episodes per iteration and 25 MCTS simulations per turn). This took about 3 days on an NVIDIA Tesla K80. The pretrained model (PyTorch) can be found in pretrained_models/othello/pytorch/. You can play a game against it using pit.py. Below is the performance of the model against a random and a greedy baseline with the number of iterations. alt tag

A concise description of our algorithm can be found here.

Citation

If you found this work useful, feel free to cite it as

@misc{thakoor2016learning,
  title={Learning to play othello without human knowledge},
  author={Thakoor, Shantanu and Nair, Surag and Jhunjhunwala, Megha},
  year={2016},
  publisher={Stanford University, Final Project Report}
}

Contributing

While the current code is fairly functional, we could benefit from the following contributions:

  • Game logic files for more games that follow the specifications in Game.py, along with their neural networks
  • Neural networks in other frameworks
  • Pre-trained models for different game configurations
  • An asynchronous version of the code- parallel processes for self-play, neural net training and model comparison.
  • Asynchronous MCTS as described in the paper

Some extensions have been implented here.

Contributors and Credits

Note: Chainer and TensorFlow v1 versions have been removed but can be found prior to commit 2ad461c.

alpha-zero-virus-war's People

Contributors

suragnair avatar shantanuthakoor avatar jjw-megha avatar evg-tyurin avatar threedliteguy avatar rodneyodonnell avatar sourkream avatar corochann avatar lyphrowny avatar rlronan avatar goshawk22 avatar brianprichardson avatar mikhail avatar nmo13 avatar jernejhabjan avatar vochicong avatar yangboz avatar wang-zm18 avatar sunfc avatar pavolkacej avatar edwardtau avatar dependabot[bot] avatar brettkoonce avatar zhiqingxiao avatar zxkyjimmy avatar mwilliammyers avatar saravanan21 avatar mlkorra avatar leviathan91 avatar wasdee avatar

alpha-zero-virus-war's Issues

Parallelize episode execution

Episode execution doesn't depend on the exernal state so can be safely parallelized, thus improving the performance

Read the original paper

The code is based on the paper, but not optimized: maybe the paper will give better optimisation route

Memory consumption

Cannot pass the mark of "Self play 3/..." on 10x10 with 3 moves because the memory comsumption becomes overwhelming: > 12 GBs of RAM. Not an issue though on 5x5 with 2 moves

May be the recursion in the MCST in search, or self.trainExamples becomes of monstrous size

Class for Board state

The string representation of the board is used across the project.

I propose to create a class for the board state, which will store all the needed info of the current board state. It might reduce the memory consumption, because there will be only one copy of the board state

Cache growth

Caching is used in get_moves in a form of dict, which grows after each call to getActionProb (the size was measured after each call). The growth in size is almost constant, which means the board states are mostly different

It doesn't seem to be the right behaviour, because the search tree is reset after each episode, so all the probabilities of the MCST are reset. But the next step is chosen randomly, so this might be the case of the different board states

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.