Giter VIP home page Giter VIP logo

nuzero's Introduction

NuZero

AlphaZero + DeepThinking + WarGames

This system was developed as an attempt to tackle the enormous complexity of Wargames, more specifically Standard Combat Series (SCS) games, by combining the AlphaZero learning capabilities with the Deepthinking extrapolation capacity. The general idea was to train recurrent networks on small/simple maps using AlphaZero and then use the techniques described in the Deepthinking papers to extrapolate the learned strategies to the very large maps of SCS games. The system ended up developing into a wider project to accommodate a larger set of games, network architectures and configurations. More information about the system and its conceptualization can be found in my Master Thesis

Most important papers for this project:

Features

  • Options for both sequential and fully asynchronous self-play, training and testing using Ray.
  • Large selection of hyperparameters that goes beyond the AlphaZero paper.
  • Allows defining custom games, network architectures and agents.
  • Already implemented network architectures for both hexagonal and orthogonal data.
  • Saves checkpoints during training so that training can be continued from any point.
  • Allows replay buffer saving/loading.
  • Creates graphs for loss, win rates and others.
  • Definition of any custom SCS games within the already implemented rules.
  • Simple visualization interface for SCS games.

Getting started

Installation

git clone https://github.com/guilherme439/NuZero
cd NuZero

You might want to create a virtual environment using:

python -m venv venv

or

virtualenv venv

Then activate it

source venv/bin/activate

Install the requirements:

pip install -r requirements.txt

Training

In order to start training with a specific configuration, the training presets should be used. Training presets are defined inside Run.py .

python Run.py --training-preset 0 

As an example, training preset 0 trains a recurrent network for tic tac toe, using an optimized configuration, while the remaining presets are defined for SCS games.

Testing

To test a trained network just use/define a testing preset. Currently, preset 0 tests and provides statistics for a pretrained tic tac toe model, while the remaing presets are used for SCS Games.

python Run.py --testing-preset 0

Interactive

(currently not working in the latest version)

A command line interface is also available even though it does not support all the functionalities. The objective of this interface is giving new users a easy way to start using the system. To use it, simply run:

python Run.py --interactive

This will show you a series of prompts that will allow you to start training/testing by selecting from one of the available games, networks and configurations.

Configs

In order to train the networks, both Training and Search configurations are required. On the other hand, for testing-presets a Testing configuration is needed. These configuration files are located in the Configs/ folder in their respective directories.

Structure

The system can be ran in a variety of ways that can use different parts of the code, however these are the general responsibilities for each class:

Coordinators:

  • AlphaZero - The main thread of the program. Responsible for training and plotting. Also launches the classes responsible for self-play.
  • TestManager - Launches and manages the classes responsible for tests.

Workers: (usually several of these will run in parallel)

  • Gamer - Plays individual self-play games.
  • Tester - Runs individual test games.

Others:

  • Explorer - Contains the methods necessary to run MCTS both in self-play and testing.

This diagram gives a generic view of the code logic:

(the diagram is not up to date, but still gives a general idea of the functioning of the system) ClassDiagram

When running sequentially, the AlphaZero instance will launch Gamers to play a certain number of games. When they finish playing those games, they terminate and AlphaZero executes a training step. At the end of this training step the Gamers are once again launched and the cycle continues. On the other hand, if running fully asynchronously, the Gamers are launched only once and they will keep playing games indefinitely and filling the replay buffer. In the meanwhile the AlphaZero instance will be training the network and storing it.

If the system is ran with asynchronous testing, the Test Manager will start in a separate process and the AlphaZero instance will check for which tests have concluded at the end of each training step. Otherwise, tests will run sequentially, meaning that control will switch to the Test Manager while running tests, and self-play and training only continue after the tests finish.

It is only possible to run sequential testing, if the system is not running in fully asynchronous mode.

Notes

  • This AlphaZero implementation was developed for two player games, despite also supporting single player games. In the AlphaZero version from the original paper, the state was always represented from the persective of the current player, which meant that the value function represented the advantage/disadvantage for the prespective of current player, this is, 1 was a victory for the current player, while -1 was a defeat. Since this implementation was developed with SCS games in mind, which display high objective variability and assymetry, we decided to use a static representation of the state, meaning that both players' units are represented the same way, indenpendetly of what player is currently playing. This ultimatly means that value function has a slightly different meaning, with 1 always representing a victory for player one, and -1 always representing a victory for player two. This change also affects the way in which the backpropagation is done at the end of each self-play game.

Authors

Publicly available alphazero pseudocode, as well as the deepthink github project were used as a base for some of the code and I also took ideas from other open-source AlphaGo/AlphaZero/MuZero implementations available on github.

nuzero's People

Contributors

guilherme439 avatar

Watchers

 avatar

nuzero's Issues

SCS game interface

Implement interface where users can play SCS games using keyboard/mouse. This would be good to test the AIs against humans.

Difficulty -> hard
Priority -> low

Truncate graph data

When continuing training based on a network checkpoint, use only the graph data up until that checkpoint. This is, if I trained a network until iteration 5000, and now I want to continue training the network, based on interation's 4200 network checkpoint, I should only keep the graph data for the 4200 iterations and not the entire 5000.

Difficulty -> easy
Priority -> medium

New cache update strategies

Allows three cache update strategies: local, universal, disabled

This requires changing the network storage to a dict, so that each network can be identified easilly.
Also requires #21 first.

local -> each actor maintains its own cache, and clears it whenever the network changes.
universal -> each actor updates an universal cache when it finishes a game.
disabled -> a new cache is created for each game.


Difficulty -> hard
Priority -> low

Overall System Interface

Improve the general interface for running the system.
Find a way to make presets easier to define and use. Ideally in a way where it can be easy to implement a new feature where users can create custom training/testing presets which can be saved for later use (probably using save files).

Difficulty -> hard
Priority -> low

Interactive

Redo and improve Interactive CLI

Difficulty -> hard
Priority -> medium

New fully async Gamer logic

Instead of having each gamer play_forever... use the same loop that is now in alphazero.self_play, but with while(True).
This requires AlphaZero to be ran with two threads: one to manage self-play and the other to manage training. Which also means that we need be very carefull with any shared objects, however it should work as long as the network storage runs on a single thread sequentially.

This would allow an universal cache when running fully async.

Difficulty -> hard
Priority -> low

Dual networks

Allow for different networks for player 1 and player 2.

Difficulty -> medium
Priority -> medium

INI -> YAML configs

Change search and training INI configs to YAML.

Difficulty -> easy
Priority -> low

Structure and Refactoring

Improve overall folder structure and refactor the code so that, for example, all interface related functions in "Run.py" can be in their own file and apropriate folder.

Difficulty -> medium
Priority -> high

SCS Marker Creator Interface

Create interface for users to create their own marker images to use when playing SCS games. (possibly integrate with the CLI)

Difficulty -> medium
Priority -> low

SCS Game features

Expand SCS Game features so that they get closer and closer to original games.

Difficulty -> hard
Priority -> low

Save and Load Optimizer/Scheduler state

Implement methods for Save and Loading Optimizer State.
Allow training config option for either using the previous learning rate/scheduler or use the new one define in the config.

Difficulty -> medium
Priority -> high

Embeddings Cache

Make a cache using torch.nn.Embeding. Might be able to achieve better memory efficiency and better hit rates as there will not be the need to replace cache entries so often.

Difficulty -> medium
Priority -> low

Renderer paths

SCS_Renderer: Change hardcoded filepaths to dynamic paths based on current directory.

Difficulty -> easy
Priority -> low

Test suite

Create test suite with tests for:

  • Comand line interface
  • Presets (end-to-end)
  • Specific modules like Explorer or Test Manager
  • SCS game engine

Difficulty -> hard
Priority -> medium

Better state set

Improve state set implementation to allow any number of states ( including graph colors )

Difficulty -> hard
Priority -> low

Store plotting data in files to save RAM

Write data points to file instead of adding them to a vector. Keep only the vector for the current epoch's loss. This should reduce the ammount of RAM used, at the cost of some extra time reading from the files when plotting.

Difficulty -> easy
Priority -> low

PonderNet

Find a way to integrate PonderNet into all this xD. It could bring amazing results. :)

Difficulty -> very hard
Priority -> low

Saving and Loading Replay Buffer

Implement methods to save/load the Replay Buffer to/from a file.
This needs to be possible to turn On/Off in training_config since the files will likely take a very large space on disk.

Difficulty -> medium
Priority -> high

Ray's Destributed Training

Implement Ray's destributed training into the current network/training system.

Difficulty -> medium/hard
Priority -> low

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.