Giter VIP home page Giter VIP logo

tic_tac_toe's Introduction

Tic Tac Toe played by Double Deep Q-Networks

tic_tac_toe

This repository contains a (successful) attempt to train a Double Deep Q-Network (DDQN) agent to play Tic-Tac-Toe. It learned to:

  • Distinguish valid from invalid moves
  • Comprehend how to win a game
  • Block the opponent when poses a threat

Key formulas of algorithms used:

Double Deep Q-Networks:

Based on the DDQN algorithm by Van-Hasselt et al. [1]. The cost function used is:

cost

Where θ represents the trained Q-Network and ϑ represents the semi-static Q-Target network.

The Q-Target update rule is based on the DDPG algorithm by Lillicrap et al. [2] :

update_rule

for some 0 <= τ <= 1.

Maximum Entropy Learning:

Based on a paper by Haarnoja et al.[3] and designed according to a blog-post by BAIR[4]. Q-Values are computed using the Soft Bellman Equation: soft_bellman

Trained models:

Two types of agents were trained:

  • a regular DDQN agent
  • an agent which learns using maximum entropy. They are named 'Q' and 'E' respectively.

Both models use a cyclic memory buffer as their experience-replay memory.

All pre-trained models are found under the models/ directory, where several trained models can be found for each variant. Q files refer to DDQN models and E files refer to DDQN-Max-Entropy models.

Do it yourself:

The main.py holds several useful functions. See doc-strings for more details:

  • train will initiate a single training process. It will save the weights and plots graphs. Using the current settings, training took me around 70 minutes on a 2018 MacBook Pro
  • multi_train will train several DDQN and DDQN-Max-Entropy models
  • play allows a human player to play against a saved model
  • face_off can be used to compare models by letting them play against each other

The DeepQNetworkModel class can be easily configured using these parameters (among others):

  • layers_size: set the number and size of the hidden layers of the model (only fully-connected layers are supported)
  • memory: set memory type (cyclic buffer or reservoir sampling)
  • double_dqn: set whether to use DDQN or a standard DQN
  • maximize_entropy: set whether to use maximum entropy learning or not

See the class doc-string for all possible parameters.


Related blogposts:


References:

  1. Hado van Hasselt et al., Deep Reinforcement Learning with Double Q-learning
  2. Lillicrap et al. , Continuous control with deep reinforcement learning
  3. Haarnoja et al., Reinforcement Learning with Deep Energy-Based Policies
  4. Tang & Haarnoja, Learning Diverse Skills via Maximum Entropy Deep Reinforcement Learning (blogpost)

tic_tac_toe's People

Contributors

shakedzy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.