Giter VIP home page Giter VIP logo

learning-tetris-using-the-noisy-cross-entropy-method's Introduction

Learning Tetris using the noisy Cross Entropy method

Project conducted as part of the second year Monte Carlo course at ENSAE. Most of the explanations belows are extracts from preexisting articles (see bibliography), however, the code and the results are ours.

Main idea

The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning seems to be limited although it is fast, because it often converges to suboptimal policies. A standard technique for preventing early convergence is to introduce noise. We apply the noisy cross-entropy method to the game of Tetris to demonstrate its efficiency.

Code

  • Tetris.py implements the Tetris game and the state-value functions.
  • Tetris_tuned.py proposes two more features to integrate in the state-value function (depht of the holes & numbers of lines with holes) as it is proposed by [2].
  • CE_method.py implements the classical cross-entropy method optimizer.
  • CE_method_with_noise.py implements the Noisy cross-entropy method (constant & decreasing noise).
  • Simulated_annealing.py implements the simulated annealing optimizer.

Tetris controller

Following the approach of Thiery and Scherrer [2], we shall learn state-value functions that are linear combination of 21 basis functions.

Feature Id Description Comments
Column height $h_p$ Height of the $p$ th column of the board There are $P$ such features where $P$ is the board width
Column difference $\Delta h_p$ Absolute difference $\mid h_p − h_{p+1} \mid$ between adjacent columns There are $P − 1$ such features where $P$ is the board width
Maximum height $L$ Maximum pile height Prevents from having a big pile
Holes $H$ Number of empty cells covered by at least one full cell Prevents from making holes

The value function to optimise:

$$ V_w(s):=\sum_{i=1}^{10} w_i h_i(s) + \sum_{i=1}^{9} w_{10+i}\Delta h_i(s) + w_{20}L + w_{21}H $$

where $s$ denotes a Tetris state and $w=(w_1,w_2,...,w_{21})$ the weight under which optimize.

One of our Tetris simulation for an optimised weight vector (for gif generator see Tetris.py) :

Optimizer

Cross entropy

Presentation

An interesting illustration to understand this principle for $w=(w_1,w_2)$ is proposed in this repo :

  • The red crosses : the 10 best vectors we select and use to estimate next round $\mathcal{N}\left(\mu, \sigma^2\right)$
  • The black dots : the next round 100 vectors we generate and test

Results

Cross entropy with constant noise

Presentation

Results

Cross entropy with decreasing noise

Presentation

Results

Further improvements

  • Try a two pieces Tetris controller
  • Use new features in the controller
  • Optimize the hyperparameters
  • Simulated annealing optimizer

Bibliography

Contacts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.