Giter VIP home page Giter VIP logo

gym-backgammon's Introduction

Backgammon OpenAI Gym


Backgammon

Table of Contents


gym-backgammon

The backgammon game is a 2-player game that involves both the movement of the checkers and also the roll of the dice. The goal of each player is to move all of his checkers off the board.

This repository contains a Backgammon game implementation in OpenAI Gym.
Given the current state of the board, a roll of the dice, and the current player, it computes all the legal actions/moves (iteratively) that the current player can execute. The legal actions are generated in a such a way that they uses the highest number of dice (if possible) for that state and player.


Installation

git clone https://github.com/dellalibera/gym-backgammon.git
cd gym-backgammon/
pip3 install -e .

Environment

The encoding used to represent the state is inspired by the one used by Gerald Tesauro[1].

Observation

Type: Box(198)

Num Observation Min Max
0 WHITE - 1st point, 1st component 0.0 1.0
1 WHITE - 1st point, 2nd component 0.0 1.0
2 WHITE - 1st point, 3rd component 0.0 1.0
3 WHITE - 1st point, 4th component 0.0 6.0
4 WHITE - 2nd point, 1st component 0.0 1.0
5 WHITE - 2nd point, 2nd component 0.0 1.0
6 WHITE - 2nd point, 3rd component 0.0 1.0
7 WHITE - 2nd point, 4th component 0.0 6.0
...
92 WHITE - 24th point, 1st component 0.0 1.0
93 WHITE - 24th point, 2nd component 0.0 1.0
94 WHITE - 24th point, 3rd component 0.0 1.0
95 WHITE - 24th point, 4th component 0.0 6.0
96 WHITE - BAR checkers 0.0 7.5
97 WHITE - OFF bar checkers 0.0 1.0
98 BLACK - 1st point, 1st component 0.0 1.0
99 BLACK - 1st point, 2nd component 0.0 1.0
100 BLACK - 1st point, 3rd component 0.0 1.0
101 BLACK - 1st point, 4th component 0.0 6.0
...
190 BLACK - 24th point, 1st component 0.0 1.0
191 BLACK - 24th point, 2nd component 0.0 1.0
192 BLACK - 24th point, 3rd component 0.0 1.0
193 BLACK - 24th point, 4th component 0.0 6.0
194 BLACK - BAR checkers 0.0 7.5
195 BLACK - OFF bar checkers 0.0 1.0
196 - 197 Current player 0.0 1.0

Encoding of a single point (it indicates the number of checkers in that point):

Checkers Encoding
0 [0.0, 0.0, 0.0, 0.0]
1 [1.0, 0.0, 0.0, 0.0]
2 [1.0, 1.0, 0.0, 0.0]
>= 3 [1.0, 1.0, 1.0, (checkers - 3.0) / 2.0]

Encoding of BAR checkers:

Checkers Encoding
0 - 14 [bar_checkers / 2.0]

Encoding of OFF bar checkers:

Checkers Encoding
0 - 14 [off_checkers / 15.0]

Encoding of the current player:

Player Encoding
WHITE [1.0, 0.0]
BLACK [0.0, 1.0]

Actions

The valid actions that an agent can execute depend on the current state and the roll of the dice. So, there is no fixed shape for the action space.

Reward

+1 if player WHITE wins, and 0 if player BLACK wins

Starting State

All the episodes/games start in the same starting position:

| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------|     |-------P=O Home Board--------|     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|-----------------------------|     |-----------------------------|     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|--------Outer Board----------|     |-------P=X Home Board--------|     |
| 11 | 10 |  9 |  8 |  7 |  6 | BAR |  5 |  4 |  3 |  2 |  1 |  0 | OFF |

Episode Termination

  1. One of the 2 players win the game
  2. Episode length is greater than 10000

Reset

The method reset() returns:

  • the player that will move first (0 for the WHITE player, 1 for the BLACK player)
  • the first roll of the dice, a tuple with the dice rolled, i.e (1,3) for the BLACK player or (-1, -3) for the WHITE player
  • observation features from the starting position

Rendering

If render(mode = 'rgb_array') or render(mode = 'state_pixels') are selected, this is the output obtained (on multiple steps):

Backgammon


Example

Play Random Agents

To run a simple example (both agents - WHITE and BLACK select an action randomly):

cd examples/
python3 play_random_agent.py

Valid actions

An internal variable, current player is used to keep track of the player in turn (it represents the color of the player).
To get all the valid actions:

actions = env.get_valid_actions(roll)

The legal actions are represented as a set of tuples.
Each action is a tuple of tuples, in the form ((source, target), (source, target))
Each tuple represents a move in the form (source, target)

NOTE:

The actions of asking a double and accept/reject a double are not available.

Given the following configuration (starting position, BLACK player in turn, roll = (1, 3)):

| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------|     |-------P=O Home Board--------|     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |  X |     |
|  X |    |    |    |  O |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|  X |    |    |    |    |    |     |  O |    |    |    |    |    |     |
|-----------------------------|     |-----------------------------|     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |    |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |    |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|  O |    |    |    |  X |    |     |  X |    |    |    |    |  O |     |
|--------Outer Board----------|     |-------P=X Home Board--------|     |
| 11 | 10 |  9 |  8 |  7 |  6 | BAR |  5 |  4 |  3 |  2 |  1 |  0 | OFF |

Current player=1 (O - Black) | Roll=(1, 3)

The legal actions are:

Legal Actions:
((11, 14), (14, 15))
((0, 1), (11, 14))
((18, 19), (18, 21))
((11, 14), (18, 19))
((0, 1), (0, 3))
((0, 1), (16, 19))
((16, 17), (16, 19))
((18, 19), (19, 22))
((0, 1), (18, 21))
((16, 17), (18, 21))
((0, 3), (18, 19))
((16, 19), (18, 19))
((16, 19), (19, 20))
((0, 1), (1, 4))
((16, 17), (17, 20))
((0, 3), (16, 17))
((18, 21), (21, 22))
((0, 3), (3, 4))
((11, 14), (16, 17))

Backgammon Versions

backgammon-v0

The above description refers to backgammon-v0.

backgammon-pixel-v0

The state is represented by (96, 96, 3) feature vector.
It is the only difference w.r.t backgammon-v0.

An example of the board representation:

raw_pixel


Useful links and related works


License

MIT

gym-backgammon's People

Contributors

dellalibera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

gym-backgammon's Issues

Render as image

When trying to render as image (and not as text in console):

env = gym.make('gym_backgammon:backgammon-v0', disable_env_checker=True)
mode = 'rgb_array'

I get:

    arr = np.fromstring(image_data.data, dtype=np.uint8, sep='')
AttributeError: 'ImageData' object has no attribute 'data'

I solved the issue in: https://github.com/ndvbd/gym-backgammon

Saved Trained agents

Hi I have trained even upto 1000000 games for 2 days, but even that is losing to a random agent

Has anyone made a trained set already which I can use?
Also the trained tar file for 1000000 games is same size as 1000 games, is there a wrong implmentation on my part?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.