jserv / ttt Goto Github PK

An implementation of tic-tac-toe in C, featuring an AI powered by the negamax algorithm

License: The Unlicense

Makefile 2.61% C 94.09% Shell 3.30%

ttt's Introduction

ttt

An implementation of tic-tac-toe in C, featuring an AI powered by the negamax algorithm, the reinforcement learning (RL) algorithm, the Monte Carlo tree search (MCTS) algorithm. And the RL algorithm contains the Monte Carlo learning algorithm and TD learning algorithm.

Build

negamax AI

If you want to play with negamax AI: Build the game:

$ make

and run:

$ ./ttt

reinforcement learning AI

If you want to play with RL AI: Train the state value table first, you could modify the hyperparameters, which are macros in train.c:

MONTE_CARLO : Decide whether or not to use the Monte Carlo method or the TD method.
REWARD_TRADEOFF : Decide the reward of the Markov decision model, which is a balance between the episode reward and the score from get_score. Those who want to fully customize the reward can modify it in the init_agent function.
INITIAL_MULTIPLIER : Decide the initial state value, which is the multiplier of get_score in the initial state value. Those who want to fully customize the initial state value can modify it in the init_agent function.
EPSILON_GREEDY : Decide whether or not to use epsilon-greedy exploration when training.
NUM_EPISODE : the number of game episode in training
LEARNING_RATE, GAMMA : $\alpha$, $\gamma$ in training.
EPSILON_START EPSILON_END : $\epsilon$ in Epsilon-Greedy Algorithm and it would decay exponentially.

compile

$ make train

and run:

$./train

Build the game playing with RL agent, it would load the pretrained model from train:

$ make rl

and run:

$ ./rl

MCTS AI

If you want to play with MCTS AI:
There are several hyperparameters you can modify:

EXPLORATION_FACTOR in agents/mcts.h : The exploration parameter.
ITERATIONS in agents/mcts.h : Number of simulations in MCTS.

Build the game:

$ make mcts

and run:

$ ./mcts

ELO rating system

There are several hyperparameters you can modify:

N_GAMES in elo.c : The number of games played to calculate the ELO rating.
ELO_INIT in elo.c : The initial ELO rating assigned to a player before any games are played.
ELO_K in elo.c : The coefficient used in the ELO calculation formula to determine the impact of each game's outcome on the player's rating.

Build the elo system:

$ make elo

and run:

$ ./elo

Run

These program operate entirely in the terminal environment. Below is its appearance as it awaits your next move:

 1 |  ×
 2 |     ○
 3 |
---+----------
      A  B  C
>

To execute a move, enter [column][row]. For example:

> a3

Press Ctrl-C to exit.

Game Rules

The winner is determined by the first player who successfully places GOAL of their marks in a row, whether it is vertically, horizontally, or diagonally, regardless of the board size.

Using the following 4x4 board games, whose GOAL is 3, as examples:

 1 |  ×  ×
 2 |     ○  ×
 3 |     ○
 4 |     ○
---+------------
      A  B  C  D
>

The player "○" wins the game since he placed his marks in a row vertically (B2-B3-B4).

 1 |  ×  ×  ○
 2 |  ×  ○  
 3 |  ○  
 4 |     
---+------------
      A  B  C  D
>

The player "○" wins the game since he placed his marks in a row diagonally (A3-B2-C1).

 1 |  o  x  
 2 |  o  x  
 3 |  o     x
 4 |  o  x
---+------------
      A  B  C  D
>

The player "○" wins the game if ALLOW_EXCEED is 1; otherwise, the game will continue because the number of "○"s in a row exceeds GOAL.

Reference

Mastering Tic-Tac-Toe with Minimax Algorithm in Python
tic-tac-toe: tic-tac-toe game for terminal I/O.

ttt's People

Contributors

Stargazers

Watchers

ttt's Issues

Clarification and Details Needed on Project Architecture

I came across a post in the System Software 2024 Facebook group discussing this project, but there are several aspects that require further clarification and details:

Linux Char Device Driver Location: The post mentions that this project includes a Linux character device driver. Could you please specify where exactly in the repository the driver code would be located?
User Space-Kernel Space API Design: What is the current design of the API that facilitates communication between the user space and kernel space? It would be helpful to have a detailed explanation or documentation about this.
Embedding Machine Learning Model in Kernel: The post also indicates that a machine-learning model is embedded into the kernel. Could you provide guidance or existing design principles that have been followed for embedding such a model? Additionally, it would be beneficial to understand the expected behavior and limitations of this integration.

Understanding these elements is crucial for comprehensively grasping the architecture of this project. Any detailed information or documentation regarding these aspects would be greatly appreciated.

Winning rule is not correct about 4x4 board

The winning rule is not correct for the 4x4 board

Improving performance and efficiency of PRNG

We've recently integrated the MT19937 algorithm into our project for pseudo-random number generation. However, it's crucial to assess its performance and efficiency.

Quoting from @jserv :
With full support from processor hardware, achieving around 60 Gbits/s with AES, supplemented by intermittent reseeds from RDRAND, is achievable. When implemented purely in software, we can approach similar performance levels for algorithms such as SplitMix64, Xoroshiro128+, and PCG64. Additionally, with the utilization of vector processing capabilities, ChaCha20 offers competitive speeds and outperforms algorithms like the Mersenne Twister.

Action items:

Compare the performance and efficiency of MT19937 against alternative algorithms like SplitMix64, Xoroshiro128+, PCG64, and ChaCha20.
Investigate potential optimizations or enhancements to MT19937's implementation to improve its performance and efficiency further.
Consider the implications of adopting MT19937 versus alternative algorithms in terms of speed, efficiency, and suitability for the project's requirements.

Refs:

See: #21

Winning rule about the diagonal line

Is it legal that the off-diagonal line scores in 4x4 board like the figure above?

Since the rule about diagonal line is not clear, in general only diagonal (A1-B2-C3-D4 and A4-B3-C2-D1) can get the score not the "off-diagonal"

State based gameplay recording

I have some thoughts on recording gameplay. For wider usage, recording game states could be an option. Here are several reasons

Gameplay can be replayed from any states, backward and forward. By using move sequence to reach certain state in the gameplay, moves need to be aggregated from start. Viewing gameplays in Counter Strike: Global Offensive suffers from long delay when jumping between rounds, which is caused by replaying from start everytime on jumping. In our case, the gameplay of m, n, k game could be long if k is large enough.
In case errors can happen while recording, faults only remain in some states rather than whole gameplay, comparing to using moves.
A disadvantages is that it may take more spaces to store, comparing to 4 bytes at most per move.

Some ideas on designing encoding: Since later states contain earlier states, we may have the code partial ordered. This may make states search-able. Puffer code for encoding a tree is designed to proved the count of all possible trees. This can be a reference. Or just compress the raw array data by using compression algorithms like LZ4, which kernel has already provided. Perfect hash functions may be considered if the count of possible game states is not too large.

jserv / ttt Goto Github PK

ttt's Introduction

ttt

Build

negamax AI

reinforcement learning AI

MCTS AI

ELO rating system

Run

Game Rules

Reference

ttt's People

Contributors

Stargazers

Watchers

Forkers

ttt's Issues

Recommend Projects

Recommend Topics

Recommend Org