Giter VIP home page Giter VIP logo

gptchess's Introduction

Debunking the Chessboard: Confronting GPTs Against Chess Engines to Estimate Elo Ratings and Assess Legal Move Abilities

Source code supporting experiments presented in https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/ I've started with the GPT-3 chess experiment from https://github.com/clevcode/skynet-dev and explore the variability space of experiments (GPT variants, number of games, Stockfish skills, temperature, prompt, etc.)

Install Python dependencies

pip install openai chess stockfish

for the analysis part, you'll also need libraries like pandas and matplotlib

Install Stockfish

See https://stockfishchess.org/download/ I'm using the Linux stockfish-ubuntu-x86-64-avx2 binary (version 16).

Run Chess experiment

export OPENAI_KEY=<your-openai-API-key>
python ./gptchess/gpt-experiments.py 

or if you want to run multiple games, you can use a loop like:

for i in {1..20}; do python gptchess/gpt-experiments.py; done

Edit the source code to change the GPT version, the number of games, the number of moves per game, etc. Something like this:

chess_config = ChessEngineConfig(
    skill_level=4,
    engine_depth=15,
    engine_time=None,
    random_engine=False
)

gpt_config = GPTConfig(
    model_gpt="gpt-3.5-turbo-instruct",
    temperature=0.0,
    max_tokens=5,
    chat_gpt=False,
    system_role_message=None  
)

play_game(chess_config, gpt_config, base_pgn=BASE_PGN, nmove=1, white_piece=False)

The outcome is located in output folder and is a subfolder, with the PGN file of the game, the log of the game, and the session with GPT. You can then analyze the data with the Jupyter notebook analysis.ipynb.

Data and analysis

For convenience, we have released the data used as part of the experiment documented in the blog post. It has been saved into the games.tar.gz of this repo. I'm usually using zenodo, but the size of the data is manageable.
Decompress the archive to get games folder.

> ls
game30da9c70-31ed-4489-9e93-9ef179661da4  game6a246b94-4511-4b34-908a-6f1fd5b838ac  game98a79424-b219-4764-a4d8-71f524454923  gamec9ba4fa3-b7a9-4953-b8d5-a25db0e10886  gameffa12232-8e93-4440-82d8-a0c8a6023c63
game3179c6e9-935e-4f59-b932-268fd10356fc  game6a34daf4-4820-4eb0-a7ad-6a472d6506a2  game99371010-b5af-48d4-adbc-4fcfda798304  gameca445cfd-0076-4146-ba1b-c263c8cd8707

with plenty of folders. Each folder contains:

  • game.pgn: the PGN file of the game
  • metainformation.txt about configuration of the experiments
  • log.txt the log of the game
  • session.txt the session with GPT

games_db.csv contains almost all information about all the games, in a structured way.

analysis.ipynb is a Jupyter notebook to analyze the data.

Update/Misc

  • new experiments based on Monsieur Phi suggestion/experiments (X/Twitter thread in french: https://twitter.com/MonsieurPhi/status/1781260337754366265), as a follow-up of his excellent video (in french again!) https://www.youtube.com/watch?v=6D1XIbkm4JE where I was interviewed. The basic idea is to study the impact of the prompt on the GPTs' playing skill, on the very specific position 1. e4 e5 2. Bc4 Nc6 3. Qh5. I have to wrap-up, but the tldr is that the prompt has indeed a significant impact on the GPTs' playing skill (at least on this position!), and that we can identify intuitive patterns of prompt leading to either g6 or Nf6. See gptchess/gpt-experiments-prompt-variations.py and analysis_prompt_variations.ipyng and prompt_variations_phi.csv.
  • In parallel, Yosha Iglesias has made a fantastic video: https://www.youtube.com/watch?v=FBx95bdJzh8 further exploring prompt sensitivity and surprising skills of GPT as well as many interesting ideas worth replicating in the large. Fascinating! analysis_yosha.ipynb is one ongoing/modest attempt to study the impact of prompt on the GPTs' playing skill (see also games_yosha.tar.gz and games_db_yosha.csv). Stay tuned for more experiments and analysis!

gptchess's People

Contributors

familiar-project avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

scchess

gptchess's Issues

Elo ratings calculation

Using the advanced Elo calculations implemented in the ordo program,
and using the provided nominal Stockfish strengths as anchors,
I determined the following ratings from the games in this repo (omitting the isolated text-davinci-003 group):

   # PLAYER                    :  RATING  POINTS  PLAYED   (%)
   1 Stockfish_Elo2035         :  2035.0    16.0      20    80
   2 Stockfish_Elo1954         :  1954.0    16.5      19    87
   3 Stockfish_Elo1871         :  1871.0   187.5     252    74
   4 Stockfish_Elo1785         :  1785.0   104.0     153    68
   5 Stockfish_Elo1694         :  1694.0    20.5      40    51
   6 gpt-3.5-turbo-instruct    :  1682.9   193.0     485    40
   7 Stockfish_Elo1597         :  1597.0     8.5      22    39
   8 gpt-3.5-turbo             :  1401.2     6.0       8    75
   9 gpt-4                     :  1377.5    63.0     123    51
  10 RANDOM chess engine       :   760.6     1.0     110     1

Note that accurately determining Elo for chess engine is a non-trivial task, as has long been recognized in the computer chess community. In particular, a simplistic tournament performance (FIDE lookup table) approach is not a good one.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.