Giter VIP home page Giter VIP logo

mastermind's Introduction

README

To enhance the reproducibility of MasterMind, we provide part of the training data and demo training code required for the tasks of Dou Dizhu and Go in this repository.

Summary

The dataset in ./data contains the expert dataset for the Doudizhu and Go tasks proposed in MasterMind. In summary, this dataset uses a QA format, with the question part providing the current state of the game; the answer part provides the corresponding game-playing strategy and the logic behind adopting this strategy. The dataset encodes all the above information in string format, all expert information is automatically generated by agents.

This repository also includes the code for fine-tuning the base model using Instruct tuning. By default, we use the LLaMA model as the base model and apply LoRA technology for parameter fine-tuning.

Quick Start

Here is a simple tutorial for applying instruct tuning on this dataset:

  • Install torch==2.2.2. You can directly run the following command to install it with cuda==11.8. For other version, please view the official tutorial: PyTorch
pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cu118
  • Install other requirements by:
pip3 install -r requirements.txt
  • Run our demo code for Doudizhu:
python train_dou.py

โ€‹ , or for Go:

python train_go.py

Data Examples

For the Doudizhu data, each data point primarily includes: the current hand, possible card-playing strategies, the opponent's potential responses to each strategy, and the final decision on which card to play. An example is as follows:

image-20240530111216311

For the Go data, the dataset includes predicting the next state, predicting the territory analysis provided by KataGo, predicting the explanations given by human players, and win rate predictions. Some specific examples are as follows:

0_board

Explanation: When White 1 hangs high on the upper right, the preparations for Black 2's attack are taken into account beforehand. When White 3 comes over, Black 4's move to 12 is followed by White 13's move, and Black 14's break is related to the introduction.

image-20240515105949253 image-20240515110152381

mastermind's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.