Giter VIP home page Giter VIP logo

go_attack's Introduction

Go Attack

This repository contains code for studying the adversarial robustness of KataGo.

Read about our research here: https://arxiv.org/abs/2211.00241.

View our website here: https://goattack.far.ai/.

To run our adversary with Sabaki, see this guide.

Development / testing information

To clone this repository, run one of the following commands

# Via HTTPS
git clone --recurse-submodules https://github.com/AlignmentResearch/go_attack.git

# Via SSH
git clone --recurse-submodules [email protected]:AlignmentResearch/go_attack.git

You can run pip install -e .[dev] inside the project root directory to install all necessary dependencies.

To run a pre-commit script before each commit, run pre-commit install (pre-commit should already have been installed in the previous step). You may also want to run pre-commit install from engines/KataGo-custom to install that repository's respective commit hook.

Git submodules

Modifications to KataGo are not tracked in this repository and should instead be made to the AlignmentResearch/KataGo-custom repository. We use code from KataGo-custom in this repository via a Git submodule.

Individual containers

We run KataGo within Docker containers. More specifically:

  1. The C++ portion of KataGo runs in the container defined by compose/cpp/Dockerfile.
  2. The Python training portion of KataGo runs in the container defined at compose/python/Dockerfile.

The Dockerfiles contain instructions for how to build them.

After building a container, you run it with a command like

docker run --gpus all -v ~/go_attack:/go_attack -v DATA_DIR:/shared -it humancompatibleai/goattack:cpp

where DATA_DIR is a directory, shared among all containers, in which to save the results of training runs.

A KataGo executable can be found in the /engines/KataGo-custom/cpp directory inside the C++ container.

Launching victim-play training runs

In order to launch training runs, run several containers simultaneously:

  • One or more 1-GPU C++ containers executing victim-play games to generate data. Example command to run in each container: /go_attack/kubernetes/victimplay.sh [--warmstart] EXPERIMENT-NAME /shared/, where the optional --warmstart flag should be set for warmstarted runs.
  • One 1-GPU Python container for training. Example command: /go_attack/kubernetes/train.sh [--initial-weights WARMSTART-MODEL-DIR] EXPERIMENT-NAME /shared/ 1.0 where the optional --initial-weights WARMSTART-MODEL-DIR flag should be set for warmstarted runs.
  • One Python container for shuffling data. Example command: /go_attack/kubernetes/shuffle-and-export.sh [--preseed WARMSTART-SELFPLAY-DIR] EXPERIMENT-NAME /shared where the optional --preseed flag should be set for warmstarted runs.
  • One Python container for running the curriculum. Example command: /go_attack/kubernetes/curriculum.sh EXPERIMENT-NAME /shared/ /go_attack/configs/examples/cyclic-adversary-curriculum.json -harden-below-visits 100.
    • The victims listed in the curriculum .json file are assumed to exist in /shared/victims. They can be symlinks.
  • Optionally, one 1-GPU C++ container for evaluating models. Example command: /go_attack/kubernetes/evaluate-loop.sh /shared/victimplay/EXPERIMENT-NAME/ /shared/victimplay/EXPERIMENT-NAME/eval.

See configs/examples for example experiment configurations and example values for the warmstart flags.

For these wrapper scripts in kubernetes/, optional flags for the wrapper come before any positional arguments, but optional flags for the underlying command the wrapper calls go after any positional arguments. For example, in the command /go_attack/kubernetes/shuffle-and-export.sh --preseed WARMSTART-SELFPLAY-DIR EXPERIMENT-NAME /shared -add-to-window 100000000, --preseed is a flag for the wrapper whereas -add-to-window is a flag to be passed to /engines/KataGo-tensorflow/python/selfplay/shuffle_and_export_loop.sh.

Docker compose

Within the compose directory of this repo are a few docker-compose .yml files that automate the process of spinning up the various components of training.

Each .yml file also has a corresponding .env that configures more specific parameters of the run ( e.g. what directory to write to, how many threads to use, batch size, where to look for other config files ).

(Note: we stopped using these in October 2022, so they are no longer maintained.)

Website and analysis notebooks

See AlignmentResearch/KataGoVisualizer.

Baseline attacks

In addition to the learned attacks, we also implement 5 baseline, hardcoded attacks:

  • Edge attack, which plays random vertices in the outermost available ring of the board
  • Random attack, which simply plays random legal moves
  • Pass attack, which always passes at every turn
  • Spiral attack, which deterministically plays the "largest" legal move in lexicographical order in polar coordinates (going counterclockwise starting from the outermost ring)
  • Mirror Go, which plays the opponent's last move reflected about the y = x diagonal, or the y = -x diagonal if they play on y = x. If the mirrored vertex is taken, then the policy plays the "closest" legal vertex by L1 distance.

You can test these attacks by running baseline_attacks.py with the appropriate --strategy flag (edge, random, pass, spiral, or mirror). Run python scripts/baseline_attacks.py --help for more information about all the available flags.

go_attack's People

Contributors

yawen-d avatar tomtseng avatar ed1d1a8d avatar adamgleave avatar norabelrose avatar avtomaton avatar kellinpelrine avatar

Stargazers

戴淯琮 (Yu-Tsung Tai) avatar  avatar Orpheus Lummis avatar  avatar  avatar Prakarsh Kaushik avatar ASHUTOSH SHUKLA avatar Sam/Samuel avatar  avatar Chin-Chang Yang avatar Tarun Kumar S avatar lannyyuan avatar  avatar Ziyue Wang avatar David Lewis avatar  avatar Jiaxun Cui avatar  avatar  avatar willshion avatar Yu Su avatar Jose Cohenca avatar Rin avatar Sekret avatar Xiong Jun Wu(熊君武) avatar Ciro Santilli avatar Leroy TANG avatar Hiromichi NOMATA avatar Jan Kaniewski avatar  avatar Hiroshi Yamashita avatar Yijun Wang avatar Chengjun Lu avatar Walter C avatar  avatar  avatar Shae McFadden avatar  avatar Lisa  avatar Beren avatar Christopher Bartz avatar  avatar cz avatar STYLIANOS IORDANIS avatar  avatar  avatar Junyan Xu avatar  avatar  avatar  avatar Renzhe Xu avatar YY_Serendipity avatar  avatar  avatar  avatar  avatar Junshuai Song avatar Tsunami Planeptune avatar  avatar  avatar Saturday-morning avatar  avatar  avatar Zhenwei avatar Adonis Bovell avatar HaoTian Qi avatar Cliff Wu avatar TzuRen avatar 爱可可-爱生活 avatar Mamy Ratsimbazafy avatar  avatar  avatar  avatar Kinke Kabingila avatar smellslikeml avatar Richard Metzler avatar Sejong Yang avatar  avatar Nick Liu avatar

Watchers

 avatar Nova avatar  avatar Junyan Xu avatar  avatar  avatar germanpa avatar Philip Quirke avatar Walter C avatar  avatar

go_attack's Issues

KataGo misconfiguration invalidates the main result.

Your config file for
KataGo is not setting friendlyPassOk: false option, and therefore KataGo rules are not set to be Tromp-Taylor.
KataGo will perform "friendly early pass", which is what you report in your paper.

Tromp-Taylor configuration is prescribed here: https://github.com/lightvector/KataGo/blob/master/docs/GTP_Extensions.md

To summarize, your bot and you judging code is working with Tromp-Taylor rules while KataGo is not.

This misconfiguration is the root cause why your network is able to exploit KataGo.
I'm sorry that this invalidates the main result of your paper.

Notably there was a case where a human player exploited the rules in similar way.

Fix broken ci (invalid credentials)

See https://app.circleci.com/pipelines/github/HumanCompatibleAI/go_attack/755/workflows/a2effdee-b29a-49db-ae2a-9dd5e9aaad6f/jobs/2249 as an example.

Log:

Using SSH Config Dir '/home/circleci/.ssh'
git version 2.35.1
Cloning git repository
Cloning into '.'...
Warning: Permanently added the ECDSA host key for IP address '140.82.112.4' to the list of known hosts.
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

exit status 128

More in-depth training statistics

It would be helpful to get more informative statistics during selfplay/victimplay training.

Useful statistics to track:

  • Win rate
  • Fraction of games won by resignation
  • Win-margin
  • Number of moves taken

It would be nice to be able to plot all of the above as a function of training epoch, with the following filters:

  • Color
  • Board size
  • Whether game was won or lost

Checked items means we already have support for that feature in some capacity.

Some comments:

Can you reupload vit-victim-b16-s650m bin.gz file to the drive?

Hi, I have noticed that the model format of vit-victim-b16-s650m in your Google Drive is not a bin.gz file. I would like to use your ViT-adversary to attack our Transformer based models. However, to use your ViT-adversary, it need to take a victim model in bin.gz format, so I am wondering if you could upload your ictim model in bin.gz format or give the instructions of how to convert your pt file into right bin.gz format.
Thank you very much.
@AdamGleave @tomtseng @ed1d1a8d

`nvidia-smi`/`gpustat` does not work in C++ container

gpustat used to work in humancompatible/goattack:cpp. I confirmed this by checking out a commit from the end of August, building the image, and running gpustat.
I think gpustat broke for reasons not related to our changes. When I used the same commit and re-built with --no-cache, gpustat no longer worked.

What estimator did you use to score the result?

What estimator did you use to score the result? Did you verify the scoring result across different interfaces? I downloaded your .swf files and load them in sabaki and the scoring result is completely different. From the result of your article I believe that you used an outdated estimator where the dead stones have to be hand picked or if you wrote it yourselves then I believe you completely misunderstood how the Go scoring system works.

How to set up the adversarial training?

Hi, I just tried to have the cyclic-adv-s545 model play against the latest 28b model. However, it doesn't seem to work very well. And I would like to do some fine tuning on my own. I just saw some scripts under kubernetes folder, but I don't really know how to make it run locally, so is there any instructions on setting up the iterative adversarial training on a local machine? By the way are there some more recent models? Thanks! @AdamGleave @tomtseng @ed1d1a8d

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.