alignmentresearch / go_attack Goto Github PK

View Code? Open in Web Editor NEW

79.0 10.0 7.0 27.15 MB

License: MIT License

Dockerfile 5.77% Shell 17.54% Python 64.33% HTML 12.36%

go_attack's Introduction

Go Attack

This repository contains code for studying the adversarial robustness of KataGo.

Read about our research here: https://arxiv.org/abs/2211.00241.

View our website here: https://goattack.far.ai/.

To run our adversary with Sabaki, see this guide.

Development / testing information

To clone this repository, run one of the following commands

# Via HTTPS
git clone --recurse-submodules https://github.com/AlignmentResearch/go_attack.git

# Via SSH
git clone --recurse-submodules [email protected]:AlignmentResearch/go_attack.git

You can run pip install -e .[dev] inside the project root directory to install all necessary dependencies.

To run a pre-commit script before each commit, run pre-commit install (pre-commit should already have been installed in the previous step). You may also want to run pre-commit install from engines/KataGo-custom to install that repository's respective commit hook.

Git submodules

Modifications to KataGo are not tracked in this repository and should instead be made to the AlignmentResearch/KataGo-custom repository. We use code from KataGo-custom in this repository via a Git submodule.

engines/KataGo-custom tracks the stable branch of the KataGo-custom repository.
engines/KataGo-raw tracks the master branch of https://github.com/lightvector/KataGo.

Individual containers

We run KataGo within Docker containers. More specifically:

The C++ portion of KataGo runs in the container defined by compose/cpp/Dockerfile.
The Python training portion of KataGo runs in the container defined at compose/python/Dockerfile.

The Dockerfiles contain instructions for how to build them.

After building a container, you run it with a command like

docker run --gpus all -v ~/go_attack:/go_attack -v DATA_DIR:/shared -it humancompatibleai/goattack:cpp

where DATA_DIR is a directory, shared among all containers, in which to save the results of training runs.

A KataGo executable can be found in the /engines/KataGo-custom/cpp directory inside the C++ container.

Launching victim-play training runs

In order to launch training runs, run several containers simultaneously:

One or more 1-GPU C++ containers executing victim-play games to generate data. Example command to run in each container: /go_attack/kubernetes/victimplay.sh [--warmstart] EXPERIMENT-NAME /shared/, where the optional --warmstart flag should be set for warmstarted runs.
One 1-GPU Python container for training. Example command: /go_attack/kubernetes/train.sh [--initial-weights WARMSTART-MODEL-DIR] EXPERIMENT-NAME /shared/ 1.0 where the optional --initial-weights WARMSTART-MODEL-DIR flag should be set for warmstarted runs.
One Python container for shuffling data. Example command: /go_attack/kubernetes/shuffle-and-export.sh [--preseed WARMSTART-SELFPLAY-DIR] EXPERIMENT-NAME /shared where the optional --preseed flag should be set for warmstarted runs.
One Python container for running the curriculum. Example command: /go_attack/kubernetes/curriculum.sh EXPERIMENT-NAME /shared/ /go_attack/configs/examples/cyclic-adversary-curriculum.json -harden-below-visits 100.
- The victims listed in the curriculum .json file are assumed to exist in /shared/victims. They can be symlinks.
Optionally, one 1-GPU C++ container for evaluating models. Example command: /go_attack/kubernetes/evaluate-loop.sh /shared/victimplay/EXPERIMENT-NAME/ /shared/victimplay/EXPERIMENT-NAME/eval.

See configs/examples for example experiment configurations and example values for the warmstart flags.

For these wrapper scripts in kubernetes/, optional flags for the wrapper come before any positional arguments, but optional flags for the underlying command the wrapper calls go after any positional arguments. For example, in the command /go_attack/kubernetes/shuffle-and-export.sh --preseed WARMSTART-SELFPLAY-DIR EXPERIMENT-NAME /shared -add-to-window 100000000, --preseed is a flag for the wrapper whereas -add-to-window is a flag to be passed to /engines/KataGo-tensorflow/python/selfplay/shuffle_and_export_loop.sh.

Docker compose

Within the compose directory of this repo are a few docker-compose .yml files that automate the process of spinning up the various components of training.

Each .yml file also has a corresponding .env that configures more specific parameters of the run ( e.g. what directory to write to, how many threads to use, batch size, where to look for other config files ).

(Note: we stopped using these in October 2022, so they are no longer maintained.)

Website and analysis notebooks

See AlignmentResearch/KataGoVisualizer.

Baseline attacks

In addition to the learned attacks, we also implement 5 baseline, hardcoded attacks:

Edge attack, which plays random vertices in the outermost available ring of the board
Random attack, which simply plays random legal moves
Pass attack, which always passes at every turn
Spiral attack, which deterministically plays the "largest" legal move in lexicographical order in polar coordinates (going counterclockwise starting from the outermost ring)
Mirror Go, which plays the opponent's last move reflected about the y = x diagonal, or the y = -x diagonal if they play on y = x. If the mirrored vertex is taken, then the policy plays the "closest" legal vertex by L1 distance.

You can test these attacks by running baseline_attacks.py with the appropriate --strategy flag (edge, random, pass, spiral, or mirror). Run python scripts/baseline_attacks.py --help for more information about all the available flags.

go_attack's People

Contributors

Stargazers

Watchers

Forkers

saturday-morning stjordanis jbx2060 knut0815 ziyuewang25 ruffy-369 standardgalactic

go_attack's Issues

Clean up `devbox` language in `scripts/generate_paper_evaluations.py`

Per #111 (comment)

KataGo misconfiguration invalidates the main result.

Your config file for
KataGo is not setting friendlyPassOk: false option, and therefore KataGo rules are not set to be Tromp-Taylor.
KataGo will perform "friendly early pass", which is what you report in your paper.

Tromp-Taylor configuration is prescribed here: https://github.com/lightvector/KataGo/blob/master/docs/GTP_Extensions.md

To summarize, your bot and you judging code is working with Tromp-Taylor rules while KataGo is not.

This misconfiguration is the root cause why your network is able to exploit KataGo.
I'm sorry that this invalidates the main result of your paper.

Notably there was a case where a human player exploited the rules in similar way.

Fix broken ci (invalid credentials)

See https://app.circleci.com/pipelines/github/HumanCompatibleAI/go_attack/755/workflows/a2effdee-b29a-49db-ae2a-9dd5e9aaad6f/jobs/2249 as an example.

Log:

Using SSH Config Dir '/home/circleci/.ssh'
git version 2.35.1
Cloning git repository
Cloning into '.'...
Warning: Permanently added the ECDSA host key for IP address '140.82.112.4' to the list of known hosts.
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

exit status 128

codecov is broken

See https://about.codecov.io/blog/message-regarding-the-pypi-package/

Finish migration to AlignmentResearch org

Move gogui to AlignmentResearch.
Move ELF and leela-zero forks to AlignmentResearch.

Docker failures do not cause update_images.py (or launch-jobs.sh) to error

We've observed Docker push fail in update_images.py with no visible error or exception -- additionally, launch-jobs.sh does not currently check that update_images.py to succeed so would continue regardless of if update_images.py failed.

KataGo-raw does not point to valid commit atm

I'm having trouble figuring out what commit 42892e for KataGo-raw is — navigating to https://github.com/HumanCompatibleAI/KataGo-custom/tree/42892ea19b4256803ba7a0ff18d1096a84d11fe6 gives me a 404

Originally posted by @tomtseng in #60 (comment)

Refactor configs to use @include directives

Now we have AlignmentResearch/KataGo-custom#10 we should simplify our configs

Can you reupload vit-victim-b16-s650m bin.gz file to the drive?

Hi, I have noticed that the model format of vit-victim-b16-s650m in your Google Drive is not a bin.gz file. I would like to use your ViT-adversary to attack our Transformer based models. However, to use your ViT-adversary, it need to take a victim model in bin.gz format, so I am wondering if you could upload your ictim model in bin.gz format or give the instructions of how to convert your pt file into right bin.gz format.
Thank you very much.
@AdamGleave @tomtseng @ed1d1a8d

Can authors build a release like katago.exe to help me learn it on Lizzie UI?

Can authors build a release like katago.exe to help me learn it on Lizzie UI?I have trouble in building the source code, because I don't have the docker.Looking forward to early reply.

`nvidia-smi`/`gpustat` does not work in C++ container

gpustat used to work in humancompatible/goattack:cpp. I confirmed this by checking out a commit from the end of August, building the image, and running gpustat.
I think gpustat broke for reasons not related to our changes. When I used the same commit and re-built with --no-cache, gpustat no longer worked.

When can build an specific engine to use in Lizzie UI?Or I needn't?

I just download the weights in the pull requests and use it by katago engine.I don't know whether it works, sometimes it doesn't work on katab40s256.I don't know whether it's right to use the model like this.

Create AlignmentResearch org on docker hub

And migrate our docker images to the new org.

What estimator did you use to score the result?

What estimator did you use to score the result? Did you verify the scoring result across different interfaces? I downloaded your .swf files and load them in sabaki and the scoring result is completely different. From the result of your article I believe that you used an outdated estimator where the dead stones have to be hand picked or if you wrote it yourselves then I believe you completely misunderstood how the Go scoring system works.

How to set up the adversarial training?

Hi, I just tried to have the cyclic-adv-s545 model play against the latest 28b model. However, it doesn't seem to work very well. And I would like to do some fine tuning on my own. I just saw some scripts under kubernetes folder, but I don't really know how to make it run locally, so is there any instructions on setting up the iterative adversarial training on a local machine? By the way are there some more recent models? Thanks! @AdamGleave @tomtseng @ed1d1a8d