humancompatibleai / adversarial-policies Goto Github PK

View Code? Open in Web Editor NEW

272.0 272.0 47.0 5.46 MB

Find best-response to a fixed policy in multi-agent RL

License: MIT License

Python 94.51% Shell 4.80% Dockerfile 0.70%

adversarial-policies's People

Contributors

Stargazers

Watchers

adversarial-policies's Issues

About the result in YouShallNotPass experiment

Thanks for your nice work!
I try to reproduce your results by running the "multi_train with paper". But I found the result is far less than your paper reported in YouShallNotPass experiments. Personally I think this algorithm is very randomness and it depends heavily on the random seed. I run 4 experiments with seed = [0, 1, 2, 3] and only one seed successfully surpass 0.6 winning rates after 20 million steps.
I think you could run more seeds to report your results.

Issue running adversarial_policies repository

it's an issue I encountered while trying to run the adversarial_policies repository. Specifically, I ran into an error after executing the following three commands:

docker pull humancompatibleai/adversarial_policies:latest
docker run -it --env MUJOCO_KEY=URL_TO_YOUR_MUJOCO_KEY humancompatibleai/adversarial_policies:latest /bin/bash
python -m aprl.train with env_name=multicomp/SumoHumans-v0 paper
After running the third command, I received the following error message:

/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
ERROR: Invalid length of encrypted section

I am unsure of how to proceed and was wondering if you could offer any guidance or assistance in resolving this issue.

Thank you for your time and I look forward to hearing from you soon.

Docker build base image no more supported

The base docker image is no more available in Docker hub. Can you please update this repo with all dependencies fixed?

Fix SubprocVecEnv related hang

In some setups (notably not in the Docker container), if the parent process raises an exception and SubprocVecEnv has already been created, the process will hang indefinitely rather than exiting. All the children processes of SubprocVecEnv exit, but the semaphore tracker continues running (blocked on an FD read) and the parent process busy-waits. This seems to be related to the switch from fork to spawn in SubprocVecEnv's use of multiprocessing, needed to fetch some other threading errors.

This seems quite nasty to track down and is not a show-stopper so deprioritizing it for now, but this would be good to fix.

Policy serializing

Use new format for VecNormalize: hill-a/stable-baselines#525
Switch to context manager to ensure policies are closed?
Consider switching to BasePolicy rather than BaseRLModel?

Error encountered when building virtual enviornment when using pre-built enviornment

To reproduce:

docker run -it --env MUJOCO_KEY=URL_TO_YOUR_MUJOCO_KEY \ humancompatibleai/adversarial_policies:latest /bin/bash # change tag if built locally
ci/build_venv.sh
The following error occurs: seems like something wrong with the dependencies of gym[mujoco]

Evaluating commands

Hello: thanks for your interesting paper and code, i am really enjoying your work and i have small questions

Q1: Do you have any documents, explaining the main files, configurations and experiments (i.e all commands to run for experiment (training, evaluation, visualization)

Q2: i run few experience but i get errors while running Evaluating and Visualizing , for example i run:

python -m aprl.train with env_name=multicomp/SumoHumans-v0 paper. and the output is three files under data/baselines/20220215_124424-default
Could you please tell me the right commands for Evaluating and Visualizing for above exp and also the right path for victim

Evaluation (example): python -m aprl.score_agent with path_of_trained_adversary(above path) path_of_victim (it is not working)
thanks

Checkpointing support with ray Tune

It would be nice to make modelfree.hyperparams.train_rl a tune.Trainable rather than a function, adding checkpointing support. This would let us use the HyperBand and Population Based Training schedulers. Conceptually this is easy enough: we already supporting saving models via save_callbacks, and can restore using load_path. However, the interfaces don't quite line up: Ray expects _train to perform one small training step, with _save called in between. There's no good way to make Stable Baselines return part-way. We could call it repeatedly with small total_timesteps, but this would make the progress be wrong, breaking annealers.

docker install failure

Hello,

We're trying to reproduce the paper and when running the command $ docker build ., this error message showed up:

COPY failed: stat /var/lib/docker/tmp/docker-builder836225598/adversarial-policies/ci/build_venv.sh: no such file or directory

We tried it on different operating systems (Mac OS and Linux) and we all have this same error. Could you help us with this issue?

Policy Evaluation question

Both recommendations from the README,

experiments/modelfree/baselines.sh

and

experiments/modelfree/attack_transfer.sh data/aws-public/multi_train/paper/20190429_011349

https://hastebin.com/raw/nivikiwima
and
https://hastebin.com/raw/ojebamoqaq
respectively

Seem to be attempting to evaluate, but then end in ValueError: Unrecognized config type '['
They create folders in ~/adversarial-policies/data/aws/score_agents/adversary_masked_init but then dont fill them with anything.

Am I doing something wrong? I assumed Sacred is configured with the given scripts because I have not interacted with it. I have already imported aws-public

Really dig the outcome of the paper btw. Using pure RL and MDPs to adversarially attack a black-box victim is very cool. YSNP ZooVD1 vs AdvD1 is an interesting case where Adv is still twitching to do the adversarial attack, but cant rely on it solely anymore. Great animations. Found the paper through 2 Minute Papers on Youtube and decided to do a school project recreating the results. Got most of the way there but I cant seem to get this final step of picking and choosing policys to put against eachother.

Any help would be appreciated

Question about the victim

As mentioned in the paper, the victim is fixed during training. But I can not find where the victim's checkpoint is. Can you please help me to point it out?

Make Docker image smaller

Currently the Docker image is ~1.8 GB. This is a combination of: a large base image (with X and other tools), and ~700 MB of installed packages (350MB per virtual environment).

Most of the pip packages are the same between virtual environments, so installing them system-wide can save some space. TensorFlow in particular is a space hog. However, this is a fragile setup: I tried, and then pytest (which was installed system-wide) couldn't find packages inside the virtual environment. Definitely fixable, but I'm not sure the risk of ongoing breakage is worth the time savings.

Alternatively, downloading 1.8 GB shouldn't actually be that slow -- perhaps it just needs to be hosted somewhere faster?

sacred.utils.SacredError: The configuration is read-only in a captured function!

when i run , python -m modelfree.multi.train with paper,getting this error
ERROR - multi_train - Failed after 0:00:00!
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/duo/Lib/adversarial-policies/src/modelfree/multi/train.py", line 50, in multi_train
return run(base_config=train)
File "/home/duo/Lib/adversarial-policies/src/modelfree/multi/common.py", line 156, in run
spec['run'] = trainable_name
sacred.utils.SacredError: The configuration is read-only in a captured function!

my experience of building this repo locally on Ubuntu 16.04

Hello there, thanks for doing this research project and open source your result. I found it super interesting and decided to try it on my own. Since you guys didn't talk much about building this repo locally, I thought I'd just share my experience here.

I checked the dependencies in the Dockerfile and sudo apt-get from line 14 to 34. Namely:

sudo apt-get install build-essential \
    curl \
    ffmpeg \
    git \
    libgl1-mesa-dev \
    libgl1-mesa-glx \
    libglew-dev \
    libosmesa6-dev \
    net-tools \
    parallel \
    python3.7 \
    python3.7-dev \
    python3-pip \
    rsync \
    software-properties-common \
    unzip \
    vim \
    virtualenv \

And, it turns out that the package manager cannot find python3.7 related packages (python3.7, python3.7-dev, python3-pip) so I installed them based on this link. Namely,

sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.7
sudo apt install python3.7-dev
sudo apt install python3-pip

We'll also need cffi. Install it by pip3 install cffi.
The next step is to install MUJOCO. I installed it based on this link.

get a valid license through this link
Download MUJOCO pro from here. You'll need mjpro131 and mujoco200 .
Unzip mjpro131 and mujoco200 by:

unzip mjpro131_linux.zip -d ~/.mujoco
unzip mujoco200_linux.zip -d ~/.mujoco

move the mjkey.txt file that should be sent to your email to the hidden folder ~/.mujoco by mv DOWNLOAD_LOCATION/mjkey.txt ~/.mujoco
put the following lines into your bashrc file.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/(username)/.mujoco/mjpro131/bin

I suspect that I'll need to change the environment variable to point to .../mojoco200/bin when training with other gym environment but the above works for replicating one of the experiments in the paper ( python -m aprl.train with env_name=multicomp/SumoHumans-v0 paper )

execute step 9 and 10 in the tutorial above.

That's it for installing MUJOCO. The next step is just training the agent.

As indicated in the README file of this repo, after cloning this repo, run ci/build_venv.sh , activate it by . ./venv/bin/activate , run pip install -e .
As indicated in the README file of this repo, train with python -m aprl.train with env_name=multicomp/SumoHumans-v0 paper

Docker build error

I'm trying to reproduce the result but got stuck at the first step. Seems like some dependencies of the stable-baseline3 is wrong? To reproduce this error, just clone this repo and run docker build .

How to modify the win condition?

Thanks for your nice work!
I try to reproduce this work by writing it myself, but I got some questions on the win condition of sumo humans.
I noticed the win condition in the paper is modified, A player wins by remaining standing after their opponent has fallen.
So,
How can I modify the win condition? Can you Can you please give more detailed instructions?
When should I modify the win condition? Does ZooON v.s. ZooON use Bansal's win conditon or use the modified version? And same as Rand/Zero v.s. ZooON.

Which version of gym-compete should I use?

Hello Gleave,

We are trying to reproduce the paper but meeting some problem when evaluating, the results are not same with paper. When using baselines vs random policy in Sumo humans, it usually tie, not win.

I notice the winning condition is modified, this may means my gym-compete version is not true.

I try to git clone this version git clone https://github.com/HumanCompatibleAI/multiagent-competition.git@3a3f9dc but meet this error fatal: unable to access 'https://github.com/HumanCompatibleAI/multiagent-competition.git@3a3f9dc/': The requested URL returned error: 400

This url cannot be accessed. How can I get the right gym-compete?

Thanks a lot!

Make Ray Tune work with Autoscaler

We're experiencing ray-project/ray#5189 with Ray 0.7.2. Currently working around by downgrading to 0.7.

It seems to be caused by total_available_capacity being absent from heartbeats when no available resources.

Tasks:

Check if bug persists in latest release, 0.7.6.
Come up with small reproducible example.
See if it happens outside of Ray Tune, e.g. standard ray.remote when resources are saturated.

Handle Preemption Gracefully

Have Ray Tune retry jobs that fail due to preemption.

Minimal viable product: have Ray retry from scratch. This will require changing https://github.com/ray-project/ray/blob/master/python/ray/tune/trial.py#L313 to remove the checkpoint_freq > 0 check.

For score, this is all we want to do, since those tasks should not be long running anyway and checkpointing would be painful.

For train, there is actually a benefit from checkpointing, and in fact we already do store checkpoints: Ray just doesn't know about them. Making Stable Baselines and Ray API play nicely together may be tricky. Provided we don't do annealing though I think can just call learn() repeatedly with small timesteps.

# TODO(adam): delete this once Sacred issue #498 is resolved

IDSIA/sacred#498 was resolved.

humancompatibleai / adversarial-policies Goto Github PK

adversarial-policies's People

Contributors

Stargazers

Watchers

Forkers

adversarial-policies's Issues

Recommend Projects

Recommend Topics

Recommend Org