Giter VIP home page Giter VIP logo

adversarial-policies's People

Contributors

adamgleave avatar decodyng avatar kantneel avatar madhuparna04 avatar michaelddennis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adversarial-policies's Issues

About the result in YouShallNotPass experiment

Thanks for your nice work!
I try to reproduce your results by running the "multi_train with paper". But I found the result is far less than your paper reported in YouShallNotPass experiments. Personally I think this algorithm is very randomness and it depends heavily on the random seed. I run 4 experiments with seed = [0, 1, 2, 3] and only one seed successfully surpass 0.6 winning rates after 20 million steps.
I think you could run more seeds to report your results.

Issue running adversarial_policies repository

it's an issue I encountered while trying to run the adversarial_policies repository. Specifically, I ran into an error after executing the following three commands:

docker pull humancompatibleai/adversarial_policies:latest
docker run -it --env MUJOCO_KEY=URL_TO_YOUR_MUJOCO_KEY humancompatibleai/adversarial_policies:latest /bin/bash
python -m aprl.train with env_name=multicomp/SumoHumans-v0 paper
After running the third command, I received the following error message:

/venv/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
ERROR: Invalid length of encrypted section

I am unsure of how to proceed and was wondering if you could offer any guidance or assistance in resolving this issue.

Thank you for your time and I look forward to hearing from you soon.

Fix SubprocVecEnv related hang

In some setups (notably not in the Docker container), if the parent process raises an exception and SubprocVecEnv has already been created, the process will hang indefinitely rather than exiting. All the children processes of SubprocVecEnv exit, but the semaphore tracker continues running (blocked on an FD read) and the parent process busy-waits. This seems to be related to the switch from fork to spawn in SubprocVecEnv's use of multiprocessing, needed to fetch some other threading errors.

This seems quite nasty to track down and is not a show-stopper so deprioritizing it for now, but this would be good to fix.

Evaluating commands

Hello: thanks for your interesting paper and code, i am really enjoying your work and i have small questions

Q1: Do you have any documents, explaining the main files, configurations and experiments (i.e all commands to run for experiment (training, evaluation, visualization)

Q2: i run few experience but i get errors while running Evaluating and Visualizing , for example i run:

python -m aprl.train with env_name=multicomp/SumoHumans-v0 paper. and the output is three files under data/baselines/20220215_124424-default
Could you please tell me the right commands for Evaluating and Visualizing for above exp and also the right path for victim

Evaluation (example): python -m aprl.score_agent with path_of_trained_adversary(above path) path_of_victim (it is not working)
thanks

Checkpointing support with ray Tune

It would be nice to make modelfree.hyperparams.train_rl a tune.Trainable rather than a function, adding checkpointing support. This would let us use the HyperBand and Population Based Training schedulers. Conceptually this is easy enough: we already supporting saving models via save_callbacks, and can restore using load_path. However, the interfaces don't quite line up: Ray expects _train to perform one small training step, with _save called in between. There's no good way to make Stable Baselines return part-way. We could call it repeatedly with small total_timesteps, but this would make the progress be wrong, breaking annealers.

docker install failure

Hello,

We're trying to reproduce the paper and when running the command $ docker build ., this error message showed up:

COPY failed: stat /var/lib/docker/tmp/docker-builder836225598/adversarial-policies/ci/build_venv.sh: no such file or directory

We tried it on different operating systems (Mac OS and Linux) and we all have this same error. Could you help us with this issue?

Policy Evaluation question

Both recommendations from the README,

experiments/modelfree/baselines.sh

and

experiments/modelfree/attack_transfer.sh data/aws-public/multi_train/paper/20190429_011349

https://hastebin.com/raw/nivikiwima
and
https://hastebin.com/raw/ojebamoqaq
respectively

Seem to be attempting to evaluate, but then end in ValueError: Unrecognized config type '['
They create folders in ~/adversarial-policies/data/aws/score_agents/adversary_masked_init but then dont fill them with anything.

Am I doing something wrong? I assumed Sacred is configured with the given scripts because I have not interacted with it. I have already imported aws-public

Really dig the outcome of the paper btw. Using pure RL and MDPs to adversarially attack a black-box victim is very cool. YSNP ZooVD1 vs AdvD1 is an interesting case where Adv is still twitching to do the adversarial attack, but cant rely on it solely anymore. Great animations. Found the paper through 2 Minute Papers on Youtube and decided to do a school project recreating the results. Got most of the way there but I cant seem to get this final step of picking and choosing policys to put against eachother.

Any help would be appreciated

Question about the victim

As mentioned in the paper, the victim is fixed during training. But I can not find where the victim's checkpoint is. Can you please help me to point it out?

Make Docker image smaller

Currently the Docker image is ~1.8 GB. This is a combination of: a large base image (with X and other tools), and ~700 MB of installed packages (350MB per virtual environment).

Most of the pip packages are the same between virtual environments, so installing them system-wide can save some space. TensorFlow in particular is a space hog. However, this is a fragile setup: I tried, and then pytest (which was installed system-wide) couldn't find packages inside the virtual environment. Definitely fixable, but I'm not sure the risk of ongoing breakage is worth the time savings.

Alternatively, downloading 1.8 GB shouldn't actually be that slow -- perhaps it just needs to be hosted somewhere faster?

sacred.utils.SacredError: The configuration is read-only in a captured function!

when i run , python -m modelfree.multi.train with paper,getting this error
ERROR - multi_train - Failed after 0:00:00!
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/duo/Lib/adversarial-policies/src/modelfree/multi/train.py", line 50, in multi_train
return run(base_config=train)
File "/home/duo/Lib/adversarial-policies/src/modelfree/multi/common.py", line 156, in run
spec['run'] = trainable_name
sacred.utils.SacredError: The configuration is read-only in a captured function!

my experience of building this repo locally on Ubuntu 16.04

Hello there, thanks for doing this research project and open source your result. I found it super interesting and decided to try it on my own. Since you guys didn't talk much about building this repo locally, I thought I'd just share my experience here.

I checked the dependencies in the Dockerfile and sudo apt-get from line 14 to 34. Namely:

sudo apt-get install build-essential \
    curl \
    ffmpeg \
    git \
    libgl1-mesa-dev \
    libgl1-mesa-glx \
    libglew-dev \
    libosmesa6-dev \
    net-tools \
    parallel \
    python3.7 \
    python3.7-dev \
    python3-pip \
    rsync \
    software-properties-common \
    unzip \
    vim \
    virtualenv \

And, it turns out that the package manager cannot find python3.7 related packages (python3.7, python3.7-dev, python3-pip) so I installed them based on this link. Namely,

sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.7
sudo apt install python3.7-dev
sudo apt install python3-pip

We'll also need cffi. Install it by pip3 install cffi.
The next step is to install MUJOCO. I installed it based on this link.

  1. get a valid license through this link
  2. Download MUJOCO pro from here. You'll need mjpro131 and mujoco200 .
  3. Unzip mjpro131 and mujoco200 by:
unzip mjpro131_linux.zip -d ~/.mujoco
unzip mujoco200_linux.zip -d ~/.mujoco
  1. move the mjkey.txt file that should be sent to your email to the hidden folder ~/.mujoco by mv DOWNLOAD_LOCATION/mjkey.txt ~/.mujoco
  2. put the following lines into your bashrc file.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/(username)/.mujoco/mjpro131/bin

I suspect that I'll need to change the environment variable to point to .../mojoco200/bin when training with other gym environment but the above works for replicating one of the experiments in the paper ( python -m aprl.train with env_name=multicomp/SumoHumans-v0 paper )

  1. execute step 9 and 10 in the tutorial above.

That's it for installing MUJOCO. The next step is just training the agent.

  1. As indicated in the README file of this repo, after cloning this repo, run ci/build_venv.sh , activate it by . ./venv/bin/activate , run pip install -e .

  2. As indicated in the README file of this repo, train with python -m aprl.train with env_name=multicomp/SumoHumans-v0 paper

Docker build error

I'm trying to reproduce the result but got stuck at the first step. Seems like some dependencies of the stable-baseline3 is wrong? To reproduce this error, just clone this repo and run docker build .
image

How to modify the win condition?

Thanks for your nice work!
I try to reproduce this work by writing it myself, but I got some questions on the win condition of sumo humans.
I noticed the win condition in the paper is modified, A player wins by remaining standing after their opponent has fallen.
So,
How can I modify the win condition? Can you Can you please give more detailed instructions?
When should I modify the win condition? Does ZooON v.s. ZooON use Bansal's win conditon or use the modified version? And same as Rand/Zero v.s. ZooON.

Which version of gym-compete should I use?

Hello Gleave,

We are trying to reproduce the paper but meeting some problem when evaluating, the results are not same with paper. When using baselines vs random policy in Sumo humans, it usually tie, not win.

I notice the winning condition is modified, this may means my gym-compete version is not true.

I try to git clone this version git clone https://github.com/HumanCompatibleAI/multiagent-competition.git@3a3f9dc but meet this error fatal: unable to access 'https://github.com/HumanCompatibleAI/multiagent-competition.git@3a3f9dc/': The requested URL returned error: 400

This url cannot be accessed. How can I get the right gym-compete?

Thanks a lot!

Make Ray Tune work with Autoscaler

We're experiencing ray-project/ray#5189 with Ray 0.7.2. Currently working around by downgrading to 0.7.

It seems to be caused by total_available_capacity being absent from heartbeats when no available resources.

Tasks:

  • Check if bug persists in latest release, 0.7.6.
  • Come up with small reproducible example.
  • See if it happens outside of Ray Tune, e.g. standard ray.remote when resources are saturated.

Handle Preemption Gracefully

Have Ray Tune retry jobs that fail due to preemption.

Minimal viable product: have Ray retry from scratch. This will require changing https://github.com/ray-project/ray/blob/master/python/ray/tune/trial.py#L313 to remove the checkpoint_freq > 0 check.

For score, this is all we want to do, since those tasks should not be long running anyway and checkpointing would be painful.

For train, there is actually a benefit from checkpointing, and in fact we already do store checkpoints: Ray just doesn't know about them. Making Stable Baselines and Ray API play nicely together may be tricky. Provided we don't do annealing though I think can just call learn() repeatedly with small timesteps.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.