Giter VIP home page Giter VIP logo

agent's Introduction

Agent

(TensorBoard Travis build status)

A Tensorboard plugin to explore reinforcement learning models at the timestep level. A project by Andrew Schreiber and Fabian Steuer.

Saliency heatmap demo

Observations

The goal of the Atari game Enduro is to pass other cars without colliding. We've trained two models, one trained on 3000 episodes and the other trained on 10 episodes, which will be visualized using Agent.

The perturbation saliency heatmap below is generated by a process of measuring where blurs of the Atari frame produces a large change in what the model estimates the expected reward of the frame to be. Where you see blue overlay is where the model is 'paying attention'. See this paper for more details.

What do you notice from these 20 frames? (Advanced tip: download the gifs and step through each frame)

3,000 episodes of training

Expert

10 episodes of training

Noob

One observation is that that the well-trained model adjusts itself substantially on the cars, especially when the agent's car coming close to passing another car. Meanwhile the untrainted model doesn't place much attention on the cars specifically, rather it's attention meanders randomly across the screen.

Why is this interesting? Perhaps it had turned out the well-trained model was barely paying attention to the cars at all. That would mean the 'expert' had learned some trick undiscernable to humans in it's environment, which may not generalize or be otherwise problematic from a safety perspective. A loss or averaged rewards graph would not permit you this insight; your metrics would simply tell you the model had learned well.

Live example

Updated Nov 26, 2018

http://li592-70.members.linode.com:6006/#agent

Purpose / Musings

It's surprisingly difficult to understand why a reinforcement or inverse reinforcement learning agent makes a decision today.

At distill.pub we have seen impressive techniques and tooling emerge for interpreting supervised learning beyond summary statistics. Why do we find a void of usable, open-source interpretability techniques for reinforcement learning? Victoria Kraknova made a well-reasoned call for more research in deep RL interpretability for AI Safety at a NIPS workshop a year ago. It seems there is much to be explored about why a RL agent choses actions moment-by-moment and that such work would be valuable for debugging and understanding, yet the subfield has published little since 2017. What is causing the paralysis?

We observe a primary bottleneck is misfitted tooling. From experience, the current process to extract and save the relevant network activations and episode frames is laborious and complex. Even if you succeed, the technique(s) you build tend to be tightly-coupled to your project (see this group who made a compelling deep RL intepretability tool, but to use it you have to be running their version of Lua and Windows 10).

We find the above state of affairs frustrating for a subfield of technical AI Safety potentially ripe with low-hanging fruit. We believe RL and IRL research would be safer if the field had a well-documented platform for intepreting agents using standard, popular tools (Unix, Python, Tensorflow, Tensorboard).

The purpose of Agent is to accelerate progress in deep RL/IRL intepretability. We are very interested in perspectives from people in the intepretability, deep RL/IRL, and AI Safety communities. Please share your feedback through GitHub issues.


Goals

Agent v0 targets Dec 1st with two deep learning interpretability techniques, t-SNE and saliency heatmaps, which we hope will prove immediately useful. v0 will include an API you can integrate into your new or existing RL model training code.

Agent v1 scope is still under development. For researchers with fresh insight into RL intepretability, Agent v1 aims to support custom visualizations with the aim to reduce the overhead in developing new techniques by an order of magnitude. Furthermore we aim for documentation and examples to make it straightforward to get started. Test coverage and a basic style guide for maintainability.

Agent was built in Python within Tensorboard due to the visualization suite's robustness and popularity among researchers. We hope someday Agent could be merged into Tensorboard itself like the Beholder plugin.

Setup (Work in progress)

Note: Agent is currently built for demonstration purposes.

Packages required (recommended version):

Python virtual environment (v3.6)

Bazel build tool from Google. Install guide in link. (v0.21.0)

Tensorflow (v1.13.1)

Then:

git clone https://github.com/andrewschreiber/agent.git
cd agent

# Install API layer in your Python virtual environment
pip install .

#Build takes ~7m on a 2015 Macbook
bazel build tensorboard:tensorboard

#Use the custom tensorboard build by running
./bazel-bin/tensorboard/tensorboard --logdir tb/logdirectory/logs

Tensorboard

To visualize training, use the following command to setup Baselines to send tensorboard log files.

export OPENAI_LOG_FORMAT='stdout,log,csv,tensorboard' OPENAI_LOGDIR=logs

Return to the original terminal tab, at the root of rlmonitor, and run your training:

python -m baselines.run --alg=deepq --env=CartPole-v0 --save_path=./cartpole_model.pkl --num_timesteps=1e5

Go to the linked URL in the tensorboard tab to see your model train.

Run Cartpole with DQN

cd examples/baselines

Follow instuctions from https://github.com/andrewschreiber/baselines to install Gym. Then:

Train a model:

python -m baselines.run --alg=deepq --env=CartPole-v0 --save_path=./cartpole_model.pkl --num_timesteps=1e5

agent's People

Contributors

andrewharp avatar andrewschreiber avatar benoitsteiner avatar caisq avatar chihuahua avatar dongjoon-hyun avatar dsmilkov avatar erzel avatar francoisluus avatar gunan avatar jameswex avatar jart avatar keveman avatar martinwicke avatar nfelt avatar nsthorat avatar qiuminxu avatar renatoutsch avatar rmlarsen avatar rnabel avatar rohan100jain avatar sam-mccall avatar stephanwlee avatar tayo avatar teamdandelion avatar tensorflower-gardener avatar trisolaran avatar wchargin avatar yifeif avatar zheng-xq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

agent's Issues

Images

Ticket for hosting images for readme.

expert
noob

Readme contains git conflicts

<<<<<<< HEAD
python -m baselines.run --alg=deepq --env=CartPole-v0 --load_path=./cartpole_model.pkl --num_timesteps=0 --play
=======

and

> > > > > > > 5fc3c8cea4b5f79c738345686a218f089b58ddba

No module named tensorflow

When installing on Ubuntu 18.04 LTS, I get this issue:

root@localhost:~/agent# ./bazel-bin/tensorboard/tensorboard --logdir logs
Traceback (most recent call last):
  File "/root/agent/bazel-bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/main.py", line 39, in <module>
    from tensorboard import default
  File "/root/agent/bazel-bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/default.py", line 34, in <module>
    import tensorflow as tf
ImportError: No module named tensorflow

Pip freeze:

root@localhost:~/agent# pip3 freeze
absl-py==0.6.1
agent==0.0.1
asn1crypto==0.24.0
astor==0.7.1
attrs==17.4.0
Automat==0.6.0
certifi==2018.1.18
chardet==3.0.4
click==6.7
colorama==0.3.7
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
cryptography==2.1.4
distro-info==0.18
futures==3.1.1
gast==0.2.0
grpcio==1.16.1
h5py==2.8.0
httplib2==0.9.2
hyperlink==17.3.1
idna==2.6
incremental==16.10.1
iotop==0.6
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
keyring==10.6.0
keyrings.alt==3.0
language-selector==0.1
Markdown==3.0.1
numpy==1.15.4
PAM==0.4.2
protobuf==3.6.1
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycrypto==2.6.1
pygobject==3.26.1
pyOpenSSL==17.5.0
pyserial==3.4
python-apt==1.6.3
python-debian==0.1.32
pyxdg==0.25
PyYAML==3.12
requests==2.18.4
requests-unixsocket==0.1.5
SecretStorage==2.3.1
service-identity==16.0.0
six==1.11.0
ssh-import-id==5.7
systemd-python==234
tensorboard==1.12.0
tensorflow==1.12.0
termcolor==1.1.0
Twisted==17.9.0
ufw==0.35
unattended-upgrades==0.1
urllib3==1.22
Werkzeug==0.14.1
zope.interface==4.3.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.