Giter VIP home page Giter VIP logo

imitation's Introduction

Status: Archive (code is provided as-is, no updates expected)

Generative Adversarial Imitation Learning

Jonathan Ho and Stefano Ermon

Contains an implementation of Trust Region Policy Optimization (Schulman et al., 2015).

Dependencies:

  • OpenAI Gym >= 0.1.0, mujoco_py >= 0.4.0
  • numpy >= 1.10.4, scipy >= 0.17.0, theano >= 0.8.2
  • h5py, pytables, pandas, matplotlib

Provided files:

  • expert_policies/* are the expert policies, trained by TRPO (scripts/run_rl_mj.py) on the true costs
  • scripts/im_pipeline.py is the main training and evaluation pipeline. This script is responsible for sampling data from experts to generate training data, running the training code (scripts/imitate_mj.py), and evaluating the resulting policies.
  • pipelines/* are the experiment specifications provided to scripts/im_pipeline.py
  • results/* contain evaluation data for the learned policies

imitation's People

Contributors

christopherhesse avatar hojonathanho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imitation's Issues

why compute bernoulli entropy in this way?

as the code written [here](def logit_bernoulli_entropy(logits_B):
ent_B = (1.-tensor.nnet.sigmoid(logits_B))*logits_B - logsigmoid(logits_B)
return ent_B), bernoulli was computed by this way

def logit_bernoulli_entropy(logits_B):
ent_B = (1.-tensor.nnet.sigmoid(logits_B))*logits_B - logsigmoid(logits_B)
return ent_B

but it's different to the equation of binary entropy:
$-p\log p - (1-p)\log(1-p)$

is there any relationship between these two expressions? or why does openai compute bernoulli entropy that way? is there any theoretical equation support?

Running the code

For anyone struggling to run the code:

  1. Set floatX=float64 in your Theano Flags
  2. The code as it is right now only works on a cluster. If you want to run individual experiments instead of doing phase1_train, do something like:
for x, y in zip(cmd_templates, argdicts):
    print(x.format(**y))

and then pick the commands you want and run them individually. In step2_eval, write something like:

if not os.path.exists(checkptfile):
    nonexistent_checkptfiles.append(checkptfile)
else:
    evals_to_do.append((task, alg, num_trajs, run, checkptfile))

instead of the assertion that would prevent you from running only some of the experiments.
3) I had to disable multithreading. If you run into these issues you can set the number of threads to 1 where you find it.
4) Some things are set up only for the humanoid environment in mujoco. For instance, if you are using a different environment, in environments/rlgymenv.py, track_body_name should be set to None and there is no sim.env.viewer -- use other methods like sim.env.render() if you want to view things.

Documentation and Getting Started Tutorial

Hi,
Could you please provide a detailed documentation of the organization of the code and what all algorithms are available? Also, a "getting started" tutorial on running im_pipeline.py with standard/default command line arguments would be very helpful.
Thank you,
Anirban

Regarding MountainCar-v0 env

Dear authors,

Thanks for sharing this high-quality code. l create expert policies by running scripts/scripts/run_rl_mj.py on CartPole-v0 successfully, but when l run the MountainCar-v0 with the same script, the score remains -200. Do you make some extra configurations over MountainCar-v0 during your expriments?

Thanks!

Question about this package

Hi, I am confused about this file when I using it. Is there any information about the detail description of API?

'File' object has no attribute 'getNode'

I'm getting this error when executing "log.write_snapshot(policy,i)" in line 393 of policyopt.nn.py (while running the main method in imitate_mj.py). Anyone know how to deal with this issue?

Json example

Hi,
Great job! it would be helpful to see an example json.
Thanks,
Nir

problems with parameter setting

We try to run your project, but we are not sure about what 'dataset' in pipelines is exactly.
I set the parameters under the instructions in pipelines like this:
python imitate_mj.py
--mode ga
--env Humanoid-v1
--data log_humanoid_1.h5
--limit_trajs 160
--data_subsamp_freq 20
--favor_zero_expert_reward 0
--min_total_sa 50000
--max_iter 1501
--reward_include_time 0
--reward_lr .01
-log {out}

where is my fault?
Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.