openai / imitation Goto Github PK

View Code? Open in Web Editor NEW

667.0 180.0 191.0 30.45 MB

Code for the paper "Generative Adversarial Imitation Learning"

Home Page: https://arxiv.org/abs/1606.03476

License: MIT License

Python 100.00%

paper

imitation's Introduction

Status: Archive (code is provided as-is, no updates expected)

Generative Adversarial Imitation Learning

Jonathan Ho and Stefano Ermon

Contains an implementation of Trust Region Policy Optimization (Schulman et al., 2015).

Dependencies:

OpenAI Gym >= 0.1.0, mujoco_py >= 0.4.0
numpy >= 1.10.4, scipy >= 0.17.0, theano >= 0.8.2
h5py, pytables, pandas, matplotlib

Provided files:

expert_policies/* are the expert policies, trained by TRPO (scripts/run_rl_mj.py) on the true costs
scripts/im_pipeline.py is the main training and evaluation pipeline. This script is responsible for sampling data from experts to generate training data, running the training code (scripts/imitate_mj.py), and evaluating the resulting policies.
pipelines/* are the experiment specifications provided to scripts/im_pipeline.py
results/* contain evaluation data for the learned policies

imitation's People

Contributors

Stargazers

Watchers

Forkers

amoliu medusagit 1769948908 treeniap caomw ml-ai-nlp-ir seth-park benjamesbabala maniacs-ops vyraun atgambardella josephdhf furyphoenix arjunchandra wesleyjtann wnstlr caifazhou leezqcst codeaudit daguaa dshaun johnsonc mehramoh mouatez yenchih algoskynet projectafey xflee wuzhongdehua zhangyuancv lixuejian unknown-yuser gojira chpyang0229 rohjunha danieltakeshi santara williamd4112 maphysart sherjilozair emigmo rs19hack akshayjh daviddao hgajiayou jg-jayaganesh kfriesth tensorflowmy macura0 khudkhud leechikara vpomponiu mqiao2007 robinzzb kairobo earl-w zuxfoucault fuxicv midasc c1a1o1 anishsingh20 scharoun homerquan jaedukseo picopoco ryanjulian chris-chris lovetocommit meelement kun1989 quantumgame andyshenas layne-wang doctorkwj kumarkrishna banila1007 esmaeilinia rylz reyadrahman ykankaya guanyashi ourobouros wsjeon nosyndicate nke001 fredblain afcarl fdsmlhn chenxingqiang junchenjin jaykimbravekjh landoufulxf ytongbai alexliuyuren wh-forker raeony thesharmanitish bbrito christopherhesse intuitionmachine

imitation's Issues

Cannot run im_pipeline.py

what are the argument required by this script? Any example?

why compute bernoulli entropy in this way?

as the code written [here](def logit_bernoulli_entropy(logits_B):
ent_B = (1.-tensor.nnet.sigmoid(logits_B))*logits_B - logsigmoid(logits_B)
return ent_B), bernoulli was computed by this way

imitation/policyopt/thutil.py

Lines 48 to 51 in 99fbccf

 def logit_bernoulli_entropy(logits_B): 

 ent_B = (1.-tensor.nnet.sigmoid(logits_B))*logits_B - logsigmoid(logits_B) 

 return ent_B

but it's different to the equation of binary entropy:
$-p\log p - (1-p)\log(1-p)$

is there any relationship between these two expressions? or why does openai compute bernoulli entropy that way? is there any theoretical equation support?

Running the code

For anyone struggling to run the code:

Set floatX=float64 in your Theano Flags
The code as it is right now only works on a cluster. If you want to run individual experiments instead of doing phase1_train, do something like:

for x, y in zip(cmd_templates, argdicts):
    print(x.format(**y))

and then pick the commands you want and run them individually. In step2_eval, write something like:

if not os.path.exists(checkptfile):
    nonexistent_checkptfiles.append(checkptfile)
else:
    evals_to_do.append((task, alg, num_trajs, run, checkptfile))

instead of the assertion that would prevent you from running only some of the experiments.
3) I had to disable multithreading. If you run into these issues you can set the number of threads to 1 where you find it.
4) Some things are set up only for the humanoid environment in mujoco. For instance, if you are using a different environment, in environments/rlgymenv.py, track_body_name should be set to None and there is no sim.env.viewer -- use other methods like sim.env.render() if you want to view things.

Step order difference between code and algorithm in paper

Hello @hojonathanho,

does the code here swapped the order of step 4. and step 5. of the algorithm presented in the paper?

Documentation and Getting Started Tutorial

Hi,
Could you please provide a detailed documentation of the organization of the code and what all algorithms are available? Also, a "getting started" tutorial on running im_pipeline.py with standard/default command line arguments would be very helpful.
Thank you,
Anirban

Regarding MountainCar-v0 env

Dear authors,

Thanks for sharing this high-quality code. l create expert policies by running scripts/scripts/run_rl_mj.py on CartPole-v0 successfully, but when l run the MountainCar-v0 with the same script, the score remains -200. Do you make some extra configurations over MountainCar-v0 during your expriments?

Thanks!

Question about this package

Hi, I am confused about this file when I using it. Is there any information about the detail description of API?

'File' object has no attribute 'getNode'

I'm getting this error when executing "log.write_snapshot(policy,i)" in line 393 of policyopt.nn.py (while running the main method in imitate_mj.py). Anyone know how to deal with this issue?

Json example

Hi,
Great job! it would be helpful to see an example json.
Thanks,
Nir

problems with parameter setting

We try to run your project, but we are not sure about what 'dataset' in pipelines is exactly.
I set the parameters under the instructions in pipelines like this:
python imitate_mj.py
--mode ga
--env Humanoid-v1
--data log_humanoid_1.h5
--limit_trajs 160
--data_subsamp_freq 20
--favor_zero_expert_reward 0
--min_total_sa 50000
--max_iter 1501
--reward_include_time 0
--reward_lr .01
-log {out}

where is my fault?
Thank you very much.

	def logit_bernoulli_entropy(logits_B):
	ent_B = (1.-tensor.nnet.sigmoid(logits_B))*logits_B - logsigmoid(logits_B)
	return ent_B