PyTorch implementation of Dense connections based Off-policy adversarial Imitation Learning (DOIL).
In DOIL, we use the TD3 algorithm to train the imitation policy. In addition, dense connections are integrated into the actor network and the critic network of DOIL. Both the TD3 algorithm and dense connections are beneficial for improving the sample efficiency of GAIL.
Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.4 and Python 3.7.
We use the official TD3 code of D2RL to train the agent. And then, the trained agent is used to generate expert trajectories. The expert data of Ant-v2, BipedalWalker-v3, HalfCheetah-v2, Hopper-v2, Reacher-v2 and Walker2d-v2 is available at this Google drive site.
The ablation experiments can be reproduced by running:
./run_ablation.sh
The main experiments for DOIL can be reproduced by running:
./run_experiments.sh
We can also run experiments when reward types vary and using only states transitions by changing the arguments reward_type and states_only, respectively.
If the argument wdail is set ture, then WGAN is used to train the discriminator, just try it!