Dual Policy Distillation
This is the repository for experiment code of dual policy distillation (IJCAI'20). Please see our paper for more details.
Installation
-
Please follow the instructions to install the mujoco and mujoco-py
-
Make sure that you have Python 3.5+ and pip installed:
git clone https://github.com/datamllab/dual-policy-distillation.git
pip install -r requirements.txt
pip install -e .
Example
- DPD_DDPG
python baselines/dpd_ddpg/main.py --env-id HalfCheetah-v2 --num-timesteps 5000000 --nb-epochs 2500 --dis-batch-size 64 --actor-dis-lr 1e-4 --exp-scale 0.75
- DPD_PPO
python baselines/dpd_ppo/main.py --env HalfCheetah-v2 --num-timesteps 20000000 --exp-scale 0.5
Hyperparameters
- DPD_DDPG
usage: main.py [-h] [--env-id ENV_ID] [--render-eval] [--no-render-eval]
[--layer-norm] [--no-layer-norm] [--render] [--no-render]
[--normalize-returns] [--no-normalize-returns]
[--normalize-observations] [--no-normalize-observations]
[--seed SEED] [--critic-l2-reg CRITIC_L2_REG]
[--batch-size BATCH_SIZE] [--dis-batch-size DIS_BATCH_SIZE]
[--actor-lr ACTOR_LR] [--actor-dis-lr ACTOR_DIS_LR]
[--critic-lr CRITIC_LR] [--exp-scale EXP_SCALE] [--popart]
[--no-popart] [--gamma GAMMA] [--reward-scale REWARD_SCALE]
[--clip-norm CLIP_NORM] [--nb-epochs NB_EPOCHS]
[--nb-epoch-cycles NB_EPOCH_CYCLES]
[--nb-train-steps NB_TRAIN_STEPS]
[--nb-dis-train-steps NB_DIS_TRAIN_STEPS]
[--nb-eval-steps NB_EVAL_STEPS]
[--nb-rollout-steps NB_ROLLOUT_STEPS] [--noise-type NOISE_TYPE]
[--num-timesteps NUM_TIMESTEPS] [--evaluation]
[--no-evaluation] [--log_dir LOG_DIR]
- DPD_PPO
usage: main.py [-h] [--env-id ENV_ID] [--seed SEED]
[--num-timesteps NUM_TIMESTEPS] [--play] [--log-dir LOG_DIR]
[--exp-scale EXP_SCALE]
Citation
@article{lai2020dual,
title={Dual Policy Distillation},
author={Lai, Kwei-Herng and Zha, Daochen and Li, Yuening and Hu, Xia},
journal={arXiv preprint arXiv:2006.04061},
year={2020}
}