Giter VIP home page Giter VIP logo

ibc's Introduction

Implicit Behavioral Cloning

This codebase contains the official implementation of the Implicit Behavioral Cloning (IBC) algorithm from our paper:

Implicit Behavioral Cloning (website link) (arXiv link)
Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson
Conference on Robot Learning (CoRL) 2021

Abstract

We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.

Prerequisites

The code for this project uses python 3.7+ and the following pip packages:

python3 -m pip install --upgrade pip
pip install \
  absl-py==0.12.0 \
  gin-config==0.4.0 \
  matplotlib==3.4.3 \
  mediapy==1.0.3 \
  opencv-python==4.5.3.56 \
  pybullet==3.1.6 \
  scipy==1.7.1 \
  tensorflow==2.6.0 \
  keras==2.6.0 \
  tf-agents==0.11.0rc0 \
  tqdm==4.62.2

(Optional): For Mujoco support, see docs/mujoco_setup.md. Recommended to skip it unless you specifically want to run the Adroit and Kitchen environments.

Quickstart: from 0 to a trained IBC policy in 10 minutes.

Step 1: Install listed Python packages above in Prerequisites.

Step 2: Run unit tests (should take less than a minute), and do this from the directory just above the top-level ibc directory:

./ibc/run_tests.sh

Step 3: Check that Tensorflow has GPU access:

python3 -c "import tensorflow as tf; print(tf.test.is_gpu_available())"

If the above prints False, see the following requirements, notably CUDA 11.2 and cuDNN 8.1.0: https://www.tensorflow.org/install/gpu#software_requirements.

Step 4: Let's do an example Block Pushing task, so first let's download oracle data (or see Tasks for how to generate it):

cd ibc/data
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip
unzip block_push_states_location.zip && rm block_push_states_location.zip
cd ../..

Step 5: Set PYTHONPATH to include the directory just above top-level ibc, so if you've been following the commands above it is:

export PYTHONPATH=$PYTHONPATH:${PWD}

Step 6: On that example Block Pushing task, we'll next do a training + evaluation with Implicit BC:

./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh

Some notes:

  • On an example single-GPU machine (GTX 2080 Ti), the above trains at about 18 steps/sec, and should get to high success rates in 5,000 or 10,000 steps (roughly 5-10 minutes of training).
  • The mlp_ebm.gin is just one config, which is meant to be reasonably fast to train, with only 20 evals at each interval, and is not suitable for all tasks. See Tasks for more configs.
  • Due to the --video flag above, you can watch a video of the learned policy in action at: /tmp/ibc_logs/mlp_ebm/ibc_dfo/... navigate to the videos/ttl=7d subfolder, and by default there should be one example .mp4 video saved every time you do an evaluation interval.

(Optional) Step 7: For the pybullet-based tasks, we also have real-time interactive visualization set up through a visualization server, so in one terminal:

cd <path_to>/ibc/..
export PYTHONPATH=$PYTHONPATH:${PWD}
python3 -m pybullet_utils.runServer

And in a different terminal run the oracle a few times with the --shared_memory flag:

cd <path_to>/ibc/..
export PYTHONPATH=$PYTHONPATH:${PWD}
python3 ibc/data/policy_eval.py -- \
  --alsologtostderr \
  --shared_memory \
  --num_episodes=3 \
  --policy=oracle_push \
  --task=PUSH

You're done with Quickstart! See below for more Tasks, and also see docs/codebase_overview.md and docs/workflow.md for additional info.

Tasks

Task: Particle

In this task, the goal is for the agent (black dot) to first go to the green dot, then the blue dot.

Example IBC policy Example MSE policy

Get Data

We can either generate data from scratch, for example for 2D (takes 15 seconds):

./ibc/ibc/configs/particle/collect_data.sh

Or just download all the data for all different dimensions:

cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/particle.zip
unzip particle.zip && rm particle.zip
cd ../..

Train and Evaluate

Let's start with some small networks, on just the 2D version since it's easiest to visualize, and compare MSE and IBC. Here's a small-network (256x2) IBC-with-Langevin config, where 2 is the argument for the environment dimensionality.

./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 2

And here's an idenitcally sized network (256x2) but with MSE config:

./ibc/ibc/configs/particle/run_mlp_mse.sh 2

For the above configurations, we suggest comparing the rollout videos, which you can find at /tmp/ibc_logs/...corresponding_directory../videos/. At the top of this section is shown a comparison at 10,000 training steps for the two different above configs.

And here are the best configs respectfully for IBC (with langevin) and MSE, in this case run on the 16-dimensional environment:

./ibc/ibc/configs/particle/run_mlp_ebm_langevin_best.sh 16
./ibc/ibc/configs/particle/run_mlp_mse_best.sh 16

Note: the _best config is kind of slow for Langevin to train, but even just ./ibc/ibc/configs/particle/run_mlp_ebm_langevin.sh 16 (smaller network) seems to solve the 16-D environment pretty well, and is much faster to train.

Task: Block Pushing (from state observations)

Get Data

We can either generate data from scratch (~2 minutes for 2,000 episodes: 200 each across 10 replicas):

./ibc/ibc/configs/pushing_states/collect_data.sh

Or we can download data from the web:

cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_states_location.zip
unzip 'block_push_states_location.zip' && rm block_push_states_location.zip
cd ../..

Train and Evaluate

Here's reasonably fast-to-train config for IBC with DFO:

./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh

Or here's a config for IBC with Langevin:

./ibc/ibc/configs/pushing_states/run_mlp_ebm_langevin.sh

Or here's a comparable, reasonably fast-to-train config for MSE:

./ibc/ibc/configs/pushing_states/run_mlp_mse.sh

Or to run the best configs respectfully for IBC, MSE, and MDN (some of these might be slower to train than the above):

./ibc/ibc/configs/pushing_states/run_mlp_ebm_best.sh
./ibc/ibc/configs/pushing_states/run_mlp_mse_best.sh
./ibc/ibc/configs/pushing_states/run_mlp_mdn_best.sh

Task: Block Pushing (from image observations)

Get Data

Download data from the web:

cd ibc/data/
wget https://storage.googleapis.com/brain-reach-public/ibc_data/block_push_visual_location.zip
unzip 'block_push_visual_location.zip' && rm block_push_visual_location.zip
cd ../..

Train and Evaluate

Here is an IBC with Langevin configuration which should actually converge faster than the IBC-with-DFO that we reported in the paper:

./ibc/ibc/configs/pushing_pixels/run_pixel_ebm_langevin.sh

And here are the best configs respectfully for IBC (with DFO), MSE, and MDN:

./ibc/ibc/configs/pushing_pixels/run_pixel_ebm_best.sh
./ibc/ibc/configs/pushing_pixels/run_pixel_mse_best.sh
./ibc/ibc/configs/pushing_pixels/run_pixel_mdn_best.sh

Task: D4RL Adroit and Kitchen

Get Data

The D4RL human demonstration training data used for the paper submission can be downloaded using the commands below. This data has been processed into a .tfrecord format from the original D4RL data format:

cd ibc/data && mkdir -p d4rl_trajectories && cd d4rl_trajectories
wget https://storage.googleapis.com/brain-reach-public/ibc_data/door-human-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/hammer-human-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-complete-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-mixed-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/kitchen-partial-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/pen-human-v0.zip \
     https://storage.googleapis.com/brain-reach-public/ibc_data/relocate-human-v0.zip
unzip '*.zip' && rm *.zip
cd ../../..

Run Train Eval:

Here are the best configs respectfully for IBC (with Langevin), and MSE: On a 2080 Ti GPU test, this IBC config trains at only 1.7 steps/sec, but it is about 10x faster on TPUv3.

./ibc/ibc/configs/d4rl/run_mlp_ebm_langevin_best.sh pen-human-v0
./ibc/ibc/configs/d4rl/run_mlp_mse_best.sh pen-human-v0

The above commands will run on the pen-human-v0 environment, but you can swap this arg for whichever of the provided Adroit/Kitchen environments.

Here also is an MDN config you can try. The network size is tiny but if you increase it heavily then it seems to get NaNs during training. In general MDNs can be finicky. A solution should be possible though.

./ibc/ibc/configs/d4rl/run_mlp_mdn.sh pen-human-v0

Summary for Reproducing Results

For the tasks that we've been able to open-source, results from the paper should be reproducible by using the linked data and command-line args below.

Task Figure/Table in paper Data Train + Eval commands
Coordinate regression Figure 4 See colab See colab
D4RL Adroit + Kitchen Table 2 Link Link
N-D particle Figure 6 Link Link
Simulated pushing, single target, states Table 3 Link Link
Simulated pushing, single target, pixels Table 3 Link Link

Citation

If you found our paper/code useful in your research, please consider citing:

@article{florence2021implicit,
    title={Implicit Behavioral Cloning},
    author={Florence, Pete and Lynch, Corey and Zeng, Andy and Ramirez, Oscar and Wahid, Ayzaan and Downs, Laura and Wong, Adrian and Lee, Johnny and Mordatch, Igor and Tompson, Jonathan},
    journal={Conference on Robot Learning (CoRL)},
    month = {November},
    year={2021}
}

ibc's People

Contributors

ayzaan avatar ericcousineau-tri avatar jonathantompson avatar peteflorence avatar rchen152 avatar ronshapiro avatar siegelordex avatar yilei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ibc's Issues

Cannot register 2 metrics with the same name: /tensorflow/api/keras/optimizers

Just tried running tests on Ubuntu 20.04, CPython 3.8.10, but get the following error:

$ cd ibc/..
$ ./ibc/run_tests.sh
...
PYTHONPATH=:{parent}/ibc/.. python3 {parent}/ibc/ibc/agents/mcmc_test.py --alsologtostderr
...
2021-11-08 16:03:33.113843: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-08 16:03:33.113875: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-08 16:03:34.161325: E tensorflow/core/lib/monitoring/collection_registry.cc:77] Cannot register 2 metrics with the same name: /tensorflow/api/keras/optimizers
...
tensorflow.python.framework.errors_impl.AlreadyExistsError: Another metric with the same name already exists.
ERROR: 'PYTHONPATH=:{parent}/ibc/.. python3 {parent}/ibc/ibc/agents/mcmc_test.py --alsologtostderr' failed!

Full stack trace: https://gist.github.com/EricCousineau-TRI/ac6e9943606e6b9f7e335882f7caa350

Not sure if it's b/c of CUDA error.
I have stock Ubuntu CUDA 10.1 on my machine, so will try out NVidia-installed CUDA 11.0.
See below.

Also happens when trying to run training script, ./ibc/ibc/configs/pushing_states/run_mlp_ebm.sh

other tasks in D4RL

Hi,
Thanks for providing the implementations of your work! I want to valid IBC on the locomotion tasks in D4RL, such as hopper, halfcheetah .. But it seems like you haven't provided the relevant datasets. Are there any scripts code for converting the d4rl dataset to the tfrecords? Or the dataset links for direct downloading like the adroits :)
Thanks

Support for Autogressive Dfo

I'm trying to implement IBC with Autogressive Dfo, is this supported? I see that its mention in the appendix of the paper but I'm unable to find it in the code, I found dfo and Langevin but not autogressive dfo.

Maybe the version of gym needs to be written to the readme

Hi @peteflorence
In the new version of GYM, 'done' has been removed from the parameters of step, and 'terminated' and 'truncated' have been added. So running the unit test in the new version of the GYM environment will fail. I think maybe the version of GYM used for this project should be indicated in the readme.
Thanks,
Vinson

How do you handle mode killing?

If I'm not mistaken, you sample multiple negative examples for each positive example in the dataset. As the negative examples converge to the positive mode, they will kill the mode. How do you handle this?

Unit tests fail

Hi, it seems that the unit tests do not work out of the box. I'm working in a clean conda environment with Python 3.7.13. All of the prerequisites are installed with the versions described in the readme, as well as CUDA and cuDNN (Tensorflow has GPU access).

Here's the complete output from the test script:

Test script outputibc-test ❯ ./ibc/run_tests.sh bash: /home/arc/miniconda3/envs/ibc-test/lib/libtinfo.so.6: no version information available (required by bash) Running run_tests.sh in directory /home/arc/noah Running tests: /home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py /home/arc/noah/ibc/environments/block_pushing/block_pushing_test.py /home/arc/noah/ibc/environments/utils/utils_pybullet_test.py /home/arc/noah/ibc/environments/utils/xarm_sim_robot_test.py /home/arc/noah/ibc/environments/particle/particle_test.py /home/arc/noah/ibc/ibc/agents/mcmc_test.py /home/arc/noah/ibc/ibc/train/stats_test.py /home/arc/noah/ibc/data/dataset_test.py *********************************************************************** Running test /home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py *********************************************************************** PYTHONPATH=:/home/arc/noah:/home/arc/noah/ibc/.. python3 /home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py --alsologtostderr /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead. 'nearest': pil_image.NEAREST, /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead. 'bilinear': pil_image.BILINEAR, /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead. 'bicubic': pil_image.BICUBIC, /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead. if hasattr(pil_image, 'HAMMING'): /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead. if hasattr(pil_image, 'BOX'): /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead. if hasattr(pil_image, 'LANCZOS'): /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/__init__.py:56: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if (distutils.version.LooseVersion(tf_version) < /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tensorflow_probability/python/__init__.py:61: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if (distutils.version.LooseVersion(tf.__version__) < /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/utils/common.py:87: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. and (distutils.version.LooseVersion(tf.__version__) <= pybullet build time: Jun 28 2022 14:19:23 /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/envs/registration.py:416: UserWarning: WARN: The `registry.env_specs` property along with `EnvSpecTree` is deprecated. Please use `registry` directly as a dictionary instead. "The `registry.env_specs` property along with `EnvSpecTree` is deprecated. Please use `registry` directly as a dictionary instead." 2022-06-29 13:44:14.864729: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-29 13:44:14.868489: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-06-29 13:44:14.868817: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero Running tests under Python 3.7.13: /home/arc/miniconda3/envs/ibc-test/bin/python3 [ RUN ] Blocks2DTest.test_load_push_env /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/spaces/box.py:112: UserWarning: WARN: Box bound precision lowered by casting to float32 logger.warn(f"Box bound precision lowered by casting to {self.dtype}") argv[0]= I0629 13:44:14.880214 140609892135296 utils_pybullet.py:85] Loading URDF plane.urdf I0629 13:44:14.885888 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/workspace.urdf I0629 13:44:14.886190 140609892135296 utils_pybullet.py:85] Loading URDF xarm/xarm6_robot.urdf I0629 13:44:14.908028 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/suction/suction-head-long.urdf I0629 13:44:14.911714 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone.urdf I0629 13:44:14.912028 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone2.urdf I0629 13:44:14.912325 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block.urdf I0629 13:44:14.912554 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block2.urdf INFO:tensorflow:time(__main__.Blocks2DTest.test_load_push_env): 0.09s I0629 13:44:14.959522 140609892135296 test_util.py:2189] time(__main__.Blocks2DTest.test_load_push_env): 0.09s [ OK ] Blocks2DTest.test_load_push_env [ RUN ] Blocks2DTest.test_serialize_state_push argv[0]= I0629 13:44:14.963971 140609892135296 utils_pybullet.py:85] Loading URDF plane.urdf I0629 13:44:14.969090 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/workspace.urdf I0629 13:44:14.969393 140609892135296 utils_pybullet.py:85] Loading URDF xarm/xarm6_robot.urdf I0629 13:44:14.987162 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/suction/suction-head-long.urdf I0629 13:44:14.991161 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone.urdf I0629 13:44:14.991568 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone2.urdf I0629 13:44:14.991911 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block.urdf I0629 13:44:14.992150 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block2.urdf INFO:tensorflow:time(__main__.Blocks2DTest.test_serialize_state_push): 0.13s I0629 13:44:15.085130 140609892135296 test_util.py:2189] time(__main__.Blocks2DTest.test_serialize_state_push): 0.13s [ OK ] Blocks2DTest.test_serialize_state_push [ RUN ] Blocks2DTest.test_session [ SKIPPED ] Blocks2DTest.test_session [ RUN ] Blocks2DTest.test_validate_environment argv[0]= I0629 13:44:15.090482 140609892135296 utils_pybullet.py:85] Loading URDF plane.urdf I0629 13:44:15.095655 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/workspace.urdf I0629 13:44:15.095959 140609892135296 utils_pybullet.py:85] Loading URDF xarm/xarm6_robot.urdf I0629 13:44:15.113764 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/suction/suction-head-long.urdf I0629 13:44:15.117837 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone.urdf I0629 13:44:15.118308 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/zone2.urdf I0629 13:44:15.118738 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block.urdf I0629 13:44:15.119005 140609892135296 utils_pybullet.py:85] Loading URDF ibc/environments/assets/block2.urdf /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:98: UserWarning: WARN: We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html "We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) " /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:217: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator. "Future gym versions will require that `Env.reset` can be passed a `seed` instead of using `Env.seed` for resetting the environment random number generator. " /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:229: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `return_info` to return information from the environment resetting. "Future gym versions will require that `Env.reset` can be passed `return_info` to return information from the environment resetting." /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py:234: UserWarning: WARN: Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information. "Future gym versions will require that `Env.reset` can be passed `options` to allow the environment initialisation to be passed additional information." /home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/spaces/box.py:197: UserWarning: WARN: Casting input x to numpy array. logger.warn("Casting input x to numpy array.") INFO:tensorflow:time(__main__.Blocks2DTest.test_validate_environment): 0.09s I0629 13:44:15.179256 140609892135296 test_util.py:2189] time(__main__.Blocks2DTest.test_validate_environment): 0.09s [ FAILED ] Blocks2DTest.test_validate_environment ====================================================================== FAIL: test_validate_environment (__main__.Blocks2DTest) Blocks2DTest.test_validate_environment ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py", line 34, in test_validate_environment utils.validate_py_environment(env) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/utils.py", line 75, in validate_py_environment time_step = environment.reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 196, in reset self._current_time_step = self._reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/wrappers.py", line 111, in _reset return self._env.reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 196, in reset self._current_time_step = self._reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/gym_wrapper.py", line 193, in _reset observation = self._gym_env.reset() File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/time_limit.py", line 66, in reset return self.env.reset(**kwargs) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 42, in reset return self.env.reset(**kwargs) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 47, in reset return passive_env_reset_check(self.env, **kwargs) File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 247, in passive_env_reset_check _check_obs(obs, env.observation_space, "reset") File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 115, in _check_obs ), f"{pre} is not contained with the observation space ({observation_space})" AssertionError: The observation returned by the `reset()` method is not contained with the observation space (Dict(block_translation: Box(-5.0, 5.0, (2,), float32), block_orientation: Box(-6.2831855, 6.2831855, (1,), float32), block2_translation: Box(-5.0, 5.0, (2,), float32), block2_orientation: Box(-6.2831855, 6.2831855, (1,), float32), effector_translation: Box([ 0.05 -0.6 ], [0.8 0.6], (2,), float32), effector_target_translation: Box([ 0.05 -0.6 ], [0.8 0.6], (2,), float32), target_translation: Box(-5.0, 5.0, (2,), float32), target_orientation: Box(-6.2831855, 6.2831855, (1,), float32), target2_translation: Box(-5.0, 5.0, (2,), float32), target2_orientation: Box(-6.2831855, 6.2831855, (1,), float32))) ---------------------------------------------------------------------- Ran 4 tests in 0.310s FAILED (failures=1, skipped=1) ERROR: 'PYTHONPATH=:/home/arc/noah:/home/arc/noah/ibc/.. python3 /home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py --alsologtostderr' failed!

Here's the last part of that formatted a bit more nicely:

INFO:tensorflow:time(__main__.Blocks2DTest.test_validate_environment): 0.09s
I0629 13:44:15.179256 140609892135296 test_util.py:2189] time(__main__.Blocks2DTest.test_validate_environment): 0.09s
[  FAILED  ] Blocks2DTest.test_validate_environment
======================================================================
FAIL: test_validate_environment (__main__.Blocks2DTest)
Blocks2DTest.test_validate_environment
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py", line 34, in test_validate_environment
    utils.validate_py_environment(env)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/utils.py", line 75, in validate_py_environment
    time_step = environment.reset()
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 196, in reset
    self._current_time_step = self._reset()
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/wrappers.py", line 111, in _reset
    return self._env.reset()
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 196, in reset
    self._current_time_step = self._reset()
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/gym_wrapper.py", line 193, in _reset
    observation = self._gym_env.reset()
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/time_limit.py", line 66, in reset
    return self.env.reset(**kwargs)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 42, in reset
    return self.env.reset(**kwargs)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 47, in reset
    return passive_env_reset_check(self.env, **kwargs)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 247, in passive_env_reset_check
    _check_obs(obs, env.observation_space, "reset")
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 115, in _check_obs
    ), f"{pre} is not contained with the observation space ({observation_space})"
AssertionError: The observation returned by the `reset()` method is not contained with the observation space (Dict(block_translation: Box(-5.0, 5.0, (2,), float32), block_orientation: Box(-6.2831855, 6.2831855, (1,), float32), block2_translation: Box(-5.0, 5.0, (2,), float32), block2_orientation: Box(-6.2831855, 6.2831855, (1,), float32), effector_translation: Box([ 0.05 -0.6 ], [0.8 0.6], (2,), float32), effector_target_translation: Box([ 0.05 -0.6 ], [0.8 0.6], (2,), float32), target_translation: Box(-5.0, 5.0, (2,), float32), target_orientation: Box(-6.2831855, 6.2831855, (1,), float32), target2_translation: Box(-5.0, 5.0, (2,), float32), target2_orientation: Box(-6.2831855, 6.2831855, (1,), float32)))

----------------------------------------------------------------------

The issue seems to be that several fields of the observation returned by BlockPushMultimodal._compute_state() need to be converted to np arrays with dtype np.float32. After doing that and running the test again, I get the following error instead:

INFO:tensorflow:time(__main__.Blocks2DTest.test_validate_environment): 0.1s
I0629 14:04:36.733099 140095945187712 test_util.py:2189] time(__main__.Blocks2DTest.test_validate_environment): 0.1s
[  FAILED  ] Blocks2DTest.test_validate_environment
======================================================================
ERROR: test_validate_environment (__main__.Blocks2DTest)
Blocks2DTest.test_validate_environment
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/arc/noah/ibc/environments/block_pushing/block_pushing_multimodal_test.py", line 34, in test_validate_environment
    utils.validate_py_environment(env)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/utils.py", line 84, in validate_py_environment
    time_step = environment.step(action)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 233, in step
    self._current_time_step = self._step(action)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/wrappers.py", line 117, in _step
    time_step = self._env.step(action)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 233, in step
    self._current_time_step = self._step(action)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/tf_agents/environments/gym_wrapper.py", line 215, in _step
    observation, reward, self._done, self._info = self._gym_env.step(action)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/time_limit.py", line 49, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 37, in step
    return self.env.step(action)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 39, in step
    return passive_env_step_check(self.env, action)
  File "/home/arc/miniconda3/envs/ibc-test/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 273, in passive_env_step_check
    if np.any(np.isnan(obs)):
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

----------------------------------------------------------------------

This is the same error as in #14 so I'd guess these issues are related.

Any thoughts?

Colab to reproduce Fig. 4

Hi IBC Authors,
Thank you for your awesome work and the code release. Your README asks to look into the colab for reproducing Fig. 4. However, I do not know which colab are your referrring to.

Would you mind sharing the link to the colab for us to reproduce your results.

langevin formulation: Noise term isn't scaled correctly?

@thduynguyen and I are implementing some of this, and we found that the Langevin formulation may not be correct:

ibc/ibc/agents/mcmc.py

Lines 247 to 249 in 1073c37

gradient_scale = 0.5 # this is in the Langevin dynamics equation.
de_dact = (gradient_scale * l_lambda * de_dact +
tf.random.normal(tf.shape(actions)) * l_lambda * noise_scale)

as opposed to
https://en.wikipedia.org/wiki/Metropolis-adjusted_Langevin_algorithm

We think it should be

de_dact = step_size * de_dact + sqrt(2 * step_size) * noise

vs

de_dact = step_size * (0.5 * de_dact + noise)

On our end, we implemented this in PyTorch, and found that using the ibc formulation, our distribution does not converge to desired stddev; if we use sqrt(2 * step_size), then it seems to converge within ~12% error.

Here's a watered down version of the test we used to check (written in PyTorch):
https://github.com/EricCousineau-TRI/repro/blob/db3e329379f691706883d85e45bca9c63568c9d0/python/torch/langevin_step.ipynb

If you have time, please let me know if you think this is wrong or what not.

--

Iterations from (what we think is) IBC-style:

langevin_ibc_true.mp4

Iterations from formulation from Wikipedia:

langevin_ibc_false.mp4

And as always, thanks a ton for publishing this code!!!

start ibc

Hi,thank you for your work. when I run ./ibc/run_tests.sh
it tell me **Traceback (most recent call last):
File "/home/nmz/ibc/data/dataset_test.py", line 25, in
from ibc.environments.block_pushing import block_pushing # pylint: disable=unused-import
File "/home/nmz/ibc/environments/block_pushing/block_pushing.py", line 935, in
if 'BlockPush-v0' in registration.registry.env_specs:
AttributeError: 'dict' object has no attribute 'env_specs'
ERROR: 'PYTHONPATH=/opt/ros/noetic/lib/python3/dist-packages:/home/nmz/ibc/.. python3 /home/nmz/ibc/data/dataset_test.py --alsologtostderr' failed!
**
I just follow you steps!

Support for Categorical Action Space

I am currently working on implementing implicit BC for a task which has both keyboard and mouse inputs as action-space, is there a straightforward way to make this action space suitable for the implicit regression task?

Goal Tolerance Are Different for Different Methods

Hi, I found that

train_eval.goal_tolerance = 0.02

is set in EBM's config but not in MSE's config.

The difference makes the evaluation to be more strict on MSE-based BC as the default goal_tolerance=0.01 (code).

Setting train_eval.goal_tolerance = 0.01 for the EBM agent decreases its success rate from 1.0 to [0.85, 0.95] after training for 10k steps.

Error running particle experiments

Hi, thanks for open sourcing this work! I tried running:

./ibc/ibc/configs/particle/run_mlp_ebm_langevin_best.sh 2

And got this error

  File "ibc/ibc/train_eval.py", line 397, in main                                                                        [122/528]
    strategy=strategy)                                                                                                              File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gin/config.py", line 1069, in gin_wrapper                          utils.augment_exception_message_and_reraise(e, err_str)                                                                         File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gin/utils.py", line 41, in augment_exception_message_and_rerais
e                                                                                                                                 
    raise proxy.with_traceback(exception.__traceback__) from None                                                                 
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gin/config.py", line 1046, in gin_wrapper                          return fn(*new_args, **new_kwargs)                                                                                            
  File "ibc/ibc/train_eval.py", line 279, in train_eval                                                                           
    name_scope_suffix=f'_{env_name}')                                                                                             
  File "ibc/ibc/train_eval.py", line 353, in evaluation_step                                                                          eval_actor.run()                                                                                                              
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/train/actor.py", line 149, in run                    
    self._time_step, self._policy_state)                                                                                          
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/drivers/py_driver.py", line 112, in run                  next_time_step = self.env.step(action_step.action)                                                                            
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 233, in step   
    self._current_time_step = self._step(action)                                                                                  
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/environments/wrappers.py", line 1015, in _step           time_step = self._env.step(action)                                                                                            
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/environments/py_environment.py", line 233, in step   
    self._current_time_step = self._step(action)                                                                                  
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/tf_agents/environments/gym_wrapper.py", line 215, in _step     
    observation, reward, self._done, self._info = self._gym_env.step(action)                                                      
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gym/wrappers/order_enforcing.py", line 37, in step             
    return self.env.step(action)                                                                                                  
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gym/wrappers/env_checker.py", line 39, in step                 
    return passive_env_step_check(self.env, action)                                                                               
  File "/iris/u/ayz/anaconda3/envs/ibc/lib/python3.7/site-packages/gym/utils/passive_env_checker.py", line 273, in passive_env_st$
p_check                                                                                                                           
    if np.any(np.isnan(obs)):                                                                                                     
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types acc$
rding to the casting rule ''safe''

I'm not super familiar with tf-agents, but from some debugging it looks like obs is a dictionary type and np.isnan is having issue with it. Any thought on how one could fix this?

Thanks,
Allan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.