andrejorsula / drl_grasping Goto Github PK

Deep Reinforcement Learning for Robotic Grasping from Octrees

Home Page: https://arxiv.org/pdf/2208.00818

License: BSD 3-Clause "New" or "Revised" License

CMake 0.27% Python 96.12% Dockerfile 1.21% Shell 2.41%

robotics grasping reinforcement-learning octree domain-randomization sim2real ros2 gym-ignition stable-baselines3 deep-reinforcement-learning

drl_grasping's Introduction

Deep Reinforcement Learning for Robotic Grasping from Octrees

This project focuses on applying deep reinforcement learning to acquire a robust policy that allows robots to grasp diverse objects from compact 3D observations in the form of octrees.

Evaluation of a trained policy on novel scenes (previously unseen camera poses, objects, terrain textures, ...).

Sim-to-Real transfer of a policy trained solely inside a simulation (zero-shot transfer). Credit: Aalborg University

Evaluation of a trained policy for grasping rocks on the Moon inside a simulation.

Sim-to-Real transfer in a Moon-analogue facility (zero-shot transfer). Credit: University of Luxembourg

Overview

This repository contains multiple RL environments for robotic manipulation, focusing on robotic grasping using continuous actions in Cartesian space. All environments have several observation variants that enable direct comparison (RGB images, depth maps, octrees, ...). Each task is coupled with a simulation environment that can be used to train RL agents. These agents can subsequently be evaluated on real robots that integrate ros2_control (or ros_control via ros1_bridge).

End-to-end model-free actor-critic algorithms have been tested on these environments (TD3, SAC and TQC | SB3 PyTorch implementation). A setup for experimenting with model-based algorithm (DreamerV2 | original TensorFlow implementation) is also provided, however, it is currently limited to RGB image observations. Interoperability of environments with most algorithms and their implementations should be possible due to compatibility with the Gym API.

List of Environments

Below is the list of implemented environments. Each environment (observation variant) has two alternatives, Task-Obs-vX and Task-Obs-Gazebo-vX (omitted from the table). Here, Task-Obs-vX implements the logic of the environment and can be used on real robots, whereas Task-Obs-Gazebo-vX combines this logic with the simulation environment inside Gazebo. Robots should be interchangeable for most parts, with some limitations (e.g. GraspPlanetary task requires a mobile manipulator to randomize the environment fully).

If you are interested in configuring these environments, first take a look at the list of their parameters inside Gym registration and then at their individual source code.

Reach the end-effector goal.	Grasp and lift a random object.	Grasp and lift a Moon rock.
Reach-v0 (state obs)	Grasp-v0 (state obs)	GraspPlanetary-v0 (state obs)
—	—	GraspPlanetary-MonoImage-v0
Reach-ColorImage-v0	—	GraspPlanetary-ColorImage-v0
Reach-DepthImage-v0	—	GraspPlanetary-DepthImage-v0
—	—	GraspPlanetary-DepthImageWithIntensity-v0
—	—	GraspPlanetary-DepthImageWithColor-v0
Reach-Octree-v0	Grasp-Octree-v0	GraspPlanetary-Octree-v0
Reach-OctreeWithIntensity-v0	Grasp-OctreeWithIntensity-v0	GraspPlanetary-OctreeWithIntensity-v0
Reach-OctreeWithColor-v0	Grasp-OctreeWithColor-v0	GraspPlanetary-OctreeWithColor-v0

By default, Grasp and GraspPlanetary tasks utilize GraspCurriculum that shapes their reward function and environment difficulty.

Domain Randomization

To facilitate the sim-to-real transfer of trained agents, simulation environments introduce domain randomization with the aim of improving the generalization of learned policies. This randomization is accomplished via ManipulationGazeboEnvRandomizer that populates the virtual world and enables randomizing of several properties at each reset of the environment. As this randomizer is configurable with numerous parameters, please take a look at the source code to see what environments you can create.

Examples of domain randomization for the Grasp task.

Examples of domain randomization for the GraspPlanetary task.

Model Datasets

Simulation environments in this repository can utilize datasets of any SDF models, e.g. models from Fuel. By default, the Grasp task uses Google Scanned Objects collection together with a set of PBR textures pointed to by TEXTURE_DIRS environment variable. On the contrary, the GraspPlanetary task employs custom models that are procedurally generated via Blender. However, this can be adjusted if desired.

All external models can be automatically configured and randomized in several ways via ModelCollectionRandomizer before their insertion into the world, e.g. optimization of collision geometry, estimation of (randomized) inertial properties and randomization of parameters such as geometry scale or surface friction. When processing large collections, model filtering can also be enabled based on several aspects, such as the complexity of the geometry or the existence of disconnected components. A few scripts for managing datasets can be found under scripts/utils/ directory.

End-to-End Learning from 3D Octree Observations

This project initially investigated how 3D visual observations can be leveraged to improve end-to-end learning of manipulation skills. Octrees were selected for this purpose due to their efficiently organized structure compared to other 3D representations.

To enable the extraction of abstract features from 3D octree observations, an octree-based 3D CNN is employed. The network module that accomplishes such feature extraction is implemented in the form of OctreeCnnFeaturesExtractor (PyTorch). This features extractor is part of the OctreeCnnPolicy policy implemented for TD3, SAC and TQC algorithms. Internally, the feature extractor utilizes O-CNN implementation to benefit from hardware acceleration on NVIDIA GPUs.

Illustration of the end-to-end actor-critic network architecture with octree-based 3D CNN feature extractor.

Limitations

The known limitations of this repository are listed below for your convenience.

No parallel environments – It is currently not possible to run multiple instances of the environment simultaneously.
Slow training – The simulation environments are computationally complex (physics, rendering, underlying low-level control, ...). This significantly impacts the ability to train agents with time and computational constraints. The performance of some of these aspects can be improved at the cost of accuracy and realism (e.g. physics_rate/step_size).
Suboptimal hyperparameters – Although a hyperparameter optimization framework was employed for some combinations of environments and algorithms, it is a prolonged process. This problem is exacerbated by the vast quantity of hyperparameters and their general brittleness. Therefore, the default hyperparameters provided in this repository might not be optimal.
Nondeterministic – Experiments are not fully repeatable, and even the same seed of the pseudorandom generator can lead to different results. This is caused by several aspects, such as the nondeterministic nature of network-based communication and non-determinism in the underlying deep learning frameworks and hardware.

Instructions

Setup-wise, there are two options when using this repository. Option A – Docker is recommended when trying this repository due to its simplicity. Otherwise, Option B – Local Installation can be used if a local setup is preferred. Both of these options are equal for the usage of this repository; however, pre-built Docker images come with all the required datasets while enabling isolation of runs.

Option A – Docker

Hardware Requirements

CUDA GPU – CUDA-enabled GPU is required for hardware-accelerated processing of octree observations. Everything else should also be functional on the CPU.

Install Docker

First, ensure your system has a setup for using Docker with NVIDIA GPUs. You can follow install_docker_with_nvidia.bash installation script for Debian-based distributions. Alternatively, consult the NVIDIA Container Toolkit Installation Guide for other Linux distributions.

# Execute script inside a cloned repository
.docker/host/install_docker_with_nvidia.bash
# (Alternative) Execute script from URL
bash -c "$(wget -qO - https://raw.githubusercontent.com/AndrejOrsula/drl_grasping/master/.docker/host/install_docker_with_nvidia.bash)"

Clone a Prebuilt Docker Image

Prebuilt Docker images of drl_grasping can be pulled directly from Docker Hub without needing to build them locally. You can use the following command to manually pull the latest image or one of the previous tagged Releases. The average size of images is 25GB (including datasets).

docker pull andrejorsula/drl_grasping:${TAG:-latest}

(Optional) Build a New Image

It is also possible to build the Docker image locally using the included Dockerfile. To do this, build.bash script can be executed as shown below (arguments are optional). This script will always print the corresponding low-level docker build ... command for your reference.

.docker/build.bash ${TAG:-latest} ${BUILD_ARGS}

Run a Docker Container

For simplicity, please run drl_grasping Docker containers using the included run.bash script shown below (arguments are optional). It enables NVIDIA GPUs and GUI interface while automatically mounting the necessary volumes (e.g. persistent logging) and setting environment variables (e.g. synchronization of middleware communication with the host). This script will always print the corresponding low-level docker run ... command for your reference.

# Execute script inside a cloned repository
.docker/run.bash ${TAG:-latest} ${CMD}
# (Alternative) Execute script from URL
bash -c "$(wget -qO - https://raw.githubusercontent.com/AndrejOrsula/drl_grasping/master/.docker/run.bash)" -- ${TAG:-latest} ${CMD}

The network communication of drl_grasping within this Docker container is configured based on the ROS 2 ROS_DOMAIN_ID environment variable, which can be set via ROS_DOMAIN_ID={0...101} .docker/run.bash ${TAG:-latest} ${CMD}. By default (ROS_DOMAIN_ID=0), external communication is restricted and multicast is disabled. With ROS_DOMAIN_ID=42, the communication remains restricted to localhost with multicast enabled, enabling monitoring of communication outside the container but within the same system. Using ROS_DOMAIN_ID=69 will use the default network interface and multicast settings, which can enable monitoring of communication within the same LAN. All other ROS_DOMAIN_IDs share the default behaviour and can be employed to enable communication partitioning for running of multiple drl_grasping instances.

Option B – Local Installation

Hardware Requirements

CUDA GPU – CUDA-enabled GPU is required for hardware-accelerated processing of octree observations. Everything else should also be functional on the CPU.

Dependencies

Ubuntu 20.04 (Focal Fossa) is the recommended OS for local installation. Other Linux distributions might work but require most dependencies to be built from the source.

These are the primary dependencies required to use this project that must be installed on your system.

Python 3.8
ROS 2 Galactic
Gazebo Fortress
Gym-Ignition
- Please use AndrejOrsula/gym-ignition fork in order to ensure compatibility (default branch – drl_grasping).
O-CNN
- Please use AndrejOrsula/O-CNN fork in order to ensure compatibility (default branch – master).

All additional dependencies are either pulled via vcstool (drl_grasping.repos) or installed via pip (python_requirements.txt) and rosdep during the building process below.

Building

Clone this repository recursively and import VCS dependencies. Then install dependencies and build with colcon.

# Clone this repository into your favourite ROS 2 workspace
git clone --recursive https://github.com/AndrejOrsula/drl_grasping.git
# Install Python requirements
pip3 install -r drl_grasping/python_requirements.txt
# Import dependencies
vcs import < drl_grasping/drl_grasping.repos
# Install dependencies
IGNITION_VERSION=fortress rosdep install -y -r -i --rosdistro ${ROS_DISTRO} --from-paths .
# Build
colcon build --merge-install --symlink-install --cmake-args "-DCMAKE_BUILD_TYPE=Release"

Sourcing

Before utilizing this project via local installation, remember to source the ROS 2 workspace.

source install/local_setup.bash

This enables:

Use of drl_grasping Python module
Execution of binaries, scripts and examples via ros2 run drl_grasping <executable>
Launching of setup scripts via ros2 launch drl_grasping <launch_script>
Discoverability of shared resources

Test Random Agents

A good starting point is to simulate some episodes using random agents where actions are sampled from the defined action space. This is also useful when modifying environments because it lets you analyze the consequences of actions and resulting observations without deep learning pipelines running in the background. To get started, run the following example. It should open RViz 2 and Gazebo client instances that provide you with visual feedback.

ros2 run drl_grasping ex_random_agent.bash

After running the example script, the underlying ros2 launch drl_grasping random_agent.launch.py ... command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments.

ros2 launch drl_grasping random_agent.launch.py seed:=42 robot_model:=lunalab_summit_xl_gen env:=GraspPlanetary-Octree-Gazebo-v0 check_env:=false render:=true enable_rviz:=true log_level:=warn

Train New Agents

You can also train your agents from scratch. To begin the training, run the following example. By default, headless mode is used during the training to reduce computational load.

ros2 run drl_grasping ex_train.bash

After running the example script, the underlying ros2 launch drl_grasping train.launch.py ... command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments.

ros2 launch drl_grasping train.launch.py seed:=42 robot_model:=panda env:=Grasp-OctreeWithColor-Gazebo-v0 algo:=tqc log_folder:=/root/drl_grasping_training/train/Grasp-OctreeWithColor-Gazebo-v0/logs tensorboard_log:=/root/drl_grasping_training/train/Grasp-OctreeWithColor-Gazebo-v0/tensorboard_logs save_freq:=10000 save_replay_buffer:=true log_interval:=-1 eval_freq:=10000 eval_episodes:=20 enable_rviz:=false log_level:=fatal

Remote Visualization

To visualize the agent while training, separate RViz 2 and Gazebo client instances can be opened. For the Docker setup, these commands can be executed in a new drl_grasping container with the same ROS_DOMAIN_ID.

# RViz 2 (Note: Visualization of robot model will not be loaded using this approach)
rviz2 -d $(ros2 pkg prefix --share drl_grasping)/rviz/drl_grasping.rviz
# Gazebo client
ign gazebo -g

TensorBoard

TensorBoard logs will be generated during training in a directory specified by the tensorboard_log:=${TENSORBOARD_LOG} argument. You can open them in your web browser using the following command.

tensorboard --logdir ${TENSORBOARD_LOG}

(Experimental) Train with Dreamer V2

You can also try to train some agents using the model-based Dreamer V2 algorithm. To begin the training, run the following example. By default, headless mode is used during the training to reduce computational load.

ros2 run drl_grasping ex_train_dreamerv2.bash

After running the example script, the underlying ros2 launch drl_grasping train_dreamerv2.launch.py ... command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments.

ros2 launch drl_grasping train_dreamerv2.launch.py seed:=42 robot_model:=lunalab_summit_xl_gen env:=GraspPlanetary-ColorImage-Gazebo-v0 log_folder:=/root/drl_grasping_training/train/GraspPlanetary-ColorImage-Gazebo-v0/logs eval_freq:=10000 enable_rviz:=false log_level:=fatal

Evaluate New Agents

Once you train your agents, you can evaluate them. Start by looking at ex_evaluate.bash, which can be modified to fit your trained agent. It should open RViz 2 and Gazebo client instances that provide you with visual feedback, while the agent's performance will be logged and printed to STDOUT.

ros2 run drl_grasping ex_evaluate.bash

After running the example script, the underlying ros2 launch drl_grasping evaluate.launch.py ... command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments. For example, you can select a specific checkpoint with the load_checkpoint:=${LOAD_CHECKPOINT} argument instead of running the final model.

ros2 launch drl_grasping evaluate.launch.py seed:=77 robot_model:=panda env:=Grasp-Octree-Gazebo-v0 algo:=tqc log_folder:=/root/drl_grasping_training/train/Grasp-Octree-Gazebo-v0/logs reward_log:=/root/drl_grasping_training/evaluate/Grasp-Octree-Gazebo-v0 stochastic:=false n_episodes:=200 load_best:=false enable_rviz:=true log_level:=warn

Optimize Hyperparameters

The default hyperparameters for training agents with TD3, SAC and TQC can be found under the hyperparams directory. Optuna can be employed to autotune some of these parameters. To get started, run the following example. By default, headless mode is used during hyperparameter optimization to reduce computational load.

ros2 run drl_grasping ex_optimize.bash

ros2 launch drl_grasping optimize.launch.py seed:=69 robot_model:=panda env:=Grasp-Octree-Gazebo-v0 algo:=tqc log_folder:=/root/drl_grasping_training/optimize/Grasp-Octree-Gazebo-v0/logs tensorboard_log:=/root/drl_grasping_training/optimize/Grasp-Octree-Gazebo-v0/tensorboard_logs n_timesteps:=1000000 sampler:=tpe pruner:=median n_trials:=20 n_startup_trials:=5 n_evaluations:=4 eval_episodes:=20 log_interval:=-1 enable_rviz:=true log_level:=fatal

Citation

Please use the following citation if you use drl_grasping in your work.

@inproceedings{orsula_learning_2022,
  author    = {Andrej Orsula and Simon B{\o}gh and Miguel Olivares-Mendez and Carol Martinez},
  title     = {{Learning} to {Grasp} on the {Moon} from {3D} {Octree} {Observations} with {Deep} {Reinforcement} {Learning}},
  year      = {2022},
  booktitle = {2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages     = {4112--4119},
  doi       = {10.1109/IROS47612.2022.9981661}
}

Directory Structure

.
├── drl_grasping/        # [dir] Primary Python module of this project
│   ├── drl_octree/      # [dir] Submodule for end-to-end learning from 3D octree observations
│   ├── envs/            # [dir] Submodule for environments
│   │   ├── control/     # [dir] Interfaces for the control of agents
│   │   ├── models/      # [dir] Functional models for simulation environments
│   │   ├── perception/  # [dir] Interfaces for the perception of agents
│   │   ├── randomizers/ # [dir] Domain randomization of the simulated environments
│   │   ├── runtimes/    # [dir] Runtime implementations of the task (sim/real)
│   │   ├── tasks/       # [dir] Implementation of tasks
│   │   ├── utils/       # [dir] Environment-specific utilities used across the submodule
│   │   └── worlds/      # [dir] Minimal templates of worlds for simulation environments
│   └── utils/           # [dir] Submodule for training and evaluation scripts boilerplate (using SB3)
├── examples/            # [dir] Examples for training and evaluating RL agents
├── hyperparams/         # [dir] Default hyperparameters for training RL agents
├── launch/              # [dir] ROS 2 launch scripts that can be used to interact with this repository
├── pretrained_agents/   # [dir] Collection of pre-trained agents
├── rviz/                # [dir] RViz2 config for visualization
├── scripts/             # [dir] Helpful scripts for training, evaluation and other utilities
├── CMakeLists.txt       # Colcon-enabled CMake recipe
└── package.xml          # ROS 2 package metadata

drl_grasping's People

Contributors

Stargazers

Watchers

drl_grasping's Issues

EPIC: 3D observations and policy

Goal of this epic is to have an agent that received 3D observations and optimises a corresponding policy for it

Make sure random objects don't get spawned inside each other, while getting spawned as close to ground plane as possible

Not a critical problem as such, but it significantly slows the simulation if intersection between two collision geometries occurs.

Prior to anything else, try using https://github.com/dartsim/dart instead of the ignition fork. Things might explode, but that might be okay. Else fix appropriately.

Try https://github.com/dartsim/dart
- If not satisfactory, fix this issue correctly (there are currently only bad ways that I can think of, so investigate it more)

Reset of joint positions causes segmentation fault

robot.to_gazebo().reset_joint_positions(joint_positions) eventually causes segmentation fault (not always).

dbg says it is during gazebo postupdate step, but unknown place

Estimate normals from point cloud

Move tf broadcasting from randomizer into task, while making it camera/robot agnostic

Determine what kind of reward function is descriptive enough (with preference for sparse reward)

EPIC: Future works

A list of future improvements with short description. These are currently not part of Design and decisions (#50). These issues might be created and worked on if time allows it. However, their priority is lower than any open issues as they are likely out of scope for this project.

Parallel environments

Make environment more scalable.

Data augmentation

Data augmentation can go a long way - https://arxiv.org/pdf/2004.14990.pdf

Adding noise to observations is simple. More complex augmentation might be a bit difficult with octree structure - what data to augment, e.g. rgb + depth image or point cloud?

Control force of grasps

Gripper currently applies the same force to all objects. However, this force might be too excessive for several objects and break them in real life. For example, this problem could be formulated as an energy minimisation problem and incorporated into the reward function.

Random position of scene (ground plane + object)

Agent trained in an environment where the position of ground plane is randomized might provide better generalisation and allow Sim2Real transfer even if real setup does not match the simulation environment. The use of Octree (transformed into robot coordinate frame) allows this feature.

Extend octree to cover the entire reachable workspace of the robot

With this addition, no extra information about the origin of octree would need to be included as pose of all observed surfaces would already have spatial encoding. Therefore, the agent should be able to grasp objects located anywhere within its reach (if properly generalised). This would obviously require much more training, computation and memory (since depth of octree would need to be further increased).

Geometry of ground plane (heighmap)

Variety in geometry of ground plane further improves generalisation. Currently, only horizontal flat plane is used.

Let agent terminate episode

In https://arxiv.org/pdf/1806.10293.pdf, agent has another action that lets it terminate the episode
In https://arxiv.org/pdf/2007.04499.pdf, grasping action terminates the episode

Semantic grasping

A policy that grasps one specific object, e.g. based on extra observation of current object's position given as goal (and modify reward of course). Grasping specific object class from clutter is also pretty useful.

Directional (ray-based) grasping

A policy that grasps object(s) from one specific direction (or area) instead of random one, e.g. grasp tool by handle. For example, a ray could be given as an extra observation to indicate goal and provide more reward the closer it matches.

Extend 3D observations with proprioceptive data

Add observations for end effector position, orientation and gripper state alongside octree

Add dockerfile

Whale goes PHWARGH...

EPIC: Hyperparameter tuning

Autotuning issue: #53

Randomise object spawning

EPIC: Design and decisions

This is the current design that contains major decisions for the project. Additional future work and improvements that are not part of this design are listed in #51 and might be eventually included in the implementation if time allows it.

Setup

Reproducible setup in a simulation and test setup in real life. Ignition Gazebo was selected as the robotics simulator and is used for training the RL agent.

Robotic arm
- Equipped with a mechanical gripper
- Franka Emika Panda with a parallel two-finger gripper
  - Ignition model - panda_ign
RGB-D camera
- Oriented towards objects
- Ignition model - generic rgbd_camera sensor
- RealSense D4XX camera will be used for real evaluation
Objects
- Variety of shapes, sizes and materials
- Placed randomly on top of horizontal surface
- Ignition models - Google Scanned Objects collection
- Real-life test objects - idk, something random

Task

Grasping in its simplest can be conceptually decoupled into these sub-routines. RL agent should aspire to learn steps 1-3, with the 4th step being determined by surrounding application, e.g. success or max number of steps. Agent can learn additional steps through exploration, e.g. push and pull objects in order to provide better grasping conditions.

Move end effector (gripper) to pre-grasp pose
- Pose must be determined from sensory observations
Close gripper
Lift object above the supporting surface
- Make sure the grasp is secure
Terminate and allow other tasks/processes to execute
- Outside of scope for agent's policy, but it needs to be determined both in simulation and real-life

These stages slightly inspired stages in curriculum learning, see #62.

Control loop

The main control loop of the agent runs at relatively low frequency (~2.5Hz)
Low-level controllers, e.g. joint PIDs, and sensors, e.g. RGB-D camera, run at higher update rate than the agent (~200Hz for control, >=15Hz for sensing)

Get observations
Predict actions
Execute actions (simultaneously)
- Move arm to the new configuration
- Execute gripper action
  - This action might be executed much faster than arm movement
Repeat until termination (success or max steps)

Another approach would be to decompose the task to sensing, planning, execution (e.g. robot action would consist only of grasp pose and everything else would be performed outside the agent's policy), or remove gripper action from the control loop and perform grasp once episode is terminated or a certain Z position is reached, e.g. https://arxiv.org/pdf/1802.10264.pdf.
However, the selected 'dynamic' closed-loop control was selected as it resembles what humans do more closely.

RL Algorithm

Decided to use Truncated Quantile Critics (TQC), derived from Soft Actor-Critic (SAC)
- TQC - https://arxiv.org/abs/2005.04269
- SAC - https://arxiv.org/abs/1812.05905 (refined paper)
  - SAC - https://arxiv.org/abs/1801.01290 (first paper)
- Using stable-baselines3 implementation (TQC is currently in sb3-contrib)

Actions

List of actions that the agent is allowed to take that must provide the ability to accomplish task successfully. All of these will be part of a single action-space vector.

End effector pose

Position
- Absolute/Relative
  - ~~Absolute~~ (direct target, in world frame)
  - Relative (relative target, in end effector frame)
    - Selected because it is much more popular in literature and can be specified with normalised limits
Orientation
- Absolute/Relative
  - ~~Absolute~~
  - Relative
- Number of DOF - 1D (around Z) || full 3D
  1. Use orientation only around Z at first (more popular in literature)
  2. Then try to use full 3D (Note: Implemented, but not used yet... might try later)
- Representation (3D)
  - Quaternion
  - Rotation matrix
  - "6D representation"

Position (Relative)

Dimension
- (x, y, z)
Limits
- Normalised [-1, 1]
- Scaled into smaller metric steps before use, e.g. [-0.1 m, 0.1 m]

Orientation (Relative)

Dimension
- z || (x, y, z, w) || R[3x3] || v₁(3x1), v₂(3x1)
Limits
- Normalised [-1, 1]
- Converted into normalised quaternion before use (no matter the original representation)

Gripper

Absolute/Relative
- Absolute (desired gripper state)
  - This seems to be preferred in literature, i.e. a binary signal for open/close
- Relative (change in gripper state)

Gripper (Absolute)

Signals
- Action
  - Open/close
- ~~Force~~ (optional, not needed in simplified case - use max, added to Future Works (FW) #51)
- ~~Width~~ (optional, not needed in simplified case - use min/max)
  - Applicable only when closing
Dimension
- Action can be encoded as one-hot vector (open, close) as in https://arxiv.org/pdf/1806.10293.pdf or scalar
- Scalar for each other signal
Limits
- Normalised [-1, 1]
- Remapped to account for limits of the signals, e.g. max force

Gripper (Relative)

Signals
- Width
- ~~Force~~ (optional, not needed in simplified case - use max)
  - Applicable only when closing, i.e. width < 0
Dimension
- Scalar for each signal
Limits
- Normalised [-1, 1]
- Remapped to account for limits of the signals, e.g. min/max width

Observations

Octree of the scene

Constructed from point cloud
- This point cloud needs to be
  - Transformed into robot coordinate frame
  - Cropped to a bounding-box with consistent size (preferable aspect ratio of 1:1:1)
    - For now, restrict this bounding-box to a volume above ground plane
    - It should be possible to extend this to the entire reachable workspace of the robot (added to FW #51)
Features (more than one can be used)
- Normals
  - (x, y, z)
- Distance to average point position (w.r.t. octet centre)
- Color
  - (r, g, b)
Extra?
- Position of octree centre/corner w.r.t robot base frame (added to FW #51)

End effector pose

Position
Orientation
- Representation
  - Quaternion
  - Rotation matrix
  - "6D representation"

Position

Dimension
- (x, y, z)
Not normalised

Orientation

Dimension
-v₁(3x1), v₂(3x1)
Normalised

Gripper state

Width
- Normalised
- This will be identical to state (open/close) if binary actions are used

Reward function

Ongoing epic: #41

Dense/Sparse
- Dense (shaped)
- Sparse (shaped)

Sparse (shaped)

reward multiplier r (currently r = 4.0)

+r^0 for reaching object (within 10cm)
+r^1 for touching object
+r^2 for grasping object
+r^3 for lifting object (above 15cm, terminates episode)
-1.0 for touching ground (terminates episode)
0.0 for pushing all objects outside of workspace (terminates episode)
-0.005 for all time steps

Policy (network architecture)

Currently, using depth=4 and full_depth=2

Feature Extractor (shared between actor and critics):

Octree:
- [Conv3D -> (BatchNorm) -> ReLU] * (depth-full_depth) -> Conv1D/Conv3D -> Octree2Voxel -> Flatten -> Linear
- BatchNorm is optional
- Conv1D is default, but can be replaced by Conv3D by setting argumeng
- See source code for OctreeCnnFeatureExtractor for more details. e.g number of channels.
Auxiliary (proprioceptive) observations:
- Linear
Octree and auxiliary features are concatenated together
Actor and Critics:
[Linear -> ReLU] * 2

Domain randomisation

Currently, the following domain randomisation can be applied in the simulation

Random objects from Google Scanned Objects collection (80 objects for training, 20 for testing)
- Random mass (with limits)
- Random surface friction (with limits)
- Random pose (above ground plane)
Random texture of the ground plane (80 textures for training, 20 for testing)
Random initial configuration for the robot
Random perspective of the camera

Tune PID gains for joint position control

Decide what kind of 3D data representation and corresponding network architecture to use for observations/policy

Considered option:

Voxel grid
- https://github.com/Durant35/VoxNet
- Straight-forward, but memory and computation inefficient
- Don't use
Point cloud directly
- https://github.com/charlesq34/pointnet and https://github.com/charlesq34/pointnet2
- A function defined by the unordered points is approximated via applying a symmetric function (the function itself is approximated by mlp). Seems to be general and work well.
Octree
- https://github.com/microsoft/O-CNN
- Construct octree structure and use features of the finest leaf octants as input into CNN (averaged normal vector used in the paper, but other features can be used, e.g. rgb). 3D Convolution and pooling (which up-scales octants to higher depth) modules can then be used on these features only, to improve performance and lower memory requirements (rather than processing the entire voxel grid volume).

Replace Ignition bridge for Image/PointCloud data with a plugin that has direct ROS2 communication

Conversion with ros_ign_bridge (#11) is pretty costly

See https://github.com/ignitionrobotics/ros_ign/tree/ros2/ros_ign_point_cloud for more info. It needs to be implemented.

Convert point cloud to octree, using normal vectors as features

https://github.com/microsoft/O-CNN/tree/master/octree

EPIC: Actions for agent (simulation)

Implemented in https://github.com/AndrejOrsula/ign_moveit2

EPIC: Reward function for grasping environment

From literature:

Sparse

Simple sparse reward

https://arxiv.org/pdf/1802.10264.pdf
- +1 for successful grasp
https://ieeexplore-ieee-org.zorac.aub.aau.dk/stamp/stamp.jsp?tp=&arnumber=8460553
- +1 for successful grasp

Sparse reward with extra guidance

https://arxiv.org/pdf/2007.04499.pdf
- +10 for successful grasp (gripper holding object above certain height)
- +1 (partial reward) if gripper comes in contact with the object
- -1 if no action is taken
- -0.025 for all time steps prior to termination
https://arxiv.org/pdf/1806.10293.pdf
- +1 if the gripper contains an object and is above a certain height
- −0.05 for all steps prior to termination (max 20 steps, but agent is allowed to terminate earlier)
- At the beginning, this paper provides demonstrations of grasps from other algorithms providing 15-30% success rate. Then they use only the learned policy + exploration once 50% success it reached.

Dense

These don't seem to be that common in literate and those that are present are usually quite complex.

https://arxiv.org/pdf/1910.09470.pdf (very complex reward)
- Reach reward (upon being close to the object, e.g. 0.01 m) +[0.0, 0.01]
- Touch reward +1 upon contact with the object
- Lift reward +1 (when above certain theshold
- ~~Task-specific reward (stacking objects)~~
- All rewards are summed (specific function is used)

Other dense reward formulations

Increase in object's height (Final goal)
- Reward could increase linearly as actor lifts object relative to their original position. This could be further restricted that there must be a contact between robot fingers and the object (such that actor holds the object rather than throwing it).
Touching an object (Helping step)
- A small reward could be added for each time step robot touches an object with both fingers.

Guiding the agent

Demonstration of success

Add samples of success to the replay buffer at the beginning of training. These could be either from manual demonstration or by using another algorithm (could also be pre-planned).

Curriculum learning

Slowly increase the overall difficulty of the task, e.g. start from one object with restricted operating area.

https://arxiv.org/abs/2003.04960

Extend 3D observations/policy with visual (RGB) data

Random object - Make sure models belong to the Fuel collection if sampled from local cache

Currently, there is no such functionality to support this and only ownership is being checked.

Textures with the same base-name are reused for meshes (obj + mtl) when using ogre2

Models from Google Scanned Objects collection have identical directory structure and names for the textures. If multiple models from this collection are used, they will all use the same texture, which belongs to the first inserted model. This issue occurs only with ogre2.

Fix if simple or find a quick workaround

EPIC: Simulation environment

Eliminate the need to run unpaused steps during each reset

Unpaused steps are executed such that:

Camera gets the latest observation
- This could be improved by waiting until camera publishes new topic
Robot reaches its starting configuration
- Using controller to reach pose because reset of joint positions causes segfault
- I currently have no suggestion/solution to this. The segfault must be fixed (#28)

Panda control (real)

There is currently no ROS2/MoveIt2 support for Panda, mainly due to ros2_control

Follow https://github.com/frankaemika/franka_ros
- Some progress of porting it to ROS2 - https://github.com/mahaarbo/franka_ros/tree/dashing-devel

Alternatively, use ros1<->ros2 bridge with MoveIt (not a preferred solution)

There is an open MR to implement action bridge (required)

Refactor adding of controller inside panda model

It is very messy right now ...

Make sure a new observation is received after each reset (Ignition sensors)

Currently there is a tradeoff between sensor update rate and how many unpaused steps are run after each reset.

First, make sure we can guarantee that reset received new observations of the reset state, then try to eliminate the unpaused steps altogether (#29)

Integrate MoveIt2 with Ignition

Selected MoveIt2 over MoveIt after the following considerations:

There is currently no existing integration of Ignition with either MoveIt or MoveIt2. Therefore, there is no direct benefit to MoveIt.
Migration of MoveIt2 is quite far
- No extensive examples/documentation yet though...
- Major disadvantage is currently the missing integration with ros2_control. As far as I can see, there is only fake controller + simple controller manager. New state of ros2_control also means that there is currently no interface for real Panda yet, see #16.
Overall preference to use ROS2

Integration plan

The long-term solution seems creation of ign_ros2_control (does not exist yet), similar to gazebo_ros2_control. However, ros2_control is still under very active development and examples/demos of using the interface (not just implementing the interface) are missing. This is currently far beyond my competences to look into.

Therefore, I am looking into the following alternative that conceptually seems simpler.

Create a ROS2 node with MoveIt2 backend that provides
- JointTrajectory messages for each plan
- C++ example/template
- Python module providing MoveIt2 interface (substitute for moveit_commander)
- Python example
Create Ignition controller plugin joint_trajectory_controller
- This plugin should take Ignition alternative of JointTrajectory (needs to be created) and follow it (look into similar projects for inspiration)
- Create JointTrajectory protobuf for Ignition
Bridge JointTrajectory between ROS2 msg and Ignition protobuf

Additional links/info (might not be relevant):

MoveIt config - https://github.com/ros-planning/panda_moveit_config

EPIC: Sim2Real transfer

Certain order of python imports causes segmentation fault (gym-ignition)

From what I have noticed so far:

open3d must be before gym-ignition
torch (tensorflow) must be before gym-ignition

It seems to be connected to protobuf version (some special order gave me an error before segfault, where protobuf 3.6 was found, but 3.9 required).

Reordering solves to problem, therefore this issue is low-priority arctic ice box.

Measure gripper contact forces

Detect robot-object collisions

Related to #12

Tweak simulation/physics parameters

...

Try simplifying collision geometry of the robot

EPIC: Observations for agent (simulation)

Setup virtual RGB-D camera

Basic setup

Add to SDF

Communication with ROS2

RGB image in ROS2
Depth image in ROS2
PointCloud in ROS2

Others

Setup transformations within ROS2
Investigate camera params, e.g. noise

Mesh/texture (material) memory leak

Removing a model does not free all of its memory, there is a memory leak related to meshes/textures (material). This introduces limitation to domain randomisation which causes training to eventually fail. Current workaround is to limit the frequency of adding/removing models in order to post-pone eventual crash due to no available memory.

Encountered using ogre2
- Meshes with textures
  - Noticed with obj+mtl
- Primitives with textures
  - Noticed with metallic PBR pipeline

Related upstream issue:
- gazebosim/gz-rendering#39

Investigate further (ign-rendering)
Try to find a solution (if time allows it)

Eliminate the need of having base worlds for tasks (use default world instead)

This is currently required due to world plugins

Create a model for ground plane with random texture

Experiment with low-poly version of panda (collision mesh) to see if it brings any performance for training

Create a low poly version of the lower links (keep hand and fingers representative enough)
See if it brings any noticeable improvements or if the current aabb optimised collision checker is good enough

Implement object spawning (primitives)

Try out ROS2 Ignition bridge

Implement observations/policy for 3D spatial (e.g. occupancy) data

Resources:

Enable control of full 3D orientation

Extend action from controlling only the orientation around Z axis to full 3D orientation

Implement object spawning (mesh)

RGB-D camera does not render shadows when used with ogre2

Shadows are not rendered in images from camera sensor when used in combination with ogre2. This issue does not occur with ogre1.

Affects only RGB-D camera

Refactor gripper handling inside the task

Make it more general and easier to use/setup

Detect if object is successfully grasped

Tensorboard is currently not creating logs due to protobuf version discrepancy between Gazebo and Tensorflow

In my setup with gym_ignition + stable-baselines3 + open3d, I cannot use protobuf from pip because it causes segfault during import in python scripts. Currently, I am building latest version of protobuf from source and disabling tensorboard as it is the only way for me to even run the training.

Therefore, no tensorboard logs are currently being created.

Try to use "6D representation" for orientation representation

https://towardsdatascience.com/better-rotation-representations-for-accurate-pose-estimation-e890a7e1317f is an interesting article that encourages to use "6D representation" instead of classical quaternions (currently used).

I am up for trying that!

Replace in actions

Additional Resources:

Create a simple gym environment

Start with https://github.com/robotology/gym-ignition

andrejorsula / drl_grasping Goto Github PK

drl_grasping's Introduction

Deep Reinforcement Learning for Robotic Grasping from Octrees

Overview

Model Datasets

Instructions

Hardware Requirements

Install Docker

Clone a Prebuilt Docker Image

(Optional) Build a New Image

Run a Docker Container

Hardware Requirements

Dependencies

Building

Sourcing

Remote Visualization

TensorBoard

(Experimental) Train with Dreamer V2

Citation

Directory Structure

drl_grasping's People

Contributors

Stargazers

Watchers

Forkers

drl_grasping's Issues

Parallel environments

Data augmentation

Control force of grasps

Random position of scene (ground plane + object)

Extend octree to cover the entire reachable workspace of the robot

Geometry of ground plane (heighmap)

Let agent terminate episode

Semantic grasping

Directional (ray-based) grasping

Setup

Task

Control loop

RL Algorithm

Actions

End effector pose

Position (Relative)

Orientation (Relative)

Gripper

Gripper (Absolute)

Gripper (Relative)

Observations

Octree of the scene

End effector pose

Position

Orientation

Gripper state

Reward function

Sparse (shaped)

Policy (network architecture)

Domain randomisation

Sparse

Simple sparse reward

Sparse reward with extra guidance

Dense

Other dense reward formulations

Guiding the agent

Demonstration of success

Curriculum learning

Recommend Projects

Recommend Topics

Recommend Org