Giter VIP home page Giter VIP logo

andrejorsula / drl_grasping Goto Github PK

View Code? Open in Web Editor NEW
371.0 9.0 49.0 1.09 MB

Deep Reinforcement Learning for Robotic Grasping from Octrees

Home Page: https://arxiv.org/pdf/2208.00818

License: BSD 3-Clause "New" or "Revised" License

CMake 0.27% Python 96.12% Dockerfile 1.21% Shell 2.41%
robotics grasping reinforcement-learning octree domain-randomization sim2real ros2 gym-ignition stable-baselines3 deep-reinforcement-learning

drl_grasping's Introduction

Deep Reinforcement Learning for Robotic Grasping from Octrees

This project focuses on applying deep reinforcement learning to acquire a robust policy that allows robots to grasp diverse objects from compact 3D observations in the form of octrees.

Evaluation of a trained policy on novel scenes (previously unseen camera poses, objects, terrain textures, ...).

Sim-to-Real transfer of a policy trained solely inside a simulation (zero-shot transfer). Credit: Aalborg University

Evaluation of a trained policy for grasping rocks on the Moon inside a simulation.

Sim-to-Real transfer in a Moon-analogue facility (zero-shot transfer). Credit: University of Luxembourg

Overview


This repository contains multiple RL environments for robotic manipulation, focusing on robotic grasping using continuous actions in Cartesian space. All environments have several observation variants that enable direct comparison (RGB images, depth maps, octrees, ...). Each task is coupled with a simulation environment that can be used to train RL agents. These agents can subsequently be evaluated on real robots that integrate ros2_control (or ros_control via ros1_bridge).

End-to-end model-free actor-critic algorithms have been tested on these environments (TD3, SAC and TQC | SB3 PyTorch implementation). A setup for experimenting with model-based algorithm (DreamerV2 | original TensorFlow implementation) is also provided, however, it is currently limited to RGB image observations. Interoperability of environments with most algorithms and their implementations should be possible due to compatibility with the Gym API.

List of Environments

Below is the list of implemented environments. Each environment (observation variant) has two alternatives, Task-Obs-vX and Task-Obs-Gazebo-vX (omitted from the table). Here, Task-Obs-vX implements the logic of the environment and can be used on real robots, whereas Task-Obs-Gazebo-vX combines this logic with the simulation environment inside Gazebo. Robots should be interchangeable for most parts, with some limitations (e.g. GraspPlanetary task requires a mobile manipulator to randomize the environment fully).

If you are interested in configuring these environments, first take a look at the list of their parameters inside Gym registration and then at their individual source code.

Reach the end-effector goal. Grasp and lift a random object. Grasp and lift a Moon rock.
Reach-v0 (state obs) Grasp-v0 (state obs) GraspPlanetary-v0 (state obs)
GraspPlanetary-MonoImage-v0
Reach-ColorImage-v0 GraspPlanetary-ColorImage-v0
Reach-DepthImage-v0 GraspPlanetary-DepthImage-v0
GraspPlanetary-DepthImageWithIntensity-v0
GraspPlanetary-DepthImageWithColor-v0
Reach-Octree-v0 Grasp-Octree-v0 GraspPlanetary-Octree-v0
Reach-OctreeWithIntensity-v0 Grasp-OctreeWithIntensity-v0 GraspPlanetary-OctreeWithIntensity-v0
Reach-OctreeWithColor-v0 Grasp-OctreeWithColor-v0 GraspPlanetary-OctreeWithColor-v0

By default, Grasp and GraspPlanetary tasks utilize GraspCurriculum that shapes their reward function and environment difficulty.

Domain Randomization

To facilitate the sim-to-real transfer of trained agents, simulation environments introduce domain randomization with the aim of improving the generalization of learned policies. This randomization is accomplished via ManipulationGazeboEnvRandomizer that populates the virtual world and enables randomizing of several properties at each reset of the environment. As this randomizer is configurable with numerous parameters, please take a look at the source code to see what environments you can create.

Examples of domain randomization for the Grasp task.

Examples of domain randomization for the GraspPlanetary task.

Model Datasets

Simulation environments in this repository can utilize datasets of any SDF models, e.g. models from Fuel. By default, the Grasp task uses Google Scanned Objects collection together with a set of PBR textures pointed to by TEXTURE_DIRS environment variable. On the contrary, the GraspPlanetary task employs custom models that are procedurally generated via Blender. However, this can be adjusted if desired.

All external models can be automatically configured and randomized in several ways via ModelCollectionRandomizer before their insertion into the world, e.g. optimization of collision geometry, estimation of (randomized) inertial properties and randomization of parameters such as geometry scale or surface friction. When processing large collections, model filtering can also be enabled based on several aspects, such as the complexity of the geometry or the existence of disconnected components. A few scripts for managing datasets can be found under scripts/utils/ directory.

End-to-End Learning from 3D Octree Observations

This project initially investigated how 3D visual observations can be leveraged to improve end-to-end learning of manipulation skills. Octrees were selected for this purpose due to their efficiently organized structure compared to other 3D representations.

To enable the extraction of abstract features from 3D octree observations, an octree-based 3D CNN is employed. The network module that accomplishes such feature extraction is implemented in the form of OctreeCnnFeaturesExtractor (PyTorch). This features extractor is part of the OctreeCnnPolicy policy implemented for TD3, SAC and TQC algorithms. Internally, the feature extractor utilizes O-CNN implementation to benefit from hardware acceleration on NVIDIA GPUs.

Illustration of the end-to-end actor-critic network architecture with octree-based 3D CNN feature extractor.

Limitations

The known limitations of this repository are listed below for your convenience.

  • No parallel environments – It is currently not possible to run multiple instances of the environment simultaneously.
  • Slow training – The simulation environments are computationally complex (physics, rendering, underlying low-level control, ...). This significantly impacts the ability to train agents with time and computational constraints. The performance of some of these aspects can be improved at the cost of accuracy and realism (e.g. physics_rate/step_size).
  • Suboptimal hyperparameters – Although a hyperparameter optimization framework was employed for some combinations of environments and algorithms, it is a prolonged process. This problem is exacerbated by the vast quantity of hyperparameters and their general brittleness. Therefore, the default hyperparameters provided in this repository might not be optimal.
  • Nondeterministic – Experiments are not fully repeatable, and even the same seed of the pseudorandom generator can lead to different results. This is caused by several aspects, such as the nondeterministic nature of network-based communication and non-determinism in the underlying deep learning frameworks and hardware.

Instructions

Setup-wise, there are two options when using this repository. Option A – Docker is recommended when trying this repository due to its simplicity. Otherwise, Option B – Local Installation can be used if a local setup is preferred. Both of these options are equal for the usage of this repository; however, pre-built Docker images come with all the required datasets while enabling isolation of runs.

Option A – Docker

Hardware Requirements

  • CUDA GPU – CUDA-enabled GPU is required for hardware-accelerated processing of octree observations. Everything else should also be functional on the CPU.

Install Docker

First, ensure your system has a setup for using Docker with NVIDIA GPUs. You can follow install_docker_with_nvidia.bash installation script for Debian-based distributions. Alternatively, consult the NVIDIA Container Toolkit Installation Guide for other Linux distributions.

# Execute script inside a cloned repository
.docker/host/install_docker_with_nvidia.bash
# (Alternative) Execute script from URL
bash -c "$(wget -qO - https://raw.githubusercontent.com/AndrejOrsula/drl_grasping/master/.docker/host/install_docker_with_nvidia.bash)"

Clone a Prebuilt Docker Image

Prebuilt Docker images of drl_grasping can be pulled directly from Docker Hub without needing to build them locally. You can use the following command to manually pull the latest image or one of the previous tagged Releases. The average size of images is 25GB (including datasets).

docker pull andrejorsula/drl_grasping:${TAG:-latest}

(Optional) Build a New Image

It is also possible to build the Docker image locally using the included Dockerfile. To do this, build.bash script can be executed as shown below (arguments are optional). This script will always print the corresponding low-level docker build ... command for your reference.

.docker/build.bash ${TAG:-latest} ${BUILD_ARGS}

Run a Docker Container

For simplicity, please run drl_grasping Docker containers using the included run.bash script shown below (arguments are optional). It enables NVIDIA GPUs and GUI interface while automatically mounting the necessary volumes (e.g. persistent logging) and setting environment variables (e.g. synchronization of middleware communication with the host). This script will always print the corresponding low-level docker run ... command for your reference.

# Execute script inside a cloned repository
.docker/run.bash ${TAG:-latest} ${CMD}
# (Alternative) Execute script from URL
bash -c "$(wget -qO - https://raw.githubusercontent.com/AndrejOrsula/drl_grasping/master/.docker/run.bash)" -- ${TAG:-latest} ${CMD}

The network communication of drl_grasping within this Docker container is configured based on the ROS 2 ROS_DOMAIN_ID environment variable, which can be set via ROS_DOMAIN_ID={0...101} .docker/run.bash ${TAG:-latest} ${CMD}. By default (ROS_DOMAIN_ID=0), external communication is restricted and multicast is disabled. With ROS_DOMAIN_ID=42, the communication remains restricted to localhost with multicast enabled, enabling monitoring of communication outside the container but within the same system. Using ROS_DOMAIN_ID=69 will use the default network interface and multicast settings, which can enable monitoring of communication within the same LAN. All other ROS_DOMAIN_IDs share the default behaviour and can be employed to enable communication partitioning for running of multiple drl_grasping instances.

Option B – Local Installation

Hardware Requirements

  • CUDA GPU – CUDA-enabled GPU is required for hardware-accelerated processing of octree observations. Everything else should also be functional on the CPU.

Dependencies

Ubuntu 20.04 (Focal Fossa) is the recommended OS for local installation. Other Linux distributions might work but require most dependencies to be built from the source.

These are the primary dependencies required to use this project that must be installed on your system.

All additional dependencies are either pulled via vcstool (drl_grasping.repos) or installed via pip (python_requirements.txt) and rosdep during the building process below.

Building

Clone this repository recursively and import VCS dependencies. Then install dependencies and build with colcon.

# Clone this repository into your favourite ROS 2 workspace
git clone --recursive https://github.com/AndrejOrsula/drl_grasping.git
# Install Python requirements
pip3 install -r drl_grasping/python_requirements.txt
# Import dependencies
vcs import < drl_grasping/drl_grasping.repos
# Install dependencies
IGNITION_VERSION=fortress rosdep install -y -r -i --rosdistro ${ROS_DISTRO} --from-paths .
# Build
colcon build --merge-install --symlink-install --cmake-args "-DCMAKE_BUILD_TYPE=Release"

Sourcing

Before utilizing this project via local installation, remember to source the ROS 2 workspace.

source install/local_setup.bash

This enables:

  • Use of drl_grasping Python module
  • Execution of binaries, scripts and examples via ros2 run drl_grasping <executable>
  • Launching of setup scripts via ros2 launch drl_grasping <launch_script>
  • Discoverability of shared resources
Test Random Agents

A good starting point is to simulate some episodes using random agents where actions are sampled from the defined action space. This is also useful when modifying environments because it lets you analyze the consequences of actions and resulting observations without deep learning pipelines running in the background. To get started, run the following example. It should open RViz 2 and Gazebo client instances that provide you with visual feedback.

ros2 run drl_grasping ex_random_agent.bash

After running the example script, the underlying ros2 launch drl_grasping random_agent.launch.py ... command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments.

ros2 launch drl_grasping random_agent.launch.py seed:=42 robot_model:=lunalab_summit_xl_gen env:=GraspPlanetary-Octree-Gazebo-v0 check_env:=false render:=true enable_rviz:=true log_level:=warn
Train New Agents

You can also train your agents from scratch. To begin the training, run the following example. By default, headless mode is used during the training to reduce computational load.

ros2 run drl_grasping ex_train.bash

After running the example script, the underlying ros2 launch drl_grasping train.launch.py ... command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments.

ros2 launch drl_grasping train.launch.py seed:=42 robot_model:=panda env:=Grasp-OctreeWithColor-Gazebo-v0 algo:=tqc log_folder:=/root/drl_grasping_training/train/Grasp-OctreeWithColor-Gazebo-v0/logs tensorboard_log:=/root/drl_grasping_training/train/Grasp-OctreeWithColor-Gazebo-v0/tensorboard_logs save_freq:=10000 save_replay_buffer:=true log_interval:=-1 eval_freq:=10000 eval_episodes:=20 enable_rviz:=false log_level:=fatal

Remote Visualization

To visualize the agent while training, separate RViz 2 and Gazebo client instances can be opened. For the Docker setup, these commands can be executed in a new drl_grasping container with the same ROS_DOMAIN_ID.

# RViz 2 (Note: Visualization of robot model will not be loaded using this approach)
rviz2 -d $(ros2 pkg prefix --share drl_grasping)/rviz/drl_grasping.rviz
# Gazebo client
ign gazebo -g

TensorBoard

TensorBoard logs will be generated during training in a directory specified by the tensorboard_log:=${TENSORBOARD_LOG} argument. You can open them in your web browser using the following command.

tensorboard --logdir ${TENSORBOARD_LOG}

(Experimental) Train with Dreamer V2

You can also try to train some agents using the model-based Dreamer V2 algorithm. To begin the training, run the following example. By default, headless mode is used during the training to reduce computational load.

ros2 run drl_grasping ex_train_dreamerv2.bash

After running the example script, the underlying ros2 launch drl_grasping train_dreamerv2.launch.py ... command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments.

ros2 launch drl_grasping train_dreamerv2.launch.py seed:=42 robot_model:=lunalab_summit_xl_gen env:=GraspPlanetary-ColorImage-Gazebo-v0 log_folder:=/root/drl_grasping_training/train/GraspPlanetary-ColorImage-Gazebo-v0/logs eval_freq:=10000 enable_rviz:=false log_level:=fatal
Evaluate New Agents

Once you train your agents, you can evaluate them. Start by looking at ex_evaluate.bash, which can be modified to fit your trained agent. It should open RViz 2 and Gazebo client instances that provide you with visual feedback, while the agent's performance will be logged and printed to STDOUT.

ros2 run drl_grasping ex_evaluate.bash

After running the example script, the underlying ros2 launch drl_grasping evaluate.launch.py ... command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments. For example, you can select a specific checkpoint with the load_checkpoint:=${LOAD_CHECKPOINT} argument instead of running the final model.

ros2 launch drl_grasping evaluate.launch.py seed:=77 robot_model:=panda env:=Grasp-Octree-Gazebo-v0 algo:=tqc log_folder:=/root/drl_grasping_training/train/Grasp-Octree-Gazebo-v0/logs reward_log:=/root/drl_grasping_training/evaluate/Grasp-Octree-Gazebo-v0 stochastic:=false n_episodes:=200 load_best:=false enable_rviz:=true log_level:=warn
Optimize Hyperparameters

The default hyperparameters for training agents with TD3, SAC and TQC can be found under the hyperparams directory. Optuna can be employed to autotune some of these parameters. To get started, run the following example. By default, headless mode is used during hyperparameter optimization to reduce computational load.

ros2 run drl_grasping ex_optimize.bash

After running the example script, the underlying ros2 launch drl_grasping train.launch.py ... command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments.

ros2 launch drl_grasping optimize.launch.py seed:=69 robot_model:=panda env:=Grasp-Octree-Gazebo-v0 algo:=tqc log_folder:=/root/drl_grasping_training/optimize/Grasp-Octree-Gazebo-v0/logs tensorboard_log:=/root/drl_grasping_training/optimize/Grasp-Octree-Gazebo-v0/tensorboard_logs n_timesteps:=1000000 sampler:=tpe pruner:=median n_trials:=20 n_startup_trials:=5 n_evaluations:=4 eval_episodes:=20 log_interval:=-1 enable_rviz:=true log_level:=fatal

Citation

Please use the following citation if you use drl_grasping in your work.

@inproceedings{orsula_learning_2022,
  author    = {Andrej Orsula and Simon B{\o}gh and Miguel Olivares-Mendez and Carol Martinez},
  title     = {{Learning} to {Grasp} on the {Moon} from {3D} {Octree} {Observations} with {Deep} {Reinforcement} {Learning}},
  year      = {2022},
  booktitle = {2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages     = {4112--4119},
  doi       = {10.1109/IROS47612.2022.9981661}
}

Directory Structure

.
├── drl_grasping/        # [dir] Primary Python module of this project
│   ├── drl_octree/      # [dir] Submodule for end-to-end learning from 3D octree observations
│   ├── envs/            # [dir] Submodule for environments
│   │   ├── control/     # [dir] Interfaces for the control of agents
│   │   ├── models/      # [dir] Functional models for simulation environments
│   │   ├── perception/  # [dir] Interfaces for the perception of agents
│   │   ├── randomizers/ # [dir] Domain randomization of the simulated environments
│   │   ├── runtimes/    # [dir] Runtime implementations of the task (sim/real)
│   │   ├── tasks/       # [dir] Implementation of tasks
│   │   ├── utils/       # [dir] Environment-specific utilities used across the submodule
│   │   └── worlds/      # [dir] Minimal templates of worlds for simulation environments
│   └── utils/           # [dir] Submodule for training and evaluation scripts boilerplate (using SB3)
├── examples/            # [dir] Examples for training and evaluating RL agents
├── hyperparams/         # [dir] Default hyperparameters for training RL agents
├── launch/              # [dir] ROS 2 launch scripts that can be used to interact with this repository
├── pretrained_agents/   # [dir] Collection of pre-trained agents
├── rviz/                # [dir] RViz2 config for visualization
├── scripts/             # [dir] Helpful scripts for training, evaluation and other utilities
├── CMakeLists.txt       # Colcon-enabled CMake recipe
└── package.xml          # ROS 2 package metadata

drl_grasping's People

Contributors

andrejorsula avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

drl_grasping's Issues

Make sure random objects don't get spawned inside each other, while getting spawned as close to ground plane as possible

Not a critical problem as such, but it significantly slows the simulation if intersection between two collision geometries occurs.

Prior to anything else, try using https://github.com/dartsim/dart instead of the ignition fork. Things might explode, but that might be okay. Else fix appropriately.

  • Try https://github.com/dartsim/dart
    • If not satisfactory, fix this issue correctly (there are currently only bad ways that I can think of, so investigate it more)

EPIC: Future works

A list of future improvements with short description. These are currently not part of Design and decisions (#50). These issues might be created and worked on if time allows it. However, their priority is lower than any open issues as they are likely out of scope for this project.

Parallel environments

Make environment more scalable.

Data augmentation

Data augmentation can go a long way - https://arxiv.org/pdf/2004.14990.pdf

Adding noise to observations is simple. More complex augmentation might be a bit difficult with octree structure - what data to augment, e.g. rgb + depth image or point cloud?

Control force of grasps

Gripper currently applies the same force to all objects. However, this force might be too excessive for several objects and break them in real life. For example, this problem could be formulated as an energy minimisation problem and incorporated into the reward function.

Random position of scene (ground plane + object)

Agent trained in an environment where the position of ground plane is randomized might provide better generalisation and allow Sim2Real transfer even if real setup does not match the simulation environment. The use of Octree (transformed into robot coordinate frame) allows this feature.

Extend octree to cover the entire reachable workspace of the robot

With this addition, no extra information about the origin of octree would need to be included as pose of all observed surfaces would already have spatial encoding. Therefore, the agent should be able to grasp objects located anywhere within its reach (if properly generalised). This would obviously require much more training, computation and memory (since depth of octree would need to be further increased).

Geometry of ground plane (heighmap)

Variety in geometry of ground plane further improves generalisation. Currently, only horizontal flat plane is used.

Let agent terminate episode

Semantic grasping

A policy that grasps one specific object, e.g. based on extra observation of current object's position given as goal (and modify reward of course). Grasping specific object class from clutter is also pretty useful.

Directional (ray-based) grasping

A policy that grasps object(s) from one specific direction (or area) instead of random one, e.g. grasp tool by handle. For example, a ray could be given as an extra observation to indicate goal and provide more reward the closer it matches.

EPIC: Design and decisions

This is the current design that contains major decisions for the project. Additional future work and improvements that are not part of this design are listed in #51 and might be eventually included in the implementation if time allows it.


Setup

Reproducible setup in a simulation and test setup in real life. Ignition Gazebo was selected as the robotics simulator and is used for training the RL agent.

  • Robotic arm
  • RGB-D camera
  • Objects
    • Variety of shapes, sizes and materials
    • Placed randomly on top of horizontal surface
    • Ignition models - Google Scanned Objects collection
    • Real-life test objects - idk, something random

Task

Grasping in its simplest can be conceptually decoupled into these sub-routines. RL agent should aspire to learn steps 1-3, with the 4th step being determined by surrounding application, e.g. success or max number of steps. Agent can learn additional steps through exploration, e.g. push and pull objects in order to provide better grasping conditions.

  1. Move end effector (gripper) to pre-grasp pose
    • Pose must be determined from sensory observations
  2. Close gripper
  3. Lift object above the supporting surface
    • Make sure the grasp is secure
  4. Terminate and allow other tasks/processes to execute
    • Outside of scope for agent's policy, but it needs to be determined both in simulation and real-life

These stages slightly inspired stages in curriculum learning, see #62.


Control loop

  • The main control loop of the agent runs at relatively low frequency (~2.5Hz)
  • Low-level controllers, e.g. joint PIDs, and sensors, e.g. RGB-D camera, run at higher update rate than the agent (~200Hz for control, >=15Hz for sensing)
  1. Get observations
  2. Predict actions
  3. Execute actions (simultaneously)
    • Move arm to the new configuration
    • Execute gripper action
      • This action might be executed much faster than arm movement
  4. Repeat until termination (success or max steps)

Another approach would be to decompose the task to sensing, planning, execution (e.g. robot action would consist only of grasp pose and everything else would be performed outside the agent's policy), or remove gripper action from the control loop and perform grasp once episode is terminated or a certain Z position is reached, e.g. https://arxiv.org/pdf/1802.10264.pdf.
However, the selected 'dynamic' closed-loop control was selected as it resembles what humans do more closely.


RL Algorithm


Actions

List of actions that the agent is allowed to take that must provide the ability to accomplish task successfully. All of these will be part of a single action-space vector.

End effector pose

  • Position
    • Absolute/Relative
      • Absolute (direct target, in world frame)
      • Relative (relative target, in end effector frame)
        • Selected because it is much more popular in literature and can be specified with normalised limits
  • Orientation
    • Absolute/Relative
      • Absolute
      • Relative
    • Number of DOF - 1D (around Z) || full 3D
      1. Use orientation only around Z at first (more popular in literature)
      2. Then try to use full 3D (Note: Implemented, but not used yet... might try later)
    • Representation (3D)
      • Quaternion
      • Rotation matrix
      • "6D representation"
Position (Relative)
  • Dimension
    • (x, y, z)
  • Limits
    • Normalised [-1, 1]
    • Scaled into smaller metric steps before use, e.g. [-0.1 m, 0.1 m]
Orientation (Relative)
  • Dimension
    • z || (x, y, z, w) || R[3x3] || v1(3x1), v2(3x1)
  • Limits
    • Normalised [-1, 1]
    • Converted into normalised quaternion before use (no matter the original representation)

Gripper

  • Absolute/Relative
    • Absolute (desired gripper state)
      • This seems to be preferred in literature, i.e. a binary signal for open/close
    • Relative (change in gripper state)
Gripper (Absolute)
  • Signals
    • Action
      • Open/close
    • Force (optional, not needed in simplified case - use max, added to Future Works (FW) #51)
    • Width (optional, not needed in simplified case - use min/max)
      • Applicable only when closing
  • Dimension
  • Limits
    • Normalised [-1, 1]
    • Remapped to account for limits of the signals, e.g. max force
Gripper (Relative)
  • Signals
    • Width
    • Force (optional, not needed in simplified case - use max)
      • Applicable only when closing, i.e. width < 0
  • Dimension
    • Scalar for each signal
  • Limits
    • Normalised [-1, 1]
    • Remapped to account for limits of the signals, e.g. min/max width

Observations

Octree of the scene

  • Constructed from point cloud
    • This point cloud needs to be
      • Transformed into robot coordinate frame
      • Cropped to a bounding-box with consistent size (preferable aspect ratio of 1:1:1)
        • For now, restrict this bounding-box to a volume above ground plane
        • It should be possible to extend this to the entire reachable workspace of the robot (added to FW #51)
  • Features (more than one can be used)
    • Normals
      • (x, y, z)
    • Distance to average point position (w.r.t. octet centre)
    • Color
      • (r, g, b)
  • Extra?
    • Position of octree centre/corner w.r.t robot base frame (added to FW #51)

End effector pose

  • Position
  • Orientation
    • Representation
      • Quaternion
      • Rotation matrix
      • "6D representation"
Position
  • Dimension
    • (x, y, z)
  • Not normalised
Orientation
  • Dimension
    -v1(3x1), v2(3x1)
  • Normalised

Gripper state

  • Width
    • Normalised
    • This will be identical to state (open/close) if binary actions are used

Reward function

Ongoing epic: #41

  • Dense/Sparse
    • Dense (shaped)
    • Sparse (shaped)

Sparse (shaped)

reward multiplier r (currently r = 4.0)

  • +r^0 for reaching object (within 10cm)
  • +r^1 for touching object
  • +r^2 for grasping object
  • +r^3 for lifting object (above 15cm, terminates episode)
  • -1.0 for touching ground (terminates episode)
  • 0.0 for pushing all objects outside of workspace (terminates episode)
  • -0.005 for all time steps

Policy (network architecture)

Currently, using depth=4 and full_depth=2

Feature Extractor (shared between actor and critics):

  • Octree:
    • [Conv3D -> (BatchNorm) -> ReLU] * (depth-full_depth) -> Conv1D/Conv3D -> Octree2Voxel -> Flatten -> Linear
    • BatchNorm is optional
    • Conv1D is default, but can be replaced by Conv3D by setting argumeng
    • See source code for OctreeCnnFeatureExtractor for more details. e.g number of channels.
  • Auxiliary (proprioceptive) observations:
    • Linear
  • Octree and auxiliary features are concatenated together
    Actor and Critics:
  • [Linear -> ReLU] * 2

Domain randomisation

Currently, the following domain randomisation can be applied in the simulation

  • Random objects from Google Scanned Objects collection (80 objects for training, 20 for testing)
    • Random mass (with limits)
    • Random surface friction (with limits)
    • Random pose (above ground plane)
  • Random texture of the ground plane (80 textures for training, 20 for testing)
  • Random initial configuration for the robot
  • Random perspective of the camera

Decide what kind of 3D data representation and corresponding network architecture to use for observations/policy

Considered option:

  • Voxel grid
  • Point cloud directly
  • Octree
    • https://github.com/microsoft/O-CNN
    • Construct octree structure and use features of the finest leaf octants as input into CNN (averaged normal vector used in the paper, but other features can be used, e.g. rgb). 3D Convolution and pooling (which up-scales octants to higher depth) modules can then be used on these features only, to improve performance and lower memory requirements (rather than processing the entire voxel grid volume).

EPIC: Reward function for grasping environment

From literature:

Sparse

Simple sparse reward
Sparse reward with extra guidance
  • https://arxiv.org/pdf/2007.04499.pdf
    • +10 for successful grasp (gripper holding object above certain height)
    • +1 (partial reward) if gripper comes in contact with the object
    • -1 if no action is taken
    • -0.025 for all time steps prior to termination
  • https://arxiv.org/pdf/1806.10293.pdf
    • +1 if the gripper contains an object and is above a certain height
    • −0.05 for all steps prior to termination (max 20 steps, but agent is allowed to terminate earlier)
    • At the beginning, this paper provides demonstrations of grasps from other algorithms providing 15-30% success rate. Then they use only the learned policy + exploration once 50% success it reached.

Dense

These don't seem to be that common in literate and those that are present are usually quite complex.

  • https://arxiv.org/pdf/1910.09470.pdf (very complex reward)
    • Reach reward (upon being close to the object, e.g. 0.01 m) +[0.0, 0.01]
    • Touch reward +1 upon contact with the object
    • Lift reward +1 (when above certain theshold
    • Task-specific reward (stacking objects)
    • All rewards are summed (specific function is used)
Other dense reward formulations
  • Increase in object's height (Final goal)
    • Reward could increase linearly as actor lifts object relative to their original position. This could be further restricted that there must be a contact between robot fingers and the object (such that actor holds the object rather than throwing it).
  • Touching an object (Helping step)
    • A small reward could be added for each time step robot touches an object with both fingers.

Guiding the agent

Demonstration of success

Add samples of success to the replay buffer at the beginning of training. These could be either from manual demonstration or by using another algorithm (could also be pre-planned).

Curriculum learning

Slowly increase the overall difficulty of the task, e.g. start from one object with restricted operating area.

Eliminate the need to run unpaused steps during each reset

Unpaused steps are executed such that:

  • Camera gets the latest observation
    • This could be improved by waiting until camera publishes new topic
  • Robot reaches its starting configuration
    • Using controller to reach pose because reset of joint positions causes segfault
    • I currently have no suggestion/solution to this. The segfault must be fixed (#28)

Integrate MoveIt2 with Ignition

Selected MoveIt2 over MoveIt after the following considerations:

  • There is currently no existing integration of Ignition with either MoveIt or MoveIt2. Therefore, there is no direct benefit to MoveIt.
  • Migration of MoveIt2 is quite far
    • No extensive examples/documentation yet though...
    • Major disadvantage is currently the missing integration with ros2_control. As far as I can see, there is only fake controller + simple controller manager. New state of ros2_control also means that there is currently no interface for real Panda yet, see #16.
  • Overall preference to use ROS2

Integration plan

The long-term solution seems creation of ign_ros2_control (does not exist yet), similar to gazebo_ros2_control. However, ros2_control is still under very active development and examples/demos of using the interface (not just implementing the interface) are missing. This is currently far beyond my competences to look into.

Therefore, I am looking into the following alternative that conceptually seems simpler.

  • Create a ROS2 node with MoveIt2 backend that provides
    • JointTrajectory messages for each plan
    • C++ example/template
    • Python module providing MoveIt2 interface (substitute for moveit_commander)
    • Python example
  • Create Ignition controller plugin joint_trajectory_controller
    • This plugin should take Ignition alternative of JointTrajectory (needs to be created) and follow it (look into similar projects for inspiration)
    • Create JointTrajectory protobuf for Ignition
  • Bridge JointTrajectory between ROS2 msg and Ignition protobuf

Additional links/info (might not be relevant):

Certain order of python imports causes segmentation fault (gym-ignition)

From what I have noticed so far:

  • open3d must be before gym-ignition
  • torch (tensorflow) must be before gym-ignition

It seems to be connected to protobuf version (some special order gave me an error before segfault, where protobuf 3.6 was found, but 3.9 required).

Reordering solves to problem, therefore this issue is low-priority arctic ice box.

Setup virtual RGB-D camera

Basic setup

  • Add to SDF

Communication with ROS2

  • RGB image in ROS2
  • Depth image in ROS2
  • PointCloud in ROS2

Others

  • Setup transformations within ROS2
  • Investigate camera params, e.g. noise

Mesh/texture (material) memory leak

Removing a model does not free all of its memory, there is a memory leak related to meshes/textures (material). This introduces limitation to domain randomisation which causes training to eventually fail. Current workaround is to limit the frequency of adding/removing models in order to post-pone eventual crash due to no available memory.

  • Encountered using ogre2
    • Meshes with textures
      • Noticed with obj+mtl
    • Primitives with textures
      • Noticed with metallic PBR pipeline


  • Investigate further (ign-rendering)
  • Try to find a solution (if time allows it)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.