Giter VIP home page Giter VIP logo

mobile_manipulation_papers's Introduction

Mobile Manipulation: Papers

Mobile Manipulation (MM) lies at the intersection of mobile robots (e.g., wheeled, legged, and aerial robots) and manipulation (e.g., dual arm, single arm, etc.). However, robotic researchers predominantly concentrate on wheeled or legged robots equipped with a single arm, primarily due to the control complexity associated with aerial robots and dual-arm systems.

The primary challenge in MM is to coordinate the mobility and manipulation to address long-horizon and generalized tasks, which are inherently more challenging than stationary manipulation. Specifically, the objective is to enable robots to navigate within a clustered environment while concurrently interacting with the surroundings using their manipulator. Currently, the overall strategies to achieve this can be categorized into Hierarchical (i.e., addressing the base and manipulator separately) and Whole-body (i.e., addressing the base and manipulator simultaneously) motion planning and control. Both approaches encounter distinct challenges, as well as some shared obstacles that need to be addressed.

Challenges include:

  1. Understanding the unstructured environment;
  2. Obstacle avoidance and self-collision avoidance;
  3. Ensuring graspability for the target object;
  4. Hands-off errors within the skill chaining in a long-horizon task (Hierarchical);
  5. Skill coordination in a complex task requiring multiple skills simultaneously (Hierarchical);
  6. High controlling complexity due to a high degree of freedom (Whole-body);
  7. ...

Regardless of the chosen strategy, there are various methods to implement it: Classical methods (e.g., model predictive control), Learning-Based methods (e.g., reinforcement learning, imitation learning, large language models), and Learning + Classical methods. Learning-Based methods, in particular, stand out due to their generality and efficiency in tackling complex tasks that are challenging to model accurately. Consequently, the focus here will primarily be on learning-based methods. Additionally, to encourage the development of the MM community, researchers have proposed some Benchmarks and Challenges that offer convenient platforms for the development of custom algorithms and standard metrics for evaluation.

In this repository, I summarize all the aforementioned strategies, methods, benchmarks, and challenges, along with the papers I have read. For some papers related to Learning-Based MM, I will provide brief introductions to their core ideas, and methodologies. Links to these papers and codes (if open-sourced) are also included. The table of contents is outlined below.

0. Review

[Machines 2022] Motion Planning for Mobile Manipulators—A Systematic Review, [paper] [code]

1. Benchmarks & Challenges

[CVPR 2021] ManipulaTHOR: A Framework for Visual Object Manipulation, [paper] [code] [website]

[NeurIPS, 2021] Habitat 2.0: Training Home Assistants to Rearrange their Habitat, [paper] [code]

[arXiv 2022] ProcTHOR: Large-Scale Embodied AI Using Procedural Generation, [paper] [code] [website]

[arXiv 2023] HomeRobot: Open Vocabulary Mobile Manipulation, [paper] [code] [website]

[arXIv 2023] Harmonic Mobile Manipulation, [paper] [website]

[NeurIPS 2023] UniTeam: Open Vocabulary Mobile Manipulation Challenge, [paper]

[ICLR 2023] ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills, [paper] [code] [website]

2. Classical

[ICRA 2013] Robot placement based on reachability inversion, [paper]

[ICRC 2018] Reuleaux: Robot Base Placement by Reachability Analysis, [paper] [code]

[IROS 2018] Coupling Mobile Base and End-Effector Motion in Task Space, [paper]

[ICRA 2019] Whole-Body MPC for a Dynamically Stable Mobile Manipulator, [paper]

[ICRA 2020] Perceptive Model Predictive Control for Continuous Mobile Manipulation, [paper] [code]

[ICRA 2020] Planning an Efficient and Robust Base Sequence for a Mobile Manipulator Performing Multiple Pick-and-place Tasks, [paper]

[ICRA 2022] A Collision-Free MPC for Whole-Body Dynamic Locomotion and Manipulation, [paper]

[ICRA 2022] Combining Navigation and Manipulation Costs for Time-Efficient Robot Placement in Mobile Manipulation Tasks, [paper]

[RSS 2023] Demonstrating Mobile Manipulation in the Wild: A Metrics-Driven Approach, [paper]

3. Reinforcement Learning (RL)

3.1 Hierarchical RL

[CoRL 2019] HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators, [paper] [code] [website]

  • Leveraged RL in both high-level policy—which embodiment (e.g. base or arm or both) to use and low-level policy—the corresponding control policy for base or arm.

[ICRA 2021] Learning Kinematic Feasibility for Mobile Manipulation through Deep Reinforcement Learning, [paper] [code] [website]

  • First, generated a rough trajectory for the end-effector (possibly infeasible in kinematics), then used RL to the base policy to achieve the arm's kinematic feasibility.

[ICRA 2021] ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation, [paper] [website]

  • Similar to HRL4IN, but leveraged motion generation for the low-level policy.

[arXIv 2022] Multi-skill Mobile Manipulation for Object Rearrangement, [paper] [code] [website]

  • Based on Habitat 2.0, replace point-goal navigation skill with region-goal navigation skill and stationary manipulation skill with mobile manipulation skill (i.e. replace arm action space with arm-base action space)

[ICRA 2022] Robot Learning of Mobile Manipulation with Reachability Behavior Priors, [paper] [code] [website]

[ICRA 2022] Combining Learning-based Locomotion Policy with Model-based Manipulation for Legged Mobile Manipulators, [paper]

  • Leveraged RL to produce robust locomotion policies for legged robots with the prediction of the external wrench and zero-shot adapt for manipulators unseen during training.

[arXiv 2023] ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation, [paper]

  • Leveraged RL to train a coordination policy that activates the appropriate (pre-trained and frozen) skills, depending on different observations.

3.2 Whole-body RL

[Sensors 2019] Learning Mobile Manipulation through Deep Reinforcement Learning, [paper]

[arXiv 2020] Whole-Body Control of a Mobile Manipulator using End-to-End Reinforcement Learning, [paper]

[RSS 2020] Spatial Action Maps for Mobile Manipulation, [paper] [code] [website]

  • Represented the set of possible actions by a pixel map—spatial action map, which aligned with the input image of the current state, in order to help policy learn with dense action representations.

[CoRL 2021] Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation, [paper] [website]

  • Proposed a curricular training method with autonomous reseting in the real world, so that the navigation and grasping skill can be trained automatically without human intervention.

[CoRL 2022] Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion, [paper] [code] [website]

  • For a legged mobile manipulator, learned a unified policy for coordinated manipulation and locomotion and regularized online adaptation for sim-to-real transfer.

[arXIv 2023] Harmonic Mobile Manipulation, [paper] [website]

  • Proposed an end-to-end learning approach that jointly optimizes navigation and manipulation, which support more complex tasks

[WACV 2023] Towards Disturbance-Free Visual Mobile Manipulation, [paper] [code] [website]

  • Based on ManipulaTHOR, to address the disturbance (i.e. a subset of collision, where an object is moved by some distance caused by a collision) problem, incorporated a disturbance penalty into the reward function in the RL and utilized supervised learning to train a disturbance prediction as an auxiliary task.

[ICRA 2023] Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization, [paper]

  • Focusing on Table Wiping task, described crumbs and spill dynamics with a stochastic differential equation (SDE), solve the constrained Markov decision process (CMDP) via RL and optimize the whole trajectory to avoid self-collisions and obstacles.

[RSS 2023] Causal Policy Gradient for Whole-Body Mobile Manipulation, [paper] [code] [website]

  • Explored the causality from action space to reward via RL and thus reduces gradient variance in the policy learning, stabilizing the training process.

3.3 RL+Priors

[ICRA 2022] Robot Learning of Mobile Manipulation with Reachability Behavior Priors, [paper] [code] [website]

  • Reachability Priors.

[arXiv 2022] Mobile Manipulation Leveraging Multiple Views, [paper]

  • Visibility Priors.

[arXiv 2023] Active-Perceptive Motion Generation for Mobile Manipulation, [paper]

  • Visibility Priors.

[ICRA 2024] GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion, [paper]

  • Graspability Priors.

3.4 RL-Training

[ICML 2018] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, [paper] [code]

[NeurIPS 2019] Better Exploration with Optimistic Actor-Critic, [paper] [code]

[ICLR 2020] DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames, [paper] [code]

[ICLR 2022] Boosted Curriculum Reinforcement Learning, [paper] [code]

4. Imitation Learning

[RSS 2022] Human-to-Robot Imitation in the Wild, [paper] [website]

  • Proposed an efficient one-shot robot learning algorithm, which enables a mobile manipulator to learn from a third person perspective in the wild for different manipulation tasks.
  • Designed a novel objective function for policy learning to align human and robot videos and boost sample efficiency.

5. Large Language Models (LM)

[arXiv 2022] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, [paper] [code] [website]

[CVPR Demo] LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place, [website]

  • An extension of ASC, leveraging LLM to specify goals while other parts stay the same with ASC.

mobile_manipulation_papers's People

Contributors

hang0610 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.