Giter VIP home page Giter VIP logo

repose's Introduction

RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering (ICCV2021) [Link]

overview

Abstract

We present RePOSE, a fast iterative refinement method for 6D object pose estimation. Prior methods perform refinement by feeding zoomed-in input and rendered RGB images into a CNN and directly regressing an update of a refined pose. Their runtime is slow due to the computational cost of CNN, which is especially prominent in multiple-object pose refinement. To overcome this problem, RePOSE leverages image rendering for fast feature extraction using a 3D model with a learnable texture. We call this deep texture rendering, which uses a shallow multi-layer perceptron to directly regress a view-invariant image representation of an object. Furthermore, we utilize differentiable Levenberg-Marquardt (LM) optimization to refine a pose fast and accurately by minimizing the feature-metric error between the input and rendered image representations without the need of zooming in. These image representations are trained such that differentiable LM optimization converges within few iterations. Consequently, RePOSE runs at 92 FPS and achieves state-of-the-art accuracy of 51.6% on the Occlusion LineMOD dataset - a 4.1% absolute improvement over the prior art, and comparable result on the YCB-Video dataset with a much faster runtime.

Prerequisites

  • Python >= 3.6
  • Pytorch == 1.9.0
  • Torchvision == 0.10.0
  • CUDA == 10.1

Downloads

Installation

  1. Set up the python environment:
    $ pip install torch==1.9.0 torchvision==0.10.0
    $ pip install Cython==0.29.17
    $ sudo apt-get install libglfw3-dev libglfw3
    $ pip install -r requirements.txt
    
    # Install Differentiable Renderer
    $ cd renderer
    $ python3 setup.py install
    
  2. Compile cuda extensions under lib/csrc:
    ROOT=/path/to/RePOSE
    cd $ROOT/lib/csrc
    export CUDA_HOME="/usr/local/cuda-10.1"
    cd ../ransac_voting
    python setup.py build_ext --inplace
    cd ../camera_jacobian
    python setup.py build_ext --inplace
    cd ../nn
    python setup.py build_ext --inplace
    cd ../fps
    python setup.py
    
  3. Set up datasets:
    $ ROOT=/path/to/RePOSE
    $ cd $ROOT/data
    
    $ ln -s /path/to/linemod linemod
    $ ln -s /path/to/linemod_orig linemod_orig
    $ ln -s /path/to/occlusion_linemod occlusion_linemod
    
    $ cd $ROOT/data/model/
    $ unzip pretrained_models.zip
    
    $ cd $ROOT/cache/LinemodTest
    $ unzip ape.zip benchvise.zip .... phone.zip
    $ cd $ROOT/cache/LinemodOccTest
    $ unzip ape.zip can.zip .... holepuncher.zip
    

Testing

We have 13 categories (ape, benchvise, cam, can, cat, driller, duck, eggbox, glue, holepuncher, iron, lamp, phone) on the LineMOD dataset and 8 categories (ape, can, cat, driller, duck, eggbox, glue, holepuncher) on the Occlusion LineMOD dataset. Please choose the one category you like (replace ape with another category) and perform testing.

Evaluate the ADD(-S) score

  1. Generate the annotation data:
    python run.py --type linemod cls_type ape model ape
    
  2. Test:
    # Test on the LineMOD dataset
    $ python run.py --type evaluate --cfg_file configs/linemod.yaml cls_type ape model ape
    
    # Test on the Occlusion LineMOD dataset
    $ python run.py --type evaluate --cfg_file configs/linemod.yaml test.dataset LinemodOccTest cls_type ape model ape
    

Visualization

  1. Generate the annotation data:
    python run.py --type linemod cls_type ape model ape
    
  2. Visualize:
    # Visualize the results of the LineMOD dataset
    python run.py --type visualize --cfg_file configs/linemod.yaml cls_type ape model ape
    
    # Visualize the results of the Occlusion LineMOD dataset
    python run.py --type visualize --cfg_file configs/linemod.yaml test.dataset LinemodOccTest cls_type ape model ape
    

Citation

@InProceedings{Iwase_2021_ICCV,
    author    = {Iwase, Shun and Liu, Xingyu and Khirodkar, Rawal and Yokota, Rio and Kitani, Kris M.},
    title     = {RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {3303-3312}
}

Acknowledgement

Our code is largely based on clean-pvnet and our rendering code is based on neural_renderer. Thank you so much for making these codes publicly available!

Contact

If you have any questions about the paper and implementation, please feel free to email me ([email protected])! Thank you!

repose's People

Contributors

sh8 avatar

Stargazers

Jingtao Xu avatar  avatar  avatar  avatar 天子方辰 avatar Zubair Irshad avatar Mona Jalal avatar Changmin Jeon avatar AzureRossi avatar Linghao Chen avatar gzrer avatar Minjie Chen avatar Federico Vasile avatar  avatar Aria F avatar Ash avatar  avatar ernstig avatar Linpeng Peng avatar  avatar Jingze avatar Hossam Ali avatar Lifan Yu (Silvey) avatar Divyam Sheth avatar Yu Sun avatar menglong1132 avatar  avatar Yunze avatar Tuo Cao avatar  avatar Nikita avatar  avatar ren.scsh avatar Hyeontae Son avatar Yuki Mitsui avatar  avatar Iam1100 avatar Grandzxw avatar Xuecheng avatar PeterWright avatar  avatar Yang Hai avatar Sean Stevens avatar  avatar Hannah Schieber avatar Daoyi Gao avatar Alberto Remus avatar Kanstantsin Pachkouski avatar Jolo Tolentino avatar  avatar Yuan Liu avatar Jeong HoeJun avatar David Russell avatar  avatar Yahui Zhang avatar  avatar Abhinaba Bala  avatar 爱可可-爱生活 avatar Renat Bashirov avatar Wei Zhang avatar Raphaël avatar Zhe Wang avatar  avatar Manh Ha Hoang avatar john_bh avatar Fu Bowen avatar  avatar Matt Shaffer avatar leerw avatar  avatar xinxin avatar Augustine Ora avatar Hanzhi Chen avatar Nicola Piga avatar Yujiao Shi avatar VladyslavMos avatar Jan Emrich avatar mkz-ccc avatar  avatar Yu Liu avatar 程笑天 (Cheng Xiaotian) avatar Shinnosuke Usami avatar Xingyu Liu avatar  avatar Hana Hoshino avatar

Watchers

 avatar  avatar 程笑天 (Cheng Xiaotian) avatar Philipp Quentin avatar  avatar Arqam avatar hiyyg avatar Matt Shaffer avatar

repose's Issues

KeyError: 'R_all'

Hi, very nice work.
And Here when I run your visulization code, it shows:
image
It seems that it has no key of 'R_all'.

And could you please share about your training codes? Thanks!

hello, the problem about driller metric in linemod dataset!

I have ran the code, and test the ADD(-S) in every object in linemod dataset(not occ), the cache file(pvnet result) of object driller seems to be wrong. The test metric ADD(-S) of driller is just 41.76%.
Could you please upload the cache file of object driller again?

Question about the PVNet result

Hi,

Thank you for your excellent work!
I am a little confused about the PVNet's initial pose for LINEMOD and LINEMOD Occlusion. How could I find them?

Also, for the initial pose result in YCB-V based on PoseCNN you used, how could I find it?

Best,
Rui

Can this work with purely synthetic training data (and no texture map)?

I like how this approach does not require a texture of the object's 3D model. As you mention in the paper, this is a typical situation, especially for objects that are challenging to scan.

In some cases, there are also no real images of the object to train on. In these cases, synthetic data with domain randomization is often used to render images with random textures applied to the object's 3D model.

If I understand correctly, RePOSE trains on real images of the object. Have you done any experimentation with just domain-randomized synthetic data? Any intuition on whether this will work?

How train linemod or ycb-v?

Hi, thanks your work, this is a meaningful and interesting.
but I want to know that the code update work has been completed? How can I start training?

Questions about requirements.txt

Hi! Thanks for your great works

I have one question.

I got this error from "pip install -r requirements.txt"
ERROR: Could not find a version that satisfies the requirement soft-renderer==1.0.0 (from versions: none)
ERROR: No matching distribution found for soft-renderer==1.0.0

and I cannot find any package name "soft-renderer" in pypi either.

What should I do?

Occ-Linemod initial results

Hi, this is a great work.
As you metioned in the paper, you did experiments on Occ-Linemod, using the pvnet results as the initial pose. Can you please share the pvnet initial result? Any format is okay, Thanks.

Question about installing the Neural-Renderer

Hello, thanks for sharing such great work!

So I am able to use neural-renderer example code to generate rendering results with torch==1.2 installed. While the REPOSE's prerequisite is Pytorch == 1.9.0, and if I install neural-renderer under torch==1.2 and upgrade to torch==1.9 then I will encounter error:
import neural_renderer.cuda.load_textures as load_textures_cuda ImportError: /usr/local/lib/python3.7/dist-packages/neural_renderer/cuda/load_textures.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
when running neural-renderer example code. How should I solve this conflict?

compile bugs in nn

gcc: error: src/nearest_neighborhood.cu.o: No such file or directory
gcc: error: /opt/cuda-10.1/lib64/libcudart.so: No such file or directory

Question about "pip install -r requirements.txt"

Hi, thank you for your awesome work! But when I try to run the command "pip install -r requirements.txt", and it just shows that

"ERROR: [email protected]:sh8/rdopt.git@7601bca4818a03ef1ace1e0c1df396ccec56003f#egg=camera_jacobian is not a valid editable requirement. It should either be a path to a local project or a VCS URL (beginning with bzr+http, bzr+https, bzr+ssh, bzr+sftp, bzr+ftp, bzr+lp, bzr+file, git+http, git+https, git+ssh, git+git, git+file, hg+file, hg+http, hg+https, hg+ssh, hg+static-http, svn+ssh, svn+http, svn+https, svn+svn, svn+file)."

How could I fix this? Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.