Giter VIP home page Giter VIP logo

affordance-net's Introduction

By Thanh-Toan Do*, Anh Nguyen*, Ian Reid, Darwin G. Caldwell, Nikos G. Tsagarakis (* equal contribution)

affordance-net

Contents

  1. Requirements
  2. Installation
  3. Demo
  4. Training
  5. Notes

Requirements

  1. Caffe

  2. Hardware

    • To train a full AffordanceNet, you'll need a GPU with ~11GB (e.g. Titan, K20, K40, Tesla, ...).
    • To test a full AffordanceNet, you'll need ~6GB GPU.
    • Smaller net will be avalable soon.
  3. [Optional] For robotic demo

Installation

  1. Clone the AffordanceNet repository into your $AffordanceNet_ROOT folder.

  2. Build Caffe and pycaffe:

    • cd $AffordanceNet_ROOT/caffe-affordance-net
    • # Now follow the Caffe installation instructions: http://caffe.berkeleyvision.org/installation.html
    • # If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do:
    • make -j8 && make pycaffe
  3. Build the Cython modules:

    • cd $AffordanceNet_ROOT/lib
    • make
  4. Download pretrained weights. This weight is trained on the training set of the IIT-AFF dataset:

    • Extract the file you downloaded to $AffordanceNet_ROOT
    • Make sure you have the caffemodel file like this: '$AffordanceNet_ROOT/pretrained/AffordanceNet_200K.caffemodel

Demo

After successfully completing installation, you'll be ready to run the demo.

  1. Export pycaffe path:

    • export PYTHONPATH=$AffordanceNet_ROOT/caffe-affordance-net/python:$PYTHONPATH
  2. Demo on static images:

    • cd $AffordanceNet_ROOT/tools
    • python demo_img.py
    • You should see the detected objects and their affordances.
  3. (Optional) Demo on depth camera (such as Asus Xtion):

    • With AffordanceNet and the depth camera, you can easily select the interested object and its affordances for robotic applications such as grasping, pouring, etc.
    • First, launch your depth camera with ROS, OpenNI, etc.
    • cd $AffordanceNet_ROOT/tools
    • python demo_asus.py
    • You may want to change the object id and/or affordance id (line 380, 381 in demo_asus.py). Currently, we select the bottle and its grasp affordance.
    • The 3D grasp pose can be visualized with rviz. You should see something like this: affordance-net-asus

Training

  1. We train AffordanceNet on IIT-AFF dataset

    • We need to format IIT-AFF dataset as in Pascal-VOC dataset for training.
    • For your convinience, we did it for you. Just download this file and extract it into your $AffordanceNet_ROOT folder.
    • The extracted folder should contain three sub-folders: $AffordanceNet_ROOT/data/cache, $AffordanceNet_ROOT/data/imagenet_models, and $AffordanceNet_ROOT/data/VOCdevkit2012 .
  2. Train AffordanceNet:

    • cd $AffordanceNet_ROOT
    • ./experiments/scripts/faster_rcnn_end2end.sh [GPU_ID] [NET] [--set ...]
    • e.g.: ./experiments/scripts/faster_rcnn_end2end.sh 0 VGG16 pascal_voc
    • We use pascal_voc alias although we're training using the IIT-AFF dataset.

Notes

  1. AffordanceNet vs. Mask-RCNN: AffordanceNet can be considered as a general version of Mask-RCNN when we have multiple classes inside each instance.
  2. The current network achitecture is slightly diffrent from the paper, but it achieves the same accuracy.
  3. Train AffordanceNet on your data:
    • Format your images as in Pascal-VOC dataset (as in $AffordanceNet_ROOT/data/VOCdevkit2012 folder).
    • Prepare the affordance masks (as in $AffordanceNet_ROOT/data/cache folder): For each object in the image, we need to create a mask and save as a .sm file. See $AffordanceNet_ROOT/utils for details.

Citing AffordanceNet

If you find AffordanceNet useful in your research, please consider citing:

@article{AffordanceNet17,
  title={AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection},
  author={Do, Thanh-Toan and Nguyen, Anh and Reid, Ian and Caldwell, Darwin G and Tsagarakis, Nikos G},
  journal={arXiv:1709.07326},
  year={2017}
}

If you use IIT-AFF dataset, please consider citing:

@inproceedings{Nguyen17,
  title={Object-Based Affordances Detection with Convolutional Neural Networks and Dense Conditional Random Fields},
  author={Nguyen, Anh and Kanoulas, Dimitrios and Caldwell, Darwin G and Tsagarakis, Nikos G},
  booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2017},
}

License

MIT License

Acknowledgement

This repo used a lot of source code from Faster-RCNN

Contact

If you have any questions or comments, please send us an email: [email protected] and [email protected]

affordance-net's People

Contributors

nqanh avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.