Giter VIP home page Giter VIP logo

segan's Introduction

This project is presented as spotlight in CVPR2018.

Abstract

Humans have strong ability to make inferences about the appearance of the invisible and occluded parts of scenes. For example, when we look at the scene on the left we can make predictions about what is behind the coffee table, and can even complete the sofa based on the visible parts of the sofa, the coffee table, and what we know in general about sofas and coffee tables and how they occlude each other.

SeGAN can learn to

  1. Generate the appearance of the occluded parts of objects,
  2. Segment the invisible parts of objects,
  3. Although trained on synthetic photo realistic images reliably segment natural images,
  4. By reasoning about occluder-occludee relations infer depth layering.

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{ehsani2018segan,
  title={Segan: Segmenting and generating the invisible},
  author={Ehsani, Kiana and Mottaghi, Roozbeh and Farhadi, Ali},
  booktitle={CVPR},
  year={2018}
}

Prerequisites

  • Using Torch 7 and dependencies from this repository.
  • Linux OS
  • NVIDIA GPU + CUDA + CuDNN

Installation

  1. Clone the repository using the command:

     git clone https://github.com/ehsanik/SeGAN
     cd SeGAN
    
  2. Download the dataset from here and extract it.

  3. Make a link to the dataset.

     ln -s /PATH/TO/DATASET dyce_data
    
  4. Download pretrained weights from here and extract it.

  5. Make a link to the weights' folder.

     ln -s /PATH/TO/WEIGHTS weights
    

Dataset

We introduce DYCE, a dataset of synthetic occluded objects. This is a synthetic dataset with photo-realistic images and natural configuration of objects in scenes. All of the images of this dataset are taken in indoor scenes. The annotations for each image contain the segmentation mask for the visible and invisible regions of objects. The images are obtained by taking snapshots from our 3D synthetic scenes.

Statistics

The number of the synthetic scenes that we use is 11, where we use 7 scenes for training and validation, and 4 scenes for testing. Overall there are 5 living rooms and 6 kitchens, where 2 living rooms and 2 kitchen are used for testing. On average, each scene contains 60 objects and the number of visible objects per image is 17.5 (by visible we mean having at least 10 visible pixels). There is no common object instance in train and test scenes.

The dataset can be downloaded from here.

Train

To train your own model:

th main.lua -baseLR 1e-3 -end2end -istrain "train"

See data_settings.lua for additional commandline options.

Test

To test using the pretrained model and reproduce the results in the paper:

Model Segmentation Texture
Visible โˆช Invisible Visible Invisible L1 L2
Multipath 47.51 48.58 6.01 - -
SeGAN(ours) w/ SVpredicted 68.78 64.76 15.59 0.070 0.023
SeGAN(ours) w/ SVgt 75.71 68.05 23.26 0.026 0.008
th main.lua -weights_segmentation "weights/segment" -end2end -weights_texture "weights/texture" -istrain "test" -predictedSV

For testing using the groundtruth visible mask as input instead of the predicted mask:

th main.lua -weights_segmentation "weights/segment_gt_sv" -end2end -weights_texture "weights/texture_gt_sv" -istrain "test"

Acknowledgments

Code for GAN network borrows heavily from pix2pix.

segan's People

Contributors

ehsanik avatar

Watchers

James Cloos avatar Gowdham S avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.