Giter VIP home page Giter VIP logo

semantic-segmentation's Introduction

Deep convolutional networks for semantic segmentation

Parts of the code are are based on this project.

Code and data organization (outdated)

semseg
├── storage  % data (not code): datasets, trained models, log-files, ...
├── data  % data loading and preparation: Dataset, MiniBatchReader
|   ├── preparers  % Iccv09Preparer
|   |   ├── abstract_preparer.py
|   |   └── iccv09_preparer.py 
|   ├── dataset_dir.py
|   └── dataset.py
├── models  % AbstractModel, BaselineA
|   ├── preprocessing.py
|   ├── tf_utils
|   |   ├── blocks.py  % higher-level operations: ResNet block, LinkNet rencoder/decoder blocks 
|   |   ├── layers.py  % elementary operations: conv, max_pool, resize
|   |   └── variables.py 
|   ├── abstract_model.py  % AbstractModel
|   └── baseline_a.py  % BaselineA
├── processing  % image and label processing
|   ├── image_format.py
|   ├── labels.py
|   ├── shape.py  % TODO: test resize
|   └── transform.py  % TODO: use skimage.transform and test
├── test  % unit tests
|   :.
├── util  % helper functions and classes
|   ├── visualizer.py  % Visualizer
|   :.
├── evaluation.py
:.

Tasks

High priority

  • implement evaluation measures used in LinkNet and modify evaluation.py so that it makes use of numpy/scipy
  • evaluate our LinkNet
  • implement inference time measurement depending on mini-batch size (to compare with the results in the LinkNet paper)
  • write the report

Medium priority

  • nothing

Low priority

  • use tf.nn.sparse_softmax_cross_entropy_with_logits for more efficient training
  • make a baseline similar to BaselineA that uses strided convolutions instead of pooling layers (use 3x3 conv with stride 2 instead of pool->conv)
  • try IoU loss (like here)

Work in progress

  • finish saving and loading of trained models ~ Ivan
  • implement LinkNet ~ Josip

Completed

  • make data loading work (data.preparers.Iccv09Preparer, data.Dataset)
  • make a simple baseline
  • complete abstract_model
  • make the cost function (as well as other used evaluation measures) in BaselineA ignore "unknown" class (class 0)
  • implement util.Visualizer
  • improve the colors in util.visualizer.Visualizer
  • enable usage of util.Visualizer while training (by pressing d followed by ENTER in the console)
  • add stride:int and dilation:int parameters to tf_utils.layers.conv (use tf.nn.convolution)
  • add batch normalization to tf_utils.layers, use tf.layers.batch_normalization(input_layer, fused=True, data_format='NCHW')
  • improve random seeding in Dataset for beter reproducibility
  • add ResNet layers and encoder/decoder blocks used in LinkNet to tf_utils.blocks
  • implement a textual options menu that can be opened while training is paused, enabling network output visualization, saving/loading of weights, stopping training (after the current epoch) and other actions
  • add transposed convolution to tf_utils.layers

Current validation results on Stanford Background Dataset

mini-batch size = 16, Pentium 2020M

Model mIoU Pixel acc. #epochs Inference time [s] Hardware
BaselineA - 0.630 150 0.100* Pentium 2020M

"Inference time" - on what hardware?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.