Giter VIP home page Giter VIP logo

dataflow's Introduction

dataflow

data feeder using tensorpack.dataflow

Introduce

Dataflow in Tensorpack

  1. It's easy: write everything in pure Python.
  2. It's fast: see Efficient DataFlow on how to build a fast DataFlow with parallelism.

-- http://tensorpack.readthedocs.io/en/latest/tutorial/dataflow.html

Examples

  • General network images
from dataflow.dataset import NetworkImages

class NetworkImagesImple(NetworkImages):
    def __init__(self, shuffle=False):
        super(NetworkImagesImple, self).__init__(shuffle)
        self.datapoints = [
            ['http://t1.daumcdn.net/news/201511/20/sportskhan/20151120010041631lkva.jpg', 0],
            ['http://t1.daumcdn.net/news/201511/03/SpoChosun/20151103111905902jtmo.jpg',  1],
            ['http://t1.daumcdn.net/news/201712/26/ked/20171226081404015hktd.jpg',        2],
            ['http://t1.daumcdn.net/news/201511/05/10asia/20151105173913995tqqc.jpg',     3],
            ['http://t1.daumcdn.net/news/201607/20/etimesi/20160720112503626xuwr.jpg',    4],
        ]

ds = NetworkImagesImple()

for datapoint in ds.get_data():
    pass
  • ILSVRC12 multi threaded downloading with multi processed preprocessing
import tensorpack.dataflow as df
from dataflow.dataset import ILSVRC12

service_code = 'CONTACT_ME'
ds = ILSVRC12(service_code, 'train', shuffle=True).parallel(num_threads=16)
ds = df.PrefetchDataZMQ(ds, nr_proc=8)

for datapoint in ds.get_data():
    pass

Original Dataflow Examples

Basic Dataflow

Distributed Dataflow

Tensorflow \w Dataflow

PyTorch \w Dataflow

Install

pre-requirements

  • ubuntu
apt install -y libsm6 libxext-dev cmake
  • mac
brew install cmake
  • commons
pip install -r requirements.txt
export TENSORPACK_DATASET=/data/private/storage/tensorpack_data

Benchmark

ILSVRC12

parallel download and decode image only

  • without image augment (parallel download and decode only)
  • resource : 4 GPU, 8 CPU, 48 GB in kakaobrain braincloud
unit : duration time (5000 images)
threads \ process 1 2 4 8 16 32
1 05:17 02:37 01:25 00:36 00:18 00:08
2 02:39 01:23 00:35 00:17 00:08 00:05
4 01:10 00:35 00:17 00:08 00:06 00:05
8 00:35 00:17 00:08 00:05 00:06 00:08
16 00:25 00:13 00:06 00:06 00:07 00:09
32 00:26 00:13 00:06 00:06 00:08 00:09
unit : images per sec
threads \ process 1 2 4 8 16 32
1 15.76 31.74 58.66 135.97 269.16 556.83
2 31.42 59.81 141.52 282.79 556.39 865.39
4 71.11 140.83 283.55 575.78 820.73 861.46
8 141.12 286.69 555.56 912.18 722.68 561.70
16 196.69 374.15 723.51 794.82 649.93 525.28
32 188.49 360.05 728.51 818.10 610.04 548.91

parallel downalod and augment for resnet

  • resource : 4 GPU, 8 CPU, 48 GB in kakaobrain braincloud
unit : duration time (5000 images)
threads \ process 2 4 8 16 32
2 01:11 00:33 00:16 00:10 00:12
4 00:33 00:16 00:08 00:12 00:12
8 00:28 00:14 00:10 00:12 00:15
16 00:28 00:14 00:10 00:12 00:16
32 00:28 00:14 00:10 00:12 00:15
unit : images per sec
threads \ process 2 4 8 16 32
2 70.33 147.21 294.56 495.01 403.18
4 149.60 303.54 539.99 397.04 318.66
8 176.30 350.24 487.05 385.01 315.29
16 172.77 343.41 485.52 393.25 308.80
32 175.74 347.32 489.17 387.76 312.62

dataflow's People

Contributors

wbaek avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.