Giter VIP home page Giter VIP logo

Comments (10)

JanuszL avatar JanuszL commented on May 14, 2024

Hi,
That is a good question. In this example, we load all the data at the beginning as the input set is small. In the real use case, you should put your loading code inside iter_setup and load batch of images at each iteration. Regarding loading the labels, you can treat them as any other data and load them using ExternalSource as well. In such case, you will have one source for images and the second one for the labels.

At the same time, I am puzzled that we can achieve the multi-threaded processing by rewriting iter_setup(self) to send external data?

If I understand your question correctly, you can multi-threaded processing inside iter_setup but this code needs to feed data using feed_input to all ExternalSources before iter_setup ends. However, you can continue you background processing outside iter_setup.
@ptrendx I hope I haven't missed anything.

from dali.

wangguangyuan avatar wangguangyuan commented on May 14, 2024

@JanuszL
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import numpy as np
import tensorflow as tf
import nvidia.dali.plugin.tf as dali_tf

def read_image_path(file_path):
resluts, labels, images_path = [], [], []
i = 0
with open(file_path, 'r') as f:
for line in f:
i = i + 1
image_path, label = line.strip().split()
resluts.append(image_path)
labels.append(np.array([int(label)], dtype=np.int32))
# labels.append(np.array([i], dtype=np.int32))
images_path.append(np.array([i], dtype=np.int32))
# labels.append([int(label)])
# return resluts, labels
return resluts, labels, images_path

def make_batch(size, iter, images_path, labels, ids):
# images_path, labels, ids = read_image_path(file_path)
data = [np.fromstring(open(path, 'rb').read(), dtype=np.uint8) for path in images_path[itersize:size(iter+1)]]
return data, labels[itersize:size(iter+1)], ids[itersize:size(iter+1)]

class C2Pipe(Pipeline):
def init(self, batch_size, num_threads, device_id, file_path, pipelined=True, async=True):
super(C2Pipe, self).init(batch_size,
num_threads,
device_id,
exec_pipelined=pipelined,
exec_async=async)

    # self.file_path = file_path
    self.images_path, self.labels, self.ids = read_image_path(file_path)
    self.input = ops.ExternalSource()
    self.label_input = ops.ExternalSource()
    self.id_input = ops.ExternalSource()

    self.decode = ops.HostDecoder(output_type=types.RGB)

    self.rcm = ops.FastResizeCropMirror(crop=[224, 224])

    self.np = ops.NormalizePermute(device="gpu",
                                   output_dtype=types.FLOAT,
                                   mean=[128., 128., 128.],
                                   std=[1., 1., 1.],
                                   height=224,
                                   width=224,
                                   image_type=types.RGB)

    self.uniform = ops.Uniform(range=(0., 1.))
    self.resize_uniform = ops.Uniform(range=(256., 480.))
    self.mirror = ops.CoinFlip(probability=0.5)
    # self.ct = ops.Cast(d)
    self.cast = ops.Cast(device="cpu",
                         dtype=types.INT32)
    self.iter = 0

def define_graph(self):
    self.jpegs = self.input(name='image')
    self.label = self.label_input(name='label')
    self.id = self.id_input(name='id')

    images = self.decode(self.jpegs)
    resized = self.rcm(images, crop_pos_x=self.uniform(),
                       crop_pos_y=self.uniform(),
                       mirror=self.mirror(),
                       resize_shorter=self.resize_uniform())

    output = self.np(resized.gpu())
    return output, self.label, self.id

def iter_setup(self):
    raw_data, raw_label, raw_id = make_batch(self.batch_size, self.iter, self.images_path, self.labels, self.ids)
    self.feed_input(self.jpegs, raw_data)
    self.feed_input(self.label, raw_label)
    self.feed_input(self.id, raw_id)
    self.iter += 1

pipe = C2Pipe(batch_size=32, num_threads=2, device_id=0, file_path='./test.txt')
serialized_pipes = pipe.serialize()
daliop_t = dali_tf.DALIIterator()
with tf.device('/gpu:%i' % device_id):
image, label = daliop_t(serialized_pipeline=serialized_pipes,
shape=[32, 224, 224, 3],
image_type=tf.float32,
label_type=tf.int32,
num_threads=2,
device_id=device_id)

Traceback (most recent call last):
File "/share5/public/guangyuan/workplace/horovod_project/dali_from_raw_data_test.py", line 135, in
serialized_pipes = pipe.serialize()
File "/home/guangyuan/.local/lib/python2.7/site-packages/nvidia/dali/pipeline.py", line 271, in serialize
return self._pipe.SerializeToProtobuf()
RuntimeError: CHECK failed: IsInitialized(): Can't serialize message of type "dali_proto.PipelineDef" because it is missing required fields: op[0].name, op[7].name, op[8].name

from dali.

JanuszL avatar JanuszL commented on May 14, 2024

Registered at DALI-207

from dali.

wangguangyuan avatar wangguangyuan commented on May 14, 2024

@JanuszL
The function used in the dali serialization process must be provided by dali?
Is it not possible to serialize an external function?
If so, is there any other solution?

from dali.

JanuszL avatar JanuszL commented on May 14, 2024

Hi,
We haven't test this scenario. We will check if this is DALI bug or limitation of how DALI and TF are integrated.
Putting in our ToDo list.

from dali.

JanuszL avatar JanuszL commented on May 14, 2024

Hi,
Currently external source is not serialized and deosn't work for TF. Real reason is that DALI cannot call python code from inside - in this case iter_setup is not callable (with current design) and there is no easy way to feed data into ExternalSource. It should work with PyTorch as well as MxNet.
We plan to develop such functionality in the future but it is hard to tell now when.

from dali.

wangguangyuan avatar wangguangyuan commented on May 14, 2024

Hi, thank you very much, is there a document about the dali architecture? Some of the features I need may need to be added from source.

from dali.

JanuszL avatar JanuszL commented on May 14, 2024

Hi,
Please look into https://github.com/NVIDIA/DALI#additional-resources first. Regarding detailed documentation which describes a design of each class, we don't have such document and it is probably not going to be done anytime soon. You can still use Doxygen file from DALI source to generate development docs.

from dali.

klecki avatar klecki commented on May 14, 2024

Hi,
the support for External Source callbacks/iterator in TensorFlow via the tf.data.Dataset compatible API was merged and will be present in DALI 1.5.

DALI 1.4 already supports inputs from other tf.data.Datasets in the experimental.DALIDatasetWithInputs.

You can see more in the documentation: https://docs.nvidia.com/deeplearning/dali/main-user-guide/docs/plugins/tensorflow_plugin_api.html#experimental

Tutorial is under review in: #3212

from dali.

JanuszL avatar JanuszL commented on May 14, 2024

Hi,
DALI 1.5 is out and can be used to test the new functionality.

from dali.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.