Comments (10)
Hi,
That is a good question. In this example, we load all the data at the beginning as the input set is small. In the real use case, you should put your loading code inside iter_setup and load batch of images at each iteration. Regarding loading the labels, you can treat them as any other data and load them using ExternalSource as well. In such case, you will have one source for images and the second one for the labels.
At the same time, I am puzzled that we can achieve the multi-threaded processing by rewriting iter_setup(self) to send external data?
If I understand your question correctly, you can multi-threaded processing inside iter_setup but this code needs to feed data using feed_input to all ExternalSources before iter_setup ends. However, you can continue you background processing outside iter_setup.
@ptrendx I hope I haven't missed anything.
from dali.
@JanuszL
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import numpy as np
import tensorflow as tf
import nvidia.dali.plugin.tf as dali_tf
def read_image_path(file_path):
resluts, labels, images_path = [], [], []
i = 0
with open(file_path, 'r') as f:
for line in f:
i = i + 1
image_path, label = line.strip().split()
resluts.append(image_path)
labels.append(np.array([int(label)], dtype=np.int32))
# labels.append(np.array([i], dtype=np.int32))
images_path.append(np.array([i], dtype=np.int32))
# labels.append([int(label)])
# return resluts, labels
return resluts, labels, images_path
def make_batch(size, iter, images_path, labels, ids):
# images_path, labels, ids = read_image_path(file_path)
data = [np.fromstring(open(path, 'rb').read(), dtype=np.uint8) for path in images_path[itersize:size(iter+1)]]
return data, labels[itersize:size(iter+1)], ids[itersize:size(iter+1)]
class C2Pipe(Pipeline):
def init(self, batch_size, num_threads, device_id, file_path, pipelined=True, async=True):
super(C2Pipe, self).init(batch_size,
num_threads,
device_id,
exec_pipelined=pipelined,
exec_async=async)
# self.file_path = file_path
self.images_path, self.labels, self.ids = read_image_path(file_path)
self.input = ops.ExternalSource()
self.label_input = ops.ExternalSource()
self.id_input = ops.ExternalSource()
self.decode = ops.HostDecoder(output_type=types.RGB)
self.rcm = ops.FastResizeCropMirror(crop=[224, 224])
self.np = ops.NormalizePermute(device="gpu",
output_dtype=types.FLOAT,
mean=[128., 128., 128.],
std=[1., 1., 1.],
height=224,
width=224,
image_type=types.RGB)
self.uniform = ops.Uniform(range=(0., 1.))
self.resize_uniform = ops.Uniform(range=(256., 480.))
self.mirror = ops.CoinFlip(probability=0.5)
# self.ct = ops.Cast(d)
self.cast = ops.Cast(device="cpu",
dtype=types.INT32)
self.iter = 0
def define_graph(self):
self.jpegs = self.input(name='image')
self.label = self.label_input(name='label')
self.id = self.id_input(name='id')
images = self.decode(self.jpegs)
resized = self.rcm(images, crop_pos_x=self.uniform(),
crop_pos_y=self.uniform(),
mirror=self.mirror(),
resize_shorter=self.resize_uniform())
output = self.np(resized.gpu())
return output, self.label, self.id
def iter_setup(self):
raw_data, raw_label, raw_id = make_batch(self.batch_size, self.iter, self.images_path, self.labels, self.ids)
self.feed_input(self.jpegs, raw_data)
self.feed_input(self.label, raw_label)
self.feed_input(self.id, raw_id)
self.iter += 1
pipe = C2Pipe(batch_size=32, num_threads=2, device_id=0, file_path='./test.txt')
serialized_pipes = pipe.serialize()
daliop_t = dali_tf.DALIIterator()
with tf.device('/gpu:%i' % device_id):
image, label = daliop_t(serialized_pipeline=serialized_pipes,
shape=[32, 224, 224, 3],
image_type=tf.float32,
label_type=tf.int32,
num_threads=2,
device_id=device_id)
Traceback (most recent call last):
File "/share5/public/guangyuan/workplace/horovod_project/dali_from_raw_data_test.py", line 135, in
serialized_pipes = pipe.serialize()
File "/home/guangyuan/.local/lib/python2.7/site-packages/nvidia/dali/pipeline.py", line 271, in serialize
return self._pipe.SerializeToProtobuf()
RuntimeError: CHECK failed: IsInitialized(): Can't serialize message of type "dali_proto.PipelineDef" because it is missing required fields: op[0].name, op[7].name, op[8].name
from dali.
Registered at DALI-207
from dali.
@JanuszL
The function used in the dali serialization process must be provided by dali?
Is it not possible to serialize an external function?
If so, is there any other solution?
from dali.
Hi,
We haven't test this scenario. We will check if this is DALI bug or limitation of how DALI and TF are integrated.
Putting in our ToDo list.
from dali.
Hi,
Currently external source is not serialized and deosn't work for TF. Real reason is that DALI cannot call python code from inside - in this case iter_setup is not callable (with current design) and there is no easy way to feed data into ExternalSource. It should work with PyTorch as well as MxNet.
We plan to develop such functionality in the future but it is hard to tell now when.
from dali.
Hi, thank you very much, is there a document about the dali architecture? Some of the features I need may need to be added from source.
from dali.
Hi,
Please look into https://github.com/NVIDIA/DALI#additional-resources first. Regarding detailed documentation which describes a design of each class, we don't have such document and it is probably not going to be done anytime soon. You can still use Doxygen file from DALI source to generate development docs.
from dali.
Hi,
the support for External Source callbacks/iterator in TensorFlow via the tf.data.Dataset compatible API was merged and will be present in DALI 1.5.
DALI 1.4 already supports inputs from other tf.data.Datasets in the experimental.DALIDatasetWithInputs.
You can see more in the documentation: https://docs.nvidia.com/deeplearning/dali/main-user-guide/docs/plugins/tensorflow_plugin_api.html#experimental
Tutorial is under review in: #3212
from dali.
Hi,
DALI 1.5 is out and can be used to test the new functionality.
from dali.
Related Issues (20)
- Ran of GPU memory when using Imagenet but not COCO-Stuff 2017 HOT 20
- how to do a image zoom? HOT 9
- Unrecognized image format HOT 1
- Inference Model without converting TensorGPU to TensorCPU HOT 2
- Stack a batch in one batch of this shape HOT 4
- How to get center crop HOT 6
- GitHub Roadmap 2024 HOT 4
- Why is the val_loss curve trained through Dali data loading method oscillating? HOT 2
- NumPy decoder HOT 8
- Using JPEG hardware decoder with DALI on A100 GPU
- Extracting properties from a list of DataNodes HOT 5
- A100 hardware decoder HOT 1
- Extract motion vectors HOT 7
- Segmentation fault when using 'mixed' HOT 5
- Bbox Pruning Too Aggressive? HOT 5
- Indexing video with binary mask HOT 1
- source_info tensor not guaranteed to contain correct data HOT 1
- 16 bit gray scale Image read error HOT 1
- COCO Reader pixelwise_masks Emtpy Output HOT 7
- Dali on Jetson: nvidia.dali.fn.readers.video_resize is missing HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dali.