Giter VIP home page Giter VIP logo

nvidia / dali Goto Github PK

View Code? Open in Web Editor NEW
4.9K 93.0 604.0 370.27 MB

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Home Page: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

License: Apache License 2.0

CMake 1.86% C++ 53.88% Python 31.12% C 1.41% Cuda 10.43% Shell 1.28% Dockerfile 0.03%
fast-data-pipeline image-augmentation data-augmentation image-processing data-processing deep-learning machine-learning python neural-network gpu

dali's Introduction

License Documentation Format

NVIDIA DALI

The NVIDIA Data Loading Library (DALI) is a GPU-accelerated library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built in data loaders and data iterators in popular deep learning frameworks.

Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference.

DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline. Features such as prefetching, parallel execution, and batch processing are handled transparently for the user.

In addition, the deep learning frameworks have multiple data pre-processing implementations, resulting in challenges such as portability of training and inference workflows, and code maintainability. Data processing pipelines implemented using DALI are portable because they can easily be retargeted to TensorFlow, PyTorch, MXNet and PaddlePaddle.

DALI Diagram

DALI in action:

Highlights

  • Easy-to-use functional style Python API.
  • Multiple data formats support - LMDB, RecordIO, TFRecord, COCO, JPEG, JPEG 2000, WAV, FLAC, OGG, H.264, VP9 and HEVC.
  • Portable across popular deep learning frameworks: TensorFlow, PyTorch, MXNet, PaddlePaddle, JAX.
  • Supports CPU and GPU execution.
  • Scalable across multiple GPUs.
  • Flexible graphs let developers create custom pipelines.
  • Extensible for user-specific needs with custom operators.
  • Accelerates image classification (ResNet-50), object detection (SSD) workloads as well as ASR models (Jasper, RNN-T).
  • Allows direct data path between storage and GPU memory with GPUDirect Storage.
  • Easy integration with NVIDIA Triton Inference Server with DALI TRITON Backend.
  • Open source.

DALI success stories:


DALI Roadmap

The following issue represents a high-level overview of our 2023 plan. You should be aware that this roadmap may change at any time and the order below does not reflect any type of priority.

We strongly encourage you to comment on our roadmap and provide us feedback on the mentioned GitHub issue.


Installing DALI

To install the latest DALI release for the latest CUDA version (12.x):

pip install nvidia-dali-cuda120
# or
pip install --extra-index-url https://pypi.nvidia.com  --upgrade nvidia-dali-cuda120

DALI requires NVIDIA driver supporting the appropriate CUDA version. In case of DALI based on CUDA 12, it requires CUDA Toolkit to be installed.

DALI comes preinstalled in the TensorFlow, PyTorch, NVIDIA Optimized Deep Learning Framework, powered by Apache MXNet, and PaddlePaddle containers on NVIDIA GPU Cloud.

For other installation paths (TensorFlow plugin, older CUDA version, nightly and weekly builds, etc), and specific requirements please refer to the Installation Guide.

To build DALI from source, please refer to the Compilation Guide.


Examples and Tutorials

An introduction to DALI can be found in the Getting Started page.

More advanced examples can be found in the Examples and Tutorials page.

For an interactive version (Jupyter notebook) of the examples, go to the docs/examples directory.

Note: Select the Latest Release Documentation or the Nightly Release Documentation, which stays in sync with the main branch, depending on your version.


Additional Resources

  • GPU Technology Conference 2023; Developer Breakout: Accelerating Enterprise Workflows With Triton Server and DALI; Brandon Tuttle: event.
  • GPU Technology Conference 2023; GPU-Accelerating End-to-End Geospatial Workflows; Kevin Green: event.
  • GPU Technology Conference 2022; Effective NVIDIA DALI: Accelerating Real-life Deep-learning Applications; Rafał Banaś: event.
  • GPU Technology Conference 2022; Introduction to NVIDIA DALI: GPU-accelerated Data Preprocessing; Joaquin Anton Guirao: event.
  • GPU Technology Conference 2021; NVIDIA DALI: GPU-Powered Data Preprocessing by Krzysztof Łęcki and Michał Szołucha: event.
  • GPU Technology Conference 2020; Fast Data Pre-Processing with NVIDIA Data Loading Library (DALI); Albert Wolant, Joaquin Anton Guirao: recording.
  • GPU Technology Conference 2019; Fast AI data pre-preprocessing with DALI; Janusz Lisiecki, Michał Zientkiewicz: slides, recording.
  • GPU Technology Conference 2019; Integration of DALI with TensorRT on Xavier; Josh Park and Anurag Dixit: slides, recording.
  • GPU Technology Conference 2018; Fast data pipeline for deep learning training, T. Gale, S. Layton and P. Trędak: slides, recording.
  • Developer Page.
  • Blog Posts.

Contributing to DALI

We welcome contributions to DALI. To contribute to DALI and make pull requests, follow the guidelines outlined in the Contributing document.

If you are looking for a task good for the start please check one from external contribution welcome label.

Reporting Problems, Asking Questions

We appreciate feedback, questions or bug reports. When you need help with the code, follow the process outlined in the Stack Overflow document. Ensure that the posted examples are:

  • minimal: Use as little code as possible that still produces the same problem.
  • complete: Provide all parts needed to reproduce the problem. Check if you can strip external dependency and still show the problem. The less time we spend on reproducing the problems, the more time we can dedicate to the fixes.
  • verifiable: Test the code you are about to provide, to make sure that it reproduces the problem. Remove all other problems that are not related to your request.

Acknowledgements

DALI was originally built with major contributions from Trevor Gale, Przemek Tredak, Simon Layton, Andrei Ivanov and Serge Panev.

dali's People

Contributors

5had3z avatar a-sansanwal avatar aderylo avatar alexbula avatar awolant avatar azrael417 avatar banasraf avatar barci2 avatar cclauss avatar cliffwoolley avatar cyyever avatar drivanov avatar havaker avatar jantonguirao avatar januszl avatar joehandzik avatar kh4l avatar klecki avatar ksztenderski avatar kychennv avatar matthew-frank avatar mzient avatar prak-nv avatar pribalta avatar ptrendx avatar staniewzki avatar stiepan avatar szalpal avatar szkarpinski avatar willthefrog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dali's Issues

Add example with CAFFE2

Would it be possible to add an example with piping to CAFFE2?
Currently I only found such example on some NVIDIA presentation. The LMDB example, only relates for reading CAFFE2 data, without piping it to CAFFE2.

It will be great to have a complete and official example.

Thank you

NVJPEG error "3" Current pipeline object is no longer valid

i use MXNet1.3.0 to train resnext50 by fp16 in 4*TitanV. To support the speed of TitanV, i replace the source io named ImageRecordIter with DALI. At beginning,the program running normally. After a while, it tells the error:

root INFO Epoch[0] Rank[0] Batch[20] TotalIter[20] Train:0.002(0.003)	kv_sync:0.072(0.673)	data:0.209(0.881)	iter_total_time:0.226(0.936)	Speed: 1366.24 samples/sec	accuracy=0.000744	top_k_accuracy_5=0.005580
root INFO Epoch[0] Rank[0] Batch[40] TotalIter[40] Train:0.003(0.003)	kv_sync:0.091(0.392)	data:0.113(0.524)	iter_total_time:0.234(0.607)	Speed: 1953.21 samples/sec	accuracy=0.001270	top_k_accuracy_5=0.004883
......
......
root INFO Epoch[0] Rank[0] Batch[920] TotalIter[920] Train:0.003(0.003)	kv_sync:0.094(0.119)	data:0.097(0.153)	iter_total_time:0.214(0.234)	Speed: 2371.57 samples/sec	accuracy=0.001465	top_k_accuracy_5=0.004687
root INFO Epoch[0] Rank[0] Batch[940] TotalIter[940] Train:0.002(0.003)	kv_sync:0.114(0.119)	data:0.117(0.153)	iter_total_time:0.224(0.233)	Speed: 2390.58 samples/sec	accuracy=0.000879	top_k_accuracy_5=0.004297
Traceback (most recent call last):
  File "/mnt/truenas/scratch/xiaotao.chen/softwares/resnet.mxnet/train.py", line 156, in <module>
    main(config)
  File "/mnt/truenas/scratch/xiaotao.chen/softwares/resnet.mxnet/train.py", line 151, in main
    kvstore=kv)
  File "/mnt/truenas/scratch/xiaotao.chen/softwares/resnet.mxnet/core/solver.py", line 113, in fit
    next_data_batch = next(data_iter)
  File "/mnt/truenas/scratch/xiaotao.chen/py-envs/py27_mxnet0722/local/lib/python2.7/site-packages/nvidia/dali/plugin/mxnet.py", line 169, in next
    return self.__next__();
  File "/mnt/truenas/scratch/xiaotao.chen/py-envs/py27_mxnet0722/local/lib/python2.7/site-packages/nvidia/dali/plugin/mxnet.py", line 126, in __next__
    outputs.append(p.outputs())
  File "/mnt/truenas/scratch/xiaotao.chen/py-envs/py27_mxnet0722/local/lib/python2.7/site-packages/nvidia/dali/pipeline.py", line 239, in outputs
    return self._pipe.Outputs()
RuntimeError: Critical error in pipeline: [/mnt/truenas/scratch/xiaotao.chen/DALI/dali/pipeline/operators/decoder/nvjpeg_decoder.h:167] NVJPEG error "3"
Current pipeline object is no longer valid.
./resnet.train.sh: line 13: 21009 Segmentation fault      (core dumped) python /mnt/truenas/scratch/xiaotao.chen/softwares/resnet.mxnet/train.py

According to the log info, it seems like NVJPEG bug.

My environment as follows.

Package                       Version   Location                                                                            
----------------------------- --------- ------------------------------------------------------------------------------------
backports.functools-lru-cache 1.5       
certifi                       2018.8.13 
chardet                       3.0.4     
cycler                        0.10.0    
Cython                        0.28.5    
easydict                      1.7       
future                        0.16.0    
futures                       3.2.0     
graphviz                      0.8.4     
idna                          2.6       
kiwisolver                    1.0.1     
matplotlib                    2.2.3     
mxnet                         1.3.0     /mnt/truenas/scratch/xiaotao.chen/softwares/mxnet0722/python                        
numpy                         1.15.0    
nvidia-dali                   0.3.0     /mnt/truenas/scratch/xiaotao.chen/py-envs/py27_mxnet0722/lib/python2.7/site-packages
opencv-python                 3.4.2.17  
pip                           18.0      
pkg-resources                 0.0.0     
pyarrow                       0.10.0    
pyparsing                     2.2.0     
python-dateutil               2.7.3     
pytz                          2018.5    
pyzmq                         17.1.2    
requests                      2.18.4    
setuptools                    40.0.0    
six                           1.11.0    
subprocess32                  3.5.2     
urllib3                       1.22      
wheel                         0.31.1    
zmq                           0.0.0 

the main configuration as below:

 'batch_size': 512,
 'data_dir': '/mnt/truenas/scratch/algorithm/datasets/imagenet/imagenet_data_new/',
 'data_nthreads': 16,
 'data_type': 'float16',
 'dataset': 'imagenet',
 'depth': 50,
 'gpu_list': [0, 1, 2, 3],
 'grad_scale': 128.0,
 'image_shape': [3, 224, 224],
 'kv_store': 'device',
 'multi_precision': True,
 'network': 'resnet'

the image's shape in train.rec and val.rec keep in it's source, i haven't do any process while using im2rec.py to transform ILSVRC2012 to rec

my dali iter implementation is here.

import nvidia.dali.ops as ops
import nvidia.dali.types as types
from nvidia.dali.plugin.mxnet import DALIClassificationIterator

class HybridTrainPipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, num_gpus, db_folder):
        super(HybridTrainPipe, self).__init__(batch_size, num_threads, device_id, seed=12 + device_id)
        self.input = ops.MXNetReader(path=[os.path.join(db_folder, "train.rec")], index_path=[os.path.join(db_folder, "train.idx")],
                                     random_shuffle=True, shard_id=device_id, num_shards=num_gpus)
        self.decode = ops.nvJPEGDecoder(device="mixed", output_type=types.RGB)
        self.rrc = ops.RandomResizedCrop(device="gpu", size=(224, 224))
        self.cmnp = ops.CropMirrorNormalize(device="gpu",
                                            output_dtype=types.FLOAT,
                                            output_layout=types.NCHW,
                                            crop=(224, 224),
                                            image_type=types.RGB,
                                            mean=[0.485 * 255,0.456 * 255,0.406 * 255],
                                            std=[0.229 * 255,0.224 * 255,0.225 * 255])
        self.coin = ops.CoinFlip(probability=0.5)

    def define_graph(self):
        rng = self.coin()
        self.jpegs, self.labels = self.input(name = "Reader")
        images = self.decode(self.jpegs)
        images = self.rrc(images)
        output = self.cmnp(images, mirror = rng)
        return [output, self.labels]

class  HybridValPipe (Pipeline):
    def __init__(self, batch_size, num_threads, device_id, num_gpus, db_folder):
        super(HybridValPipe, self).__init__(batch_size, num_threads, device_id, seed=12 + device_id)
        self.input = ops.MXNetReader(path=[os.path.join(db_folder, "val.rec")], index_path=[os.path.join(db_folder, "val.idx")],
                                     random_shuffle=False, shard_id=device_id, num_shards=num_gpus)
        self.decode = ops.nvJPEGDecoder(device="mixed", output_type=types.RGB)
        self.rs = ops.Resize(device="gpu", resize_x=256, resize_y=256)
        self.cmnp = ops.CropMirrorNormalize(device="gpu",
                                            output_dtype=types.FLOAT,
                                            output_layout=types.NCHW,
                                            crop=(224, 224),
                                            image_type=types.RGB,
                                            mean=[0.485 * 255,0.456 * 255,0.406 * 255],
                                            std=[0.229 * 255,0.224 * 255,0.225 * 255])

    def define_graph(self):
        self.jpegs, self.labels = self.input(name="Reader")
        images = self.decode(self.jpegs)
        images = self.rs(images)
        output = self.cmnp(images)
        return [output, self.labels]


def get_dali_iter(data_dir, batch_size, kv, image_shape, num_gpus):
    num_examples = 1281167
    trainpipes = [HybridTrainPipe(batch_size=batch_size//num_gpus, num_threads=2, device_id=i, num_gpus=num_gpus, db_folder=data_dir) for i in range(num_gpus)]
    valpipes = [HybridValPipe(batch_size=batch_size//num_gpus, num_threads=2, device_id=i, num_gpus=num_gpus, db_folder=data_dir) for i in range(num_gpus)]

    trainpipes[0].build()
    valpipes[0].build()

    print("Training pipeline epoch size: {}".format(trainpipes[0].epoch_size("Reader")))
    print("Validation pipeline epoch size: {}".format(valpipes[0].epoch_size("Reader")))

    dali_train_iter = DALIClassificationIterator(trainpipes, trainpipes[0].epoch_size("Reader"))
    dali_val_iter = DALIClassificationIterator(valpipes, valpipes[0].epoch_size("Reader"))

    return dali_train_iter, dali_val_iter, num_examples

Benchmark results against native framework's augmentations?

Hi,

Is there any published benchmark results available against native frameworks TensorFlow, PyTorch, MxNet?
Given same amount of data augmentations on same hardware. Either full training speed (images/sec) or data augmentations only.

Can't use tensorflow example with TFRecord file or FileReader

the directory "example/tensorflow" introduces a method to use DALI with tensorflow ,but it's implemented by ops.MXNetReader to read images and labels . When I want to use ops.FileReader or ops.TFRecordReader to reader file ,the error occur :

DALI data_tensor_shape = ShapeAt(&pipe_handle_, 0) failed: [/opt/dali/dali/pipeline/data/tensor.h:188] Assert on "tl->IsDenseTensor()" failed: All tensors in the input TensorList must have the same shape and be densely packed.

I have no idea what is the error mean, and each pipeline classes return the same data struct, why MXNetReader can work while the others can not?

my code(which is mostly copied from the example):

class RN50Pipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, num_gpus):
        super(RN50Pipeline, self).__init__(batch_size,
                                         num_threads,
                                         device_id)
        self.input = ops.MXNetReader(path = rec_files, index_path = idx_files,
                                     shard_id = device_id, num_shards = num_gpus)

        self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
        self.resize = ops.Resize(device = "gpu", random_resize = True,
                                 resize_a = 256, resize_b = 480,
                                 image_type = types.RGB,
                                 interp_type = types.INTERP_LINEAR)
        self.cmn = ops.CropMirrorNormalize(device = "gpu",
                                            output_dtype = types.FLOAT,
                                            crop = (227, 227),
                                            image_type = types.RGB,
                                            mean = [128., 128., 128.],
                                            std = [1., 1., 1.])
        self.uniform = ops.Uniform(range = (0.0, 1.0))

    def define_graph(self):
        inputs, labels = self.input(name="Reader")
        images = self.decode(inputs)
        images = self.resize(images)
        output = self.cmn(images, crop_pos_x = self.uniform(),
                          crop_pos_y = self.uniform())
        return (output, labels.gpu())
class FileReadPipeline(Pipeline):
    def __init__(self,batch_size, num_threads, device_id):
        super(FileReadPipeline, self).__init__(batch_size, num_threads, device_id, seed = 12)
        self.input = ops.FileReader(file_root = image_dir, random_shuffle = True, initial_fill = 21)
        self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
        self.resize = ops.Resize(device = "gpu", random_resize = True, 
                                 resize_a = 256, resize_b = 480,
                                 image_type = types.RGB,
                                 interp_type = types.INTERP_LINEAR)
        self.cmn = ops.CropMirrorNormalize(device = "gpu",
                                            output_dtype = types.FLOAT,
                                            crop = (227, 227),
                                            image_type = types.RGB,
                                            mean = [128., 128., 128.],
                                            std = [1., 1., 1.])
        self.uniform = ops.Uniform(range = (0.0, 1.0))
    def define_graph(self):
        jpegs, labels = self.input()
        images = self.decode(jpegs)
        resized_images = self.resize(images)
        output = self.cmn(resized_images, crop_pos_x = self.uniform(),
                           crop_pos_y = self.uniform())
        # images are on the GPU
        return (output, labels.gpu())

class TFRecordPipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(TFRecordPipeline, self).__init__(batch_size,
                                         num_threads,
                                         device_id)
        self.input = ops.TFRecordReader(path = tfrecord, 
                                        index_path = tfrecord_idx,
                                        features = {"image/encoded" : tfrec.FixedLenFeature((), tfrec.string, ""),
                                         "image/class/text":          tfrec.FixedLenFeature((), tfrec.string, "")})
        self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
        self.resize = ops.Resize(device = "gpu", random_resize = True,
                                 resize_a = 256, resize_b = 480,
                                 image_type = types.RGB,
                                 interp_type = types.INTERP_LINEAR)
        self.cmn = ops.CropMirrorNormalize(device = "gpu",
                                            output_dtype = types.FLOAT,
                                            crop = (227, 227),
                                            image_type = types.RGB,
                                            mean = [128., 128., 128.],
                                            std = [1., 1., 1.])
        self.uniform = ops.Uniform(range = (0.0, 1.0))

    def define_graph(self):
        inputs = self.input()
        images = self.decode(inputs["image/encoded"])
        resized_images = self.resize(images)
        output = self.cmn(resized_images, crop_pos_x = self.uniform(),
                           crop_pos_y = self.uniform())
        return (output, inputs["image/class/text"].gpu())

def get_batch_test_dali(batch_size):
  
    global DEVICES

    pipes = [FileReadPipeline(batch_size=BATCH_SIZE, num_threads=2, device_id = device_id) for device_id in range(DEVICES)]#not work
    # pipes = [RN50Pipeline(batch_size=BATCH_SIZE, num_threads=2, device_id = device_id,num_gpus = DEVICES) for device_id in range(DEVICES)]#work
    # pipes = [TFRecordPipeline(batch_size=batch_size, num_threads=2, device_id = 0) for device_id in range(DEVICES)]#not work

    serialized_pipes = [pipe.serialize() for pipe in pipes]
    del pipes
    daliop = dali_tf.DALIIterator()
    images = []
    labels = []
    for d in range(DEVICES):
        with tf.device('/gpu:%i' % d):
            image, label = daliop(serialized_pipeline = serialized_pipes[d],
                batch_size = BATCH_SIZE,
                height = 227,
                width = 227,
                device_id = d)
            images.append(image)
            labels.append(label)

    return [images, labels]

def main_run():
 
    test_batch = get_batch_test_dali( BATCH_SIZE)
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        step = 0
        while step < NUM_DATA / BATCH_SIZE + 1:
            print('step', step)
            imgs = []
            get_batch = sess.run(test_batch)  #error occurs
            for i in range(len(images)):
                img = images[0][0][i].transpose((1,2,0)) + 128
                imgs.append(img)
            maxx = sess.run(softmax, feed_dict={x: imgs})
            step = step + 1
   sess.close()

nvidia.dali.plugin.tf makes my label incorrect

When I use pipe.build() and pipe.run() to enter the result, the label output is correct, but when I use nvidia.dali.plugin.tf and then use sess.run() to output the result, label is all zero.
my test file format: (image_path, label)
/n01704323/n01704323_1569.JPEG 0
/n01704323/n01704323_5833.JPEG 0
/n01704323/n01704323_2956.JPEG 0
/n01704323/n01704323_4976.JPEG 0
/n01704323/n01704323_8911.JPEG 0
/n01704323/n01704323_9292.JPEG 0
/n01704323/n01704323_1379.JPEG 0
/n01704323/n01704323_8648.JPEG 0
/n01704323/n01704323_8497.JPEG 0
/n01704323/n01704323_2732.JPEG 0
/n02017213/n02017213_3476.JPEG 1
/n02017213/n02017213_6601.JPEG 1
/n02017213/n02017213_3582.JPEG 1
/n02017213/n02017213_765.JPEG 1
/n02017213/n02017213_3461.JPEG 1
/n02017213/n02017213_3894.JPEG 1
/n02017213/n02017213_513.JPEG 1
/n02017213/n02017213_6029.JPEG 1
/n02017213/n02017213_43.JPEG 1
/n02017213/n02017213_5432.JPEG 1
/n02017213/n02017213_7325.JPEG 1
/n02017213/n02017213_7622.JPEG 1
/n02017213/n02017213_7330.JPEG 1

my test code:
`
#First test procedure:
pipe = FileReadPipeline(batch_size=32, num_threads=1, device_id=0, random_shuffle=True,
file_list=save_file_path, initial_fill=4096)
pipe.build()
for _ in range(1):
pipe_out = pipe.run()
images, labels = pipe_out
labels_tensor = labels.asCPU().as_tensor()
images = images.asCPU()
print(np.array(labels_tensor))
# print(images.at(9))

output:
[[1]
[0]
[1]
[0]
[1]
[1]
[1]
[0]
[1]
[1]
[0]
[1]
[1]
[0]
[0]
[0]
[1]
[0]
[0]
[0]
[1]
[1]
[0]
[0]
[0]
[1]
[0]
[1]
[0]
[1]
[0]
[0]]
#--------------------------------------#
#Second test procedure:
device_id = 0
pipe = FileReadPipeline(batch_size=32, num_threads=1, device_id=0, random_shuffle=True,
file_list=save_file_path, initial_fill=4096)
seri_pipe = pipe.serialize()
daliop = dali_tf.DALIIterator()
with tf.device('/gpu:%i' % device_id):
image, label = daliop(serialized_pipeline=seri_pipe,
batch_size=32,
height=224,
width=224,
device_id=device_id)
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
config = tf.ConfigProto(gpu_options=gpu_options)
with tf.Session(config=config) as sess:
ims, labels = sess.run([image, label])
print(labels)
`
output:
[[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 0.00000000e+00]]

`
class CommonPipeline(Pipeline):
def init(self, batch_size, num_threads, device_id, size=(224, 224), crop_size=(224, 224),
mean=IMAGE_MEAN,
std=IMAGE_STD,
channel_format=types.NHWC,
probability=0.5,
device='gpu',
decode_method='in_gpu'):
super(CommonPipeline, self).init(batch_size, num_threads, device_id)
assert device == 'gpu' and decode_method == 'in_gpu'
if decode_method == 'in_gpu':
self.decode = ops.nvJPEGDecoder(device="mixed", output_type=types.RGB)
else:
self.decode = ops.HostDecoder(output_type=types.RGB)

    self.rrc = ops.RandomResizedCrop(device=device, size=size)
    self.cmnp = ops.CropMirrorNormalize(device=device,
                                        output_dtype=types.FLOAT,
                                        output_layout=channel_format,
                                        image_type=types.RGB,
                                        crop=crop_size,
                                        mean=mean,
                                        std=std)
    self.coin = ops.CoinFlip(probability=probability)
    # self.cast = ops.Cast(device=device, dtype=types.FLOAT16)

def base_define_graph(self, inputs, labels):
    rng = self.coin()
    images = self.decode(inputs)
    images = self.rrc(images)
    images = self.cmnp(images, mirror=rng)
    # images = self.cast(images)
    return images, labels.gpu()

class FileReadPipeline(CommonPipeline):
def init(self,
batch_size,
num_threads,
device_id,
random_shuffle,
file_root='',
file_list='',
size=(224, 224),
crop_size=(224, 224),
mean=IMAGE_MEAN,
std=IMAGE_STD,
channel_format=types.NHWC,
probability=0.5,
device='gpu',
decode_method='in_gpu',
initial_fill=4096):
super(FileReadPipeline, self).init(batch_size=batch_size,
num_threads=num_threads,
device_id=device_id,
size=size,
crop_size=crop_size,
mean=mean,
std=std,
channel_format=channel_format,
probability=probability,
device=device,
decode_method=decode_method)
self.input = ops.FileReader(file_root=file_root, file_list=file_list,
random_shuffle=random_shuffle, initial_fill=initial_fill)

def define_graph(self):
    images, labels = self.input()
    return self.base_define_graph(images, labels)`

nvidia.dali.plugin.tf makes my label incorrect

When I use pipe.build() and pipe.run() to enter the result, the label output is correct, but when I use nvidia.dali.plugin.tf and then use sess.run() to output the result, label is all zero.
my test file format: (image_path, label)
/n01704323/n01704323_1569.JPEG 0
/n01704323/n01704323_5833.JPEG 0
/n01704323/n01704323_2956.JPEG 0
/n01704323/n01704323_4976.JPEG 0
/n01704323/n01704323_8911.JPEG 0
/n01704323/n01704323_9292.JPEG 0
/n01704323/n01704323_1379.JPEG 0
/n01704323/n01704323_8648.JPEG 0
/n01704323/n01704323_8497.JPEG 0
/n01704323/n01704323_2732.JPEG 0
/n02017213/n02017213_3476.JPEG 1
/n02017213/n02017213_6601.JPEG 1
/n02017213/n02017213_3582.JPEG 1
/n02017213/n02017213_765.JPEG 1
/n02017213/n02017213_3461.JPEG 1
/n02017213/n02017213_3894.JPEG 1
/n02017213/n02017213_513.JPEG 1
/n02017213/n02017213_6029.JPEG 1
/n02017213/n02017213_43.JPEG 1
/n02017213/n02017213_5432.JPEG 1
/n02017213/n02017213_7325.JPEG 1
/n02017213/n02017213_7622.JPEG 1
/n02017213/n02017213_7330.JPEG 1

my test code:
`
#First test procedure:
pipe = FileReadPipeline(batch_size=32, num_threads=1, device_id=0, random_shuffle=True,
file_list=save_file_path, initial_fill=4096)
pipe.build()
for _ in range(1):
pipe_out = pipe.run()
images, labels = pipe_out
labels_tensor = labels.asCPU().as_tensor()
images = images.asCPU()
print(np.array(labels_tensor))

print(images.at(9))

output:
[[1]
[0]
[1]
[0]
[1]
[1]
[1]
[0]
[1]
[1]
[0]
[1]
[1]
[0]
[0]
[0]
[1]
[0]
[0]
[0]
[1]
[1]
[0]
[0]
[0]
[1]
[0]
[1]
[0]
[1]
[0]
[0]]
#--------------------------------------#
#Second test procedure:
device_id = 0
pipe = FileReadPipeline(batch_size=32, num_threads=1, device_id=0, random_shuffle=True,
file_list=save_file_path, initial_fill=4096)
seri_pipe = pipe.serialize()
daliop = dali_tf.DALIIterator()
with tf.device('/gpu:%i' % device_id):
image, label = daliop(serialized_pipeline=seri_pipe,
batch_size=32,
height=224,
width=224,
device_id=device_id)
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
config = tf.ConfigProto(gpu_options=gpu_options)
with tf.Session(config=config) as sess:
ims, labels = sess.run([image, label])
print(labels)
`
output:
[[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 1.40129846e-45]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 0.00000000e+00]
[ 0.00000000e+00]]

`
class CommonPipeline(Pipeline):
def init(self, batch_size, num_threads, device_id, size=(224, 224), crop_size=(224, 224),
mean=IMAGE_MEAN,
std=IMAGE_STD,
channel_format=types.NHWC,
probability=0.5,
device='gpu',
decode_method='in_gpu'):
super(CommonPipeline, self).init(batch_size, num_threads, device_id)
assert device == 'gpu' and decode_method == 'in_gpu'
if decode_method == 'in_gpu':
self.decode = ops.nvJPEGDecoder(device="mixed", output_type=types.RGB)
else:
self.decode = ops.HostDecoder(output_type=types.RGB)

self.rrc = ops.RandomResizedCrop(device=device, size=size)
self.cmnp = ops.CropMirrorNormalize(device=device,
                                    output_dtype=types.FLOAT,
                                    output_layout=channel_format,
                                    image_type=types.RGB,
                                    crop=crop_size,
                                    mean=mean,
                                    std=std)
self.coin = ops.CoinFlip(probability=probability)
# self.cast = ops.Cast(device=device, dtype=types.FLOAT16)

def base_define_graph(self, inputs, labels):
rng = self.coin()
images = self.decode(inputs)
images = self.rrc(images)
images = self.cmnp(images, mirror=rng)
# images = self.cast(images)
return images, labels.gpu()
class FileReadPipeline(CommonPipeline):
def init(self,
batch_size,
num_threads,
device_id,
random_shuffle,
file_root='',
file_list='',
size=(224, 224),
crop_size=(224, 224),
mean=IMAGE_MEAN,
std=IMAGE_STD,
channel_format=types.NHWC,
probability=0.5,
device='gpu',
decode_method='in_gpu',
initial_fill=4096):
super(FileReadPipeline, self).init(batch_size=batch_size,
num_threads=num_threads,
device_id=device_id,
size=size,
crop_size=crop_size,
mean=mean,
std=std,
channel_format=channel_format,
probability=probability,
device=device,
decode_method=decode_method)
self.input = ops.FileReader(file_root=file_root, file_list=file_list,
random_shuffle=random_shuffle, initial_fill=initial_fill)

def define_graph(self):
images, labels = self.input()
return self.base_define_graph(images, labels)`

uses Dali to process images under multi GPU and uses multi GPU inference。

i try to uses Dali to process images under multi GPU and uses multi GPU inference。
code:

 class FileReadPipeline(Pipeline):
        def __init__(self, batch_size, num_threads, device_id,path,image_size):
            super(FileReadPipeline, self).__init__(batch_size, num_threads, device_id, seed=12)
            self.input = ops.FileReader(file_root=path, random_shuffle=True, initial_fill=21)
            self.decode = ops.nvJPEGDecoder(device="mixed", output_type=types.RGB)
            self.resize = ops.Resize(device="gpu", random_resize=True,
                             resize_a=image_size, resize_b=image_size,
                             image_type=types.RGB,
                             interp_type=types.INTERP_LINEAR)
            self.cmn = ops.CropMirrorNormalize(device="gpu",
                                       output_dtype=types.FLOAT,
                                       crop=(227, 227),
                                       image_type=types.RGB,
                                       mean=[128., 128., 128.],
                                       std=[1., 1., 1.])
            self.uniform = ops.Uniform(range=(0.0, 1.0))

       def define_graph(self):
            jpegs, labels = self.input()
            images = self.decode(jpegs)
            resized_images = self.resize(images)
            output = self.cmn(resized_images, crop_pos_x=self.uniform(),
                      crop_pos_y=self.uniform())
            return (output, labels.gpu())


 def get_batch_dali(batch_size,image_size,path,gpu_count):
      pipes = [FileReadPipeline(batch_size=batch_size, num_threads=24, 
      device_id=device_id,path=path,image_size=image_size) for device_id in
           range(gpu_count)]
      serialized_pipes = [pipe.serialize() for pipe in pipes]
      del pipes
      daliop = dali_tf.DALIIterator()
      images = []
      labels = []
      for d in range(gpu_count):
          with tf.device('/gpu:%i' % d):
              image, label = daliop(serialized_pipeline=serialized_pipes[d],
                              batch_size=batch_size,
                              height=image_size,
                              width=image_size,
                              device_id=d)
              images.append(image)
              labels.append(label)
    return [images, labels]
def main_run():
    test_batch = get_batch_test_dali(BATCH_SIZE,IMAGE_SIZE,PATH,GPU_COUNT)
    for i in range(GPU_COUNT):
        with tf.device('/gpu:%d' % i):
            x = tf.placeholder(tf.float32, shape=[BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, 3], name='x')
            model = alexnet.alexNet(x, str(i), dropoutPro, classNum, skip)
            softmax_arr.append(model)
            placeholder_arr.append(x)
    config = tf.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 0.5
    with tf.Session(config=config) as sess:
         sess.run(tf.global_variables_initializer())
        threads = []
        for i in range(len(softmax_arr)):
            threads.append(threading.Thread(target=run_sess, args=(str(i), sess,test_batch, softmax_arr[i], placeholder_arr[i])))
        for t in threads:
            t.start()
        for t in threads:
            t.join()
        sess.close()
main_run()

###################################
and i got some error:

InternalError: DALI label_tensor_shape = daliShapeAt(&pipe_handle_, 1) failed: [/opt/dali/dali/pipeline/workspace/workspace.h:123] Index 1 out of range [0, 1).
[[Node: Dali_1 = Dalibatch_size=16, device_id=1, height=227, num_threads=2, serialized_pipeline="\010\030\0...0\000@\001", width=227, _device="/job:localhost/replica:0/task:0/device:GPU:1"]]
[[Node: Dali_1/_27 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_20_Dali_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

#############
If I set the number of GPU to 1, it will be normal.

Error when using not resized RecordIO file.

I'm testing the mxnet-resnet50.ipynb example by converting it into a python script.

I use a record io file with no image resize, to preserve the image in original quality.

Then I got the following error:

$ python mxnet-resnet50.py
Training pipeline epoch size: 1281167
Validation pipeline epoch size: 50000
Traceback (most recent call last):
  File "mxnet-resnet50.py", line 127, in <module>
    dali_val_iter = DALIClassificationIterator(valpipes, valpipes[0].epoch_size("Reader"))
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/plugin/mxnet.py", line 151, in __init__
    data_layout)
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/plugin/mxnet.py", line 70, in __init__
    self._first_batch = self.next()
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/plugin/mxnet.py", line 92, in __next__
    outputs.append(p.run())
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/pipeline.py", line 164, in run
    return self.outputs()
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/pipeline.py", line 153, in outputs
    return self._pipe.Outputs()
RuntimeError: Critical error in pipeline: [/opt/dali/dali/pipeline/operators/fused/crop_mirror_normalize.cu:346] Assert on "H >= crop_h_" failed

My guess is that there are some images originally with one edge shorter than 224px, thus breaks the check. In MXNet we upscale the image before cropping if necessary: https://github.com/apache/incubator-mxnet/blob/master/src/io/image_aug_default.cc#L436

easily install DALI 0.2 use pip?

i can't compiling DALI from source in the docker, can i easily install DALI 0.2 by pip?
because only DALI 0.1 install with the cmd:"pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali".

"
DALI and NGC
DALI is preinstalled in the NVIDIA GPU Cloud TensorFlow, PyTorch, and MXNet containers in versions 18.07 and later.
"
by the way , i pull the docker image TensorFlow 18.08 from NVIDIA GPU Cloud , and find not preinstall in it .....

root@beaff37c96d6:/workspace# python
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

from dali.pipeline import Pipeline
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named 'dali'

[Question on Design of DALI] Meaning of Tensor/TensorList and Different types of Workspace

Hello!

I've noticed the comment in the headers of Workspace ( SampleWorkspace, DeviceWorkspace etc), and corresponding OpType (CPU, GPU, etc).

My question is:
a) Is TensorList certainly represent a "Batch" while Tensor represent a "Sample" ?
b) Is it on purposed that Op running on CPU will process ONLY one Sample at a "Run", while Op running on GPU will process a Batch at a "Run" ?
But what about "Mixed" Op
c) It is true for the pipeline that:
*at the beginning of pipeline only one Sample is processed at a time;
*at some point of pipeline Samples are assembled into Batch (So it is the meaning of Mixed Op);
*after assembled, Batch can no longer be split into isolated Samples
???
d) Finally could you please provide brief architecture of DALI, and explain some important Class in more detail ?

Dali and uber horovod incompatibility issues

`
import tensorflow as tf
import horovod.tensorflow as hvd
from models import imagenet_resnet_v2
from models import resnet_v2
from dataloader import get_batch_images, FileReadPipeline

#loss function
def _loss(logits, labels, weight_decay=1e-4):
labels = tf.cast(labels, tf.int32)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels,
name='cross_entropy_per_example')
loss = tf.reduce_mean(cross_entropy, name='cross_entropy')

l2_losses = [weight_decay * tf.nn.l2_loss(v) for v in tf.trainable_variables()
             if 'beta' not in v.name and 'gamma' not in v.name]

top_1 = tf.nn.in_top_k(logits, labels, 1)
top1_acc = tf.reduce_mean(tf.cast(top_1, tf.float32), name='top_1')
total_loss = loss + tf.add_n(l2_losses)
return total_loss, top1_acc

def main(_):
save_path = '/share5/public/classification_data/imagenet1k/meta/absolute_train.txt'

# Horovod: initialize Horovod.
hvd.init()

#Get the data format required by tensorflow through dali
image, label = get_batch_images(pipe_name=FileReadPipeline, batch_size=32, num_threads=4,
device_id=hvd.local_rank(),
random_shuffle=True,
file_list=save_path,
initial_fill=4096)

label = tf.squeeze(label)

net = imagenet_resnet_v2(50, num_classes=1000, data_format='channels_last')
train_output = net(image, is_training=True)
train_loss, train_top1_acc = _loss(train_output, label)

boundaries = [int(epoch) for epoch in [30, 60, 80, 90]]
values = [0.1 * decay for decay in [1, 0.1, 0.01, 1e-3, 1e-4]]
global_step = tf.contrib.framework.get_or_create_global_step()
learning_rate = tf.train.piecewise_constant(tf.cast(global_step, tf.int32), boundaries, values)

opt = tf.train.RMSPropOptimizer(learning_rate)

# Horovod: add Horovod Distributed Optimizer.
opt = hvd.DistributedOptimizer(opt)
train_op = opt.minimize(train_loss, global_step=global_step)

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
config = tf.ConfigProto(gpu_options=gpu_options)
with tf.Session(config=config) as sess:
    sess.run(tf.global_variables_initializer())
     
    #don't enable them work, but enable them don't work.
    if hvd.local_rank() == 0:
        print('local_rank: {}'.format(hvd.local_rank()))
        # sync_op = hvd.broadcast_global_variables(0)
        # sess.run(sync_op)

    for _ in range(50):
        ims, labs = sess.run([train_op, label])
        print(labs.shape)

`

When I don't enable sync_op = hvd.broadcast_global_variables(0) and sess.run(sync_op) to work properly,it doesn't work when I enable them.
Horovod is a distributed training framework for TensorFlow, Keras, and PyTorch. The goal of Horovod is to make distributed Deep Learning fast and easy to use.
(https://github.com/uber/horovod)

[guangyuan@horovod_project]$ python test_speed.py
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
<dataloader.FileReadPipeline object at 0x7facc76c1650>
/gpu:0
(<tf.Tensor 'Dali:0' shape=(32, 224, 224, 3) dtype=float32>, <tf.Tensor 'Squeeze:0' shape= dtype=int32>)
WARNING:tensorflow:From test_speed.py:82: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Hvd.size: 1
INFO:tensorflow:Hvd.local_rank: 0
INFO:tensorflow:Hvd.rand: 0
2018-07-30 14:54:15.793309: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-30 14:54:15.799875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Graphics Device major: 7 minor: 0 memoryClockRate(GHz): 1.912
pciBusID: 0000:02:00.0
totalMemory: 11.78GiB freeMemory: 10.86GiB
2018-07-30 14:54:16.316063: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties:
name: Graphics Device major: 7 minor: 0 memoryClockRate(GHz): 1.912
pciBusID: 0000:83:00.0
totalMemory: 11.78GiB freeMemory: 10.87GiB
2018-07-30 14:54:16.316167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1
2018-07-30 14:54:17.029607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-30 14:54:17.029664: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 1
2018-07-30 14:54:17.029676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N N
2018-07-30 14:54:17.029683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1: N N
2018-07-30 14:54:17.030162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6029 MB memory) -> physical GPU (device: 0, name: Graphics Device, pci bus id: 0000:02:00.0, compute capability: 7.0)
2018-07-30 14:54:17.103240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 6029 MB memory) -> physical GPU (device: 1, name: Graphics Device, pci bus id: 0000:83:00.0, compute capability: 7.0)
local_rank: 0
[root] *** Process received signal ***
[root] Signal: Segmentation fault (11)
[root] Signal code: Address not mapped (1)
[root] Failing at address: 0x10
Segmentation fault (core dumped)

How can one use DALI's C API, eg. interface with caffe

For now, image operations such as resizing, cropping (like, using opencv) before feeding to caffe model are in host CPU. Can we use DALI to load, decode, resize images in GPU and then directly feed GPU data to caffe?
It seems that DALI's examples are all interfacing with Python based frameworks like TF and that DALI does not contain a high level interface for C/C++.
c_api.h is rather simple, with which one cannot do specified operations on images and it only includes methods on pipeline.

Error while make install

[ 77%] Linking CXX shared library libdali.so
/usr/bin/ld: /usr/lib/x86_64-linux-gnu/libturbojpeg.a(libturbojpeg_la-turbojpeg.o): relocation R_X86_64_32 against `.data' can not be used when making a shared object; recompile with -fPIC
/usr/lib/x86_64-linux-gnu/libturbojpeg.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
dali/CMakeFiles/dali.dir/build.make:2394: recipe for target 'dali/libdali.so' failed
make[2]: *** [dali/libdali.so] Error 1
CMakeFiles/Makefile2:340: recipe for target 'dali/CMakeFiles/dali.dir/all' failed
make[1]: *** [dali/CMakeFiles/dali.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

The error occurs when I run make -j4 install.
What does this mean and how could I fix this?

Call for support to CUDA 8.0 and support environment without libjpeg-turbo or nvjpeg

There are 2 reasons.

  1. Our training environments are still CUDA 8.0 and it may take too much effort to upgrade all training tools to CUDA 9.0 until new machines are deployed.

  2. Since image decoding can hardly be the bottleneck of the whole training pipeline, so image decoding via OpenCV is sufficient for most scenarios. Less dependency make DALI easier to be used in existing training environment.

Make error with cuda 9.0 and cmake 3.8

DALI/dali/image/jpeg.cc:47:8: error: ‘TJSAMP_411’ was not declared in this scope
case TJSAMP_411:
DALI/dali/image/jpeg.cc:142:40: error: invalid conversion from ‘const uint8* {aka const unsigned char*}’ to ‘unsigned char*’ [-fpermissive]
w, 0, h, pixel_format, 0);
/usr/include/turbojpeg.h:729:23: error: initializing argument 2 of ‘int tjDecompress2(tjh
andle, unsigned char*, long unsigned int, unsigned char*, int, int, int, int, int)’ [-fperm
issive]
DLLEXPORT int DLLCALL tjDecompress2(tjhandle handle,

may be version of libjpeg-turbo can't compatible with the branch?

file_list - the parameters of ops.FileReader

| - file_list : string
| Path to the file with a list of pairs file label
| (leave empty to traverse the file_root directory to obtain files and labels) (default value: ``)

self.input = ops.FileReader(file_root='images/kitten', file_list='file_list.txt') is wrong.
The 'file_list.txt' is

cat_9.jpg kitten
cat_1.jpg kitten

RuntimeError: [XXXXX/DALI/dali/pipeline/operators/reader/loader/file_loader.h:104] Assert on "Size() > 0" failed: No files found.
Could you give me a example of file_list?
Thanks!

The way to convert FileReader imported images/labels to normal numpy array

Python 2.7
Tensorflow 1.7
CentOS 7
CUDA 9.2
cuDNN 7.0.5
Installed DALI from pip

I'm trying to use DALI to do the image preprocessing and then use tensorflow do the prediction.
In my understand, DALI shall be able to return an numpy array format of images and labels.
After checking the tutorial gettingstarted and TensorFlow-ResNet50, I'm a little confused about the format.

This phrase mentioned that:

# As it turns out, `TensorList` containing labels can be represented by a tensor, while the `TensorList` containing images cannot.                             
#                                                                                                                                                              
# Let us see, what is the shape and contents of returned labels.                                                                                               

# In[7]:                                                                                                                                                       


import numpy as np

labels_tensor = labels.as_tensor()

print (labels_tensor.shape())
print (np.array(labels_tensor))

I tried this way, here is the code and print out.

test_pipe = IncepV3Pipeline(batch_size=BATCH_SIZE, num_threads=2, device_id = 0, num_gpus = DEVICES)
test_pipe.build()
test_pipe_out = test_pipe.run()
print('<>',test_pipe_out)
ims, las = test_pipe_out
print("Images is_dense_tensor: " + str(ims.is_dense_tensor()))
print("Labels is_dense_tensor: " + str(las.is_dense_tensor()))
las_tensor = las.as_tensor()
print('<>',las_tensor.shape())
print('<>',np.array(las_tensor))
('<>', [<nvidia.dali.backend_impl.TensorListGPU object at 0x7f6d350f8768>, <nvidia.dali.backend_impl.TensorListGPU object at 0x7f6d350f87a0>])
Images is_dense_tensor: True
Labels is_dense_tensor: True
('<>', [16L, 1L])
('<>', array(<nvidia.dali.backend_impl.TensorGPU object at 0x7f6d350f87d8>,
      dtype=object))

Since I want to use TF frozen graph for inference, just simply getting numpy array format of images and labels from DALI pipes would be very useful.

Any idea will be welcome.

Segmentation fault

python-2.7.5
tensorflow-gpu-1.8/1.7
cuda-9.0
cudnn-7.0.5
Titan V
my code:

`class CommonPipeline(Pipeline):
def init(self,
batch_size,
num_threads,
device_id,
size=(224, 224),
crop_size=(224, 224),
mean=IMAGE_MEAN,
std=IMAGE_STD,
channel_format=types.NHWC,
probability=0.5,
device='gpu',
decode_method='in_gpu'):

    super(CommonPipeline, self).__init__(batch_size, num_threads, device_id)

    if decode_method == 'in_gpu':
        self.decode = ops.nvJPEGDecoder(device="mixed", output_type=types.RGB)
    else:
        self.decode = ops.HostDecoder(output_type=types.RGB)

    self.rrc = ops.RandomResizedCrop(device=device, size=size)
    self.cmnp = ops.CropMirrorNormalize(device=device,
                                        output_dtype=types.FLOAT,
                                        output_layout=channel_format,
                                        image_type=types.RGB,
                                        crop=crop_size,
                                        mean=mean,
                                        std=std)
    self.coin = ops.CoinFlip(probability=probability)

def base_define_graph(self, inputs, labels):
    rng = self.coin()
    images = self.decode(inputs)
    images = self.rrc(images)
    images = self.cmnp(images, mirror=rng)
    return images, labels

class FileReadPipeline(CommonPipeline):
def init(self,
batch_size,
num_threads,
device_id,
file_root='',
file_list='',
size=(224, 224),
crop_size=(224, 224),
mean=IMAGE_MEAN,
std=IMAGE_STD,
channel_format=types.NHWC,
probability=0.5,
device='gpu',
decode_method='in_gpu'):
super(FileReadPipeline, self).init(batch_size=batch_size,
num_threads=num_threads,
device_id=device_id,
size=size,
crop_size=crop_size,
mean=mean,
std=std,
channel_format=channel_format,
probability=probability,
device=device,
decode_method=decode_method)
self.input = ops.FileReader(file_root=file_root, file_list=file_list, random_shuffle=True, initial_fill=21)

def define_graph(self):
    images, labels = self.input()
    return self.base_define_graph(images, labels)

class TFRecordPipeline(CommonPipeline):
pass

def get_batch_images(pipe_name,
batch_size,
num_threads,
device_id,
file_root='',
file_list='',
size=(224, 224),
crop_size=(224, 224),
mean=IMAGE_MEAN,
std=IMAGE_STD,
channel_format=types.NHWC,
probability=0.5,
device='gpu',
decode_method='in_gpu'):

pipes = [pipe_name(batch_size=batch_size,
                   num_threads=num_threads,
                   device_id=device_id,
                   file_root=file_root,
                   file_list=file_list,
                   size=size,
                   crop_size=crop_size,
                   mean=mean,
                   std=std,
                   channel_format=channel_format,
                   probability=probability,
                   device=device,
                   decode_method=decode_method)]

serialized_pipes = [pipe.serialize() for pipe in pipes]
del pipes
daliop_t = dali_tf.DALIIterator()

images = []
labels = []

print('/gpu:%i' % device_id)
with tf.device('/gpu:%i' % device_id):
    image, label = daliop_t(serialized_pipeline=serialized_pipes[0],
                            batch_size=batch_size,
                            height=crop_size[0],
                            width=crop_size[1],
                              # num_threads=num_threads,
                            device_id=device_id)
    images.append(image)
    labels.append(label)

    return images, labels

if name == 'main':
img_batch, label_batch = get_batch_images(FileReadPipeline, batch_size=1, num_threads=4,
device_id=0, file_root='images')
print(img_batch)
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
config = tf.ConfigProto(gpu_options=gpu_options)
with tf.Session(config=config) as sess:
ims = sess.run(img_batch)
print(ims.shape)`

error:

[root]$ python dataloader.py
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
read 21 files from 2 directories
/gpu:0
[<tf.Tensor 'Dali:0' shape=(1, 224, 224, 3) dtype=float32>]
2018-07-26 18:49:03.560989: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-26 18:49:03.564573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.911
pciBusID: 0000:02:00.0
totalMemory: 11.90GiB freeMemory: 11.73GiB
2018-07-26 18:49:03.825189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 1 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.911
pciBusID: 0000:83:00.0
totalMemory: 11.90GiB freeMemory: 11.74GiB
2018-07-26 18:49:03.825291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0, 1
2018-07-26 18:49:04.313040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-26 18:49:04.313090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 1
2018-07-26 18:49:04.313100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N N
2018-07-26 18:49:04.313107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 1: N N
2018-07-26 18:49:04.313616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3656 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:02:00.0, compute capability: 6.1)
2018-07-26 18:49:04.347755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3656 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:83:00.0, compute capability: 6.1)
read 21 files from 2 directories
Segmentation fault

dataloading_tfrecord :failed: Got an unexpected argument "resize_shorter"

run dataloading_tfrecord example with multi GPU in docker ::

`
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import nvidia.dali.tfrecord as tfrec
import numpy as np
import matplotlib.pyplot as plt

class TFRecordPipeline(Pipeline):
def init(self, batch_size, num_threads, device_id):
super(TFRecordPipeline, self).init(batch_size,
num_threads,
device_id)
self.input = ops.TFRecordReader(path = tfrecord,
index_path = tfrecord_idx,
features = {"image/encoded" : tfrec.FixedLenFeature((), tfrec.string, ""),
'image/class/label': tfrec.FixedLenFeature([1], tfrec.int64, -1),
'image/class/text': tfrec.FixedLenFeature([ ], tfrec.string, ''),
'image/object/bbox/xmin': tfrec.VarLenFeature(tfrec.float32, 0.0),
'image/object/bbox/ymin': tfrec.VarLenFeature(tfrec.float32, 0.0),
'image/object/bbox/xmax': tfrec.VarLenFeature(tfrec.float32, 0.0),
'image/object/bbox/ymax': tfrec.VarLenFeature(tfrec.float32, 0.0)})
self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
self.resize = ops.Resize(device = "gpu", resize_shorter = 256.)
self.cmnp = ops.CropMirrorNormalize(device = "gpu",
output_dtype = types.FLOAT,
crop = (224, 224),
image_type = types.RGB,
mean = [0., 0., 0.],
std = [1., 1., 1.])
self.uniform = ops.Uniform(range = (0.0, 1.0))
self.iter = 0

def define_graph(self):
    inputs = self.input()
    images = self.decode(inputs["image/encoded"])
    resized_images = self.resize(images)
    output = self.cmnp(resized_images, crop_pos_x = self.uniform(),
                       crop_pos_y = self.uniform())
    return (output, inputs["image/class/text"])

def iter_setup(self):
    pass

batch_size = 16

pipe = TFRecordPipeline(batch_size=batch_size, num_threads=4, device_id = 0)
pipe.build()
`


RuntimeError Traceback (most recent call last)
in ()
2
3 pipe = TFRecordPipeline(batch_size=batch_size, num_threads=4, device_id = 0)
----> 4 pipe.build()

/usr/local/lib/python3.5/dist-packages/nvidia/dali/pipeline.py in build(self)
122
123 if not self._prepared:
--> 124 self._prepare_graph()
125
126 self._pipe.Build(self._names_and_devices)

/usr/local/lib/python3.5/dist-packages/nvidia/dali/pipeline.py in _prepare_graph(self)
94 if source_op.id not in op_ids:
95 op_ids.add(source_op.id)
---> 96 source_op.check_args()
97 ops.append(source_op)
98 else:

/usr/local/lib/python3.5/dist-packages/nvidia/dali/ops.py in check_args(self)
96
97 def check_args(self):
---> 98 self._op.schema.CheckArgs(self._spec)
99
100 def generate_outputs(self):

RuntimeError: [/opt/dali/dali/pipeline/operators/op_schema.cc:51] Assert on "required_arguments.find(s) != required_arguments.end() || OptionalArgumentExists(s) || internal_arguments_.find(s) != internal_arguments_.end()" failed: Got an unexpected argument "resize_shorter"

cmake/modules/FindJpegTurbo.cmake:22 (pkg_check_modules)

1、I have compiled libjpeg-turbo-1.5.9 and installed it at /home/guangyuan/install/libjpeg_turbo_1.5.9, but the import was not successful.
2、If I debug libjpeg-turbo when compiling dali from source code, after compiling successfully, does it affect the running speed? Or can it run normally?

[dali build]$ cmake -DNVJPEG_ROOT_DIR=/home/guangyuan/install/cuda-linux64-nvjpeg -DJPEG_TURBO_ROOT_DIR=/home/guangyuan/install/libjpeg_turbo_1.5.9 ..
-- DALI version: 0.3.0
-- git Version: v0.2.0-24297324-dirty
-- Version: 0.2.0
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX
-- Performing Test HAVE_STD_REGEX -- compiled but failed to run
-- Performing Test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX
-- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
-- Performing Test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX
-- Performing Test HAVE_POSIX_REGEX -- success
-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK -- success
-- jpeg_root_dir:/home/guangyuan/install/libjpeg_turbo_1.5.9
-- pkgcong:
-- JpefTurbo:
CMake Error at /usr/local/cmake-3.8.2/share/cmake-3.8/Modules/FindPkgConfig.cmake:416 (message):
A required package was not found
Call Stack (most recent call first):
/usr/local/cmake-3.8.2/share/cmake-3.8/Modules/FindPkgConfig.cmake:589 (_pkg_check_modules_internal)
cmake/modules/FindJpegTurbo.cmake:22 (pkg_check_modules)
cmake/Dependencies.cmake:86 (find_package)
CMakeLists.txt:43 (include)

-- Configuring incomplete, errors occurred!
See also "/home/guangyuan/dali/build/CMakeFiles/CMakeOutput.log".
See also "/home/guangyuan/dali/build/CMakeFiles/CMakeError.log".

Can't process big size image

hi,
I want to use DALI to process image and run a tensorflow model . I used a height quality image data set (about 4MB / picture), then error occured:

2018-07-04 19:11:03.454375: E tensorflow/stream_executor/cuda/cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-07-04 19:11:03.454419: E tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-07-04 19:11:03.454433: F tensorflow/core/kernels/conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)

I found that was likely an "out of memory error" (Notice that I use the same code ,but a smaller data set, the smaller one could run).
So I convert the data set to lower quality(about 400k) ,but still big image size (width x height are mostly 4000+pixels)
it returns new error ,

InternalError (see above for traceback): DALI Output(&pipe_handle_) failed: Critical error in pipeline: Error in thread 1: [/opt/dali/dali/pipeline/operators/decoder/nvjpeg_decoder.h:324] NVJPEG error "8"

It's an nvJpeg error, and the error code is "8",which means(I found it in PDF*" nvJPEG Library Documentation"* of nvJPEG)

NVJPEG_STATUS_INTERNAL_ERROR (8) Error during the execution of the device tasks.

So ,does DALI/nvJPEG has a limitation of image size or image quality? If yes, what's the limitation?

The correct way of using "ops.FileReader"

Python 2.7
Tensorflow 1.7
CentOS 7
CUDA 9.2
cuDNN 7.0.5
Installed DALI from pip

I'm trying the example named Getting Started, that I intended to traverse a directory of plenty of jpgs.
This is my script, and its output, seems like it can't find any jpg images.

for root, dir, files in os.walk("img"):
        depth = root.count('/')
        ret = ""
        if depth > 0:
            ret += "  " * (depth - 1) + "|-"
        print ret + root
        for items in fnmatch.filter(files, "*"):
                print (" " * len(ret)) + "|-" + items

image_dir = './img'
batch_size = 8
class SimplePipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(SimplePipeline, self).__init__(batch_size, num_threads, device_id, seed = 12)
        self.input = ops.FileReader(file_root = image_dir)
        self.decode = ops.HostDecoder(output_type = types.RGB)

    def define_graph(self):
        jpegs, labels = self.input()
        images = self.decode(jpegs)
        return (images, labels)
help(ops)

pipe = SimplePipeline(batch_size, 1, 0)
pipe.build()

pipe_out = pipe.run()
print(pipe_out)
images, labels = pipe_out
print("Images is_dense_tensor: " + str(images.is_dense_tensor()))
print("Labels is_dense_tensor: " + str(labels.is_dense_tensor()))

Output:

$ python trt_dali.py
/home/karafuto/lib/python2.7/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /home/karafuto/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
img
|-img.jpg

read 0 files from 1 directories
Segmentation fault (core dumped)

The first for loop has printed out that an image that named img.jpg is located under directory img/, however, ops.FileReader found 0 files.

Any idea will be welcome.

Compatibility with Tensorflow slim

There seems to be a problem with compatibility with functions used in Tensorflow slim.

Ex code)
[images, labels] = get_batch_test_dali(FLAGS.batch_size)
batch_queue = slim.prefetch_queue.prefetch_queue(
[images, labels], capacity=2 * deploy_config.num_clones)

(Output)
...
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/data/prefetch_queue.py", line 78, in prefetch_queue
dtypes = [t.dtype for t in tensor_list]
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/data/prefetch_queue.py", line 78, in
dtypes = [t.dtype for t in tensor_list]
AttributeError: 'list' object has no attribute 'dtype'

Did you go over this part?

Apply DALI on TensorRT

Hi,
I am curious whether DALI can run on TensortRT library or not. Try is not made yet, I wonder it would be possible.

Use dali.plugin.tf error

  • CentOS Linux release 7.5.1804
  • Python 2.7.5
  • tensorflow-gpu 1.9
  • GeForce GTX 1080 * 2

I used the floowing command to install nvidia.dali:

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali

But when use dali.plugin.tf ,I got the error:

>>> import nvidia.dali.plugin.tf as dali_tf
>>> daliop=dali_tf.DALIIterator()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/nvidia/dali/plugin/tf.py", line 20, in DALIIterator
    dali_tf_module = tf.load_op_library(libdali_tf)
  File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/lib64/python2.7/site-packages/nvidia/dali/plugin/libdali_tf.so: undefined symbol: _ZN9perftools8gputools6Stream18BlockHostUntilDoneEv

Error at compile with CUDA 9.2 Ubuntu 18.04

(pyenv) usuario@pc:~/dali/build$ make -j"$(nproc)" VERBOSE=1 install
/usr/bin/cmake -H/home/usuario/dali -B/home/usuario/dali/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/usuario/dali/build/CMakeFiles /home/usuario/dali/build/CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/home/usuario/dali/build'
make -f dali/pipeline/CMakeFiles/DALI_PROTO.dir/build.make dali/pipeline/CMakeFiles/DALI_PROTO.dir/depend
make -f dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE2_PROTO.dir/build.make dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE2_PROTO.dir/depend
make -f dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE_PROTO.dir/build.make dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE_PROTO.dir/depend
make[2]: Entering directory '/home/usuario/dali/build'
dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE2_PROTO.dir/build.make:61: *** target pattern contains no '%'. Pare.
make[2]: Leaving directory '/home/usuario/dali/build'
make[2]: Entering directory '/home/usuario/dali/build'
dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE_PROTO.dir/build.make:61: *** target pattern contains no '%'. Pare.
make[2]: Leaving directory '/home/usuario/dali/build'
CMakeFiles/Makefile2:506: recipe for target 'dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE2_PROTO.dir/all' failed
make[1]: *** [dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE2_PROTO.dir/all] Error 2
make[1]: ** Esperando que outros processos terminem.
make[2]: Entering directory '/home/usuario/dali/build'
make[2]: *** No rule to make target '../dali/pipeline/protobuf', needed by 'dali/pipeline/dali.pb.cc'. Pare.
make[2]: Leaving directory '/home/usuario/dali/build'
CMakeFiles/Makefile2:543: recipe for target 'dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE_PROTO.dir/all' failed
make[1]: *** [dali/pipeline/operators/reader/parser/CMakeFiles/CAFFE_PROTO.dir/all] Error 2
CMakeFiles/Makefile2:281: recipe for target 'dali/pipeline/CMakeFiles/DALI_PROTO.dir/all' failed
make[1]: *** [dali/pipeline/CMakeFiles/DALI_PROTO.dir/all] Error 2
make[1]: Leaving directory '/home/usuario/dali/build'
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

Compile error

cmake ..
make -j"$(nproc)" install

when I compile the DALI,

/home/youshihao/py-R-FCN-master-preprocess/DALI/dali/image/jpeg.cc: In function ‘dali::DALIError_t dali::DecodeJPEGHost(const uint8*, int, dali::DALIImageType, dali::Tensor<dali::CPUBackend>*)’:
/home/youshihao/py-R-FCN-master-preprocess/DALI/dali/image/jpeg.cc:147:40: error: invalid conversion from ‘const uint8* {aka const unsigned char*}’ to ‘unsigned char*’ [-fpermissive]
                w, 0, h, pixel_format, 0);
                                        ^
In file included from /home/youshihao/py-R-FCN-master-preprocess/DALI/dali/image/jpeg.cc:17:0:
/usr/include/turbojpeg.h:1134:23: note:   initializing argument 2 of ‘int tjDecompress2(tjhandle, unsigned char*, long unsigned int, unsigned char*, int, int, int, int, int)’
 DLLEXPORT int DLLCALL tjDecompress2(tjhandle handle,
                       ^
[ 42%] Building CXX object dali/CMakeFiles/dali.dir/pipeline/operators/displacement/displacement_filter.cc.o
[ 43%] Building CXX object dali/CMakeFiles/dali.dir/pipeline/operators/displacement/jitter.cc.o
dali/CMakeFiles/dali.dir/build.make:862: recipe for target 'dali/CMakeFiles/dali.dir/image/jpeg.cc.o' failed
make[2]: *** [dali/CMakeFiles/dali.dir/image/jpeg.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:335: recipe for target 'dali/CMakeFiles/dali.dir/all' failed
make[1]: *** [dali/CMakeFiles/dali.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

How should I solve it?
Thanks!

Question: How to Maximize throughput and fps

Hi,
I used DALI in the purpose of improving the performance of my deep learning application .But it seems that I can't take the fully advantage of CPU core when I use DALI , which is feasible while I use tensorflow slice_input_producer (multi-thread) to load file .

So my question :

  1. Is it possible to maximize throughput and fps by making the best of both CPU and GPU(eg. 12 core and 2 GTX 1080)? If yes, how to do that?
  2. what's the meaning of num_threads when defining own pipeline? Is there some relevance with GPU or CPU threads?
  3. when I use DALI , I have to set per_process_gpu_memory_fractionthe to limit TF memory(see #21 ), and batch size can't be set as a big size (I've tried 32, but 64 does not work). It seems that DALI needs much of GPU memory . Will the memory issue affect the performance of deep learning applications?
  4. Could you please providing more general performance report based on more general develop env(such as GTX 1080P other than DGX-2)?
  5. One about nvJPEG: nvJPEG can only do jpeg open operation, but DALI can do a lots of image augmentation(such as resize) .Why don't you add these features to nvJPEG for more general use rather than Data Loading of Deep Learning application?

Thanks in advance!

How to implement "array label" input via ops.ExternalSource()

https://github.com/NVIDIA/DALI/blob/master/dali/benchmark/resnet50_bench.py
The example given is to read all the pictures into the memory. When our data is very large, it is impossible to read all the data into the memory. We need to read only the batch_size image at a time. At the same time, the example passes the image through ops.ExternalSource(), but does not pass the label corresponding to the image, which causes me to match the image and the label when I can't retrain. Is there any way to pass the image and label together via ops.ExternalSource()?
At the same time, I am puzzled that we can achieve the multi-threaded processing by rewriting iter_setup(self) to send external data?

Is it possible to cross between dataset formats and deep learning frameworks?

(1) Is it possible to cross between dataset formats and deep learning frameworks?
For example,
() TFRecords with CAFFE2
(
) LMDB with Tensorflow

(2) If answer to Q1 is yes, could you provide an example of loading a dataset with the same format to Tensorflow and CAFFE2 or pytorch? Currently, the Tensorflow example and pytorch example are very different, and it is hard to tell what parts of the code are general, and what are customized for the deep-learning framework.

No matching distribution found for nvidia-dalipip

By following the instruction to install from pip, I met the following error

$ pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dalipip install
Looking in indexes: https://pypi.org/simple, https://developer.download.nvidia.com/compute/redist
Collecting nvidia-dalipip
  Could not find a version that satisfies the requirement nvidia-dalipip (from versions: )
No matching distribution found for nvidia-dalipip

Also, visiting https://developer.download.nvidia.com/ gives me a 404 - Not Found error.

fatal error: tensorflow/core/framework/op_kernel.h: No such file or directory

Set BUILD_TENSORFLOW to ON error.

[ 83%] Building CXX object dali/CMakeFiles/dali.dir/util/local_file.cc.o
[ 86%] Building CXX object dali/CMakeFiles/dali.dir/util/npp.cc.o
[ 86%] Building CXX object dali/CMakeFiles/dali.dir/util/nvml_wrap.cc.o
[ 89%] Building CXX object dali/CMakeFiles/dali.dir/util/ocv.cc.o
make[2]: Warning: File dali/CMakeFiles/dali.dir/pipeline/operators/color/dali_generated_color_twist.cu.o' has modification time 20 s in the future [ 89%] Building CXX object dali/CMakeFiles/dali.dir/error_handling.cc.o [ 89%] Building CXX object dali/CMakeFiles/dali.dir/util/user_stream.cc.o [ 89%] Building CXX object dali/CMakeFiles/dali.dir/common.cc.o [ 89%] Building CXX object dali/CMakeFiles/dali.dir/c_api/c_api.cc.o [ 90%] Linking CXX shared library libdali.so make[2]: warning: Clock skew detected. Your build may be incomplete. [ 90%] Built target dali Scanning dependencies of target dali_tf Scanning dependencies of target dali_benchmark.bin make[2]: Warning: File dali/libdali.so' has modification time 32 s in the future
[ 91%] Building CXX object dali/CMakeFiles/dali_tf.dir/tensorflow/daliop.cc.o
/home/guangyuan/dali/dali/tensorflow/daliop.cc:17:49: fatal error: tensorflow/core/framework/op_kernel.h: No such file or directory
#include "tensorflow/core/framework/op_kernel.h"
^
compilation terminated.
make[2]: Warning: File `dali/libdali.so' has modification time 32 s in the future
[ 96%] Building CXX object dali/CMakeFiles/dali_benchmark.bin.dir/benchmark/dali_bench.cc.o
[ 96%] Building CXX object dali/CMakeFiles/dali_benchmark.bin.dir/benchmark/resnet50_nvjpeg_bench.cc.o
[ 96%] Building CXX object dali/CMakeFiles/dali_benchmark.bin.dir/benchmark/resnet50_bench.cc.o
make[2]: *** [dali/CMakeFiles/dali_tf.dir/tensorflow/daliop.cc.o] Error 1
make[1]: *** [dali/CMakeFiles/dali_tf.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 96%] Building CXX object dali/CMakeFiles/dali_benchmark.bin.dir/benchmark/decoder_bench.cc.o
[ 96%] Building CXX object dali/CMakeFiles/dali_benchmark.bin.dir/benchmark/file_reader_alexnet_bench.cc.o
[ 97%] Linking CXX executable python/nvidia/dali/test/dali_benchmark.bin
make[2]: warning: Clock skew detected. Your build may be incomplete.
[ 97%] Built target dali_benchmark.bin
make: *** [all] Error 2

Pre-built DALI 0.1.0 PyTorch plugin segfault

This is due to a C++11 ABI incompatibility manifesting in pybind11's interaction with PyTorch (which also uses pybind11) and arising from the use of the manylinux1 base image for building the default wheels.

With ops.Resize, a rec file with original resolution still cannot be augmented correctly for validation

I am trying to use DALI with MXNet. To get the best quality, my rec files are generated with --pass-through to keep images' original quality and sizes, that means the images are not guaranteed to have its shorter edge at 256px or 480px.

I slightly modifed the code from mxnet-resnet50.ipynb and got the following error message:

$ python resnet50.py
Training pipeline epoch size: 1281167
Validation pipeline epoch size: 50000
Traceback (most recent call last):
  File "resnet50.py", line 66, in <module>
    dali_val_iter = DALIClassificationIterator(valpipes, valpipes[0].epoch_size("Reader"))
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/plugin/mxnet.py", line 226, in __init__
    data_layout)
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/plugin/mxnet.py", line 102, in __init__
    self._first_batch = self.next()
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/plugin/mxnet.py", line 169, in next
    return self.__next__();
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/plugin/mxnet.py", line 126, in __next__
    outputs.append(p.outputs())
  File "/usr/local/lib/python2.7/dist-packages/nvidia/dali/pipeline.py", line 239, in outputs
    return self._pipe.Outputs()
RuntimeError: Critical error in pipeline: [/opt/dali/dali/pipeline/operators/fused/crop_mirror_normalize.cu:349] Assert on "H >= crop_h_" failed
Current pipeline object is no longer valid.

The script to reproduce is following, the only modification I added is the ops.Resize for validation pipeline.

from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types

N = 8  # number of GPUs
batch_size = 128  # batch size per GPU

db_folder = "/media/ramdisk/rec/"

class HybridTrainPipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, num_gpus):
        super(HybridTrainPipe, self).__init__(batch_size, num_threads, device_id, seed = 12 + device_id)
        self.input = ops.MXNetReader(path = [db_folder+"train.rec"], index_path=[db_folder+"train.idx"],
                                     random_shuffle = True, shard_id = device_id, num_shards = num_gpus)
        self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
        self.rrc = ops.RandomResizedCrop(device = "gpu", size = (224, 224))
        self.cmnp = ops.CropMirrorNormalize(device = "gpu",
                                            output_dtype = types.FLOAT,
                                            output_layout = types.NCHW,
                                            crop = (224, 224),
                                            image_type = types.RGB,
                                            mean = [0.485 * 255,0.456 * 255,0.406 * 255],
                                            std = [0.229 * 255,0.224 * 255,0.225 * 255])
        self.coin = ops.CoinFlip(probability = 0.5)

    def define_graph(self):
        rng = self.coin()
        self.jpegs, self.labels = self.input(name = "Reader")
        images = self.decode(self.jpegs)
        images = self.rrc(images)
        output = self.cmnp(images, mirror = rng)
        return [output, self.labels]

class HybridValPipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, num_gpus):
        super(HybridValPipe, self).__init__(batch_size, num_threads, device_id, seed = 12 + device_id)
        self.input = ops.MXNetReader(path = [db_folder+"val.rec"], index_path=[db_folder+"val.idx"],
                                     random_shuffle = False, shard_id = device_id, num_shards = num_gpus)
        self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
        self.rs = ops.Resize(device = "gpu", resize_shorter = 256)
        self.cmnp = ops.CropMirrorNormalize(device = "gpu",
                                            output_dtype = types.FLOAT,
                                            output_layout = types.NCHW,
                                            crop = (224, 224),
                                            image_type = types.RGB,
                                            mean = [0.485 * 255,0.456 * 255,0.406 * 255],
                                            std = [0.229 * 255,0.224 * 255,0.225 * 255])

    def define_graph(self):
        self.jpegs, self.labels = self.input(name = "Reader")
        images = self.decode(self.jpegs)
        output = self.cmnp(images)
        return [output, self.labels]

trainpipes = [HybridTrainPipe(batch_size=batch_size, num_threads=2, device_id = i, num_gpus = N) for i in range(N)]
valpipes = [HybridValPipe(batch_size=batch_size, num_threads=2, device_id = i, num_gpus = N) for i in range(N)]

trainpipes[0].build()
valpipes[0].build()

print("Training pipeline epoch size: {}".format(trainpipes[0].epoch_size("Reader")))
print("Validation pipeline epoch size: {}".format(valpipes[0].epoch_size("Reader")))

from nvidia.dali.plugin.mxnet import DALIClassificationIterator
dali_train_iter = DALIClassificationIterator(trainpipes, trainpipes[0].epoch_size("Reader"))
dali_val_iter = DALIClassificationIterator(valpipes, valpipes[0].epoch_size("Reader"))

import os
import argparse
import logging
logging.basicConfig(level=logging.DEBUG)
from demo.common import find_mxnet, data, fit
import mxnet as mx

gpus_string = "".join(str(list(range(N)))).replace('[','').replace(']','')

s = ['--gpu', gpus_string,
     '--batch-size', str(batch_size * N),
     '--num-epochs', '1',
     '--data-train', db_folder + 'train.rec',
     '--data-val', db_folder + 'val.rec',
     '--disp-batches', '100',
     '--network', 'resnet-v1',
     '--num-layers', '50',
     '--data-nthreads', '40',
     '--min-random-scale', '0.533',
     '--max-random-shear-ratio', '0',
     '--max-random-rotate-angle', '0',
     '--max-random-h', '0',
     '--max-random-l', '0',
     '--max-random-s', '0',
     '--dtype', 'float16']

# parse args
parser = argparse.ArgumentParser(description="train imagenet-1k",
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
fit.add_fit_args(parser)
data.add_data_args(parser)
data.add_data_aug_args(parser)
# use a large aug level
data.set_data_aug_level(parser, 3)
parser.set_defaults(
        # network
        network          = 'resnet',
        num_layers       = 50,
        # data
        num_classes      = 1000,
        num_examples     = 1281167,
        image_shape      = '3,224,224',
        min_random_scale = 1, # if input image has min size k, suggest to use
                              # 256.0/x, e.g. 0.533 for 480
        # train
        num_epochs       = 80,
        lr_step_epochs   = '30,60',
        dtype            = 'float32'
    )
args = parser.parse_args(s)


# load network
from importlib import import_module
net = import_module('demo.symbols.'+args.network)
sym = net.get_symbol(1000, 50, "3,224,224", dtype='float16')

def get_dali_iter(args, kv=None):
    return (dali_train_iter, dali_val_iter)

# train
#fit.fit(args, sym, data.get_rec_iter)
fit.fit(args, sym, get_dali_iter)

Online cropping of images?

Is it possible to use DALI to return something like a lambda, eg:

# for each image in a batch return something like this
img = ... 
lbda = lambda x, y, width, height: img.crop(x, y, x+width, y+height)

I have a use case where my model (pytorch) spits out x, y, width and height and would like to use DALI to produce an image crop. In my current setup with PIL-SIMD this is causing a pretty large bottleneck as PIL-SIMD takes a while to crop the image, transfer the data to GPU memory and then continue model execution.

Workflow:

img0 --> [x, y, w, h] --> crop in host memory --> transfer to GPU memory --> img1 --> model execution --> model output

Would DALI be able to provide some speedups for me here? Or is this a hopeless endeavor?

What is the input data type of DALI, if it have any difference with original data load ?

note: /workspace/dataset/pytorch/ ,uder which is train/val images

python main.py -a resnet50 -j 16 --fp16 /workspace/dataset/pytorch/ --batch-size 2048
=> creating model 'resnet50'
Traceback (most recent call last):
File "main.py", line 423, in
main()
File "main.py", line 196, in main
pipe.build()
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/nvidia/dali/pipeline.py", line 124, in build
self._prepare_graph()
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/nvidia/dali/pipeline.py", line 115, in prepare_graph
self.pipe.AddOperator(op.spec, op.name)
RuntimeError: [/opt/dali/dali/pipeline/operators/reader/loader/lmdb.h:73] Assert on "mdb_env_open(mdb_env
, db_path
.c_str(), mdb_flags, 0664) == 0" failed: LMDB Error: No such file or directory

Tensor shape is incorrect when using NVIDIA DALI with TensorFlow

When i use nvidia-dali with TensorFlow and output_layout is NCHW, Shape of tensor before sess.run() become NHWC. This looks incorrect.

But after sess.run(), shape of tensor become correct.

class TFRecordPipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, tfrecord, tfrecord_idx):
        super(TFRecordPipeline, self).__init__(batch_size,
                                         num_threads,
                                         device_id)
        self.input = ops.TFRecordReader(
            path=tfrecord,
            index_path=tfrecord_idx,
            features={"image/encoded" : tfrec.FixedLenFeature((), tfrec.string, ""),
                      'image/class/label':       tfrec.FixedLenFeature([1], tfrec.int64,  -1),
                      'image/class/text':        tfrec.FixedLenFeature([ ], tfrec.string, ''),
                      })
        self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
        self.resize = ops.Resize(device = "gpu", resize_a = 256, resize_b = 256)
        self.cmnp = ops.CropMirrorNormalize(device = "gpu",
                                            output_dtype = types.FLOAT,
                                            crop = (224, 224),
                                            image_type = types.RGB,
                                            mean = [0., 0., 0.],
                                            std = [1., 1., 1.],
                                            output_layout=types.NCHW)
        self.uniform = ops.Uniform(range = (0.0, 1.0))
        self.iter = 0

    def define_graph(self):
        inputs = self.input()
        images = self.decode(inputs["image/encoded"])
        resized_images = self.resize(images)
        output = self.cmnp(resized_images, crop_pos_x = self.uniform(),
                           crop_pos_y = self.uniform())
        return (output, inputs["image/class/label"].gpu())

    def iter_setup(self):
        pass


def inputs_dali(batch_size, devices, tfrecord):
    tfrecord_idx = os.path.splitext(tfrecord)[0] + '.idx'
    tfrecord2idx_script = "tfrecord2idx"

    if not os.path.isfile(tfrecord_idx):
        call([tfrecord2idx_script, tfrecord, tfrecord_idx])

    pipes = [
        TFRecordPipeline(
            batch_size=batch_size, num_threads=2, device_id=device_id,
            tfrecord=FLAGS.tfrecord, tfrecord_idx=tfrecord_idx) for device_id
        in range(devices)]

    serialized_pipes = [pipe.serialize() for pipe in pipes]
    del pipes

    daliop = dali_tf.DALIIterator()

    images = []
    labels = []
    for d in range(devices):
        with tf.device('/gpu:%i' % d):
            image, label = daliop(serialized_pipeline=serialized_pipes[d],
                                  batch_size=batch_size,
                                  height=224,
                                  width=224,
                                  device_id=d)
            images.append(image)
            labels.append(label)

    return images, labels

images.shape is (16, 224, 224, 3). I expect this shape should be (16, 3, 224, 224).

Sample code is also here: https://gist.github.com/chmod644/c0e847f8760181acc12b80970242e5da

Any end-end CNN benchmark example available?

Looks GTC18's talk (http://on-demand.gputechconf.com/gtc/2018/presentation/s8906-fast-data-pipelines-for-deep-learning-training.pdf) expose some end-end CNN training benchmark using DALI e.g., MXNet, ResNet50, multiple GPU, and numbers look exciting!

However in current repo, though some test/benchmark named RN50, it is pre-processing only, not end-end training benchmark. So wondering any end-end CNN training examples available or in-plan to showcase apple-apple comparison? This would help users to better evaluate DALI and build their own end-end pipeline.

Take TensorFlow as example, end-end benchmark may based on TF official repo : https://github.com/tensorflow/benchmarks, completed support is ideal, but I think at least it shall support multi-GPU in single node case (PS in CPU, or PS in GPU), then AllReduce.

Thanks a lot.

how to do random Jitter with CoinFlip

I find that the CropMirrorNormalize op has a 'Mirror' parameter for randomly applying mirror operation, when it is used with the int tensor generated by the CoinFlip op.
But when I do the samething with the Jitter op it fails(use 'mask' parameter ):
RuntimeError: [/opt/dali/dali/pipeline/operators/op_spec.cc:56] Assert on "schema.HasArgument(arg_name)" failed: Argument mask is not part of the op schema

So how to use the 'mask' parameter in ops like Jitter, Rotate?Or how to apply an augmentation randomly?

cmake failed - nvJPEG

Here is the thing. After I do the cmake step:

$cmake ..

It gives me the error:

-- The CXX compiler identification is GNU 5.4.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda-8.0 (found version "8.0")
-- root:
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find NVJPEG (missing: NVJPEG_INCLUDE_DIR NVJPEG_LIBRARY)
Call Stack (most recent call first):
  /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:388 (_FPHSA_FAILURE_MESSAGE)
  cmake/modules/FindNVJPEG.cmake:32 (find_package_handle_standard_args)
  cmake/Dependencies.cmake:13 (find_package)
  CMakeLists.txt:47 (include)


-- Configuring incomplete, errors occurred!
See also "/home/smluo/GithubProject/DALI-master/build/CMakeFiles/CMakeOutput.log".
See also "/home/smluo/GithubProject/DALI-master/build/CMakeFiles/CMakeError.log".

I guess the error is about nvJPEG

I downloaded the pre-release version v0.1 and decompressed it. There is no more other instructions on the official website of NVIDIA (https://developer.nvidia.com/nvjpeg)

THX in advance.

ImportError: No module named 'nvidia.dali.backend_impl'

I installed DALI from Github source code with -DBUILD_BENCHMARK=OFF -DBUILD_TEST=OFF.

$pip install dali/python
Processing ./dali/python
Requirement already satisfied: future in /home/smluo/anaconda3/envs/python35/lib/python3.5/site-packages (from nvidia-dali==0.1.2) (0.16.0)
Building wheels for collected packages: nvidia-dali
  Running setup.py bdist_wheel for nvidia-dali ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-9rajff1s/wheels/b0/e3/0c/a10e99a91add74c139be5536d877b989f41a54107cb0921e40
Successfully built nvidia-dali
Installing collected packages: nvidia-dali
  Found existing installation: nvidia-dali 0.1.2
    Uninstalling nvidia-dali-0.1.2:
      Successfully uninstalled nvidia-dali-0.1.2
Successfully installed nvidia-dali-0.1.2

I can import nvidia but it seems nvidia.dali.backend_impl is missing.

In [1]: import nvidia

In [2]: from nvidia.dali.pipeline import Pipeline
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-ba5fedadd8b7> in <module>()
----> 1 from nvidia.dali.pipeline import Pipeline

~/anaconda3/envs/python35/lib/python3.5/site-packages/nvidia/dali/__init__.py in <module>()
     15 from __future__ import absolute_import
     16
---> 17 from . import ops
     18 from . import pipeline
     19 from . import tensor

~/anaconda3/envs/python35/lib/python3.5/site-packages/nvidia/dali/ops.py in <module>()
     17 import copy
     18 from itertools import count
---> 19 from nvidia.dali import backend as b
     20 from nvidia.dali.tensor import TensorReference
     21 from future.utils import with_metaclass

~/anaconda3/envs/python35/lib/python3.5/site-packages/nvidia/dali/backend.py in <module>()
     13 # limitations under the License.
     14
---> 15 from nvidia.dali.backend_impl import *
     16
     17 # Note: If we every need to add more complex functionality

ImportError: No module named 'nvidia.dali.backend_impl'

Tensorflow plugin example error

I followed the Tensorflow example as it is, and an error like this occurred. I don't know why.

RUN: CaffeReadPipeline
Traceback (most recent call last):
File "train_image_classifier2.py", line 107, in
test_batch = get_batch_test_dali(BATCH_SIZE, pipe_name)
File "train_image_classifier2.py", line 86, in get_batch_test_dali
serialized_pipes = [pipe.serialize() for pipe in pipes]
File "train_image_classifier2.py", line 86, in
serialized_pipes = [pipe.serialize() for pipe in pipes]
File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/pipeline.py", line 182, in serialize
self.build()
File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/pipeline.py", line 124, in build
self._prepare_graph()
File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/pipeline.py", line 63, in _prepare_graph
outputs = self.define_graph()
File "train_image_classifier2.py", line 65, in define_graph
return self.base_define_graph(images, labels)
File "train_image_classifier2.py", line 52, in base_define_graph
images = self.resize(images, resize_shorter = self.resize_rng())
File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/ops.py", line 192, in call
op_instance = _OperatorInstance(inputs, self, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/nvidia/dali/ops.py", line 94, in init
self._spec.AddArgumentInput(k, kwargs[k].name)
RuntimeError: [/opt/dali/dali/pipeline/operators/op_spec.cc:56] Assert on "schema.HasArgument(arg_name)" failed: Argument resize_shorter is not part of the op schema

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.