The dali from luqqiu

Dali CPU data loader still requires libcuda.so

Following the DALI tutorial i am able to launch the run the training in docker image with the original training script

$ nvidia-docker run --ipc=host -it -v /home/ec2-user/data:/data --network=host -v /home/ec2-user/DALI:/DALI nvcr.io/nvidia/pytorch:21.05-py3
$ cd /DALI/docs/examples/use_cases/pytorch/resnet50/
$ python -m torch.distributed.launch --nproc_per_node=1 \
  --nnodes=2 --node_rank=1 \
  --master_addr="ip-172-31-44-53.ec2.internal" --master_port=443 \
  main.py --dali_cpu --arch resnet50 --workers 1 --batch-size 16 --epochs 1 --lr 4.096 /data

I can remove the GPU training logics from the script and modify it to a version that merely reading imagenet data using dali cpu. The updated script locates here.
The updated script works with the previous nvidia-docker command.

However, when I try to run the dali data loader without GPU involved

$ docker run --ipc=host -it -v /home/ec2-user/data:/data -v /home/ec2-user/DALI:/DALI   nvcr.io/nvidia/pytorch:21.05-py3 bash
$ cd /DALI/docs/examples/use_cases/pytorch/resnet50/
$ python main.py --dali_cpu --arch resnet50 --workers 1 --batch-size 16 --epochs 1 --lr 4.096 /data

The script error out with

root@4fd5caa961fc:/DALI/docs/examples/use_cases/pytorch/resnet50# python main.py --dali_cpu --arch resnet50 --workers 1 --batch-size 16 --epochs 1 --lr 4.096 /data
dali device is cpu, decoder device is cpu
dlopen "libcuda.so" failed!
Traceback (most recent call last):
  File "main.py", line 291, in <module>
    main()
  File "main.py", line 186, in main
    pipe.build()
  File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 657, in build
    self._init_pipeline_backend()
  File "/opt/conda/lib/python3.8/site-packages/nvidia/dali/pipeline.py", line 562, in _init_pipeline_backend
    self._pipe = b.Pipeline(self._max_batch_size,
RuntimeError: [/opt/dali/dali/core/device_guard.cc:33] Assert on "cuInitChecked()" failed: Failed to load libcuda.so. Check your library paths and if the driver is installed correctly.

Basically my target is to run the DALI data loader in an instance without GPU and in a docker image without GPU.
My questions are:

is GPU instance needed to run the DALI data loader even using dali-cpu
Can the DALI data loader run in docker images without nvidia

luqqiu / dali Goto Github PK

dali's People

Contributors

Watchers

dali's Issues

Dali CPU data loader still requires libcuda.so

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent