Giter VIP home page Giter VIP logo

Comments (7)

JanuszL avatar JanuszL commented on May 30, 2024

Hi @rvandeghen,

Thank you for reaching out. As far as I understand NVDEC doesn't expose this info and it is impossible to do that in DALI. What you can do instead is use the 'optical flow' operator.

from dali.

rvandeghen avatar rvandeghen commented on May 30, 2024

@JanuszL thanks for the reply.

Do you know how to return both list of frames and list of OF ? I have an error which I guess comes from the fact that len(frames) = len(OF) + 1, thus the shapes mismatch.

The code I use is the following:

@pipeline_def
def create_video_reader_pipeline(sequence_length, files, crop_size, stride=1, shard_id=0, num_shards=1, seed=0):
    images = fn.readers.video(device="gpu",
                              filenames=files,
                              sequence_length=sequence_length,
                              normalized=False,
                              random_shuffle=False,
                              image_type=types.RGB,
                              dtype=types.UINT8,
                              initial_fill=16,
                              prefetch_queue_depth=2,
                              pad_last_batch=True,
                              name="Reader",
                              stride=stride,
                              enable_frame_num=False,
                              shard_id=shard_id,
                              num_shards=num_shards,
                              seed=seed,
                             )
    
    of = fn.optical_flow(images, output_grid=1)
    

    images = fn.crop_mirror_normalize(images,
                                      dtype=types.FLOAT,
                                      output_layout="FCHW",
                                      mean=[0.279*255, 0.452*255, 0.378*255],
                                      std=[0.188*255, 0.188*255, 0.171*255],
                                      mirror=False,#fn.random.coin_flip(),
                                      seed=seed
                                     )

    return images, of

class VideoDataset(pytorch.DALIGenericIterator):
    def __init__(self, *kargs, **kvargs):
        super().__init__(*kargs, **kvargs)

    def __next__(self):
        out, of = super().__next__()
        # DDP is used so only one pipeline per process
        # also we need to transform dict returned by DALIClassificationIterator to iterable
        # and squeeze the lables
        out = out[0]["data"]
        of = of[0]["data]

        B, F, C, H, W = out.size()
        out = out.view(B*F, C, H, W)
        return out, of

device_id = 0
shard_id = 0
num_shards = 1
batch_size = 1
sequence_length = 10


crop_size=(224, 224)
stride=5

pipeline = create_video_reader_pipeline(batch_size=batch_size,
                                        sequence_length=sequence_length,
                                        num_threads=10,
                                        device_id=device_id,
                                        shard_id=shard_id,
                                        num_shards=num_shards,
                                        files=container_files,
                                        crop_size=crop_size,
                                        stride=stride,
                                        )

train_loader = VideoDataset(pipeline,
                            ["data"],
                            reader_name="Reader",
                            auto_reset=True,
                            last_batch_policy=pytorch.LastBatchPolicy.FILL
                            )

Error:

IndexError                                Traceback (most recent call last)
Cell In[40], line 22
      9 stride=5
     11 pipeline = create_video_reader_pipeline(batch_size=batch_size,
     12                                         sequence_length=sequence_length,
     13                                         num_threads=10,
   (...)
     19                                         stride=stride,
     20                                         )
---> 22 train_loader = VideoDataset(pipeline,
     23                             ["data"],
     24                             reader_name="Reader",
     25                             auto_reset=True,
     26                             last_batch_policy=pytorch.LastBatchPolicy.FILL
     27                             )

Cell In[39], line 37, in VideoDataset.__init__(self, *kargs, **kvargs)
     36 def __init__(self, *kargs, **kvargs):
---> 37     super().__init__(*kargs, **kvargs)

File ~/micromamba/envs/sn_mae/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py:194, in DALIGenericIterator.__init__(self, pipelines, output_map, size, reader_name, auto_reset, fill_last_batch, dynamic_shape, last_batch_padded, last_batch_policy, prepare_first_batch)
    192 if self._prepare_first_batch:
    193     try:
--> 194         self._first_batch = DALIGenericIterator.__next__(self)
    195         # call to `next` sets _ever_consumed to True but if we are just calling it from
    196         # here we should set if to False again
    197         self._ever_consumed = False

File ~/micromamba/envs/sn_mae/lib/python3.10/site-packages/nvidia/dali/plugin/pytorch.py:220, in DALIGenericIterator.__next__(self)
    218 # segregate outputs into categories
    219 for j, out in enumerate(outputs[i]):
--> 220     category_outputs[self.output_map[j]] = out
    222 # Change DALI TensorLists into Tensors
    223 category_tensors = dict()

IndexError: list index out of range

from dali.

JanuszL avatar JanuszL commented on May 30, 2024

Hi @rvandeghen,

I think your pipeline returns more than the iterator consumes. Can you try:

train_loader = VideoDataset(pipeline,
                            ["images", "of"],
                            reader_name="Reader",
                            auto_reset=True,
                            last_batch_policy=pytorch.LastBatchPolicy.FILL
                            )

from dali.

rvandeghen avatar rvandeghen commented on May 30, 2024

Hi @JanuszL,

Indeed the optical flow gives good results at barely no extra cost. However, I found in the blogpost the following information and I would like to know if DALI exposes this buffer ?

The Optical Flow API returns a buffer consisting of confidence levels (called cost) for each of the flow vectors to deal with these situations. The application can use this cost buffer to selectively accept or discard regions of the flow vector map.

Renaud

from dali.

JanuszL avatar JanuszL commented on May 30, 2024

Hi @rvandeghen,

Currently, the operator doesn't ask the Optical Flow SDK to provide such values.
If you have some spare time:

from dali.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.