As currently defined, WebCodecs supports packetized codecs, where we expect one decode

Discussed with <a class="user-mention notranslate" data-hovercard-type="user" data-hov

A different way to say what <a class="user-mention notranslate" data-hovercard-type="u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Also worth noting that a decision here could affect <a class="issue-link js-issue-link

Byte stream formats about webcodecs HOT 14 CLOSED

w3c commented on July 16, 2024

Byte stream formats

from webcodecs.

Comments (14)

chcunningham commented on July 16, 2024

Discussed with @sandersdan, we lean toward a won't fix here. Bytestreams still have discrete frame boundaries such that "chunks" can be identified and provided the codec. Alternatives seem to make the API much more complicated for little benefit.

from webcodecs.

aboba commented on July 16, 2024

AV1 also uses a byte stream format. Are you saying that WebCodecs won't be able to support AV1??

from webcodecs.

sandersdan commented on July 16, 2024

A different way to say what @chcunningham is saying: if it can be packaged in an MP4, then we do not need a ReadableByteStreamController.

The terms are different between streams and media codecs:

Codec bitstream: a serialization of encoded data as a sequence of bits.
Codec bytestream: a byte-aligned bitstream.
Streams bytestream: An undivided flow of bytes. (like a Unix pipe, TCP)
Packetized: Bytes arrive in discrete, self-contained chunks. (like UDP)

There are unpacketized bitstreams (eg. H.264 Annex B), but I am unaware of any that don't also have standard packetizations.

There is a spectrum of possible implementations in WebCodecs:

Support bytestreams directly.
Require bytestreams to be broken into arbitrary chunks.
Require bytestreams to be broken into meaningful chunks.
Require bytestreams to be broken into chunks that are exactly one sample.

I prefer the last one because it allows us to accept frame metadata (such as timestamp) alongside the bytestream chunks, but it's conceivable that there exists (or will exist) a format for which this doesn't make sense.

from webcodecs.

aboba commented on July 16, 2024

@sandersdan @DanilChapovalov

The AV1 bitstream specification is packetized as specified in the AV1 RTP payload specification. The AV1 bitstream format uses OBUs (similar to H.264 NAL units), including Time Delimiter (TD), Sequence Header (SH), MetaData (MD), Tile Group (TG) and Frame Header (FH) OBUs. As an example, the following bitstream:

TD  SH MD MD(0,0) FH(0,0) TG0(0,0) MD(0,1) FH(0,1) TG(0,1)

would typically be packetized as follows:

[ SH MD MD(0,0) FH(0,0) TG(0,0) ] [ MD(0,1) FH(0,1) TG(0,1) ]

This seems like it might qualify as "arbitrary chunks" or "meaningful chunks", but probably not "chunks that are exactly one sample".

from webcodecs.

sandersdan commented on July 16, 2024

AV1 is also packetized in chunks that are exactly one sample in the ISO BMFF binding.

from webcodecs.

sandersdan commented on July 16, 2024

Also worth noting that a decision here could affect #13, and theoretical future video formats that support progressive decoding.

My gut instinct is that progressive decoding is for still images and should be a separate API, but I'd like to understand that design space better.

from webcodecs.

sandersdan commented on July 16, 2024

And one more note: for low-latency streams, it may be beneficial to submit slices/tiles individually as they arrive from the network, and the opposite for encoding. (So 'meaningful chunks'.)

If we support that, it's important to make sure we don't also make muxing harder for less latency-sensitive cases. A 'partial' flag for input and output chunks may be enough (and could be added in a v2).

from webcodecs.

murillo128 commented on July 16, 2024

I would be against byte streaming progressive decoding, that is, feeding the decoder with a byte stream without explicit boundaries (it may be inline as in h264 with the nal start header sequence "001") and let the decoder decide where are the relevant start/end bytes for each decodable chunk.

I think that the question is really if we serialize the encoding units that the encoder produces(would be the group of nals in h264 or obus in av1, or particions in vp8) into a byte array (i.e. the byte stream format) or if we just output an array of chunks so the app packetizes it at will.

Note that typically encoders provide is the later, for example in vp8 you encode the frame and then return each partition:
https://github.com/webmproject/libvpx/blob/master/examples/simple_encoder.c#L124

  const vpx_codec_err_t res =
      vpx_codec_encode(codec, img, frame_index, 1, flags, VPX_DL_GOOD_QUALITY);
  if (res != VPX_CODEC_OK) die_codec(codec, "Failed to encode frame");

  while ((pkt = vpx_codec_get_cx_data(codec, &iter)) != NULL) {
    got_pkts = 1;

    if (pkt->kind == VPX_CODEC_CX_FRAME_PKT) {
      const int keyframe = (pkt->data.frame.flags & VPX_FRAME_IS_KEY) != 0;
      if (!vpx_video_writer_write_frame(writer, pkt->data.frame.buf,
                                        pkt->data.frame.sz,
                                        pkt->data.frame.pts)) {
        die_codec(codec, "Failed to write compressed frame");
      }
      printf(keyframe ? "K" : ".");
      fflush(stdout);
    }
  }

x264 the same, providing the array of nals as ouput of x264_encoder_encode

There are pros and cons about doing it this way (which would also affect as what we accept as input in the decoder).

The good part is that providing the individual encoding units (nals/obus/partitions) it is easier to convert it to any frame-based stream format (for example to h264 annex b format) and it is easier to do an rtp packetization (if not you would have to typically parse the byte stream to find the nals/obus and apply packetization afterward).

The bad part is that this requires that the serialization is done on the app side before passing it to the appropriate transport (webrtc could be different as the packetization should be done inside of it).

from webcodecs.

murillo128 commented on July 16, 2024

Also, as a side note, SVC codecs (like vp9) produces several "frames" per input video frame, so it would not be easy to produce a single chunk from the encoder.

from webcodecs.

Tauka commented on July 16, 2024

Hello! For h264, does it mean we have to group NAL units by ourselves before creating EncodedVideoChunk? Currently I am receiving individual NAL units from RTSP stream, and trying to figure out correct way to decode them via Webcodecs API. I would appreciate any help

from webcodecs.

sandersdan commented on July 16, 2024

That's correct.

If your source is not framed then you will need to identify access unit boundaries. If your source includes AUD (Access Unit Delimiter) units then that's quite easy (break right before each AUD). It's also relatively easy if you know there is only once slice per frame and no redundant or auxiliary slices (break after each slice). Beyond that you'll probably want to read the H.264 spec.

from webcodecs.

sandersdan commented on July 16, 2024

Note for VP9 spatial SVC: My current understanding is that the several frames should in fact be separate frames, but they have the same timestamp. There is an asymmetry here; for encoding you should only be passing in the highest-resolution version of each frame.

I expect our encoders will output multiple chunks (one for each resolution) but they will have the same timestamp.

I still need to do some research to figure out if its technically valid to bundle them into a single chunk. (Presumably libvpx is/would already be bundling them like that if it's valid.)

from webcodecs.

chcunningham commented on July 16, 2024

To the core issue of slices/tiles vs 'meaningful' chunks, Chrome's longstanding behavior has been 'meangingful' chunks and this has been demonstrated to work great for a variety of use cases (RTC, Low latency streaming, Video editing, etc...). If slices/tiles is later desired, we should do this without breaking the API (e.g. specified as an option in VideoDecoderConfig, for which the default is 'meaningful' chunks). Hence I've marked the issue as 'extension'.

Having said that, we've had no real demand for this from users and I vote to just close the issue until demand arrives. @sandersdan WDYT?

If your source is not framed then you will need to identify access unit boundaries. If your source includes AUD (Access Unit Delimiter) units then that's quite easy (break right before each AUD). It's also relatively easy if you know there is only once slice per frame and no redundant or auxiliary slices (break after each slice). Beyond that you'll probably want to read the H.264 spec.

The codec registry should document this. Work tracked in #155.

I still need to do some research to figure out if its technically valid to bundle them into a single chunk. (Presumably libvpx is/would already be bundling them like that if it's valid.)

We discussed this more w/ SVC folks and learned separate chunks is how its done.

from webcodecs.

sandersdan commented on July 16, 2024

Closing is acceptable to me. Even if there is demand, breaking a stream into chunks may fit better in a containers API anyway.

from webcodecs.

Byte stream formats about webcodecs HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent