Comments (14)
Discussed with @sandersdan, we lean toward a won't fix here. Bytestreams still have discrete frame boundaries such that "chunks" can be identified and provided the codec. Alternatives seem to make the API much more complicated for little benefit.
from webcodecs.
AV1 also uses a byte stream format. Are you saying that WebCodecs won't be able to support AV1??
from webcodecs.
A different way to say what @chcunningham is saying: if it can be packaged in an MP4, then we do not need a ReadableByteStreamController.
The terms are different between streams and media codecs:
- Codec bitstream: a serialization of encoded data as a sequence of bits.
- Codec bytestream: a byte-aligned bitstream.
- Streams bytestream: An undivided flow of bytes. (like a Unix pipe, TCP)
- Packetized: Bytes arrive in discrete, self-contained chunks. (like UDP)
There are unpacketized bitstreams (eg. H.264 Annex B), but I am unaware of any that don't also have standard packetizations.
There is a spectrum of possible implementations in WebCodecs:
- Support bytestreams directly.
- Require bytestreams to be broken into arbitrary chunks.
- Require bytestreams to be broken into meaningful chunks.
- Require bytestreams to be broken into chunks that are exactly one sample.
I prefer the last one because it allows us to accept frame metadata (such as timestamp) alongside the bytestream chunks, but it's conceivable that there exists (or will exist) a format for which this doesn't make sense.
from webcodecs.
The AV1 bitstream specification is packetized as specified in the AV1 RTP payload specification. The AV1 bitstream format uses OBUs (similar to H.264 NAL units), including Time Delimiter (TD), Sequence Header (SH), MetaData (MD), Tile Group (TG) and Frame Header (FH) OBUs. As an example, the following bitstream:
TD SH MD MD(0,0) FH(0,0) TG0(0,0) MD(0,1) FH(0,1) TG(0,1)
would typically be packetized as follows:
[ SH
MD
MD(0,0)
FH(0,0)
TG(0,0)
] [ MD(0,1)
FH(0,1)
TG(0,1)
]
This seems like it might qualify as "arbitrary chunks" or "meaningful chunks", but probably not "chunks that are exactly one sample".
from webcodecs.
AV1 is also packetized in chunks that are exactly one sample in the ISO BMFF binding.
from webcodecs.
Also worth noting that a decision here could affect #13, and theoretical future video formats that support progressive decoding.
My gut instinct is that progressive decoding is for still images and should be a separate API, but I'd like to understand that design space better.
from webcodecs.
And one more note: for low-latency streams, it may be beneficial to submit slices/tiles individually as they arrive from the network, and the opposite for encoding. (So 'meaningful chunks'.)
If we support that, it's important to make sure we don't also make muxing harder for less latency-sensitive cases. A 'partial' flag for input and output chunks may be enough (and could be added in a v2).
from webcodecs.
I would be against byte streaming progressive decoding, that is, feeding the decoder with a byte stream without explicit boundaries (it may be inline as in h264 with the nal start header sequence "001") and let the decoder decide where are the relevant start/end bytes for each decodable chunk.
I think that the question is really if we serialize the encoding units that the encoder produces(would be the group of nals in h264 or obus in av1, or particions in vp8) into a byte array (i.e. the byte stream format) or if we just output an array of chunks so the app packetizes it at will.
Note that typically encoders provide is the later, for example in vp8 you encode the frame and then return each partition:
https://github.com/webmproject/libvpx/blob/master/examples/simple_encoder.c#L124
const vpx_codec_err_t res =
vpx_codec_encode(codec, img, frame_index, 1, flags, VPX_DL_GOOD_QUALITY);
if (res != VPX_CODEC_OK) die_codec(codec, "Failed to encode frame");
while ((pkt = vpx_codec_get_cx_data(codec, &iter)) != NULL) {
got_pkts = 1;
if (pkt->kind == VPX_CODEC_CX_FRAME_PKT) {
const int keyframe = (pkt->data.frame.flags & VPX_FRAME_IS_KEY) != 0;
if (!vpx_video_writer_write_frame(writer, pkt->data.frame.buf,
pkt->data.frame.sz,
pkt->data.frame.pts)) {
die_codec(codec, "Failed to write compressed frame");
}
printf(keyframe ? "K" : ".");
fflush(stdout);
}
}
x264 the same, providing the array of nals as ouput of x264_encoder_encode
There are pros and cons about doing it this way (which would also affect as what we accept as input in the decoder).
The good part is that providing the individual encoding units (nals/obus/partitions) it is easier to convert it to any frame-based stream format (for example to h264 annex b format) and it is easier to do an rtp packetization (if not you would have to typically parse the byte stream to find the nals/obus and apply packetization afterward).
The bad part is that this requires that the serialization is done on the app side before passing it to the appropriate transport (webrtc could be different as the packetization should be done inside of it).
from webcodecs.
Also, as a side note, SVC codecs (like vp9) produces several "frames" per input video frame, so it would not be easy to produce a single chunk from the encoder.
from webcodecs.
Hello! For h264, does it mean we have to group NAL units by ourselves before creating EncodedVideoChunk
? Currently I am receiving individual NAL units from RTSP stream, and trying to figure out correct way to decode them via Webcodecs API
. I would appreciate any help
from webcodecs.
That's correct.
If your source is not framed then you will need to identify access unit boundaries. If your source includes AUD (Access Unit Delimiter) units then that's quite easy (break right before each AUD). It's also relatively easy if you know there is only once slice per frame and no redundant or auxiliary slices (break after each slice). Beyond that you'll probably want to read the H.264 spec.
from webcodecs.
Note for VP9 spatial SVC: My current understanding is that the several frames should in fact be separate frames, but they have the same timestamp. There is an asymmetry here; for encoding you should only be passing in the highest-resolution version of each frame.
I expect our encoders will output multiple chunks (one for each resolution) but they will have the same timestamp.
I still need to do some research to figure out if its technically valid to bundle them into a single chunk. (Presumably libvpx is/would already be bundling them like that if it's valid.)
from webcodecs.
To the core issue of slices/tiles vs 'meaningful' chunks, Chrome's longstanding behavior has been 'meangingful' chunks and this has been demonstrated to work great for a variety of use cases (RTC, Low latency streaming, Video editing, etc...). If slices/tiles is later desired, we should do this without breaking the API (e.g. specified as an option in VideoDecoderConfig, for which the default is 'meaningful' chunks). Hence I've marked the issue as 'extension'.
Having said that, we've had no real demand for this from users and I vote to just close the issue until demand arrives. @sandersdan WDYT?
If your source is not framed then you will need to identify access unit boundaries. If your source includes AUD (Access Unit Delimiter) units then that's quite easy (break right before each AUD). It's also relatively easy if you know there is only once slice per frame and no redundant or auxiliary slices (break after each slice). Beyond that you'll probably want to read the H.264 spec.
The codec registry should document this. Work tracked in #155.
I still need to do some research to figure out if its technically valid to bundle them into a single chunk. (Presumably libvpx is/would already be bundling them like that if it's valid.)
We discussed this more w/ SVC folks and learned separate chunks is how its done.
from webcodecs.
Closing is acceptable to me. Even if there is demand, breaking a stream into chunks may fit better in a containers API anyway.
from webcodecs.
Related Issues (20)
- Vekil
- Sporadic build failures HOT 4
- Define scope for w3c candidate recommendation HOT 3
- Candidate Recommendation tracking issue
- VideoFrame copyTo() behavior with non-RGBA/RGBX/BGRA/BGRX formats HOT 1
- VideoPixelFormat enum values do not follow casing rule guidelines HOT 2
- EncodedAudioChunkInit should probably also support AllowSharedBufferSource HOT 1
- Issue
- numberOfChannels/sampleRate check in AudioDecoderConfig/AudioEncoderConfig HOT 1
- Under what conditions should the ImageDecoder [[completed promise]] and ImageTrackList [[ready promise]] be rejected? HOT 3
- Spec differentiates between ImageDecoder initialization with unsupported non-image MIME type and an unsupported image MIME type HOT 2
- how to config videoEncoder that output frame is annexB not avcC? HOT 1
- `ImageDecoder` is `[SecureContext]` but related interfaces aren't HOT 2
- `ImageBufferSource` should be `[AllowShared]` HOT 1
- Announcement: Background Segmentation metadata entry to WebCodecs VideoFrame Metadata Registry HOT 2
- Why is the audio encoder output timestamp different from the input timestamp
- Video encoder bitrate control in case frame rate is not provided
- Video frames on demand HOT 1
- Error code: SIGSEGV HOT 2
- VideoFrame.timestamp and EncodedVideoChunk.timestamp HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from webcodecs.