Giter VIP home page Giter VIP logo

Comments (6)

dholroyd avatar dholroyd commented on July 29, 2024

Seems I've also left some dbg!() in my haste to push code I worked on a few months back. 😞

My preference is to have ParamSetIter yield data which is still encoded, and have the caller do the extra steps.

It's a shame that the RbspDecoder interface makes simple cases like this really complicated! I can't see an easy way to produce Cow<[u8]> via this interface either, although that would be lovely.

from h264-reader.

dholroyd avatar dholroyd commented on July 29, 2024

Commit 7ef5c74 is a bit unsatisfactory, but gets this case working; I'll probably not have time in the short term to work out how to refactor RbspDecoder to support this better. Sorry, it's a bit hairy!

I'd be happy to re-do these interfaces -- any suggestions would be welcome!

from h264-reader.

scottlamb avatar scottlamb commented on July 29, 2024

Thank you!

Seems I've also left some dbg!() in my haste to push code I worked on a few months back. 😞

Did you mean to fix that in 7ef5c74? It's still there.

I'd be happy to re-do these interfaces -- any suggestions would be welcome!

fwiw, the "simple cases" are all my software sees right now. It currently gets all its H.264 data from ffmpeg, which always gives it a complete "extra data" (sps + pps) or frame. So my own, not-terribly-robust NAL splitting code is super easy to use—give it some data, it calls a closure once per NALU, no dealing with state between calls. So I haven't thought through the other cases nearly as much as you.

I want to replace the usage of ffmpeg here with a (to-be-determined/written) Rust rtsp crate, which may behave differently. So I may end up caring about the complicated cases as well. But I haven't given it much thought yet. Here are some ideas off the top of my head:

I don't think it's a problem RAM-wise to buffer a whole NALU at once. I assume the concern is CPU?

How much does the naive implementation that always copies to accumulate a whole NALU in a (reused) Vec cost? Maybe you already started with this and discarded it, but I'm catching up.

If that's too expensive, how often does a single NALU get split into two push calls in practice? Maybe the mux/"outer" parsers (RTP / .mp4 / MPEG-TS / Annex-B) could have a single method that pushes some bytes with an end flag. If the whole thing is in one call, it can pass the buffer through as-is; otherwise it can copy to an internal buffer.

The RBSP layer could always expect a complete NAL. Perhaps (if the outer parser's buffer stores only one NALU), it could take a mutable buffer and shift over all the non-"emulation prevention" bytes, sort of like Vec::retain does.

I also wonder if when doing this with network protocols and tokio and such, it might be better to use the bytes crate rather than plain &[u8] / Cow<[u8]> / Vec<u8>. But I haven't actually worked with tokio yet, so I don't really know.

Speaking of the bytes crate, one of its selling points is to "facilitate zero-copy network programming by allowing multiple Bytes objects to point to the same underlying memory." So maybe my "if the outer parser's buffer stores only one NALU" caveat could be removed by using it as a convenient way of starting with some subset of the buffer, then shrinking it a bit as the emulation prevention bytes are removed.

from h264-reader.

dholroyd avatar dholroyd commented on July 29, 2024

Now removed the dbg!() in a follow-up commit 😅

moonfire-nvr looks really cool, I will be taking a look!

My aim with the design (maybe not 100% realised yet!) was to avoid another portion of memory write+read workload at more-or-less the source video bitrate. It's true that, depending on the size of the 'lump' that need to be accumulated, the writes+reads might not get all the way through the cache hierarchy down to RAM, but wanting to avoid cpu 'back-end stalls' (I think that's the right term?) was the motivator. As you already know, those costs aren't a deal-breaker for lots of workloads.

I confess I've not really measured the result yet for this crate, but this is mirroring the approach in mpeg2ts-reader, which avoids accumulating the full PES-packets for the caller, and instead hands on the little fragments of PES packet from each TS-packet payload. Benchmark results vs ffmpeg appear good (but you should never trust benchmarks!).

It all pales into insignificance if the video actually has to be decoded of course! 😱

I sometimes use this code with higher-bitrate streams (40-50Mbps is common; 100Mbps is sometimes suggested). My hope is that when the app is just proxying / extracting metadata, the network will be the limiting factor, rather than CPU / cache / RAM.

So in the end, the complexity arises from my desire to parse the header from slice_*_rbsp() without having to accumulate those whole NALUs. For SPS/ PPS etc my code already accumulates the complete NALU -- I've no idea how I'd parse a partial one :)

It would be nice for this crate to support both a simple interface and a complex 'partial-input' one, with a common implementation. Not sure how yet.

I have looked briefly at bytes a while back but couldn't quite work out if it would solve my problems. I'll have to look again! I took the impression that bytes adds the cost of an Rc on taking sub-slices, so I'd want to measure the cost to my precious inner-loops 😄

from h264-reader.

dholroyd avatar dholroyd commented on July 29, 2024

I've raised #4 to track the desire to support better APIs for some of these things.

from h264-reader.

dholroyd avatar dholroyd commented on July 29, 2024

FWIW, I revised the fix for this problem in 2ce960b, since there was actually already a way mechanism to perform the required decode, but I'd forgotten about it!

from h264-reader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.