dholroyd / h264-reader Goto Github PK
View Code? Open in Web Editor NEWRust reader for H264 bitsream syntax
License: Apache License 2.0
Rust reader for H264 bitsream syntax
License: Apache License 2.0
When i try to use the code similar to this i get following error. On all streams i tried it on.
Lines 71 to 105 in 945b404
[..] on an `Err` value: ScalingMatrix(ReaderError(ReaderErrorFor("delta_scale", Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })))', [...]
Which i think is caused by wrong constants here.
Lines 158 to 162 in 945b404
Specification i have says:
for( i = 0; i < 6 +
( ( chroma_format_idc != 3 ) ? 2 : 6 ) * transform_8x8_mode_flag;
i++ ) {
Error goes away with these values.
Hi there!
Is there any reason for most of the fields of the SliceHeader
struct being private?
I am trying to use this crate to do some picture order calculations based on the 14496-10 spec.
In order to do that I need to access fields in the SliceHeader struct, so I was wondering if they could be made pub for access outside the crate? Or is there a reason why they are private at the moment?
Firstly, thanks for putting together a tool that can dig-deep into H.264 SEI data. I'm excited to have a potential tool that can view at least the headers in the SEI ITU-T-T35 data, since tools such as fq
, as great as they are, doesn't navigate the ITU-T-T35 headers, whereas h264-reader looks like it can dig in a little deeper. h264bitstream does not seem to dig down to the SEI.
However, do you have a quick couple-of-liner quickstart for h264-reader for those less familiar with rust packages vs binaries?
$ brew install rust # Using macOS brew to install rust/cargo
$ cargo list # Test cargo
$ cargo version # Test cargo
$ cargo install h264-reader # This seems to download the crate
error: there is nothing to install in `h264-reader v0.6.0`, because it has no binaries
`cargo install` is only for installing programs, and can't be used with libraries.
To use a library crate, add it as a dependency in a Cargo project instead.
I'm guessing that in order to use h264-reader, I need to start a rust project and use the library etc. This is all fine for developers, but rust is a little out of my skillset.
My goal is to pass infile.264
to h264-reader
by issuing commands at a terminal
$ h264-reader-app infile.264 -stdout
Or ideally pipe the raw stream out of FFmpeg to h264-reader
.
$ ffmpeg -i infile.ts -map 0:v:0 -codec:v copy -f h264 pipe:1 | h264-reader-app - -stdout
How would a user (not developer) go about using the tool in a terminal? Is it necessary to start a project and wrap that up into an app, or is there a simple way for a user to expose the h264-reader library as a command-line tool or library?
Once again, thanks for putting together the SEI-parser side of things. fq
is pretty good, but it doesn't get down into the ITU-T-35. h264-reader
looks promising.
I appreciate that this is more of a rust question rather than a h264-reader question, but a quickstart in the README.md would be super helpful for non-rust-types.
What are your feelings on additional convenience methods / methods that interpret the raw fields? I find the H.264 spec rather dense and want to get a few "simple" things without studying it in detail. I imagine other folks will be in the same boat.
Here are some I'd like to have and why I think it's better for the crate to compute them than for callers to do so (badly):
rfc6381_codec
. I see that your lowly
crate calculates this here this. My moonfire-nvr
calculates it here. Seems a little obscure/annoying.SeqParameterSet::pixel_dimensions()
Looks like it's specified in terms of macroblocks and map units. I'm unsure if you're supposed to apply cropping parameters to support frames that aren't a multiple of those things.TimingInfo
: max frame rate (or maybe min time scale units per frame?) I'd have expected this to be super straightforward but I got confused about mentions of interlaced video and frame doubling/tripling, references to a variety of equations with abbreviated names, etc.In the new AVCC module (thank you! super helpful), an SPS with an emulation prevention doesn't parse. e.g., this panics:
#[test]
fn hikvision() {
// From a Hikvision 2CD2032-I.
let avcc_data = hex!("014d401e ffe10017 674d401e 9a660a0f
ff350101 01400000 fa000003 01f40101
000468ee 3c80");
let avcc = AvcDecoderConfigurationRecord::try_from(&avcc_data[..]).unwrap();
let sps_data = avcc.sequence_parameter_sets().next().unwrap().unwrap();
dbg!(sps_data);
let ctx = avcc.create_context(()).unwrap();
let sps = ctx.sps_by_id(ParamSetId::from_u32(0).unwrap())
.expect("missing sps");
}
I'm not sure how you'd like this to work. Is ParamSetIter
supposed to yield an encoded NAL unit or RBSP? If the former, Avcc::create_context
should pass it through the RBSP decoder before calling SeqParameterSet::from_bytes
. If the latter, ParamSetIter
should pass it through the RBSP (and I suppose needs to return a Vec
or Cow<[u8]>
or some such).
It seems like a lot has changed since 0.5.0
. Is there any chance of getting a new release? (If there are well-defined tasks that need to be done first, I'd be happy to help)
Currently, a slice_group_map_type
of 6 calls read_group_ids() which will use pic_size_in_map_units_minus1
as the bit-length to read each run_length_minus1
value, up to num_slice_groups
times.
The pic_size_in_map_units_minus1
and num_slice_groups
should be swapped to match the spec:
I saw you had a fuzz directory, which is pretty neat! I've been meaning to try fuzzing but never have.
I ran it, and it crashed almost immediately with this input: [0, 0, 1, 221]
. Given how quickly it crashed, I suspect you've already seen this, but I thought I'd file the bug anyway.
NalSwitch
's NalReader::push
implementation calls NalHeader::new
. The latter is fallible (and failed). The former is infallible, so it calls unwrap
, which of course crashes.
Should NalReader:push
be fallible to handle this properly?
We don't assert that cpb_cnt_minus1
is within reasonable limits, which means that HrdParameters::read_cpb_specs()
may attempt to allocate a buffer of unlimited size, and run out of memory if the bitstream contains unreasonable values.
Found via fuzz testing. Failing input,
0x12,0x51,0x9,0x9c,0x69,0x2,0x0,0x0,0x0,0xf9,0xdf,0xf7,0x8,0x0,0x34,0x34,0x40,0x0,0x3b,0xff,0xff,0x0,0x0,0x1,0x47,0xff,0xff,0x8c,0xd8,0xd8,0x34,0x34,0x74,0x80,0x0,0xb0,0x15,0x0,0xb2,0x0,0x92,0xd9,0xd8,0xd8,0x34,0x34,0x74,0x0,0x0,0x0,0x1,0x28,0x34,0x15,0x15,0x0,0x0,0xd8,0x7,0x0,0x1,0x1,0x7a,0x6a,0x14,0x0,0x1,0x0,0x0,0x40,0x0,0x3b,0xff,0xff,0x0,0x0,0x1,0x47,0xff,0xff,0xff,0xff,0xff,0x8c,0xd8,0xd8,0x34,0x34,0x74,0x0,0x0,0x0,0x15,0x0,0x32,0x0,0x92,0xd9,0xd8,0xd8,0x34,0x34,0x74,0x0,0x0,0x7,0x0,0x1,0x1,0x7a,0x6a,0x14,0x0,0x1,0x0,0x0,0x1,0xff,0x0,0x6,0x0,
The problem
The call to PicParameterSetExtra::read
is commented-out/broken.
Line 252 in aa5bb36
Lines 173 to 176 in aa5bb36
I think I understand the problem now. Before I changed it, has_more_rbsp_data
did the following:
Lines 287 to 289 in c96301c
(position
and total_size
are in bits. The current version is harder to read but similar in effect.)
This isn't right: it says if there are more bits left in the RBSP, which sounds reasonable, but the spec says it should mean has more non-trailing bits (Rec. ITU-T H.264 (03/2010) section 7.2):
more_rbsp_data( ) is specified as follows.
- If there is no more data in the RBSP, the return value of more_rbsp_data( ) is equal to FALSE.
- Otherwise, the RBSP data is searched for the last (least significant, right-most) bit equal to 1 that is present in the RBSP. Given the position of this bit, which is the first bit (rbsp_stop_one_bit) of the rbsp_trailing_bits( ) syntax structure, the following applies.
- If there is more data in an RBSP before the rbsp_trailing_bits( ) syntax structure, the return value of more_rbsp_data( ) is equal to TRUE.
- Otherwise, the return value of more_rbsp_data( ) is equal to FALSE.
The method for enabling determination of whether there is more data in the RBSP is specified by the application
(or in Annex B for applications that use the byte stream format).
For most NALs, the trailing bits are guaranteed to be only in the last byte in the RBSP. For slice NALs, there are sometimes cabac_zero_word
s afterward. (See rbsp_slice_trailing_bits
, section 7.3.2.10.) In either case, it should stop at the last one bit of the RBSP.
Better behavior
has_more_rbsp_data
should return the correct result on a completely-buffered non-slice NAL. Ideally it always returns the correct result, including WouldBlock
in ambiguous cases on partially-buffered NALs.BitStreamReader
functions were aware of the trailing bits and returned EOF on hitting them, or (when parsing a partially-buffered NAL) WouldBlock
on the last one bit we know of. This might help catch bugs in parsing at least. But it doesn't seem straightforward to implement (see below).finish
method which errors if it's not positioned exactly at the trailing bits.Implementation ideas
Unfortunately bitstream_io::BitRead
doesn't have a way to peek at the queued unaligned bits non-destructively or when unaligned to access the underlying reader
(to check its position). Those might be reasonable feature requests, or we could use bitstream_io::BitQueue
directly.
I don't see a straightforward, efficient way to just check there's still a one bit left in the stream via readahead. edit: no, I got confused. It supports any numeric type (up to bitstream_io::BitQueue
only supports storing 32 bits, and I think BitReader
might be using all of that already.u128
). We could probably do this by using it directly but it might require re-implementing more of edit 2: no, but anyway this would require unbounded lookahead, as the RBSP can have several zero bytes in a row. That's no good.BitReader
than I was hoping for.
We could look backward in the RBSP to find the trailing bits position (as the spec mentions in 7.4.1.1: "identification of the end of the SODB within the NAL unit by searching the RBSP for the rbsp_stop_one_bit starting at the end of the RBSP"). It'd mean adjusting my interface proposal in #4 because getting the NAL bytes via a BufStream
doesn't allow this. And looking backward is slightly messy on slice NALs because there can be an emulation prevention three byte in the CABAC words (see the note on H.264 7.4.2.10) and those don't count. (But we only really care about the headers of slice NALs anyway, not consuming those NALs fully, so maybe it doesn't matter.) But it's at least possible to find the right bit in the NAL bit stream. And I think we could map that into rbsp::RbspBitReader
calls doing the right thing if bitstream_io
would just let us access the reader (to check if we've hit that NAL byte position in question) and count the number of queued bits.
The finish
method seems relatively straightforward to do without even any changes to bitstream_io::BitRead
's interface. There's a BitRead::byte_aligned
; we can get the bits until then and make sure the first is one and the rest are zero. Then we can use BitReader::into_reader
to ensure the remaining RBSP bytes (if any) are zero.
The VUI bitstream restriction syntax elements are not bounds checked: https://github.com/dholroyd/h264-reader/blob/master/src/nal/sps.rs#L803-L812
Expected bounds, according to Annex E.2.1 of the spec:
max_bytes_per_pic_denom
: [0, 16]max_bits_per_mb_denom
: [0, 16]log2_max_mv_length_horizontal
: [0, 15]log2_max_mv_length_vertical
: [0, 15]max_num_reorder_frames
: [0, max_dec_frame_buffering
]max_dec_frame_buffering
: [max_num_reorder_frames
, MaxDpbFrames] where MaxDpbFrames is set by the LevelThe SPS data is extract from a real h264 file.
#[test]
fn test_parse_sps() {
let raw_sps = vec![
// 0x67_u8,
0x64, 0x00, 0x1e, 0xac, 0xd9, 0x40, 0xa0, 0x2f,
0xf9, 0x70, 0x11, 0x00, 0x00, 0x03, 0x00, 0x01,
0x00, 0x00, 0x03, 0x00, 0x32, 0x0f, 0x16, 0x2d,
0x96,
];
let breader: rbsp::BitReader<&[u8]> = rbsp::BitReader::new(&raw_sps[..]);
SeqParameterSet::from_bits(breader).unwrap();
}
I'm not sure if I'm doing something wrong or not (not a video expert). However, when trying to extract the h264 stream from the 1MB 1080p mp4 from here and loading it up it seems to give different results from the h264-bitstream-viewer. Specifically in the SPS VUI section.
The h264 bitstream viewer tool gives this output:
While my Rust test app seems to give this output:
[src\main.rs:18] &sps.len() = 27
[src\main.rs:23] sps = Ok(
SeqParameterSet {
profile_idc: ProfileIdc(
100,
),
constraint_flags: ConstraintFlags {
flag0: false,
flag1: false,
flag2: false,
flag3: false,
flag4: false,
flag5: false,
reserved_zero_two_bits: 0,
},
level_idc: 42,
seq_parameter_set_id: ParamSetId(
0,
),
chroma_info: ChromaInfo {
chroma_format: YUV420,
bit_depth_luma_minus8: 0,
bit_depth_chroma_minus8: 0,
qpprime_y_zero_transform_bypass_flag: false,
scaling_matrix: SeqScalingMatrix,
},
log2_max_frame_num_minus4: 0,
pic_order_cnt: TypeZero {
log2_max_pic_order_cnt_lsb_minus4: 3,
},
max_num_ref_frames: 4,
gaps_in_frame_num_value_allowed_flag: false,
pic_width_in_mbs_minus1: 119,
pic_height_in_map_units_minus1: 67,
frame_mbs_flags: Frames,
direct_8x8_inference_flag: true,
frame_cropping: Some(
FrameCropping {
left_offset: 0,
right_offset: 0,
top_offset: 0,
bottom_offset: 4,
},
),
vui_parameters: Some(
VuiParameters {
aspect_ratio_info: Some(
Ratio1_1,
),
overscan_appropriate: Unspecified,
video_signal_type: None,
chroma_loc_info: None,
timing_info: Some(
TimingInfo {
num_units_in_tick: 768,
time_scale: 16777219,
fixed_frame_rate_flag: false,
},
),
nal_hrd_parameters: None,
vcl_hrd_parameters: None,
low_delay_hrd_flag: None,
pic_struct_present_flag: false,
bitstream_restrictions: None,
},
),
},
)
Excuse the large screenshots and data dump, but notie how num_units_in_tick
are 768 in the h264-reader case while the are expected to be 1.
I've attached the extracted h264 bitstream as a zip file and attached it to this issue though it could be extracted using
Big_Buck_Bunny_1080_10s_1MB.zip
ffmpeg.exe -i C:\Users\Jasper\Downloads\Big_Buck_Bunny_1080_10s_1MB.mp4 -vcodec copy -vbsf h264_mp4toannexb -an the_bitstream.h264
Many pieces of functionality currently exposed by this create for decoding layers of the H264 spec exposes an API which...
...all with the goal of allow the implementation to avoid copying data.
This is useful for performance reasons in some workloads, but the API is really complicated and a bit hard to use.
It would be nice to provide alternative APIs that work much more simply when,
These simpler APIs would exist in addition to the performant-but-inconvenient interfaces. The implementations should share as much code as possible (very likely, the 'simple' API could be implemented as a higher-level abstraction on top of the complicated one).
Lots of h264 syntax, as supported by this crate right now, is the same as h265. i.e. overall structure using Network Abstraction Layer Units.
Work out a good way of sharing code between h264 and h265 parsing, either by making this crate itself more generic, or by breaking common code into a separate crate.
(I don't yet know the differences between h264 and h265 well enough to know exactly where the line between generic and specific is drawn.)
A missing bounds check in pps.pic_init_qs_minus26
can cause an overflow when calculating qs_y
.
Here is an input with pps.pic_init_qs_minus26
set to -285
and slice_qs_delta
set to -2147483645
:
vec![0x00, 0x00, 0x00, 0x01, 0x67, 0x64, 0x00, 0x0B, 0xAC,
0xD9, 0x42, 0x4D, 0xF8, 0x84, 0x00, 0x00, 0x00, 0x01,
0x68, 0xEB, 0xE8, 0x02, 0x3B, 0x2C, 0x8B, 0x00, 0x00,
0x01, 0x65, 0x96, 0x10, 0x00, 0x64, 0x00, 0x00, 0x03,
0x00, 0x03, 0xFF, 0xFF, 0xFF, 0xEF, 0xFF, 0xD2, 0x88,
0x4D, 0x64, 0x00, 0x23, 0xA0, 0x2B, 0xF7, 0xE3, 0x9A,
0x89, 0xE0, 0x00, 0x00, 0x00, 0x01, 0x41, 0x9A, 0x21,
0x6C, 0x41, 0x97, 0x2E, 0xB0];
The syntax element pps.pic_init_qs_minus26
should be in the range [-26, 25].
Found with H26Forge.
Please add a example for the reader.
Could you please prepare a new release? I'd like to publish an initial version of my RTSP crate to crates.io, and it depends on various things added since h264-reader 0.4.0 a year ago. Thanks!
The pic_parameter_set_id
is a ParamSetId struct, which checks if the ID is within [0, 31]: https://github.com/dholroyd/h264-reader/blob/master/src/nal/pps.rs#L218
Valid bitstreams can have pic_parameter_set_id
s in the range [0, 255].
I tried using your lib to parse some H.264 stream. Let's say I have these NALs:
And I have a function which splits them into NAL units like so
let h264_data = include_bytes!("videos/multi_512x512.h264");
// each `nal` will be [ 00 00 01 xx xx xx ... ]
for nal in nal_units(h264_data) { }
I'd now like to get all encoded SPS / PPS / IDR / ... information bits, for example like so:
let sps = SeqParameterSet::from_bits(bits)?;
dbg!(sps.log2_max_frame_num_minus4);
Would it be possible to add some absolute MVP examples to the project (and / or pose some code here) that outlines what the preferred way of getting these data is? I did a few attempts with AnnexBReader::accumulate
, but was struggling to get an example to run without it panicking. For example, depending on what code I actually run I would get
called `Result::unwrap()` on an `Err` value: UnknownSeqParamSetId(ParamSetId(0))
thread 'f' panicked at 'called `Result::unwrap()` on an `Err` value: UnknownSeqParamSetId(ParamSetId(0))', tests\parse_nal.rs:32:93
or
called `Result::unwrap()` on an `Err` value: RbspReaderError(ReaderErrorFor("finish", Custom { kind: WouldBlock, error: "reached end of partially-buffered NAL" }))
thread 'f' panicked at 'called `Result::unwrap()` on an `Err` value: RbspReaderError(ReaderErrorFor("finish", Custom { kind: WouldBlock, error: "reached end of partially-buffered NAL" }))', tests\parse_nal.rs:19:60
stack backtrace:
but don't quite understand why. For reference, here's an example I used:
#[test]
fn f() {
let h264_data = include_bytes!("videos/multi_512x512.h264");
let mut reader = AnnexBReader::accumulate(|nal: RefNal<'_>| {
let context = Context::new();
let nal_unit_type = nal.header().unwrap().nal_unit_type();
let bits = nal.rbsp_bits();
match nal_unit_type {
UnitType::SeqParameterSet => {
let sps = SeqParameterSet::from_bits(bits).unwrap();
dbg!(sps.log2_max_frame_num_minus4);
}
_ => {} // _ => NalInterest::Ignore,
}
NalInterest::Ignore
});
for nal in nal_units(h264_data) {
reader.push(nal);
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.