Can anyone provide a code sample to extract all the frames from a video? I'm not

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Comments (7)

J7a4s0m5ine commented on June 10, 2024 2

Thanks for replying.
Looks like I'm not smart enough ☺️ to understand this fully

Sorry, I didn't mean it in that manner. When I first approached video streams, and encoding/decoding, it all looked alien. My point was more that if you understand the underlying streams you'll be able to understand what's going on with this mess of media coding and containerization. It's a fairly complex topic, and as our needs of bandwidth savings increase, it will continue to become more complex. But let's forget about that for now and I'll get back on topic:

Finding frames - your original question

Finding the h264 NALUs (frames) is essentially searching through an array buffer for a pattern, in this case the pattern is the three-byte or four-byte start code. If you don't find the pattern among the current stream data, you append it to an intermediary buffer and continue to read the source stream. This is one pattern to achieve frame parsing. There are times when the start code is different depending upon the h264 format/profile aswell....see the links and information below.

https://github.com/soliton4/nodeMirror/blob/cf7db884e61919f1efec92fb3601585c8e3c8f12/src/avc/Wgt.js#L291
this is an example where i decode a raw h264 stream and split it on the 0x00000001 markers

@soliton4 's code linked is a great example parsing a stream for NALus

1. Loop the incoming data stream to find a start sequence (00x0,00x0,00x0,00x1)

To extrapolate their code against what I said above, they are getting the data from the media source and looping through that array to find the start sequence. When no start code is found, they append the data to the temp buffer and continue to read from stream.

I'm over simplifying here a bit, their code is multithreaded possibly using webworkers and there's a little more going on than what I alluded to. There's also some logic to determine the case where nothing currently exists in the temp buffer and we find a NALu, which would mean it's either the first frame, or we already sent the previous frame to the decoder by the time the start sequence was found. But for all intents and purposes I can simplify the explanation a bit.

I've added some comments for clarity and explanation

  var b = 0;
  var l = data.length;     // get length of the incoming data
  var zeroCnt = 0;
  for (b; b < l; ++b){    // for-loop that uses a zeroCnt variable to keep track of contiguous zeros
    if (data[b] === 0){
      zeroCnt++;
    }else{
      if (data[b] == 1){
        if (zeroCnt >= 3){   // at least 3 contiguous zeros were found!
          hit(b - 3);     // we send the offset location to the "hit" function so it can process the current temp buffer and combine the frame data
          break;
        };
      };
      zeroCnt = 0;
    };
  };
  if (!foundHit){
    this.bufferAr.push(data);    // No start code was found, continue pushing data to temp buffer
  };
  
}

1. So a start code was found while we were looping

In the case a start code is found we note the exact position it occurs (the offset position in the data buffer) and create a subarray with everything leading up to the offset; everything before the offset position is apart of the previous frame and is concatenated together with the existing temp buffer (bufferAr) and sent to the decoder as a whole frame. The temp buffer is then cleared and everything that was following the offset in the original stream buffer is pushed to the temp buffer to start the loop process over again.

var hit = function(offset){
  foundHit = true;

  // pass subarray at the offset where the start code was found
  self.bufferAr.push(data.subarray(0, offset));
  // concat the two arrays  and push to the decoder
  self.decode( concatUint8(self.bufferAr) );
  // clear the temp buffer
  self.bufferAr = [];                                            
  // Push the second portion of the sliced array to the temp buffer
  self.bufferAr.push(data.subarray(offset));          
};

Other implementations that might be helpful to see

@OllieJones has a library that has a bunch of H264 functionality including searching arrays/streams for frames. Take a look at that repo as a whole, definitely read the README. I linked to a specific portion of their README because it explains the nuances with H264 streams and how they sometimes have different formats for frame separators.

In Ollie's repo they are converting from one media format (webm) to another media container format (mp4) by extracting raw NALus (videoframes + extra data + stream info) from webm and "boxing" those NALUs in mp4's container format.

Another implementation in Java

This is a port of the original FFMPEG code back in 2012

I'm using this example because it's a completely different thought pattern on how a frame parser could be architected. They're heavily using bit shifting while looking for the start sequence.

The frame parsing in this example starts at this try block. You can see they start reading in the file on L139 and walk back and forth through several while loops to do the decoding while using isEndOfFrame as a decision making point and bit shifting to find the sequence.

private boolean isEndOfFrame(int code) {
int nal = code & 0x1F;

if (nal == NAL_AUD) {
	foundFrameStart = false;
	return true;
}

boolean foundFrame = foundFrameStart;
if (nal == NAL_SLICE || nal == NAL_IDR_SLICE) {
	if (foundFrameStart) {
		return true;
	}
	foundFrameStart = true;
} else {
	foundFrameStart = false;
}

return foundFrame;
}

Lastly a project that was inspired by Broadway

It has wasm, ios, c++ and java h264 decoder variants plus some extra goodies

Decent Wikipedia/articles/documentation

I have to cut this short for now and step away from the computer for a bit, if you have more questions or anything feel free to ask. Here's some reading to catch you up on H264 formats, and the like. There's more to decoding than identifying the frames, for example depending upon the decoder you need to "prime" the input buffer with a sequence of SPS+PPS+IFrame in order to initialize it so it can determine the video size.

@soliton4 I'm not sure if this is true for this decoder. I used it a long time ago and heavily modified it for a specific purpose. I don't even have that code anymore to reference.

Anyways here are some resources on H264 video codec and MP4 containers:

https://stackoverflow.com/a/24890903 - This is an amazing write up on the H264 formats (Annex B vs AVCC), how they store information, and how they differ.

Bitmovin's ultimate guide to container formats - not specifically about H264 but there's great info about media container formats
ITU-T H264 Profile Spec PDF
H264 profile list
video Codecs - Mozilla MDN
Media Container Formats - Mozilla MDN
Network Abastraction Layer (NALu)
Parameter Sets (SPS/PPS)
Getting frames from an RTSP source explained
https://stackoverflow.com/a/7668578

Very very simple frame parser implementation

const soi = Buffer.from([0x00, 0x00, 0x00, 0x01]);
function findStartFrame(buffer, i = -1) {
    while ((i = buffer.indexOf(soi, i + 1)) !== -1) {
        if ((buffer[i + 4] & 0x1F) === 7) return i
    }
    return -1
}

from broadway.

soliton4 commented on June 10, 2024 1

https://github.com/soliton4/nodeMirror/blob/cf7db884e61919f1efec92fb3601585c8e3c8f12/src/avc/Wgt.js#L291

this is an example where i decode a raw h264 stream and split it on the 0x00000001 markers
it is best to feed complete nals to the decoder, however i believe the latest version is doing nal splitting internaly

from broadway.

J7a4s0m5ine commented on June 10, 2024 1

thats the most detailed answer ever. is there an oscaars of the thread replies? cause u r nominated
…

Haha, this is one of those fields that's difficult to understand. If I can help some poor soul along I will.

@AnilSonix No problem, and good luck!

from broadway.

J7a4s0m5ine commented on June 10, 2024

This would be fairly easy if you understand what a h264 stream "looks like," and it's format/structure.

See this SO question.

0x000001 or 0x00000001, is placed at the beginning of each NAL unit.

To extract the frames you would read the stream until you find the beginning of the next NAL unit. So you have a start byte where you identified the end or start of a frame, lets say it's byte 256, you then continue reading the stream until you find the next 0x000001 or 0x00000001, which signifies the beginning of the next frame. Let's say this header is found in byte 512. You now know there is a fully encapsulated frame between bytes 256 and 512 in the stream, and the next frame starts at bye 512.

From this point it's all data and memory management on where and how you want to save the extracted frames.

from broadway.

AnilSonix commented on June 10, 2024

This would be fairly easy if you understand what a h264 stream "looks like," and it's format/structure.

See this SO question.

0x000001 or 0x00000001, is placed at the beginning of each NAL unit.

To extract the frames you would read the stream until you find the beginning of the next NAL unit. So you have a start byte where you identified the end or start of a frame, lets say it's byte 256, you then continue reading the stream until you find the next 0x000001 or 0x00000001, which signifies the beginning of the next frame. Let's say this header is found in byte 512. You now know there is a fully encapsulated frame between bytes 256 and 512 in the stream, and the next frame starts at bye 512.

From this point it's all data and memory management on where and how you want to save the extracted frames.

Thanks for replying.
Looks like I'm not smart enough ☺️ to understand this fully. Could you point me where to get started video processing and codecs etc in general. This is very new to me.

from broadway.

soliton4 commented on June 10, 2024

thats the most detailed answer ever. is there an oscaars of the thread replies? cause u r nominated

…

On Sat, 18 Feb 2023, 20:35 C9, ***@***.***> wrote: @AnilSonix <https://github.com/AnilSonix> Thanks for replying. Looks like I'm not smart enough ☺️ to understand this fully Sorry, I didn't mean it in that manner. When I first approached video streams, and encoding/decoding, it all looked alien. My point was more that if you understand the underlying streams you'll be able to understand what's going on with this mess of media coding and containerization. It's a fairly complex topic, and as our needs of bandwidth savings increase, it will continue to become more complex. But let's forget about that for now and I'll get back on topic: Finding frames - your original question Finding the h264 NALUs (frames) is essentially searching through an array buffer for a pattern, in this case the pattern is the three-byte or four-byte start code. If you don't find the pattern among the current stream data, you append it to an intermediary buffer and continue to read the source stream. This is one pattern to achieve frame parsing. *There are times when the start code is different depending upon the h264 format/profile aswell....see the links and information below*. https://github.com/soliton4/nodeMirror/blob/cf7db884e61919f1efec92fb3601585c8e3c8f12/src/avc/Wgt.js#L291 this is an example where i decode a raw h264 stream and split it on the 0x00000001 markers @soliton4 <https://github.com/soliton4> 's code linked is a great example parsing a stream for NALus 1. Loop the incoming data stream to find a start sequence (00x0,00x0,00x0,00x1) To extrapolate their code against what I said above, they are getting the data from the media source and looping through that array <https://github.com/soliton4/nodeMirror/blob/cf7db884e61919f1efec92fb3601585c8e3c8f12/src/avc/Wgt.js#L305> to find the start sequence. When no start code is found, they append the data to the temp buffer and continue to read from stream. I'm over simplifying here a bit, their code is multithreaded possibly using webworkers and there's a little more going on than what I alluded to. There's also some logic to determine the case where nothing currently exists in the temp buffer and we find a NALu, which would mean it's either the first frame, or we already sent the previous frame to the decoder by the time the start sequence was found. But for all intents and purposes I can simplify the explanation a bit. I've added some comments for clarity and explanation var b = 0; var l = data.length; // get length of the incoming data var zeroCnt = 0; for (b; b < l; ++b){ // for-loop that uses a zeroCnt variable to keep track of contiguous zeros if (data[b] === 0){ zeroCnt++; }else{ if (data[b] == 1){ if (zeroCnt >= 3){ // at least 3 contiguous zeros were found! hit(b - 3); // we send the offset location to the "hit" function so it can process the current temp buffer and combine the frame data break; }; }; zeroCnt = 0; }; }; if (!foundHit){ this.bufferAr.push(data); // No start code was found, continue pushing data to temp buffer }; } 1. So a start code was found while we were looping In the case a start code *is* found we note the exact position it occurs (the offset position in the data buffer) and create a subarray with everything leading up to the offset; everything before the offset position is apart of the previous frame and is concatenated together with the existing temp buffer (bufferAr) and sent to the decoder as a whole frame. The temp buffer is then cleared and everything that was following the offset in the original stream buffer is pushed to the temp buffer to start the loop process over again. var hit = function(offset){ foundHit = true; // pass subarray at the offset where the start code was found self.bufferAr.push(data.subarray(0, offset)); // concat the two arrays and push to the decoder self.decode( concatUint8(self.bufferAr) ); // clear the temp buffer self.bufferAr = []; // Push the second portion of the sliced array to the temp buffer self.bufferAr.push(data.subarray(offset)); }; ------------------------------ Other implementations that might be helpful to see @OllieJones <https://github.com/OllieJones> has a library <https://github.com/OllieJones/h264-interp-utils#nalustream> that has a bunch of H264 functionality including searching arrays/streams for frames. Take a look at that repo as a whole, definitely read the README. I linked to a specific portion of their README because it explains the nuances with H264 streams and how they sometimes have different formats for frame separators. In Ollie's repo they are converting from one media format (webm) to another media container format (mp4) by extracting raw NALus (videoframes + extra data + stream info) from webm and "boxing" those NALUs in mp4's container format. Another implementation in Java <https://github.com/twilightdema/h264j/blob/3dd2cc2e65e653ecbba247ed95a0bff901c98007/h264j/src/main/java/com/twilight/h264/player/H264Player.java> This is a port of the original FFMPEG code back in 2012 I'm using this example because it's a completely different thought pattern on how a frame parser could be architected. They're heavily using bit shifting while looking for the start sequence. The frame parsing in this example starts at this try block <https://github.com/twilightdema/h264j/blob/3dd2cc2e65e653ecbba247ed95a0bff901c98007/h264j/src/main/java/com/twilight/h264/player/H264Player.java#L137-244>. You can see they start reading in the file on L139 and walk back and forth through several while loops to do the decoding while using isEndOfFrame as a decision making point and bit shifting to find the sequence. private boolean isEndOfFrame(int code) { int nal = code & 0x1F; if (nal == NAL_AUD) { foundFrameStart = false; return true; } boolean foundFrame = foundFrameStart; if (nal == NAL_SLICE || nal == NAL_IDR_SLICE) { if (foundFrameStart) { return true; } foundFrameStart = true; } else { foundFrameStart = false; } return foundFrame; } Lastly a project that was inspired by Broadway <https://github.com/oneam/h264bsd> It has wasm, ios, c++ and java h264 decoder variants plus some extra goodies Decent Wikipedia/articles/documentation I have to cut this short for now and step away from the computer for a bit, if you have more questions or anything feel free to ask. Here's some reading to catch you up on H264 formats, and the like. There's more to decoding than identifying the frames, for example depending upon the decoder you need to "prime" the input buffer with a sequence of SPS+PPS+IFrame in order to initialize it so it can determine the video size. @soliton4 <https://github.com/soliton4> I'm not sure if this is true for this decoder. I used it a long time ago and heavily modified it for a specific purpose. I don't even have that code anymore to reference. ------------------------------ Anyways here are some resources on H264 video codec and MP4 containers: *https://stackoverflow.com/a/24890903 <https://stackoverflow.com/a/24890903>* - This is an amazing write up on the H264 formats (Annex B vs AVCC), how they store information, and how they differ. - Bitmovin's ultimate guide to container formats <https://3411032.fs1.hubspotusercontent-na1.net/hubfs/3411032/Bitmovin_UltimateGuidetoContainerFormats_Whitepaper.pdf> - not specifically about H264 but there's great info about media container formats - ITU-T H264 Profile Spec PDF <https://www.itu.int/rec/T-REC-H.264-202108-I/en/wp_h264_31669_en_0803_lo.pdf> - H264 profile list <https://en.wikipedia.org/wiki/Advanced_Video_Coding#Profiles> - *video Codecs - Mozilla MDN* <https://developer.mozilla.org/en-US/docs/Web/Media/Formats/Video_codecs> - *Media Container Formats - Mozilla MDN* <https://developer.mozilla.org/en-US/docs/Web/Media/Formats/Containers> - Network Abastraction Layer (NALu) <https://en.wikipedia.org/wiki/> - Parameter Sets (SPS/PPS) <https://en.wikipedia.org/wiki/Network_Abstraction_Layer#Parameter_Sets> - *Getting frames from an RTSP source explained* <https://stackoverflow.com/a/7668578> - *https://stackoverflow.com/a/7668578 <https://stackoverflow.com/a/7668578>* *Very very simple frame parser implementation* <https://stackoverflow.com/a/74040912> const soi = Buffer.from([0x00, 0x00, 0x00, 0x01]); function findStartFrame(buffer, i = -1) { while ((i = buffer.indexOf(soi, i + 1)) !== -1) { if ((buffer[i + 4] & 0x1F) === 7) return i } return -1 } — Reply to this email directly, view it on GitHub <#241 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIKIROAEQFOOGVMJ2SYUPDWYEQB5ANCNFSM54REHFCA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

from broadway.

AnilSonix commented on June 10, 2024

Thanks for detailed answer.
I will check this out to learn and understand better.

from broadway.

How do I extract frames from a video about broadway HOT 7 CLOSED

Comments (7)

Finding frames - your original question

1. Loop the incoming data stream to find a start sequence (00x0,00x0,00x0,00x1)

1. So a start code was found while we were looping

Other implementations that might be helpful to see

Another implementation in Java

Lastly a project that was inspired by Broadway

Decent Wikipedia/articles/documentation

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent