Giter VIP home page Giter VIP logo

Comments (7)

J7a4s0m5ine avatar J7a4s0m5ine commented on June 10, 2024 2

@AnilSonix

Thanks for replying.
Looks like I'm not smart enough ☺️ to understand this fully

Sorry, I didn't mean it in that manner. When I first approached video streams, and encoding/decoding, it all looked alien. My point was more that if you understand the underlying streams you'll be able to understand what's going on with this mess of media coding and containerization. It's a fairly complex topic, and as our needs of bandwidth savings increase, it will continue to become more complex. But let's forget about that for now and I'll get back on topic:

Finding frames - your original question

Finding the h264 NALUs (frames) is essentially searching through an array buffer for a pattern, in this case the pattern is the three-byte or four-byte start code. If you don't find the pattern among the current stream data, you append it to an intermediary buffer and continue to read the source stream. This is one pattern to achieve frame parsing. There are times when the start code is different depending upon the h264 format/profile aswell....see the links and information below.

https://github.com/soliton4/nodeMirror/blob/cf7db884e61919f1efec92fb3601585c8e3c8f12/src/avc/Wgt.js#L291
this is an example where i decode a raw h264 stream and split it on the 0x00000001 markers

@soliton4 's code linked is a great example parsing a stream for NALus

1. Loop the incoming data stream to find a start sequence (00x0,00x0,00x0,00x1)

To extrapolate their code against what I said above, they are getting the data from the media source and looping through that array to find the start sequence. When no start code is found, they append the data to the temp buffer and continue to read from stream.

I'm over simplifying here a bit, their code is multithreaded possibly using webworkers and there's a little more going on than what I alluded to. There's also some logic to determine the case where nothing currently exists in the temp buffer and we find a NALu, which would mean it's either the first frame, or we already sent the previous frame to the decoder by the time the start sequence was found. But for all intents and purposes I can simplify the explanation a bit.

I've added some comments for clarity and explanation

  var b = 0;
  var l = data.length;     // get length of the incoming data
  var zeroCnt = 0;
  for (b; b < l; ++b){    // for-loop that uses a zeroCnt variable to keep track of contiguous zeros
    if (data[b] === 0){
      zeroCnt++;
    }else{
      if (data[b] == 1){
        if (zeroCnt >= 3){   // at least 3 contiguous zeros were found!
          hit(b - 3);     // we send the offset location to the "hit" function so it can process the current temp buffer and combine the frame data
          break;
        };
      };
      zeroCnt = 0;
    };
  };
  if (!foundHit){
    this.bufferAr.push(data);    // No start code was found, continue pushing data to temp buffer
  };
  
}

1. So a start code was found while we were looping

In the case a start code is found we note the exact position it occurs (the offset position in the data buffer) and create a subarray with everything leading up to the offset; everything before the offset position is apart of the previous frame and is concatenated together with the existing temp buffer (bufferAr) and sent to the decoder as a whole frame. The temp buffer is then cleared and everything that was following the offset in the original stream buffer is pushed to the temp buffer to start the loop process over again.

var hit = function(offset){
  foundHit = true;

  // pass subarray at the offset where the start code was found
  self.bufferAr.push(data.subarray(0, offset));
  // concat the two arrays  and push to the decoder
  self.decode( concatUint8(self.bufferAr) );
  // clear the temp buffer
  self.bufferAr = [];                                            
  // Push the second portion of the sliced array to the temp buffer
  self.bufferAr.push(data.subarray(offset));          
};

Other implementations that might be helpful to see

@OllieJones has a library that has a bunch of H264 functionality including searching arrays/streams for frames. Take a look at that repo as a whole, definitely read the README. I linked to a specific portion of their README because it explains the nuances with H264 streams and how they sometimes have different formats for frame separators.

In Ollie's repo they are converting from one media format (webm) to another media container format (mp4) by extracting raw NALus (videoframes + extra data + stream info) from webm and "boxing" those NALUs in mp4's container format.

Another implementation in Java

This is a port of the original FFMPEG code back in 2012

I'm using this example because it's a completely different thought pattern on how a frame parser could be architected. They're heavily using bit shifting while looking for the start sequence.

The frame parsing in this example starts at this try block. You can see they start reading in the file on L139 and walk back and forth through several while loops to do the decoding while using isEndOfFrame as a decision making point and bit shifting to find the sequence.

private boolean isEndOfFrame(int code) {
int nal = code & 0x1F;

if (nal == NAL_AUD) {
	foundFrameStart = false;
	return true;
}

boolean foundFrame = foundFrameStart;
if (nal == NAL_SLICE || nal == NAL_IDR_SLICE) {
	if (foundFrameStart) {
		return true;
	}
	foundFrameStart = true;
} else {
	foundFrameStart = false;
}

return foundFrame;
}

Lastly a project that was inspired by Broadway

It has wasm, ios, c++ and java h264 decoder variants plus some extra goodies

Decent Wikipedia/articles/documentation

I have to cut this short for now and step away from the computer for a bit, if you have more questions or anything feel free to ask. Here's some reading to catch you up on H264 formats, and the like. There's more to decoding than identifying the frames, for example depending upon the decoder you need to "prime" the input buffer with a sequence of SPS+PPS+IFrame in order to initialize it so it can determine the video size.

@soliton4 I'm not sure if this is true for this decoder. I used it a long time ago and heavily modified it for a specific purpose. I don't even have that code anymore to reference.


Anyways here are some resources on H264 video codec and MP4 containers:

https://stackoverflow.com/a/24890903 - This is an amazing write up on the H264 formats (Annex B vs AVCC), how they store information, and how they differ.

Very very simple frame parser implementation

const soi = Buffer.from([0x00, 0x00, 0x00, 0x01]);
function findStartFrame(buffer, i = -1) {
    while ((i = buffer.indexOf(soi, i + 1)) !== -1) {
        if ((buffer[i + 4] & 0x1F) === 7) return i
    }
    return -1
}

from broadway.

soliton4 avatar soliton4 commented on June 10, 2024 1

https://github.com/soliton4/nodeMirror/blob/cf7db884e61919f1efec92fb3601585c8e3c8f12/src/avc/Wgt.js#L291

this is an example where i decode a raw h264 stream and split it on the 0x00000001 markers
it is best to feed complete nals to the decoder, however i believe the latest version is doing nal splitting internaly

from broadway.

J7a4s0m5ine avatar J7a4s0m5ine commented on June 10, 2024 1

thats the most detailed answer ever. is there an oscaars of the thread replies? cause u r nominated

Haha, this is one of those fields that's difficult to understand. If I can help some poor soul along I will.

@AnilSonix No problem, and good luck!

from broadway.

J7a4s0m5ine avatar J7a4s0m5ine commented on June 10, 2024

This would be fairly easy if you understand what a h264 stream "looks like," and it's format/structure.

See this SO question.

0x000001 or 0x00000001, is placed at the beginning of each NAL unit.

To extract the frames you would read the stream until you find the beginning of the next NAL unit. So you have a start byte where you identified the end or start of a frame, lets say it's byte 256, you then continue reading the stream until you find the next 0x000001 or 0x00000001, which signifies the beginning of the next frame. Let's say this header is found in byte 512. You now know there is a fully encapsulated frame between bytes 256 and 512 in the stream, and the next frame starts at bye 512.

From this point it's all data and memory management on where and how you want to save the extracted frames.

from broadway.

AnilSonix avatar AnilSonix commented on June 10, 2024

This would be fairly easy if you understand what a h264 stream "looks like," and it's format/structure.

See this SO question.

0x000001 or 0x00000001, is placed at the beginning of each NAL unit.

To extract the frames you would read the stream until you find the beginning of the next NAL unit. So you have a start byte where you identified the end or start of a frame, lets say it's byte 256, you then continue reading the stream until you find the next 0x000001 or 0x00000001, which signifies the beginning of the next frame. Let's say this header is found in byte 512. You now know there is a fully encapsulated frame between bytes 256 and 512 in the stream, and the next frame starts at bye 512.

From this point it's all data and memory management on where and how you want to save the extracted frames.

Thanks for replying.
Looks like I'm not smart enough ☺️ to understand this fully. Could you point me where to get started video processing and codecs etc in general. This is very new to me.

from broadway.

soliton4 avatar soliton4 commented on June 10, 2024

from broadway.

AnilSonix avatar AnilSonix commented on June 10, 2024

Thanks for detailed answer.
I will check this out to learn and understand better.

from broadway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.