Comments (7)
Thanks for replying.
Looks like I'm not smart enough☺️ to understand this fully
Sorry, I didn't mean it in that manner. When I first approached video streams, and encoding/decoding, it all looked alien. My point was more that if you understand the underlying streams you'll be able to understand what's going on with this mess of media coding and containerization. It's a fairly complex topic, and as our needs of bandwidth savings increase, it will continue to become more complex. But let's forget about that for now and I'll get back on topic:
Finding frames - your original question
Finding the h264 NALUs (frames) is essentially searching through an array buffer for a pattern, in this case the pattern is the three-byte or four-byte start code. If you don't find the pattern among the current stream data, you append it to an intermediary buffer and continue to read the source stream. This is one pattern to achieve frame parsing. There are times when the start code is different depending upon the h264 format/profile aswell....see the links and information below.
https://github.com/soliton4/nodeMirror/blob/cf7db884e61919f1efec92fb3601585c8e3c8f12/src/avc/Wgt.js#L291
this is an example where i decode a raw h264 stream and split it on the 0x00000001 markers
@soliton4 's code linked is a great example parsing a stream for NALus
1. Loop the incoming data stream to find a start sequence (00x0,00x0,00x0,00x1)
To extrapolate their code against what I said above, they are getting the data from the media source and looping through that array to find the start sequence. When no start code is found, they append the data to the temp buffer and continue to read from stream.
I'm over simplifying here a bit, their code is multithreaded possibly using webworkers and there's a little more going on than what I alluded to. There's also some logic to determine the case where nothing currently exists in the temp buffer and we find a NALu, which would mean it's either the first frame, or we already sent the previous frame to the decoder by the time the start sequence was found. But for all intents and purposes I can simplify the explanation a bit.
I've added some comments for clarity and explanation
var b = 0;
var l = data.length; // get length of the incoming data
var zeroCnt = 0;
for (b; b < l; ++b){ // for-loop that uses a zeroCnt variable to keep track of contiguous zeros
if (data[b] === 0){
zeroCnt++;
}else{
if (data[b] == 1){
if (zeroCnt >= 3){ // at least 3 contiguous zeros were found!
hit(b - 3); // we send the offset location to the "hit" function so it can process the current temp buffer and combine the frame data
break;
};
};
zeroCnt = 0;
};
};
if (!foundHit){
this.bufferAr.push(data); // No start code was found, continue pushing data to temp buffer
};
}
1. So a start code was found while we were looping
In the case a start code is found we note the exact position it occurs (the offset position in the data
buffer) and create a subarray with everything leading up to the offset; everything before the offset position is apart of the previous frame and is concatenated together with the existing temp buffer (bufferAr
) and sent to the decoder as a whole frame. The temp buffer is then cleared and everything that was following the offset in the original stream buffer is pushed to the temp buffer to start the loop process over again.
var hit = function(offset){
foundHit = true;
// pass subarray at the offset where the start code was found
self.bufferAr.push(data.subarray(0, offset));
// concat the two arrays and push to the decoder
self.decode( concatUint8(self.bufferAr) );
// clear the temp buffer
self.bufferAr = [];
// Push the second portion of the sliced array to the temp buffer
self.bufferAr.push(data.subarray(offset));
};
Other implementations that might be helpful to see
@OllieJones has a library that has a bunch of H264 functionality including searching arrays/streams for frames. Take a look at that repo as a whole, definitely read the README. I linked to a specific portion of their README
because it explains the nuances with H264 streams and how they sometimes have different formats for frame separators.
In Ollie's repo they are converting from one media format (webm) to another media container format (mp4) by extracting raw NALus (videoframes + extra data + stream info) from webm and "boxing" those NALUs in mp4's container format.
Another implementation in Java
This is a port of the original FFMPEG code back in 2012
I'm using this example because it's a completely different thought pattern on how a frame parser could be architected. They're heavily using bit shifting while looking for the start sequence.
The frame parsing in this example starts at this try
block. You can see they start reading in the file on L139 and walk back and forth through several while loops to do the decoding while using isEndOfFrame
as a decision making point and bit shifting to find the sequence.
private boolean isEndOfFrame(int code) {
int nal = code & 0x1F;
if (nal == NAL_AUD) {
foundFrameStart = false;
return true;
}
boolean foundFrame = foundFrameStart;
if (nal == NAL_SLICE || nal == NAL_IDR_SLICE) {
if (foundFrameStart) {
return true;
}
foundFrameStart = true;
} else {
foundFrameStart = false;
}
return foundFrame;
}
Lastly a project that was inspired by Broadway
It has wasm, ios, c++ and java h264 decoder variants plus some extra goodies
Decent Wikipedia/articles/documentation
I have to cut this short for now and step away from the computer for a bit, if you have more questions or anything feel free to ask. Here's some reading to catch you up on H264 formats, and the like. There's more to decoding than identifying the frames, for example depending upon the decoder you need to "prime" the input buffer with a sequence of SPS+PPS+IFrame in order to initialize it so it can determine the video size.
@soliton4 I'm not sure if this is true for this decoder. I used it a long time ago and heavily modified it for a specific purpose. I don't even have that code anymore to reference.
Anyways here are some resources on H264 video codec and MP4 containers:
https://stackoverflow.com/a/24890903 - This is an amazing write up on the H264 formats (Annex B vs AVCC), how they store information, and how they differ.
-
Bitmovin's ultimate guide to container formats - not specifically about H264 but there's great info about media container formats
Very very simple frame parser implementation
const soi = Buffer.from([0x00, 0x00, 0x00, 0x01]);
function findStartFrame(buffer, i = -1) {
while ((i = buffer.indexOf(soi, i + 1)) !== -1) {
if ((buffer[i + 4] & 0x1F) === 7) return i
}
return -1
}
from broadway.
this is an example where i decode a raw h264 stream and split it on the 0x00000001 markers
it is best to feed complete nals to the decoder, however i believe the latest version is doing nal splitting internaly
from broadway.
thats the most detailed answer ever. is there an oscaars of the thread replies? cause u r nominated
…
Haha, this is one of those fields that's difficult to understand. If I can help some poor soul along I will.
@AnilSonix No problem, and good luck!
from broadway.
This would be fairly easy if you understand what a h264 stream "looks like," and it's format/structure.
0x000001 or 0x00000001, is placed at the beginning of each NAL unit.
To extract the frames you would read the stream until you find the beginning of the next NAL unit. So you have a start byte where you identified the end or start of a frame, lets say it's byte 256, you then continue reading the stream until you find the next 0x000001 or 0x00000001,
which signifies the beginning of the next frame. Let's say this header is found in byte 512. You now know there is a fully encapsulated frame between bytes 256 and 512 in the stream, and the next frame starts at bye 512.
From this point it's all data and memory management on where and how you want to save the extracted frames.
from broadway.
This would be fairly easy if you understand what a h264 stream "looks like," and it's format/structure.
0x000001 or 0x00000001, is placed at the beginning of each NAL unit.
To extract the frames you would read the stream until you find the beginning of the next NAL unit. So you have a start byte where you identified the end or start of a frame, lets say it's byte 256, you then continue reading the stream until you find the next
0x000001 or 0x00000001,
which signifies the beginning of the next frame. Let's say this header is found in byte 512. You now know there is a fully encapsulated frame between bytes 256 and 512 in the stream, and the next frame starts at bye 512.From this point it's all data and memory management on where and how you want to save the extracted frames.
Thanks for replying.
Looks like I'm not smart enough
from broadway.
from broadway.
Thanks for detailed answer.
I will check this out to learn and understand better.
from broadway.
Related Issues (20)
- Why isn't nalu slice code redundant? HOT 7
- Frame rendering without buffering, minimizing delay HOT 6
- Compiler setting NO_BROWSER=1 is ignored HOT 7
- Memory limit issue when decoding 4K video HOT 4
- Capture a screen shot from broadway canvas HOT 1
- Seu perfil é muito legal
- Legal 😎
- Antistress!
- Access Unit Boundary Check HOT 5
- The Broadway render engine fully doesn't show the original colors
- Which image format is returned by onPictureDecoded?
- WebGL: INVALID_OPERATION: texImage2D: ArrayBufferView not big enough for request HOT 1
- Hai
- How to use with npm? commonjs? import/require? webpack?
- 36977821
- Pycharm download windows or macOS or linux HOT 1
- Watch "error [レッドゾーン]" on YouTube
- Cannot enlarge memory arrays.
- onPictureDecode not firing the callback HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from broadway.