Giter VIP home page Giter VIP logo

Comments (8)

fzwoch avatar fzwoch commented on May 31, 2024

Audio/video sync is always done via timestamps. It is quite hard to find decent documentation on OBS timestamp handling expectation. It could be it just expects a stream starting from 0.

I actually only ever tested its encoding functionality - never investigated the sync. The code block in question is here:

	packet->pts = GST_BUFFER_PTS(buffer);
	packet->dts = GST_BUFFER_DTS(buffer);

	// this is a bit wonky?
	packet->pts /=
		GST_SECOND / (packet->timebase_den / packet->timebase_num);
	packet->dts /=
		GST_SECOND / (packet->timebase_den / packet->timebase_num);

As you can see.. I almost anticipated issues here :-)

So one would have to experiment a bit with these. Like:

  • What is the first timestamps
  • What happens if hard coded to 0

from obs-gstreamer.

pgwipeout avatar pgwipeout commented on May 31, 2024

So this doesn't appear to be the issue.
I've confirmed the issue is exactly 1 second with an audio sync test video.

I added prints to print both during encoding, and they always match.

H264: Profile = 66, Level = 40 
info: Encoder PTS '1999999980'
info: Encoder DTS '1999999980'
info: Encoder STEP
info: Encoder PTS '60'
info: Encoder DTS '60'
info: Encoder CYCLE

Messing with these values does little to help the audio latency before the video becomes unusable.
I've added a buffer into the stream and all it does is delay the whole package.

I think the issue is actually the latency it takes for audio conversion to happen.

info: libfdk_aac encoder created
info: libfdk_aac bitrate: 128, channels: 2
info: [rtmp stream: 'adv_stream'] Connecting to RTMP URL1
info: [rtmp stream: 'adv_stream'] Connection to rtmps:
info: ==== Streaming Start ===============================================
Framerate set to : 30 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 4 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
H264: Profile = 66, Level = 40 
info: Encoder PTS '1999999980'

I think we need to expose gstreamer as an audio encoder as well, similar to how ffmpeg does.
That way we can control the audio and video streams directly.

from obs-gstreamer.

fzwoch avatar fzwoch commented on May 31, 2024

Not exactly sure what you are printing there. At least PTS and DTS are expected to be the same in cases you are not using B-frames.

I would be worried about the timestamps not being monotonic increasing:

info: Encoder PTS '1999999980'
info: Encoder DTS '1999999980'
info: Encoder STEP
info: Encoder PTS '60'
info: Encoder DTS '60'

I would double check what timestamps are going into the encoder and which ones coming out and are written as timestamps in the encoder_packet.

E.g. On the input side it is merely a frame counter, and on the out side as well (at least for my 30 fps test case). For GStreamer they just get converted to GStreamer timescale.

So on the input side I have something like:

1
2
3

and on the outside

108000001
108000002
108000003

The outside carries an offset [1000 hours] due to frame ordering - an internal thing. I briefly tried to lurk into the OBS handling of the encoded data - It seems it saves the first received timestamps of video and saves it as an offset. That seems to be fine with my current implementation. I guess the same is done for audio. So the most likely occurrence of sync problems is that one of the channels gets more/older data handled to either encoder.

I think the issue is actually the latency it takes for audio conversion to happen.

That does not make such sense. Audio and video processing times a very different when compared to each other. Video should have a lot higher latency than audio - always. Also - audio and video are always completely independent from each other. To bring them into relation and being sync to each other always involves timestamps.

I tried a test with https://www.youtube.com/watch?v=TjAa0wOe5k4 and the Linux VA-API encoder on a Radeon GPU. The result was perfectly in sync.

from obs-gstreamer.

pgwipeout avatar pgwipeout commented on May 31, 2024

They are monotonic increasing, it simply starts at 60 for some reason and increases from there.
I had both the pts and dts print before and after conversion from gstreamer-encoder.c.
Since it started at 60, I tried subtracting 60, and also adding 60 to both.
Either one adjusted the audio by about 1/10th of a second, beyond that the video fails to decode.
I've also tried modifying only one, but that has a rapid degradation of video quality.

info: Encoder PTS '51499999485'
info: Encoder DTS '51499999485'
info: Encoder STEP
info: Encoder PTS '1545'
info: Encoder DTS '1545'
info: Encoder CYCLE
info: Encoder PTS '51533332818'
info: Encoder DTS '51533332818'
info: Encoder STEP
info: Encoder PTS '1546'
info: Encoder DTS '1546'
info: Encoder CYCLE

That does not make such sense. Audio and video processing times a very different when compared to each other. Video should have a lot higher latency than audio - always. Also - audio and video are always completely independent from each other. To bring them into relation and being sync to each other always involves timestamps.

We are doing video encoding on hardware while software is handling audio encoding.
On x86 where the horsepower is enough to handle this seamlessly this isn't an issue.
Unfortunately we are running on arm64, so by handling the video in the hardware and the audio in the software we have created a situation where video is actually faster than audio.
Especially since the built in audio encoder heavily used x86 intrinsics, which are handled on arm64 by a x86->arm64 intrinsics shim.

The issue does not manifest in software encoding for video and audio.

I've also dug into the handoff from obs to an encoder, and I don't see an easy solution without either modifying obs directly or handing audio control off to us as well.

from obs-gstreamer.

fzwoch avatar fzwoch commented on May 31, 2024

They are monotonic increasing,

I see, so the small ones are on the input side and the bigger ones on the output side, similar to me? May be interesting to see what the time base in your case is (although I would assume it is also 1/30, in case you are recording at 30 fps).

we have created a situation where video is actually faster than audio.

That really should not make a difference. As long as there is enough CPU power to encode in "real-time". The latencies is what can make a difference. But latency differences (in either direction) of like 4 seconds should be easily handled by the muxer downstream.

The encoder API is rather "dumb". It gets data with timestamps.. and returns data with timestamps. All other logic must be handled by OBS. But see below..

Since it started at 60

Okay.. this is suspicious. I would expect it to start at 1. That could mean that the first rendered 60 frames went somewhere, but not into the encoder. Since OBS looks at the first time stamp and treats that as timestamp 0 (so the first audio and video data are aligned) that could mean there is video data being lost. Since they get aligned at the start this would result in the audio to appear late, while actually the video is just early.

At 30 fps and first timestamp of 60 I would expect the latency to be 2 seconds though..

[EDIT: Or do you print the incoming PTS only when data gets pulled from the encoder? It should be printed before it it pushed.]

Silly illustration - here this may be the audio and video data as they currently are coming in:

v ----------~~~~~~~~~~~~~~~~~~~~
a ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is what after encoding OBS treats it like:

v ~~~~~~~~~~~~~~~~~~~~
a ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since audio and video get aligned.. it closes the gap (of video) at the beginning. Correctly would be that this gap is kept in the stream - most recording formats can do this.. but I guess for simplicity and streaming protocols this is the simpler approach.

So the main question is: Why is the first timestamp 60? What happened with the missing data? Or did OBS set this internally due to some other latency compensation?

from obs-gstreamer.

pgwipeout avatar pgwipeout commented on May 31, 2024

Okay, this all makes sense.

I see, so the small ones are on the input side and the bigger ones on the output side, similar to me? May be interesting to see what the time base in your case is (although I would assume it is also 1/30, in case you are recording at 30 fps).

Correct, and correct, with the step and cycle being my personal syncpoints so I knew where it was in the code.

Is it possible that gstreamer-encoder is telling OBS it is ready to receive data before it actually is?

I'll run through the OBS code and see if I can figure out how to tell me where those 60 frames went.

from obs-gstreamer.

fzwoch avatar fzwoch commented on May 31, 2024

Is it possible that gstreamer-encoder is telling OBS it is ready to receive data before it actually is?

I don't think so. I would assume (but don't know for sure) that gstreamer_encoder_create() is called synchronously. When that is run, the encoder is ready. I guess only after that the actual encoding process is started by OBS (also not confirmed).

Double check that PTS DTS you are printing. Note that there is delay in calls when something is pushed into the encoder and actually being pulled out (the data being pulled out has been pushed a couple of calls earlier, in _most_cases).

Also double check the test case. Like I did - media source and the sync video file. Just to rule out eventual capture buffer issues. E.g. if you capture some game or video running somewhere else - it may be that the audio capture buffer has a buffer of a second still stored. That could lead to the same observed behavior. Although if you say software encoding is fine - this is probably not the issue you are observing..

from obs-gstreamer.

fzwoch avatar fzwoch commented on May 31, 2024

Closing unless we have some deeper idea if something is really wrong.

from obs-gstreamer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.