Giter VIP home page Giter VIP logo

Comments (11)

alebzk avatar alebzk commented on August 22, 2024 4

While waiting for an answer, I've checked the code better and managed to answer some of my questions. Plus, I got new ones.

This is what I've learned:

  • Using a [0.25, 0.5, 0.25] window for downsampling from 48k to 24k is indeed cheap, but it leads to a low pass filter with no steep decay; however, this should be ok for pitch estimation.
  • The autocorrelation with a maximum lag of 4 is computed since the pitch is estimated not on the audio frames directly, but on their LP residuals.
  • The estimated inverse filter coefficients used to compute the LP residuals are corrected assuming -40 dBFS noise floor; ac[0] *= 1.0001f accounts for the reduction done in ac[i] -= ac[i]*(.008f*i)*(.008f*i); the correction factors are modeled as such assuming white noise, the correlation of which follows the (0.008/i)^2 law.
  • pitch_downsample() performs both the 2x decimation and the LP residual extraction, whereas pitch_search() finds the best and the second best pitch candidates - which are further refined in remove_doubling().

My remaining questions are the following:

  • The pitch estimation is performed at 12 kHz and 24 kHz; is it truly needed? Could one use a lower sample rate?
  • Why is pitch_search() called passing PITCH_MAX_PERIOD-3*PITCH_MIN_PERIOD instead of PITCH_MAX_PERIOD for the max_pitch argument?
  • In find_best_pitch() the numerators are xcorr[i]^2, why not just xcorr[i]?
  • remove_doubling() uses second_check[16] = {0, 0, 3, 2, 3, 2, 5, 2, 3, 2, 3, 2, 5, 2, 3, 2}, it looks like these are multipliers used to look for specific (sub)harmonics; where do those values come from?
  • remove_doubling() looks at higher harmonics of the initial estimated pitch period, hence removing pitch period doubling errors; why there's no removal for halving errors?

Alessio

from rnnoise.

jmvalin avatar jmvalin commented on August 22, 2024 3

@alebzk the pitch estimation code is mostly copied from Opus, so it's quite possible everything in there isn't optimal for rnnoise. The idea as you seem to have guessed is to compute the auto-correlation on the residual. The (.008fi)(.008f*i) term here is means to approximate a Gaussian for lag windowing (stabilizes the LPC analysis). As for the c1 term, it's meant to convolve the LPC filter with a slight low-pass filter to make the analysis better. I'm sure it would be possible to run the pitch even lower than 12 kHz, but that seemed fine for the use I had.

Why is pitch_search() called passing PITCH_MAX_PERIOD-3*PITCH_MIN_PERIOD instead of PITCH_MAX_PERIOD for the max_pitch argument?

This avoids searching for very short periods, which can sometimes cause false detection due to formants. The very short periods are searched through remove_doubling()

In find_best_pitch() the numerators are xcorr[i]^2, why not just xcorr[i]

Yes, because we want to maximize xy/sqrt(xx*yy) but rather than compute a sqrt(), we just square everything and since the xx term is constant, we only need to maximize xy^2/yy

remove_doubling() uses second_check[16] = {0, 0, 3, 2, 3, 2, 5, 2, 3, 2, 3, 2, 5, 2, 3, 2}, it looks like these are multipliers used to look for specific (sub)harmonics; where do those values come from?

They were hand-tuned to look for expected peaks that wouldn't be expected for a different pitch period. For example, if there's a peak at T/6, then we can expect one at 5*T/6, which is a position you wouldn't expect to find a peak if the period was T/3.

remove_doubling() looks at higher harmonics of the initial estimated pitch period, hence removing pitch period doubling errors; why there's no removal for halving errors?

Period doubling (or tripling) is a common error for auto-correlation-based pitch estimators since if there's a periodicity at T, then there's also going to be a periodicity at 2T and 3T, ... OTOH, there won't be a periodicity at T/2.

from rnnoise.

alebzk avatar alebzk commented on August 22, 2024

Hi again,

It looks like the pitch estimation method you implemented is SIFT [1]. Right?
If there are other relevant details on the exact method, could you explain and/or share a reference?

Alessio

[1] Markel, J. (1972). The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, 20(5), 367-377.

from rnnoise.

alebzk avatar alebzk commented on August 22, 2024

Thanks so much for the detailed answer!

from rnnoise.

zhly0 avatar zhly0 commented on August 22, 2024

hi,@alebzk:
from your issue,I learned a lot,I am new to pitch detection.If my training sample's sample rate is 16000,is the macro define PITCH_BUF_SIZE need to change?What is the relation between the sample rate and the PITCH_BUF_SIZE?
expecting your reply,Thanks!

from rnnoise.

alebzk avatar alebzk commented on August 22, 2024

Hi @zhly0,

@jmvalin can surely say more, but I'm happy to give a first answer.

Yes. PITCH_BUF_SIZE unit is number of samples at 48 kHz; hence, it has to be adapted if the sample rate changes. However, note that the whole pitch estimation algorithm is kind of specific for the 48k case (there are 2 downsampling steps and the estimated pitch is used to compute spectral the cross-correlation features). If you want to avoid resampling from 16k to 48k, then part of the code must be adapted.

Alessio

from rnnoise.

teslam avatar teslam commented on August 22, 2024

@alebzk thanks for your write up, very helpful (altough im just half way through). (and thanks to jmvalin ofcourse!)

I dont understand your comment: "The autocorrelation with a maximum lag of 4 is computed since the pitch is estimated not on the audio frames directly, but on their LP residuals." I dont understand where the residual is calculated (residual as in difference between x_lp and lpcestimated).
Are you referring to:
opus_val32 sum = SHL32(EXTEND32(x[i]), SIG_SHIFT)?

Thanks again

from rnnoise.

jmvalin avatar jmvalin commented on August 22, 2024

See this call:
celt_fir5(x_lp, lpc2, x_lp, len>>1, mem);
in pitch_downsample.c

from rnnoise.

teslam avatar teslam commented on August 22, 2024

@jmvalin thanks! I have gone through that function plenty of times but double checked now when you wrote, and I think my problem comes from the fact that lpc coefficients are defined with opposite sign compared to how I thought they were.

I now assume that you define them as e.g. Matlab do: https://se.mathworks.com/help/signal/ref/lpc.html.

Thanks.

from rnnoise.

shakingWaves avatar shakingWaves commented on August 22, 2024

@jmvalin @alebzk I do not understand the pitch estimation method,especially, the "pitch_search" function in pitch.c. can you share me some details?

from rnnoise.

zuowanbushiwo avatar zuowanbushiwo commented on August 22, 2024

@shakingWaves @zhly0 @jmvalin @alebzk
I also do not understand the pitch estimation method, If my training sample's sample rate is 16000, how to change pitch_downsample,pitch_search,remove_doubling those funtions?
Thanks!

from rnnoise.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.