Giter VIP home page Giter VIP logo

Comments (9)

sevagh avatar sevagh commented on May 26, 2024

First of all, thank you very much for your code, and for licensing it MIT :)

No prob. Actually somebody e-mailed me to ask for that (unless that was you) so thanks to him as well 👍

Most FFT libraries, including fftw3, are optimized for FFT sizes that are powers of 2.

That's very interesting, I didn't know that. Do you want to make a PR? I also wonder if I should add assert (N % 2 == 0) to libxcorr to enforce performant calls to FFTW3.

n is the size of the transform. It can be any positive integer. FFTW is best at handling sizes of the form 2a 3b 5c 7d 11e 13f, where e+f is either 0 or 1, and the other exponents are arbitrary. Other sizes are computed by means of a slow, general-purpose routine (which nevertheless retains O(n lg n) performance, even for prime sizes). (It is possible to customize FFTW for different array sizes. See Section Installation and Customization, for more information.) Transforms whose sizes are powers of 2 are especially fast.
http://www.fftw.org/fftw2_doc/fftw_3.html

from pitch-detection.

CrushedPixel avatar CrushedPixel commented on May 26, 2024

I've written my own codebase based on your implementation, so I haven't actually modified your code, which is why I didn't create a PR in the first place. While the assertion you're suggesting makes sense, it's obviously only effective if the actual size of the FFT, N2, which is based on N, is even as well.

Regarding the License: I'm not sure whether you are actually allowed to license your code under MIT, given that it uses FFTW3, which is licensed under GPL, which spreads like cancer and requires you to license your work under GPL as well. Therefore, I propose switching to a different FFT library with a license compatible to MIT, for example Kiss FFT.
Note that Kiss FFT is less performant than FFTW3, however it's still suitable for real-time pitch estimation of audio when working with even FFT sizes.

Here's my autocorrelation function based on xcorr, which you're free to use however you wish. Note that I changed the input to only a single vector of samples, as we're only calculating autocorrelation for MPM anyway. If that doesn't suit your use case, you can easily revert that change. Note that I also changed the calculation of N2, which is why you need to change your size2 variable inside the normalized_square_difference function to 2 * size as well. Again, I'm not sure whether the modification of N2 leads to mathematically equivalent results, however my tests did not show a noticeable difference in the pitch estimation.
Also, I changed the FFT library to Kiss FFT, so it is suitable for use with the MIT license. However, Kiss FFT uses float values instead of doubles by default, although according to their docs, it should be possible to change that. I just haven't been able to do so since I'm pretty new to C++ and its non-existent build and dependency system.

#include <kiss_fft.h> // in the header file

void autocorrelation(const std::vector<float> &data, std::complex<float> *result) {
	size_t n = data.size();
	size_t n2 = 2 * n;

	std::complex<float> signala_ext[n2];
	std::complex<float> signalb_ext[n2];

	// zero padding of input data
	for (int i = 0; i < n; i++) {
		signala_ext[n + i] = data[i];
		signalb_ext[i] = data[i];
	}

	std::complex<float> outa[n2];
	std::complex<float> outb[n2];
	std::complex<float> out[n2];

	auto fftCfgForward = kiss_fft_alloc(n2, false, nullptr, nullptr);
	auto fftCfgBackward = kiss_fft_alloc(n2, true, nullptr, nullptr);
	if (!fftCfgForward || !fftCfgBackward) {
		throw std::bad_alloc();
	}

	kiss_fft(fftCfgForward, (kiss_fft_cpx *) signala_ext, (kiss_fft_cpx *) outa);
	kiss_fft(fftCfgForward, (kiss_fft_cpx *) signalb_ext, (kiss_fft_cpx *) outb);

	for (int i = 0; i < n2; ++i) {
		out[i] = outa[i] * conj(outb[i]);
	}

	kiss_fft(fftCfgBackward, (kiss_fft_cpx *) out, (kiss_fft_cpx *) result);

	free(fftCfgForward);
	free(fftCfgBackward);
}

You could optimize this even more by making a class for correlation and frequency calculation, and creating the fftCfgForward and fftCfgBackward in the constructor and freeing it in the destructor, however the overhead of the allocation should be minimal enough so it doesn't matter.

from pitch-detection.

sevagh avatar sevagh commented on May 26, 2024

Lots of good info here. I will explore well-licensed, simpler, more concise correlation and/or FFT libraries and stop depending on libxcorr in this project.

from pitch-detection.

sevagh avatar sevagh commented on May 26, 2024

FYI if you're interested, my Python code for MPM is much more solid (using scipy, numpy, and numba) and with those well-optimized libraries it's probably pretty dang performant: https://github.com/sevagh/transcribe/blob/master/transcribe/pitch.py#L9

from pitch-detection.

sevagh avatar sevagh commented on May 26, 2024

I finally got some numbers to prove it. I simplified things by taking the xcorr code from libxcorr and embedding it directly inside this repo.

When N2 = 2 * N - 1, which is the minimum for auto-correlation to work correctly, but not necessarily a power of 2:

sevagh:pitch-detection $ hyperfine --min-runs 10 './bin/sinewave --freq 1337 --algo mpm --size 64000'
Benchmark #1: ./bin/sinewave --freq 1337 --algo mpm --size 64000

  Time (mean ± σ):      66.8 ms ±  11.3 ms    [User: 57.8 ms, System: 8.1 ms]

  Range (min … max):    50.3 ms …  85.2 ms

With this bit of code to find the next power of 2:

	/* https://stackoverflow.com/questions/466204/rounding-up-to-next-power-of-2 */
	if (N2 & (N2 - 1)) {
		N2--;
		N2 |= N2 >> 1;
		N2 |= N2 >> 2;
		N2 |= N2 >> 4;
		N2 |= N2 >> 8;
		N2 |= N2 >> 16;
		N2++;
	}

	assert(!(N2 & (N2 - 1)));

results are faster:

sevagh:pitch-detection $ hyperfine --min-runs 10 './bin/sinewave --freq 1337 --
algo mpm --size 64000'
Benchmark #1: ./bin/sinewave --freq 1337 --algo mpm --size 64000

  Time (mean ± σ):      24.7 ms ±   2.6 ms    [User: 18.0 ms, System: 6.4 ms]

  Range (min … max):    21.0 ms …  34.9 ms

sevagh:pitch-detection $

from pitch-detection.

CrushedPixel avatar CrushedPixel commented on May 26, 2024

Yes they are faster, but do they also result in the same pitch estimation for the same input files?

from pitch-detection.

sevagh avatar sevagh commented on May 26, 2024

Power of 2:

sevagh:pitch-detection $ for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done
Size: 1025      freq: 1337      pitch: 1337.11
Size: 4195      freq: 1337      pitch: 1337.3
Size: 8192      freq: 1337      pitch: 1337.06
Size: 16783     freq: 1337      pitch: 1336.99
Size: 1025      freq: 24134     pitch: 23996.5
Size: 4195      freq: 24134     pitch: 23997.5
Size: 8192      freq: 24134     pitch: 23997.9
Size: 16783     freq: 24134     pitch: 23998
Size: 1025      freq: 95.5      pitch: 95.6076
Size: 4195      freq: 95.5      pitch: 95.7174
Size: 8192      freq: 95.5      pitch: 95.6404
Size: 16783     freq: 95.5      pitch: 95.529
Size: 1025      freq: 179.3     pitch: 180.621
Size: 4195      freq: 179.3     pitch: 179.549
Size: 8192      freq: 179.3     pitch: 179.356
Size: 16783     freq: 179.3     pitch: 179.367
Size: 1025      freq: 354       pitch: 354.2
Size: 4195      freq: 354       pitch: 354.06
Size: 8192      freq: 354       pitch: 354.034
Size: 16783     freq: 354       pitch: 354.078
Size: 1025      freq: 666       pitch: 667.004
Size: 4195      freq: 666       pitch: 666.217
Size: 8192      freq: 666       pitch: 666.125
Size: 16783     freq: 666       pitch: 666.055
sevagh:pitch-detection $
sevagh:pitch-detection $
sevagh:pitch-detection $ hyperfine --min-runs 10 'for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000;
done; done'
Benchmark #1: for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done

  Time (mean ± σ):     309.0 ms ±   9.8 ms    [User: 241.0 ms, System: 61.8 ms]

  Range (min … max):   295.2 ms … 325.0 ms

No power-of-2:

sevagh:pitch-detection $ hyperfine --min-runs 10 'for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done'
Benchmark #1: for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done

  Time (mean ± σ):     487.6 ms ±   8.3 ms    [User: 420.2 ms, System: 60.4 ms]

  Range (min … max):   468.6 ms … 495.3 ms

sevagh:pitch-detection $ for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done
Size: 1025      freq: 1337      pitch: 1337.11
Size: 4195      freq: 1337      pitch: 1337.3
Size: 8192      freq: 1337      pitch: 1337.06
Size: 16783     freq: 1337      pitch: 1336.99
Size: 1025      freq: 24134     pitch: 23996.5
Size: 4195      freq: 24134     pitch: 23997.5
Size: 8192      freq: 24134     pitch: 23997.9
Size: 16783     freq: 24134     pitch: 23998
Size: 1025      freq: 95.5      pitch: 95.6076
Size: 4195      freq: 95.5      pitch: 95.7174
Size: 8192      freq: 95.5      pitch: 95.6404
Size: 16783     freq: 95.5      pitch: 95.529
Size: 1025      freq: 179.3     pitch: 180.621
Size: 4195      freq: 179.3     pitch: 179.549
Size: 8192      freq: 179.3     pitch: 179.356
Size: 16783     freq: 179.3     pitch: 179.367
Size: 1025      freq: 354       pitch: 354.2
Size: 4195      freq: 354       pitch: 354.06
Size: 8192      freq: 354       pitch: 354.034
Size: 16783     freq: 354       pitch: 354.078
Size: 1025      freq: 666       pitch: 667.004
Size: 4195      freq: 666       pitch: 666.217
Size: 8192      freq: 666       pitch: 666.125
Size: 16783     freq: 666       pitch: 666.055

from pitch-detection.

sevagh avatar sevagh commented on May 26, 2024

The nsdf is usually the same, just the FFT creation with 0-padding changes lengths:

std::vector<double> normalized_result(N, 0.0);
        for (int i = 0; i < N; ++i)
                normalized_result[i] = result[i + (N2 - N)] / result[N2 - N];
        return normalized_result;

from pitch-detection.

sevagh avatar sevagh commented on May 26, 2024

This has been addressed in a few ways.

  • FFTW has been switched to FFTS to fix the license issue
  • FFTS performs better

Since it's not really my job to make the input arrays perform better, I decided against doing power-of-2 handling. It should be a pleasant surprise to the user.

I'll put a bit in the README.

from pitch-detection.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.