Comments (9)
First of all, thank you very much for your code, and for licensing it MIT :)
No prob. Actually somebody e-mailed me to ask for that (unless that was you) so thanks to him as well 👍
Most FFT libraries, including fftw3, are optimized for FFT sizes that are powers of 2.
That's very interesting, I didn't know that. Do you want to make a PR? I also wonder if I should add assert (N % 2 == 0)
to libxcorr to enforce performant calls to FFTW3.
n is the size of the transform. It can be any positive integer. FFTW is best at handling sizes of the form 2a 3b 5c 7d 11e 13f, where e+f is either 0 or 1, and the other exponents are arbitrary. Other sizes are computed by means of a slow, general-purpose routine (which nevertheless retains O(n lg n) performance, even for prime sizes). (It is possible to customize FFTW for different array sizes. See Section Installation and Customization, for more information.) Transforms whose sizes are powers of 2 are especially fast.
http://www.fftw.org/fftw2_doc/fftw_3.html
from pitch-detection.
I've written my own codebase based on your implementation, so I haven't actually modified your code, which is why I didn't create a PR in the first place. While the assertion you're suggesting makes sense, it's obviously only effective if the actual size of the FFT, N2
, which is based on N
, is even as well.
Regarding the License: I'm not sure whether you are actually allowed to license your code under MIT, given that it uses FFTW3, which is licensed under GPL, which spreads like cancer and requires you to license your work under GPL as well. Therefore, I propose switching to a different FFT library with a license compatible to MIT, for example Kiss FFT.
Note that Kiss FFT is less performant than FFTW3, however it's still suitable for real-time pitch estimation of audio when working with even FFT sizes.
Here's my autocorrelation
function based on xcorr
, which you're free to use however you wish. Note that I changed the input to only a single vector of samples, as we're only calculating autocorrelation for MPM anyway. If that doesn't suit your use case, you can easily revert that change. Note that I also changed the calculation of N2
, which is why you need to change your size2
variable inside the normalized_square_difference
function to 2 * size
as well. Again, I'm not sure whether the modification of N2 leads to mathematically equivalent results, however my tests did not show a noticeable difference in the pitch estimation.
Also, I changed the FFT library to Kiss FFT, so it is suitable for use with the MIT license. However, Kiss FFT uses float
values instead of doubles
by default, although according to their docs, it should be possible to change that. I just haven't been able to do so since I'm pretty new to C++ and its non-existent build and dependency system.
#include <kiss_fft.h> // in the header file
void autocorrelation(const std::vector<float> &data, std::complex<float> *result) {
size_t n = data.size();
size_t n2 = 2 * n;
std::complex<float> signala_ext[n2];
std::complex<float> signalb_ext[n2];
// zero padding of input data
for (int i = 0; i < n; i++) {
signala_ext[n + i] = data[i];
signalb_ext[i] = data[i];
}
std::complex<float> outa[n2];
std::complex<float> outb[n2];
std::complex<float> out[n2];
auto fftCfgForward = kiss_fft_alloc(n2, false, nullptr, nullptr);
auto fftCfgBackward = kiss_fft_alloc(n2, true, nullptr, nullptr);
if (!fftCfgForward || !fftCfgBackward) {
throw std::bad_alloc();
}
kiss_fft(fftCfgForward, (kiss_fft_cpx *) signala_ext, (kiss_fft_cpx *) outa);
kiss_fft(fftCfgForward, (kiss_fft_cpx *) signalb_ext, (kiss_fft_cpx *) outb);
for (int i = 0; i < n2; ++i) {
out[i] = outa[i] * conj(outb[i]);
}
kiss_fft(fftCfgBackward, (kiss_fft_cpx *) out, (kiss_fft_cpx *) result);
free(fftCfgForward);
free(fftCfgBackward);
}
You could optimize this even more by making a class for correlation and frequency calculation, and creating the fftCfgForward
and fftCfgBackward
in the constructor and freeing it in the destructor, however the overhead of the allocation should be minimal enough so it doesn't matter.
from pitch-detection.
Lots of good info here. I will explore well-licensed, simpler, more concise correlation and/or FFT libraries and stop depending on libxcorr in this project.
from pitch-detection.
FYI if you're interested, my Python code for MPM is much more solid (using scipy, numpy, and numba) and with those well-optimized libraries it's probably pretty dang performant: https://github.com/sevagh/transcribe/blob/master/transcribe/pitch.py#L9
from pitch-detection.
I finally got some numbers to prove it. I simplified things by taking the xcorr
code from libxcorr and embedding it directly inside this repo.
When N2 = 2 * N - 1, which is the minimum for auto-correlation to work correctly, but not necessarily a power of 2:
sevagh:pitch-detection $ hyperfine --min-runs 10 './bin/sinewave --freq 1337 --algo mpm --size 64000'
Benchmark #1: ./bin/sinewave --freq 1337 --algo mpm --size 64000
Time (mean ± σ): 66.8 ms ± 11.3 ms [User: 57.8 ms, System: 8.1 ms]
Range (min … max): 50.3 ms … 85.2 ms
With this bit of code to find the next power of 2:
/* https://stackoverflow.com/questions/466204/rounding-up-to-next-power-of-2 */
if (N2 & (N2 - 1)) {
N2--;
N2 |= N2 >> 1;
N2 |= N2 >> 2;
N2 |= N2 >> 4;
N2 |= N2 >> 8;
N2 |= N2 >> 16;
N2++;
}
assert(!(N2 & (N2 - 1)));
results are faster:
sevagh:pitch-detection $ hyperfine --min-runs 10 './bin/sinewave --freq 1337 --
algo mpm --size 64000'
Benchmark #1: ./bin/sinewave --freq 1337 --algo mpm --size 64000
Time (mean ± σ): 24.7 ms ± 2.6 ms [User: 18.0 ms, System: 6.4 ms]
Range (min … max): 21.0 ms … 34.9 ms
sevagh:pitch-detection $
from pitch-detection.
Yes they are faster, but do they also result in the same pitch estimation for the same input files?
from pitch-detection.
Power of 2:
sevagh:pitch-detection $ for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done
Size: 1025 freq: 1337 pitch: 1337.11
Size: 4195 freq: 1337 pitch: 1337.3
Size: 8192 freq: 1337 pitch: 1337.06
Size: 16783 freq: 1337 pitch: 1336.99
Size: 1025 freq: 24134 pitch: 23996.5
Size: 4195 freq: 24134 pitch: 23997.5
Size: 8192 freq: 24134 pitch: 23997.9
Size: 16783 freq: 24134 pitch: 23998
Size: 1025 freq: 95.5 pitch: 95.6076
Size: 4195 freq: 95.5 pitch: 95.7174
Size: 8192 freq: 95.5 pitch: 95.6404
Size: 16783 freq: 95.5 pitch: 95.529
Size: 1025 freq: 179.3 pitch: 180.621
Size: 4195 freq: 179.3 pitch: 179.549
Size: 8192 freq: 179.3 pitch: 179.356
Size: 16783 freq: 179.3 pitch: 179.367
Size: 1025 freq: 354 pitch: 354.2
Size: 4195 freq: 354 pitch: 354.06
Size: 8192 freq: 354 pitch: 354.034
Size: 16783 freq: 354 pitch: 354.078
Size: 1025 freq: 666 pitch: 667.004
Size: 4195 freq: 666 pitch: 666.217
Size: 8192 freq: 666 pitch: 666.125
Size: 16783 freq: 666 pitch: 666.055
sevagh:pitch-detection $
sevagh:pitch-detection $
sevagh:pitch-detection $ hyperfine --min-runs 10 'for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000;
done; done'
Benchmark #1: for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done
Time (mean ± σ): 309.0 ms ± 9.8 ms [User: 241.0 ms, System: 61.8 ms]
Range (min … max): 295.2 ms … 325.0 ms
No power-of-2:
sevagh:pitch-detection $ hyperfine --min-runs 10 'for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done'
Benchmark #1: for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done
Time (mean ± σ): 487.6 ms ± 8.3 ms [User: 420.2 ms, System: 60.4 ms]
Range (min … max): 468.6 ms … 495.3 ms
sevagh:pitch-detection $ for x in 1337 24134 95.5 179.3 354 666; do for y in 1025 4195 8192 16783; do ./bin/sinewave $x mpm $y 48000; done; done
Size: 1025 freq: 1337 pitch: 1337.11
Size: 4195 freq: 1337 pitch: 1337.3
Size: 8192 freq: 1337 pitch: 1337.06
Size: 16783 freq: 1337 pitch: 1336.99
Size: 1025 freq: 24134 pitch: 23996.5
Size: 4195 freq: 24134 pitch: 23997.5
Size: 8192 freq: 24134 pitch: 23997.9
Size: 16783 freq: 24134 pitch: 23998
Size: 1025 freq: 95.5 pitch: 95.6076
Size: 4195 freq: 95.5 pitch: 95.7174
Size: 8192 freq: 95.5 pitch: 95.6404
Size: 16783 freq: 95.5 pitch: 95.529
Size: 1025 freq: 179.3 pitch: 180.621
Size: 4195 freq: 179.3 pitch: 179.549
Size: 8192 freq: 179.3 pitch: 179.356
Size: 16783 freq: 179.3 pitch: 179.367
Size: 1025 freq: 354 pitch: 354.2
Size: 4195 freq: 354 pitch: 354.06
Size: 8192 freq: 354 pitch: 354.034
Size: 16783 freq: 354 pitch: 354.078
Size: 1025 freq: 666 pitch: 667.004
Size: 4195 freq: 666 pitch: 666.217
Size: 8192 freq: 666 pitch: 666.125
Size: 16783 freq: 666 pitch: 666.055
from pitch-detection.
The nsdf
is usually the same, just the FFT creation with 0-padding changes lengths:
std::vector<double> normalized_result(N, 0.0);
for (int i = 0; i < N; ++i)
normalized_result[i] = result[i + (N2 - N)] / result[N2 - N];
return normalized_result;
from pitch-detection.
This has been addressed in a few ways.
- FFTW has been switched to FFTS to fix the license issue
- FFTS performs better
Since it's not really my job to make the input arrays perform better, I decided against doing power-of-2 handling. It should be a pleasant surprise to the user.
I'll put a bit in the README.
from pitch-detection.
Related Issues (20)
- Transcriber example HOT 2
- Absorb yin and mpm repos here
- Guitar tuner example HOT 3
- Docs HOT 1
- Could you give an example code to detect pitch? HOT 6
- ma'c
- How can I use it in mac os? HOT 1
- YIN vs McLeod pitch detection accuracy HOT 7
- Implement pYIN (probabilistic YIN) HOT 6
- Implement SWIPE HOT 6
- Implement CREPE HOT 2
- C API
- wrong results with different buffer sizes in the sinewave example HOT 4
- Add Aud-SWIPE-P (parallel SWIPE)
- Android/Oboe tutorial/doc
- the error of the project HOT 6
- Additional compiler flags maybe needed HOT 2
- Using autocorrelation instead of calculating NSDF in MPM HOT 5
- Create a GitHub Action that builds and runs unit tests HOT 5
- systematic comparison of methods HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pitch-detection.