Giter VIP home page Giter VIP logo

crepe's Introduction

CREPE Pitch Tracker

PyPI License Build Status Downloads PyPI

CREPE is a monophonic pitch tracker based on a deep convolutional neural network operating directly on the time-domain waveform input. CREPE is state-of-the-art (as of 2018), outperfoming popular pitch trackers such as pYIN and SWIPE:

Further details are provided in the following paper:

CREPE: A Convolutional Representation for Pitch Estimation
Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello.
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.

We kindly request that academic publications making use of CREPE cite the aforementioned paper.

Installing CREPE

CREPE is hosted on PyPI. To install, run the following command in your Python environment:

$ pip install --upgrade tensorflow  # if you don't already have tensorflow >= 2.0.0
$ pip install crepe

To install the latest version from source clone the repository and from the top-level crepe folder call:

$ python setup.py install

Using CREPE

Using CREPE from the command line

This package includes a command line utility crepe and a pre-trained version of the CREPE model for easy use. To estimate the pitch of audio_file.wav, run:

$ crepe audio_file.wav

or

$ python -m crepe audio_file.wav

The resulting audio_file.f0.csv contains 3 columns: the first with timestamps (a 10 ms hop size is used by default), the second contains the predicted fundamental frequency in Hz, and the third contains the voicing confidence, i.e. the confidence in the presence of a pitch:

time,frequency,confidence
0.00,185.616,0.907112
0.01,186.764,0.844488
0.02,188.356,0.798015
0.03,190.610,0.746729
0.04,192.952,0.771268
0.05,195.191,0.859440
0.06,196.541,0.864447
0.07,197.809,0.827441
0.08,199.678,0.775208
...

Timestamps

CREPE uses 10-millisecond time steps by default, which can be adjusted using the --step-size option, which takes the size of the time step in millisecond. For example, --step-size 50 will calculate pitch for every 50 milliseconds.

Following the convention adopted by popular audio processing libraries such as Essentia and Librosa, from v0.0.5 onwards CREPE will pad the input signal such that the first frame is zero-centered (the center of the frame corresponds to time 0) and generally all frames are centered around their corresponding timestamp, i.e. frame D[:, t] is centered at audio[t * hop_length]. This behavior can be changed by specifying the optional --no-centering flag, in which case the first frame will start at time zero and generally frame D[:, t] will begin at audio[t * hop_length]. Sticking to the default behavior (centered frames) is strongly recommended to avoid misalignment with features and annotations produced by other common audio processing tools.

Model Capacity

CREPE uses the model size that was reported in the paper by default, but can optionally use a smaller model for computation speed, at the cost of slightly lower accuracy. You can specify --model-capacity {tiny|small|medium|large|full} as the command line option to select a model with desired capacity.

Temporal smoothing

By default CREPE does not apply temporal smoothing to the pitch curve, but Viterbi smoothing is supported via the optional --viterbi command line argument.

Saving the activation matrix

The script can also optionally save the output activation matrix of the model to an npy file (--save-activation), where the matrix dimensions are (n_frames, 360) using a hop size of 10 ms (there are 360 pitch bins covering 20 cents each).

The script can also output a plot of the activation matrix (--save-plot), saved to audio_file.activation.png including an optional visual representation of the model's voicing detection (--plot-voicing). Here's an example plot of the activation matrix (without the voicing overlay) for an excerpt of male singing voice:

salience

Batch processing

For batch processing of files, you can provide a folder path instead of a file path:

$ python crepe.py audio_folder

The script will process all WAV files found inside the folder.

Additional usage information

For more information on the usage, please refer to the help message:

$ python crepe.py --help

Using CREPE inside Python

CREPE can be imported as module to be used directly in Python. Here's a minimal example:

import crepe
from scipy.io import wavfile

sr, audio = wavfile.read('/path/to/audiofile.wav')
time, frequency, confidence, activation = crepe.predict(audio, sr, viterbi=True)

Argmax-local Weighted Averaging

This release of CREPE uses the following weighted averaging formula, which is slightly different from the paper. This only focuses on the neighborhood around the maximum activation, which is shown to further improve the pitch accuracy:

Please Note

  • The current version only supports WAV files as input.
  • The model is trained on 16 kHz audio, so if the input audio has a different sample rate, it will be first resampled to 16 kHz using resampy.
  • Due to the subtle numerical differences between frameworks, Keras should be configured to use the TensorFlow backend for the best performance. The model was trained using Keras 2.1.5 and TensorFlow 1.6.0, and the newer versions of TensorFlow seems to work as well.
  • Prediction is significantly faster if Keras (and the corresponding backend) is configured to run on GPU.
  • The provided model is trained using the following datasets, composed of vocal and instrumental audio, and is therefore expected to work best on this type of audio signals.
    • MIR-1K [1]
    • Bach10 [2]
    • RWC-Synth [3]
    • MedleyDB [4]
    • MDB-STEM-Synth [5]
    • NSynth [6]

References

[1] C.-L. Hsu et al. "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset", IEEE Transactions on Audio, Speech, and Language Processing. 2009.

[2] Z. Duan et al. "Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions", IEEE Transactions on Audio, Speech, and Language Processing. 2010.

[3] M. Mauch et al. "pYIN: A fundamental Frequency Estimator Using Probabilistic Threshold Distributions", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2014.

[4] R. M. Bittner et al. "MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research", Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference. 2014.

[5] J. Salamon et al. "An Analysis/Synthesis Framework for Automatic F0 Annotation of Multitrack Datasets", Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference. 2017.

[6] J. Engel et al. "Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders", arXiv preprint: 1704.01279. 2017.

crepe's People

Contributors

adarob avatar jongwook avatar justinsalamon avatar oriyonay avatar pluieelectrique avatar sharvil avatar tbazin avatar wooters avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crepe's Issues

Add option to select GPU.

Hi, when I use CREPE, it automatically selects the GPU which is currently used for other jobs. Therefore, it returns out of memory error even if there is another GPU that is not used.

If the user can select the GPU which the user want to use, we can solve this type of an error.

About training and prediction

Hello,
First of all thanks for the amazing paper and the repo !!
I have a basic doubt, the RWC Dataset says that annotated data is at semi-tone intervals, that is 50 cents.
How is CREPE able to predict with 10 or 20 cent intervals?

Does the training data in the paper differ from the provided models on Github?

Hi,

thanks for sharing your work! ๐Ÿš€

I'm a bit confused which dataset and instruments were actually used for the provided models on Github.

In the paper, only RWC-synth and
MDB-stem-synth are specified as datasets. In the conclusions, a training on NSynth is mentioned for future work

In the README on Github:

The provided model is trained using the following datasets, composed of vocal and instrumental audio, and is therefore expected to work best on this type of audio signals.
* MIR-1K [1]
* Bach10 [2]
* RWC-Synth [3]
* MedleyDB [4]
* MDB-STEM-Synth [5]
* NSynth [6]

How did you determine the ground-truth f0 for the datasets that have only an annotated midi pitch, e.g. NSynth?

Any training code?

Hi, I am wondering if there is any well-organized training codes by author that is open-sources? There are only prediction codes in the github I think.

f0 estimation in silent period

Hi, I have a question when I use CREPE to estimate f0 in human voice dataset. I found that the f0 has fluctuation even in audio's silent period. Is it normal? I am not sure if f0 should be constant during silent period. Is it because CREPE is not suitable in estimating human voice?
I also found in this literature that female f0 should be around 186 but I got around 220 here, and sometimes even higher (300).
ๆˆชๅฑ2020-06-18 23 24 12

fix requirements.txt

Hi,
When I try to install crepe v0.0.9 i get the following error:

Could not find a version that satisfies the requirement tensorflow==2.0 (from versions: 0.12.0rc0, 0.12.0rc1, 0.12.0, 0.12.1, 1.0.0, 1.0.1, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.3.0rc0, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.4.0rc0, 1.4.0rc1, 1.4.0, 1.4.1, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.8.0rc0, 1.8.0rc1, 1.8.0, 1.9.0rc0, 1.9.0rc1, 1.9.0rc2, 1.9.0, 1.10.0rc0, 1.10.0rc1, 1.10.0, 1.10.1, 1.11.0rc0, 1.11.0rc1, 1.11.0rc2, 1.11.0, 1.12.0rc0, 1.12.0rc1, 1.12.0rc2, 1.12.0, 1.12.2, 1.12.3, 1.13.0rc0, 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1)
I have all the latest versions of tensorflow, python and pip installed and am running on Ubuntu 18.04.02

Maybe the tensorflow==2.0 line in the requirements file should read tensorflow>=2.0.0a0 ? I installed version 7 though and everything worked just fine.

Viterbi algorithm does not apply to activation probabilities

I would like to use the output of Crepe to determine whether singer is active versus silent at the perceptual level. That should change at the level of seconds, not milliseconds. Setting a hard threshold based on confidence, though, results in a quick alternation between the two states. The alternation shows in the thick vertical lines in the plots below.

Viterbi would be a straightforward approach to smoothing this out. The current version, though, only applies smoothing to the pitch. I wrote an extension and added it to a pull request in case it would be useful for others: #26.

Screen Shot 2020-06-02 at 4 59 50 PM

Screen Shot 2020-06-02 at 5 15 31 PM

Code for this plot:

import csv
import matplotlib.pyplot as plt
import numpy as np

f0 = []
conf = []
thresh = 0.5

with open('MUSDB18HQ/train/Music Delta - Hendrix/vocals.f0.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            f0.append(float(row[1]))
            conf.append(float(row[2]))
            line_count += 1
    print(f'Processed {line_count} lines.')

voiced = [1 if c > thresh else 0 for c in conf]
# plt.plot(np.array(f0) * np.array(voiced))
plt.plot(np.array(voiced))
plt.show()

ERROR: Command errored out with exit status 1:

ๆŠฅ้”™ๅ•ฆ~ๆ€Žไนˆๅค„็†ๅ‘ข๏ผŸ

`

pip install crepe

Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting crepe
Downloading http://mirrors.aliyun.com/pypi/packages/c8/74/1677b9369f233745b3dedf707ce26fb935c5c400379c45400df818f3a805/crepe-0.0.11.tar.gz (15 kB)

ERROR: Command errored out with exit status 1:
 command: 'd:\program files\python\python38\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = 
 '"'"'C:\\Users\\mac\\AppData\\Local\\Temp\\pip-install-frjzwa1k\\crepe\\setup.py'"'"'; __file__='"'"'C:\\Users\\mac\\AppData\\Local\\Temp\\pip-install-frjzwa1k\\crepe\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 
'C:\Users\mac\AppData\Local\Temp\pip-pip-egg-info-ky0m23t0'

     cwd: C:\Users\mac\AppData\Local\Temp\pip-install-frjzwa1k\crepe\
Complete output (57 lines):
C:\Users\mac\AppData\Local\Temp\pip-install-frjzwa1k\crepe\setup.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Traceback (most recent call last):
  File "d:\program files\python\python38\lib\urllib\request.py", line 1350, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "d:\program files\python\python38\lib\http\client.py", line 1240, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "d:\program files\python\python38\lib\http\client.py", line 1286, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "d:\program files\python\python38\lib\http\client.py", line 1235, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "d:\program files\python\python38\lib\http\client.py", line 1006, in _send_output
    self.send(msg)
  File "d:\program files\python\python38\lib\http\client.py", line 946, in send
    self.connect()
  File "d:\program files\python\python38\lib\http\client.py", line 1402, in connect
    super().connect()
  File "d:\program files\python\python38\lib\http\client.py", line 917, in connect
    self.sock = self._create_connection(
  File "d:\program files\python\python38\lib\socket.py", line 787, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "d:\program files\python\python38\lib\socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\mac\AppData\Local\Temp\pip-install-frjzwa1k\crepe\setup.py", line 30, in <module>
    urlretrieve(base_url + compressed_file, compressed_path)
  File "d:\program files\python\python38\lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "d:\program files\python\python38\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "d:\program files\python\python38\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "d:\program files\python\python38\lib\urllib\request.py", line 640, in http_response
    response = self.parent.error(
  File "d:\program files\python\python38\lib\urllib\request.py", line 563, in error
    result = self._call_chain(*args)
  File "d:\program files\python\python38\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "d:\program files\python\python38\lib\urllib\request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "d:\program files\python\python38\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "d:\program files\python\python38\lib\urllib\request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "d:\program files\python\python38\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "d:\program files\python\python38\lib\urllib\request.py", line 1393, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "d:\program files\python\python38\lib\urllib\request.py", line 1353, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
Downloading weight file model-tiny.h5.bz2 ...
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.`

Improperly cached cents conversion matrix

Hi,
In these lines

crepe/crepe/core.py

Lines 100 to 103 in 7f0bf05

if not hasattr(to_local_average_cents, 'cents_mapping'):
# the bin number-to-cents mapping
to_local_average_cents.mapping = (
np.linspace(0, 7180, 360) + 1997.3794084376191)

the attribute used to check for the cached conversion matrix, cents_mapping, is not the actual attribute subsequently used for caching (mapping is used).
This makes this IF statement always-passing and the value is recomputed and recached everytime the function gets called.

Model not loaded when crepe used directly in python

When using crepe directly in python (via import crepe), calling crepe.predict() crashes because the keras model is never loaded (build_and_load_model() is never called). The result is the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-38145eddd20a> in <module>()
----> 1 (time, frequency, confidence, activation) = crepe.predict(audio, sr, viterbi=True)

~/Documents/dev/miniconda3/envs/py35/lib/python3.5/site-packages/crepe/core.py in predict(audio, sr, viterbi)
    192             The raw activation matrix
    193     """
--> 194     activation = get_activation(audio, sr)
    195     confidence = activation.max(axis=1)
    196 

~/Documents/dev/miniconda3/envs/py35/lib/python3.5/site-packages/crepe/core.py in get_activation(audio, sr)
    162 
    163     # run prediction and convert the frequency bin weights to Hz
--> 164     return model.predict(frames, verbose=1)
    165 
    166 

AttributeError: 'NoneType' object has no attribute 'predict'

The fix is to add a check in get_activation() and load the model if it's None. Imminent fix coming via bz2model branch.

Timestamps not zero-centered?

The first timestamp returned by CREPE is 0.0, but the first frame (I think) goes from samples 0-1023, meaning the first timestamp should actually be 512/16000 = 0.032 = 32ms.

Or, a better solution, lets zero-pad the input signal by 512 samples, so that the first frame is indeed centered at time 0.

Incorrect output shape on some file sizes

Running crepe on a 16kHz mono wav file with 102400 samples and a step size of 10ms produces 641 pitch estimates instead of 640 (both via command line and Python interface). We'd expect a hop size of 16000 / 1000 * 10 == 160 samples for a step size of 10ms. An audio clip with 102400 samples should have 102400 / 160 == 640 estimates.

You can create a synthetic audio clip to reproduce:

import crepe
import numpy as np

x = np.random.normal(size=[102400])
x = np.clip(x, -1.0, 1.0)

time, frequency, confidence, activation = crepe.predict(x, 16000, viterbi=True)
# observe len(frequency) == 641

imshow is deprecated, replace with matplotlib

imshow is deprecated, we should use matplotlib for saving the figure to disk (we're already importing it, so might as well).

An aside is that I don't know if it's a good idea to make matplotlib a dependency... it's a pretty huge lib.

Add tensorflow to requirements.txt?

The model was trained using TF, and we don't really know if/how it'd work with a different backend. Perhaps we should add tensorflow as a requirement, and set as a minimum version the TF version used to train the model?

Option to use a smaller model for faster computation

Currently the model capacity multiplier is fixed to 32, but this can be adjusted as a trade-off between the computation time and accuracy. Roughly speaking, the number of parameters is quadratic to this multiplier.

The one that is deployed on https://marl.github.io/crepe uses model multiplier 4, and still achieves quite comparable performance:

multiplier   #params    RPA
        32    22.24M    93.75%
        16    5.879M    93.22%
         8    1.629M    92.47%
         4      486k    91.52%

(note that these numbers are on MedleyDB v1 and not comparable to what's reported in the paper)

We'd like to have an option to select a smaller model, for faster calculation in the cost of slightly lower accuracy.

I'd suggest the following options for specifying the model capacity.

CLI Option multiplier # of params Model file size
--model-capacity full 32 22M 88 MB
--model-capacity large 24 12M 48 MB
--model-capacity medium 16 5.9M 24 MB
--model-capacity small 8 1.6M 6.4 MB
--model-capacity tiny 4 486k 1.9 MB

Currently the size of the PyPI archive is 57.7 MB, very close to the 60 MB limit, and it's only possible to add the tiny for immediate upload to PyPI. Requesting quota increase on PyPI is possible, but it seems quite difficult and not so sure if they will allow us the increase.

We can alternatively put the models on a separate branch in this repo, and have the code download the models during the installation or first use.

Let me do the former first (adding tiny), and figure out how to add the other three later.

Viterbi tracking raises ValueError in hmmlearn

Hello Maintainers,

Calling crepe arctic_a0001.wav -V on the famous "Author of the danger trail" test sentence from Arctic database resulted in the following error. Calling without -V runs fine.

This is CREPE 0.0.4 installed from pip.

329/329 [==============================] - 7s 20ms/step
Traceback (most recent call last):
  File "/home/sleepwalking/contrib/miniconda2/bin/crepe", line 11, in <module>
    sys.exit(main())
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/cli.py", line 115, in main
    args.save_activation, args.save_plot, args.plot_voicing)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/cli.py", line 63, in run
    save_activation, save_plot, plot_voicing)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/core.py", line 252, in process_file
    time, frequency, confidence, activation = predict(audio, sr, viterbi)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/core.py", line 202, in predict
    cents = to_viterbi_cents(activation)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/core.py", line 122, in to_viterbi_cents
    path = model.predict(observations.reshape(-1, 1), [len(observations)])
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/base.py", line 334, in predict
    _, state_sequence = self.decode(X, lengths)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/base.py", line 294, in decode
    self._check()
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/hmm.py", line 394, in _check
    super(MultinomialHMM, self)._check()
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/base.py", line 524, in _check
    .format(self.transmat_.sum(axis=1)))
ValueError: rows of transmat_ must sum to 1.0 (got [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0])

imageio uninstall issue while installing CREPE via pip

At least in Ubuntu bionic (18.04) and pip 9.x, I see

DEPRECATION: Uninstalling a distutils installed project (imageio) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.

and in pip 10, the installation fails because of this.

We need a way to not trigger the uninstallation of imageio

Add option to suppress TF printouts

Depending on your setup TF can get quite verbose, e.g.:

Using TensorFlow backend.
/Users/justin/Documents/dev/miniconda3/envs/py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.5
  return f(*args, **kwds)
2018-07-13 11:30:52.866593: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

While I think we should keep these messages by default, it might be nice to add an optional argument to suppress them. Thoughts?

Segfault if I don't call predict early in my script

So I was trying to test the network on some streaming data but I encountered a funny bug. If I call predict once before doing anything else in the script it works fine. If I don't it segfaults (at least on my machine) with

Using TensorFlow backend.
Aborted (core dumped)
import numpy as np
import sounddevice as sd
import queue
import sys
import crepe

if __name__ == "__main__":
    fs = 16000
    frameSize = 2048

    # ------- IF I REMOVE THIS LINE IT SEGFAULTS ---------
    time, frequency, confidence, activation = crepe.predict(np.zeros(2048), fs, viterbi=False)

    q = queue.Queue()


    def audio_callback(indata, frames, time, status):
        if status:
            print(status, file=sys.stderr)
        q.put(indata)

    stream = sd.InputStream(device=None, channels=1, samplerate=fs, callback=audio_callback)
    recdata = np.zeros(frameSize, np.float64)


    with stream:
        while True:
            try:
                data = q.get_nowait()
            except queue.Empty:
                continue
            shift = len(data)
            recdata = np.roll(recdata, -shift, axis=0)
            recdata[-shift:] = data[:, 0]
            time, frequency, confidence, activation = crepe.predict(recdata, fs, viterbi=False)

I have no clue what might cause this... Maybe some conflict with sounddevice initialization?

Any API in Language C++?

Dear,
Is there any Api for C++,
I just want to use C++ in my project,
Could U help me,Please?
Thx

make it into a module

  • move the content of crepe.py into the subdirectory crepe
  • only expose the public API crepe.predict and crepe.get_activation
  • write setup.py so that setuptools can install the module
  • include the entry point in the module and in setup.py so that the CLI still work
  • upload crepe to PyPI as version 0.0.x

Classification vs regression

Thanks for the great repo!
I have this typical question. Have you tried to solve it as a regression problem, e.g., predicting the pitch index directly for example? or cent or hz or whatever. I'd appreciate if you could share your research experience around it.

Optimize for CPU inference

Inference on CPU is very slow right now (often too slow for practical application).

I think TensorFlow already uses as many cpu cores as it has access to when running in cpu mode (?), so I'm not sure whether e.g. splitting the audio track and parallelizing inference via e.g. multiprocessing or jobilb would make any difference.

But, it might be worth checking out TF guide on performance, such as the performance guide or the info on model quantization.

about dataset

when training, audios should be padded to the same length, or the generated audios just the same length?

Computing Target from Frequency

Hi there!

CREPE is great! I am doing my own experiments, trying to learn more.

I wonder, how exactly do you compute the target vector (360-cents) given a frequency?

Best,
Tristan

Performance on Instrumental Audio Tracks

Is Crepe basically trained on vocals (speeches)? I tried myself to track pitches on separate vocals and instrumental tracks of the same song and it resulted in considerably high pitch differences at the same time points for vocals and instrumental.

Step_size is not an int, also why not give access to hop_length?

Step_size does not need to be an int, floats work just fine and may be necessary if you are targeting a certain number of resulting samples. Iโ€™m currently trying to use this to replace another scheme that uses the hop length directly, so for me it would have been useful to directly set the value instead of back-calculating the step_size to match the needed hop_length.

Otherwise - very impressive works very well.
Thanks

when pitch is 0?

When the pitch in audio is zero, the ouput is the most likely pitch but not zero. Why the model do not take silence into consideration?

Frozen Model Conversion Failed

I'm trying to convert the keras models you've provided to frozen graphs so they can be used in a C# application. When I do this it throws an error when I try to load the model during conversion:

raise ValueError('Cannot create group in read-only mode.')
ValueError: Cannot create group in read-only mode.

I've found some discussions about how this means only the model's weights are included, not its architecture and dimensions.
Could you include a json file for the files dimensions in the models branch, or include frozen models?

Polyphonic transcription

I was wondering if Crepe model would allow transcriptions of instruments with polyphonic notes (for example chords played by a violin), maybe changing somehow the model and retraining it would allow it to handle polyphony?

Instructions for real-time pitch detection

I'm interested in using crepe in an online app. I'd like to use crepe in a very similar way as it is used in the example website: marl.github.io/crepe/. The readme shows how to use it for wav files, is there somewhere I can find instructions on how to use it in real-time microphone input data?

Speech

Since this model is largely training on songs and instrumentals, will it work reliably for speech analysis? Why not train the model on regular speech?

Larger values for step_size

Hi, I noticed in this package as well as in some similar ones, that the step_size is very small - for example 10ms. What would happen if the step_size is set to for example 1000ms or even 2500ms? Would this give similar results to when within 1000ms a bag of all the 10ms is taken? The reason is that I am interested by an average pitch every second or every few seconds and don't need to know for every 10ms the pitch. This would also reduce the calculation time as far as I see.
But is there something logically flawed in taking larger values for the step_size?
Thank you and kind regards!

An option to specify the timestep (hop size) for prediction

At the moment the predictions are calculated for every 10 milliseconds.

It'd be good to have an option to specify this interval, to compensate between the time precision and the time it takes to run the prediction.

Since we use 16kHz, 1 millisecond corresponds to 16 samples, and we'd get integer hop sizes for time steps given in integer multiples of milliseconds.

Bug in wave normalization.

Hi, thank you for sharing this awesome work.

I've encountered this code below, and I dont' think this is correct.

crepe/core.py

    frames = as_strided(audio, shape=(1024, n_frames),
                        strides=(audio.itemsize, hop_length * audio.itemsize))
    frames = frames.transpose()

    # normalize each frame -- this is expected by the model
    frames -= np.mean(frames, axis=1)[:, np.newaxis]
    frames /= np.std(frames, axis=1)[:, np.newaxis]

This code is meant to normalize wave by each frame, but numpy's as_strided function does not allocate new memory, but does just create a new view of the array. (for example, the memory of first frame and the memory of second frame is not separated.)

See numpy documentation for more details.

So when do "in-place" calculation on that array, unintented result may raise. Here is an example.

np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )

# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
                    strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose()

# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]

print(frames.sum())
146.7856165123937

Since frame is normalized, its sum should be about zero. but it's not.

To avoid this behavior, target array should be copied or something, to do operation correctly as intended. here is an example. only one line is changed. (.copy())

np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )

# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
                    strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose().copy()

# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]

print(frames.sum())
-2.384759056894836e-13
  • This code is tested with numpy version 1.17.4

Regression tests

As we update dependencies (e.g. TF), there's a real risk of changed behavior going unnoticed (e.g. performance drop). We should add regression tests to ensure the output remains consistent.

sklearn issued warning message for passing attributes to `check_is_fitted`

sklearn issued warning message for passing attributes to check_is_fitted as follows:

FutureWarning: Passing attributes to check_is_fitted is deprecated and will be removed in 0.23. The attributes argument is ignored.
"argument is ignored.", FutureWarning)

The sklearn version that issued this warning: 0.22

Ways to reproduce:

  1. Use the package version above
  2. Run f0 estimation
rate, audio = scipy.io.wavfile.read('/your wav file.wav')
crepe.predict(audio, rate, viterbi=True)

This should not affect the prediction output, am I right?

How to train the model by myself with the method

Dear,
When I use
crepe audio_file.wav
in piano pitches track,
the results are not good in low frequency,so I just want to train the model by myself using DCNN proposed .
Please help me,SOS
Thx

WavFileWarning: Chunk (non-data) not understood, skipping it.

when trying to load file to analyse:
env DEBUGPY_LAUNCHER_PORT=44929 /usr/bin/python3 /home/hatmore/.vscode/extensions/ms-python.python-2020.3.71659/pythonFiles/lib/python/debugpy/wheels/debugpy/launcher /home/hatmore/Desktop/pyCodes/VocalRange/test.py
/home/hatmore/Desktop/pyCodes/VocalRange/test.py:4: WavFileWarning: Chunk (non-data) not understood, skipping it.
sr, audio = wavfile.read('test.wav')

Add more unit tests

Currently only process_file is unit tested. We need to add tests for all other functions, including those exposed by the API (predict and get_activation) and those that aren't.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.