marl / crepe Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 154.0 187.44 MB

CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)

Home Page: https://marl.github.io/crepe/

License: MIT License

Python 100.00%

keras music-information-retrieval pitch-estimation tensorflow

crepe's Issues

Incorrect output shape on some file sizes

Running crepe on a 16kHz mono wav file with 102400 samples and a step size of 10ms produces 641 pitch estimates instead of 640 (both via command line and Python interface). We'd expect a hop size of 16000 / 1000 * 10 == 160 samples for a step size of 10ms. An audio clip with 102400 samples should have 102400 / 160 == 640 estimates.

You can create a synthetic audio clip to reproduce:

import crepe
import numpy as np

x = np.random.normal(size=[102400])
x = np.clip(x, -1.0, 1.0)

time, frequency, confidence, activation = crepe.predict(x, 16000, viterbi=True)
# observe len(frequency) == 641

Model not loaded when crepe used directly in python

When using crepe directly in python (via import crepe), calling crepe.predict() crashes because the keras model is never loaded (build_and_load_model() is never called). The result is the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-38145eddd20a> in <module>()
----> 1 (time, frequency, confidence, activation) = crepe.predict(audio, sr, viterbi=True)

~/Documents/dev/miniconda3/envs/py35/lib/python3.5/site-packages/crepe/core.py in predict(audio, sr, viterbi)
    192             The raw activation matrix
    193     """
--> 194     activation = get_activation(audio, sr)
    195     confidence = activation.max(axis=1)
    196 

~/Documents/dev/miniconda3/envs/py35/lib/python3.5/site-packages/crepe/core.py in get_activation(audio, sr)
    162 
    163     # run prediction and convert the frequency bin weights to Hz
--> 164     return model.predict(frames, verbose=1)
    165 
    166 

AttributeError: 'NoneType' object has no attribute 'predict'

The fix is to add a check in get_activation() and load the model if it's None. Imminent fix coming via bz2model branch.

Add tensorflow to requirements.txt?

The model was trained using TF, and we don't really know if/how it'd work with a different backend. Perhaps we should add tensorflow as a requirement, and set as a minimum version the TF version used to train the model?

fix requirements.txt

Hi,
When I try to install crepe v0.0.9 i get the following error:

Could not find a version that satisfies the requirement tensorflow==2.0 (from versions: 0.12.0rc0, 0.12.0rc1, 0.12.0, 0.12.1, 1.0.0, 1.0.1, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.3.0rc0, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.4.0rc0, 1.4.0rc1, 1.4.0, 1.4.1, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.8.0rc0, 1.8.0rc1, 1.8.0, 1.9.0rc0, 1.9.0rc1, 1.9.0rc2, 1.9.0, 1.10.0rc0, 1.10.0rc1, 1.10.0, 1.10.1, 1.11.0rc0, 1.11.0rc1, 1.11.0rc2, 1.11.0, 1.12.0rc0, 1.12.0rc1, 1.12.0rc2, 1.12.0, 1.12.2, 1.12.3, 1.13.0rc0, 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1)
I have all the latest versions of tensorflow, python and pip installed and am running on Ubuntu 18.04.02

Maybe the tensorflow==2.0 line in the requirements file should read tensorflow>=2.0.0a0 ? I installed version 7 though and everything worked just fine.

Polyphonic transcription

I was wondering if Crepe model would allow transcriptions of instruments with polyphonic notes (for example chords played by a violin), maybe changing somehow the model and retraining it would allow it to handle polyphony?

An option to specify the timestep (hop size) for prediction

At the moment the predictions are calculated for every 10 milliseconds.

It'd be good to have an option to specify this interval, to compensate between the time precision and the time it takes to run the prediction.

Since we use 16kHz, 1 millisecond corresponds to 16 samples, and we'd get integer hop sizes for time steps given in integer multiples of milliseconds.

Segfault if I don't call predict early in my script

So I was trying to test the network on some streaming data but I encountered a funny bug. If I call predict once before doing anything else in the script it works fine. If I don't it segfaults (at least on my machine) with

Using TensorFlow backend.
Aborted (core dumped)

import numpy as np
import sounddevice as sd
import queue
import sys
import crepe

if __name__ == "__main__":
    fs = 16000
    frameSize = 2048

    # ------- IF I REMOVE THIS LINE IT SEGFAULTS ---------
    time, frequency, confidence, activation = crepe.predict(np.zeros(2048), fs, viterbi=False)

    q = queue.Queue()


    def audio_callback(indata, frames, time, status):
        if status:
            print(status, file=sys.stderr)
        q.put(indata)

    stream = sd.InputStream(device=None, channels=1, samplerate=fs, callback=audio_callback)
    recdata = np.zeros(frameSize, np.float64)


    with stream:
        while True:
            try:
                data = q.get_nowait()
            except queue.Empty:
                continue
            shift = len(data)
            recdata = np.roll(recdata, -shift, axis=0)
            recdata[-shift:] = data[:, 0]
            time, frequency, confidence, activation = crepe.predict(recdata, fs, viterbi=False)

I have no clue what might cause this... Maybe some conflict with sounddevice initialization?

I can't find https://github.com/marl/crepe/raw/models/

about dataset

when training, audios should be padded to the same length, or the generated audios just the same length?

Viterbi algorithm does not apply to activation probabilities

I would like to use the output of Crepe to determine whether singer is active versus silent at the perceptual level. That should change at the level of seconds, not milliseconds. Setting a hard threshold based on confidence, though, results in a quick alternation between the two states. The alternation shows in the thick vertical lines in the plots below.

Viterbi would be a straightforward approach to smoothing this out. The current version, though, only applies smoothing to the pitch. I wrote an extension and added it to a pull request in case it would be useful for others: #26.

Code for this plot:

import csv
import matplotlib.pyplot as plt
import numpy as np

f0 = []
conf = []
thresh = 0.5

with open('MUSDB18HQ/train/Music Delta - Hendrix/vocals.f0.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            f0.append(float(row[1]))
            conf.append(float(row[2]))
            line_count += 1
    print(f'Processed {line_count} lines.')

voiced = [1 if c > thresh else 0 for c in conf]
# plt.plot(np.array(f0) * np.array(voiced))
plt.plot(np.array(voiced))
plt.show()

Regression tests

As we update dependencies (e.g. TF), there's a real risk of changed behavior going unnoticed (e.g. performance drop). We should add regression tests to ensure the output remains consistent.

How to implement a real time tracker using CREPE

How can I implement a real time tracker using the crepe, as in the README file, the crepe takes only wav file to do pitch tracking like https://marl.github.io/crepe/ ?

Any training code?

Hi, I am wondering if there is any well-organized training codes by author that is open-sources? There are only prediction codes in the github I think.

sklearn issued warning message for passing attributes to `check_is_fitted`

sklearn issued warning message for passing attributes to check_is_fitted as follows:

FutureWarning: Passing attributes to check_is_fitted is deprecated and will be removed in 0.23. The attributes argument is ignored.
"argument is ignored.", FutureWarning)

The sklearn version that issued this warning: 0.22

Ways to reproduce:

Use the package version above
Run f0 estimation

rate, audio = scipy.io.wavfile.read('/your wav file.wav')
crepe.predict(audio, rate, viterbi=True)

This should not affect the prediction output, am I right?

f0 estimation in silent period

Hi, I have a question when I use CREPE to estimate f0 in human voice dataset. I found that the f0 has fluctuation even in audio's silent period. Is it normal? I am not sure if f0 should be constant during silent period. Is it because CREPE is not suitable in estimating human voice?
I also found in this literature that female f0 should be around 186 but I got around 220 here, and sometimes even higher (300).

imageio uninstall issue while installing CREPE via pip

At least in Ubuntu bionic (18.04) and pip 9.x, I see

DEPRECATION: Uninstalling a distutils installed project (imageio) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.

and in pip 10, the installation fails because of this.

We need a way to not trigger the uninstallation of imageio

WavFileWarning: Chunk (non-data) not understood, skipping it.

when trying to load file to analyse:
env DEBUGPY_LAUNCHER_PORT=44929 /usr/bin/python3 /home/hatmore/.vscode/extensions/ms-python.python-2020.3.71659/pythonFiles/lib/python/debugpy/wheels/debugpy/launcher /home/hatmore/Desktop/pyCodes/VocalRange/test.py
/home/hatmore/Desktop/pyCodes/VocalRange/test.py:4: WavFileWarning: Chunk (non-data) not understood, skipping it.
sr, audio = wavfile.read('test.wav')

License of stripped down model

Is it okay to use the stripped down model (from https://marl.github.io/crepe/) in other applications (with attribution)?

Test command line interface

... by directly calling the crepe command

Any API in Language C++?

Dear,
Is there any Api for C++,
I just want to use C++ in my project,
Could U help me,Please?
Thx

Viterbi tracking raises ValueError in hmmlearn

Hello Maintainers,

Calling crepe arctic_a0001.wav -V on the famous "Author of the danger trail" test sentence from Arctic database resulted in the following error. Calling without -V runs fine.

This is CREPE 0.0.4 installed from pip.

329/329 [==============================] - 7s 20ms/step
Traceback (most recent call last):
  File "/home/sleepwalking/contrib/miniconda2/bin/crepe", line 11, in <module>
    sys.exit(main())
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/cli.py", line 115, in main
    args.save_activation, args.save_plot, args.plot_voicing)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/cli.py", line 63, in run
    save_activation, save_plot, plot_voicing)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/core.py", line 252, in process_file
    time, frequency, confidence, activation = predict(audio, sr, viterbi)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/core.py", line 202, in predict
    cents = to_viterbi_cents(activation)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/core.py", line 122, in to_viterbi_cents
    path = model.predict(observations.reshape(-1, 1), [len(observations)])
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/base.py", line 334, in predict
    _, state_sequence = self.decode(X, lengths)
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/base.py", line 294, in decode
    self._check()
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/hmm.py", line 394, in _check
    super(MultinomialHMM, self)._check()
  File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/base.py", line 524, in _check
    .format(self.transmat_.sum(axis=1)))
ValueError: rows of transmat_ must sum to 1.0 (got [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0])

Optimize for CPU inference

Inference on CPU is very slow right now (often too slow for practical application).

I think TensorFlow already uses as many cpu cores as it has access to when running in cpu mode (?), so I'm not sure whether e.g. splitting the audio track and parallelizing inference via e.g. multiprocessing or jobilb would make any difference.

But, it might be worth checking out TF guide on performance, such as the performance guide or the info on model quantization.

Connect to coveralls for coverage reports

https://coveralls.io/github/marl/crepe

Frozen Model Conversion Failed

I'm trying to convert the keras models you've provided to frozen graphs so they can be used in a C# application. When I do this it throws an error when I try to load the model during conversion:

raise ValueError('Cannot create group in read-only mode.')
ValueError: Cannot create group in read-only mode.

I've found some discussions about how this means only the model's weights are included, not its architecture and dimensions.
Could you include a json file for the files dimensions in the models branch, or include frozen models?

About training and prediction

Hello,
First of all thanks for the amazing paper and the repo !!
I have a basic doubt, the RWC Dataset says that annotated data is at semi-tone intervals, that is 50 cents.
How is CREPE able to predict with 10 or 20 cent intervals?

ERROR: Command errored out with exit status 1:

报错啦~怎么处理呢？

pip install crepe

Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting crepe
Downloading http://mirrors.aliyun.com/pypi/packages/c8/74/1677b9369f233745b3dedf707ce26fb935c5c400379c45400df818f3a805/crepe-0.0.11.tar.gz (15 kB)

ERROR: Command errored out with exit status 1:
 command: 'd:\program files\python\python38\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = 
 '"'"'C:\\Users\\mac\\AppData\\Local\\Temp\\pip-install-frjzwa1k\\crepe\\setup.py'"'"'; __file__='"'"'C:\\Users\\mac\\AppData\\Local\\Temp\\pip-install-frjzwa1k\\crepe\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 
'C:\Users\mac\AppData\Local\Temp\pip-pip-egg-info-ky0m23t0'

     cwd: C:\Users\mac\AppData\Local\Temp\pip-install-frjzwa1k\crepe\
Complete output (57 lines):
C:\Users\mac\AppData\Local\Temp\pip-install-frjzwa1k\crepe\setup.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Traceback (most recent call last):
  File "d:\program files\python\python38\lib\urllib\request.py", line 1350, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "d:\program files\python\python38\lib\http\client.py", line 1240, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "d:\program files\python\python38\lib\http\client.py", line 1286, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "d:\program files\python\python38\lib\http\client.py", line 1235, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "d:\program files\python\python38\lib\http\client.py", line 1006, in _send_output
    self.send(msg)
  File "d:\program files\python\python38\lib\http\client.py", line 946, in send
    self.connect()
  File "d:\program files\python\python38\lib\http\client.py", line 1402, in connect
    super().connect()
  File "d:\program files\python\python38\lib\http\client.py", line 917, in connect
    self.sock = self._create_connection(
  File "d:\program files\python\python38\lib\socket.py", line 787, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "d:\program files\python\python38\lib\socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\mac\AppData\Local\Temp\pip-install-frjzwa1k\crepe\setup.py", line 30, in <module>
    urlretrieve(base_url + compressed_file, compressed_path)
  File "d:\program files\python\python38\lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "d:\program files\python\python38\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "d:\program files\python\python38\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "d:\program files\python\python38\lib\urllib\request.py", line 640, in http_response
    response = self.parent.error(
  File "d:\program files\python\python38\lib\urllib\request.py", line 563, in error
    result = self._call_chain(*args)
  File "d:\program files\python\python38\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "d:\program files\python\python38\lib\urllib\request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "d:\program files\python\python38\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "d:\program files\python\python38\lib\urllib\request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "d:\program files\python\python38\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "d:\program files\python\python38\lib\urllib\request.py", line 1393, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "d:\program files\python\python38\lib\urllib\request.py", line 1353, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
Downloading weight file model-tiny.h5.bz2 ...
----------------------------------------

ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.`

Larger values for step_size

Hi, I noticed in this package as well as in some similar ones, that the step_size is very small - for example 10ms. What would happen if the step_size is set to for example 1000ms or even 2500ms? Would this give similar results to when within 1000ms a bag of all the 10ms is taken? The reason is that I am interested by an average pitch every second or every few seconds and don't need to know for every 10ms the pitch. This would also reduce the calculation time as far as I see.
But is there something logically flawed in taking larger values for the step_size?
Thank you and kind regards!

Guidelines for contributing

We should add guidelines + templates for posting issues and contributing code.

There are examples for both in Librosa we could work off of:
https://github.com/librosa/librosa/blob/master/CONTRIBUTING.md

@jongwook I'm happy to put this together if it sounds good to you?

Bug in Viterbi decoding?

On core.py#L149 we're passing the argmax of pitch salience matrix as observations to the decoder, but shouldn't we be passing the salience matrix directly? <-- @jongwook

Classification vs regression

Thanks for the great repo!
I have this typical question. Have you tried to solve it as a regression problem, e.g., predicting the pitch index directly for example? or cent or hz or whatever. I'd appreciate if you could share your research experience around it.

Computing Target from Frequency

Hi there!

CREPE is great! I am doing my own experiments, trying to learn more.

I wonder, how exactly do you compute the target vector (360-cents) given a frequency?

Best,
Tristan

Speech

Since this model is largely training on songs and instrumentals, will it work reliably for speech analysis? Why not train the model on regular speech?

when pitch is 0?

When the pitch in audio is zero, the ouput is the most likely pitch but not zero. Why the model do not take silence into consideration?

Add option to select GPU.

Hi, when I use CREPE, it automatically selects the GPU which is currently used for other jobs. Therefore, it returns out of memory error even if there is another GPU that is not used.

If the user can select the GPU which the user want to use, we can solve this type of an error.

Step_size is not an int, also why not give access to hop_length?

Step_size does not need to be an int, floats work just fine and may be necessary if you are targeting a certain number of resulting samples. I’m currently trying to use this to replace another scheme that uses the hop length directly, so for me it would have been useful to directly set the value instead of back-calculating the step_size to match the needed hop_length.

Otherwise - very impressive works very well.
Thanks

Add more unit tests

Currently only process_file is unit tested. We need to add tests for all other functions, including those exposed by the API (predict and get_activation) and those that aren't.

imshow is deprecated, replace with matplotlib

imshow is deprecated, we should use matplotlib for saving the figure to disk (we're already importing it, so might as well).

An aside is that I don't know if it's a good idea to make matplotlib a dependency... it's a pretty huge lib.

Option to use a smaller model for faster computation

Currently the model capacity multiplier is fixed to 32, but this can be adjusted as a trade-off between the computation time and accuracy. Roughly speaking, the number of parameters is quadratic to this multiplier.

The one that is deployed on https://marl.github.io/crepe uses model multiplier 4, and still achieves quite comparable performance:

multiplier   #params    RPA
        32    22.24M    93.75%
        16    5.879M    93.22%
         8    1.629M    92.47%
         4      486k    91.52%

(note that these numbers are on MedleyDB v1 and not comparable to what's reported in the paper)

We'd like to have an option to select a smaller model, for faster calculation in the cost of slightly lower accuracy.

I'd suggest the following options for specifying the model capacity.

CLI Option	multiplier	# of params	Model file size
`--model-capacity full`	32	22M	88 MB
`--model-capacity large`	24	12M	48 MB
`--model-capacity medium`	16	5.9M	24 MB
`--model-capacity small`	8	1.6M	6.4 MB
`--model-capacity tiny`	4	486k	1.9 MB

Currently the size of the PyPI archive is 57.7 MB, very close to the 60 MB limit, and it's only possible to add the tiny for immediate upload to PyPI. Requesting quota increase on PyPI is possible, but it seems quite difficult and not so sure if they will allow us the increase.

We can alternatively put the models on a separate branch in this repo, and have the code download the models during the installation or first use.

Let me do the former first (adding tiny), and figure out how to add the other three later.

How to train the model by myself with the method

Dear,
When I use
crepe audio_file.wav
in piano pitches track,
the results are not good in low frequency,so I just want to train the model by myself using DCNN proposed .
Please help me,SOS
Thx

make it into a module

move the content of crepe.py into the subdirectory crepe
only expose the public API crepe.predict and crepe.get_activation
write setup.py so that setuptools can install the module
include the entry point in the module and in setup.py so that the CLI still work
upload crepe to PyPI as version 0.0.x

Add coverage testing

Add unit test coverage testing via coveralls

Bug in wave normalization.

Hi, thank you for sharing this awesome work.

I've encountered this code below, and I dont' think this is correct.

crepe/core.py

    frames = as_strided(audio, shape=(1024, n_frames),
                        strides=(audio.itemsize, hop_length * audio.itemsize))
    frames = frames.transpose()

    # normalize each frame -- this is expected by the model
    frames -= np.mean(frames, axis=1)[:, np.newaxis]
    frames /= np.std(frames, axis=1)[:, np.newaxis]

This code is meant to normalize wave by each frame, but numpy's as_strided function does not allocate new memory, but does just create a new view of the array. (for example, the memory of first frame and the memory of second frame is not separated.)

See numpy documentation for more details.

So when do "in-place" calculation on that array, unintented result may raise. Here is an example.

np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )

# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
                    strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose()

# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]

print(frames.sum())

146.7856165123937

Since frame is normalized, its sum should be about zero. but it's not.

To avoid this behavior, target array should be copied or something, to do operation correctly as intended. here is an example. only one line is changed. (.copy())

np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )

# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
                    strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose().copy()

# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]

print(frames.sum())

-2.384759056894836e-13

This code is tested with numpy version 1.17.4

where is the wav file？

Dear，
where is audio_file.wav，Help
Please！

Instructions for real-time pitch detection

I'm interested in using crepe in an online app. I'd like to use crepe in a very similar way as it is used in the example website: marl.github.io/crepe/. The readme shows how to use it for wav files, is there somewhere I can find instructions on how to use it in real-time microphone input data?

Performance on Instrumental Audio Tracks

Is Crepe basically trained on vocals (speeches)? I tried myself to track pitches on separate vocals and instrumental tracks of the same song and it resulted in considerably high pitch differences at the same time points for vocals and instrumental.

Timestamps not zero-centered?

The first timestamp returned by CREPE is 0.0, but the first frame (I think) goes from samples 0-1023, meaning the first timestamp should actually be 512/16000 = 0.032 = 32ms.

Or, a better solution, lets zero-pad the input signal by 512 samples, so that the first frame is indeed centered at time 0.

Does the training data in the paper differ from the provided models on Github?

Hi,

thanks for sharing your work! 🚀

I'm a bit confused which dataset and instruments were actually used for the provided models on Github.

In the paper, only RWC-synth and
MDB-stem-synth are specified as datasets. In the conclusions, a training on NSynth is mentioned for future work

In the README on Github:

The provided model is trained using the following datasets, composed of vocal and instrumental audio, and is therefore expected to work best on this type of audio signals.
* MIR-1K [1]
* Bach10 [2]
* RWC-Synth [3]
* MedleyDB [4]
* MDB-STEM-Synth [5]
* NSynth [6]

How did you determine the ground-truth f0 for the datasets that have only an annotated midi pitch, e.g. NSynth?

Improperly cached cents conversion matrix

Hi,
In these lines

crepe/crepe/core.py

Lines 100 to 103 in 7f0bf05

 if not hasattr(to_local_average_cents, 'cents_mapping'): 

 # the bin number-to-cents mapping 

 to_local_average_cents.mapping = ( 

 np.linspace(0, 7180, 360) + 1997.3794084376191)

the attribute used to check for the cached conversion matrix, cents_mapping, is not the actual attribute subsequently used for caching (mapping is used).
This makes this IF statement always-passing and the value is recomputed and recached everytime the function gets called.

Add option to suppress TF printouts

Depending on your setup TF can get quite verbose, e.g.:

Using TensorFlow backend.
/Users/justin/Documents/dev/miniconda3/envs/py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.5
  return f(*args, **kwds)
2018-07-13 11:30:52.866593: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

While I think we should keep these messages by default, it might be nice to add an optional argument to suppress them. Thoughts?

	if not hasattr(to_local_average_cents, 'cents_mapping'):
	# the bin number-to-cents mapping
	to_local_average_cents.mapping = (
	np.linspace(0, 7180, 360) + 1997.3794084376191)

marl / crepe Goto Github PK

crepe's Issues

Recommend Projects

Recommend Topics

Recommend Org