marl / crepe Goto Github PK
View Code? Open in Web Editor NEWCREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
Home Page: https://marl.github.io/crepe/
License: MIT License
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
Home Page: https://marl.github.io/crepe/
License: MIT License
Running crepe on a 16kHz mono wav file with 102400 samples and a step size of 10ms produces 641 pitch estimates instead of 640 (both via command line and Python interface). We'd expect a hop size of 16000 / 1000 * 10 == 160
samples for a step size of 10ms. An audio clip with 102400 samples should have 102400 / 160 == 640
estimates.
You can create a synthetic audio clip to reproduce:
import crepe
import numpy as np
x = np.random.normal(size=[102400])
x = np.clip(x, -1.0, 1.0)
time, frequency, confidence, activation = crepe.predict(x, 16000, viterbi=True)
# observe len(frequency) == 641
When using crepe directly in python (via import crepe
), calling crepe.predict()
crashes because the keras model is never loaded (build_and_load_model()
is never called). The result is the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-38145eddd20a> in <module>()
----> 1 (time, frequency, confidence, activation) = crepe.predict(audio, sr, viterbi=True)
~/Documents/dev/miniconda3/envs/py35/lib/python3.5/site-packages/crepe/core.py in predict(audio, sr, viterbi)
192 The raw activation matrix
193 """
--> 194 activation = get_activation(audio, sr)
195 confidence = activation.max(axis=1)
196
~/Documents/dev/miniconda3/envs/py35/lib/python3.5/site-packages/crepe/core.py in get_activation(audio, sr)
162
163 # run prediction and convert the frequency bin weights to Hz
--> 164 return model.predict(frames, verbose=1)
165
166
AttributeError: 'NoneType' object has no attribute 'predict'
The fix is to add a check in get_activation()
and load the model if it's None
. Imminent fix coming via bz2model branch.
The model was trained using TF, and we don't really know if/how it'd work with a different backend. Perhaps we should add tensorflow as a requirement, and set as a minimum version the TF version used to train the model?
Hi,
When I try to install crepe v0.0.9 i get the following error:
Could not find a version that satisfies the requirement tensorflow==2.0 (from versions: 0.12.0rc0, 0.12.0rc1, 0.12.0, 0.12.1, 1.0.0, 1.0.1, 1.1.0rc0, 1.1.0rc1, 1.1.0rc2, 1.1.0, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.3.0rc0, 1.3.0rc1, 1.3.0rc2, 1.3.0, 1.4.0rc0, 1.4.0rc1, 1.4.0, 1.4.1, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.8.0rc0, 1.8.0rc1, 1.8.0, 1.9.0rc0, 1.9.0rc1, 1.9.0rc2, 1.9.0, 1.10.0rc0, 1.10.0rc1, 1.10.0, 1.10.1, 1.11.0rc0, 1.11.0rc1, 1.11.0rc2, 1.11.0, 1.12.0rc0, 1.12.0rc1, 1.12.0rc2, 1.12.0, 1.12.2, 1.12.3, 1.13.0rc0, 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1)
I have all the latest versions of tensorflow, python and pip installed and am running on Ubuntu 18.04.02
Maybe the tensorflow==2.0 line in the requirements file should read tensorflow>=2.0.0a0 ? I installed version 7 though and everything worked just fine.
I was wondering if Crepe model would allow transcriptions of instruments with polyphonic notes (for example chords played by a violin), maybe changing somehow the model and retraining it would allow it to handle polyphony?
At the moment the predictions are calculated for every 10 milliseconds.
It'd be good to have an option to specify this interval, to compensate between the time precision and the time it takes to run the prediction.
Since we use 16kHz, 1 millisecond corresponds to 16 samples, and we'd get integer hop sizes for time steps given in integer multiples of milliseconds.
So I was trying to test the network on some streaming data but I encountered a funny bug. If I call predict once before doing anything else in the script it works fine. If I don't it segfaults (at least on my machine) with
Using TensorFlow backend.
Aborted (core dumped)
import numpy as np
import sounddevice as sd
import queue
import sys
import crepe
if __name__ == "__main__":
fs = 16000
frameSize = 2048
# ------- IF I REMOVE THIS LINE IT SEGFAULTS ---------
time, frequency, confidence, activation = crepe.predict(np.zeros(2048), fs, viterbi=False)
q = queue.Queue()
def audio_callback(indata, frames, time, status):
if status:
print(status, file=sys.stderr)
q.put(indata)
stream = sd.InputStream(device=None, channels=1, samplerate=fs, callback=audio_callback)
recdata = np.zeros(frameSize, np.float64)
with stream:
while True:
try:
data = q.get_nowait()
except queue.Empty:
continue
shift = len(data)
recdata = np.roll(recdata, -shift, axis=0)
recdata[-shift:] = data[:, 0]
time, frequency, confidence, activation = crepe.predict(recdata, fs, viterbi=False)
I have no clue what might cause this... Maybe some conflict with sounddevice initialization?
when training, audios should be padded to the same length, or the generated audios just the same length?
I would like to use the output of Crepe to determine whether singer is active versus silent at the perceptual level. That should change at the level of seconds, not milliseconds. Setting a hard threshold based on confidence, though, results in a quick alternation between the two states. The alternation shows in the thick vertical lines in the plots below.
Viterbi would be a straightforward approach to smoothing this out. The current version, though, only applies smoothing to the pitch. I wrote an extension and added it to a pull request in case it would be useful for others: #26.
Code for this plot:
import csv
import matplotlib.pyplot as plt
import numpy as np
f0 = []
conf = []
thresh = 0.5
with open('MUSDB18HQ/train/Music Delta - Hendrix/vocals.f0.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
print(f'Column names are {", ".join(row)}')
line_count += 1
else:
f0.append(float(row[1]))
conf.append(float(row[2]))
line_count += 1
print(f'Processed {line_count} lines.')
voiced = [1 if c > thresh else 0 for c in conf]
# plt.plot(np.array(f0) * np.array(voiced))
plt.plot(np.array(voiced))
plt.show()
As we update dependencies (e.g. TF), there's a real risk of changed behavior going unnoticed (e.g. performance drop). We should add regression tests to ensure the output remains consistent.
How can I implement a real time tracker using the crepe, as in the README file, the crepe takes only wav file to do pitch tracking like https://marl.github.io/crepe/ ?
Hi, I am wondering if there is any well-organized training codes by author that is open-sources? There are only prediction codes in the github I think.
sklearn
issued warning message for passing attributes to check_is_fitted
as follows:
FutureWarning: Passing attributes to check_is_fitted is deprecated and will be removed in 0.23. The attributes argument is ignored.
"argument is ignored.", FutureWarning)
The sklearn version that issued this warning: 0.22
Ways to reproduce:
rate, audio = scipy.io.wavfile.read('/your wav file.wav')
crepe.predict(audio, rate, viterbi=True)
This should not affect the prediction output, am I right?
Hi, I have a question when I use CREPE to estimate f0 in human voice dataset. I found that the f0 has fluctuation even in audio's silent period. Is it normal? I am not sure if f0 should be constant during silent period. Is it because CREPE is not suitable in estimating human voice?
I also found in this literature that female f0 should be around 186 but I got around 220 here, and sometimes even higher (300).
At least in Ubuntu bionic (18.04) and pip 9.x, I see
DEPRECATION: Uninstalling a distutils installed project (imageio) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
and in pip 10, the installation fails because of this.
We need a way to not trigger the uninstallation of imageio
when trying to load file to analyse:
env DEBUGPY_LAUNCHER_PORT=44929 /usr/bin/python3 /home/hatmore/.vscode/extensions/ms-python.python-2020.3.71659/pythonFiles/lib/python/debugpy/wheels/debugpy/launcher /home/hatmore/Desktop/pyCodes/VocalRange/test.py
/home/hatmore/Desktop/pyCodes/VocalRange/test.py:4: WavFileWarning: Chunk (non-data) not understood, skipping it.
sr, audio = wavfile.read('test.wav')
Is it okay to use the stripped down model (from https://marl.github.io/crepe/) in other applications (with attribution)?
... by directly calling the crepe
command
Dear,
Is there any Api for C++,
I just want to use C++ in my project,
Could U help me,Please?
Thx
Hello Maintainers,
Calling crepe arctic_a0001.wav -V
on the famous "Author of the danger trail" test sentence from Arctic database resulted in the following error. Calling without -V runs fine.
This is CREPE 0.0.4 installed from pip.
329/329 [==============================] - 7s 20ms/step
Traceback (most recent call last):
File "/home/sleepwalking/contrib/miniconda2/bin/crepe", line 11, in <module>
sys.exit(main())
File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/cli.py", line 115, in main
args.save_activation, args.save_plot, args.plot_voicing)
File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/cli.py", line 63, in run
save_activation, save_plot, plot_voicing)
File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/core.py", line 252, in process_file
time, frequency, confidence, activation = predict(audio, sr, viterbi)
File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/core.py", line 202, in predict
cents = to_viterbi_cents(activation)
File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/crepe/core.py", line 122, in to_viterbi_cents
path = model.predict(observations.reshape(-1, 1), [len(observations)])
File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/base.py", line 334, in predict
_, state_sequence = self.decode(X, lengths)
File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/base.py", line 294, in decode
self._check()
File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/hmm.py", line 394, in _check
super(MultinomialHMM, self)._check()
File "/home/sleepwalking/contrib/miniconda2/lib/python2.7/site-packages/hmmlearn/base.py", line 524, in _check
.format(self.transmat_.sum(axis=1)))
ValueError: rows of transmat_ must sum to 1.0 (got [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0])
Inference on CPU is very slow right now (often too slow for practical application).
I think TensorFlow already uses as many cpu cores as it has access to when running in cpu mode (?), so I'm not sure whether e.g. splitting the audio track and parallelizing inference via e.g. multiprocessing or jobilb would make any difference.
But, it might be worth checking out TF guide on performance, such as the performance guide or the info on model quantization.
I'm trying to convert the keras models you've provided to frozen graphs so they can be used in a C# application. When I do this it throws an error when I try to load the model during conversion:
raise ValueError('Cannot create group in read-only mode.')
ValueError: Cannot create group in read-only mode.
I've found some discussions about how this means only the model's weights are included, not its architecture and dimensions.
Could you include a json file for the files dimensions in the models branch, or include frozen models?
Hello,
First of all thanks for the amazing paper and the repo !!
I have a basic doubt, the RWC Dataset says that annotated data is at semi-tone intervals, that is 50 cents.
How is CREPE able to predict with 10 or 20 cent intervals?
报错啦~怎么处理呢?
`
pip install crepe
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting crepe
Downloading http://mirrors.aliyun.com/pypi/packages/c8/74/1677b9369f233745b3dedf707ce26fb935c5c400379c45400df818f3a805/crepe-0.0.11.tar.gz (15 kB)
ERROR: Command errored out with exit status 1:
command: 'd:\program files\python\python38\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] =
'"'"'C:\\Users\\mac\\AppData\\Local\\Temp\\pip-install-frjzwa1k\\crepe\\setup.py'"'"'; __file__='"'"'C:\\Users\\mac\\AppData\\Local\\Temp\\pip-install-frjzwa1k\\crepe\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base
'C:\Users\mac\AppData\Local\Temp\pip-pip-egg-info-ky0m23t0'
cwd: C:\Users\mac\AppData\Local\Temp\pip-install-frjzwa1k\crepe\
Complete output (57 lines):
C:\Users\mac\AppData\Local\Temp\pip-install-frjzwa1k\crepe\setup.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
Traceback (most recent call last):
File "d:\program files\python\python38\lib\urllib\request.py", line 1350, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "d:\program files\python\python38\lib\http\client.py", line 1240, in request
self._send_request(method, url, body, headers, encode_chunked)
File "d:\program files\python\python38\lib\http\client.py", line 1286, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "d:\program files\python\python38\lib\http\client.py", line 1235, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "d:\program files\python\python38\lib\http\client.py", line 1006, in _send_output
self.send(msg)
File "d:\program files\python\python38\lib\http\client.py", line 946, in send
self.connect()
File "d:\program files\python\python38\lib\http\client.py", line 1402, in connect
super().connect()
File "d:\program files\python\python38\lib\http\client.py", line 917, in connect
self.sock = self._create_connection(
File "d:\program files\python\python38\lib\socket.py", line 787, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "d:\program files\python\python38\lib\socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11004] getaddrinfo failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\mac\AppData\Local\Temp\pip-install-frjzwa1k\crepe\setup.py", line 30, in <module>
urlretrieve(base_url + compressed_file, compressed_path)
File "d:\program files\python\python38\lib\urllib\request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "d:\program files\python\python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "d:\program files\python\python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "d:\program files\python\python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "d:\program files\python\python38\lib\urllib\request.py", line 563, in error
result = self._call_chain(*args)
File "d:\program files\python\python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "d:\program files\python\python38\lib\urllib\request.py", line 755, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "d:\program files\python\python38\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "d:\program files\python\python38\lib\urllib\request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "d:\program files\python\python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "d:\program files\python\python38\lib\urllib\request.py", line 1393, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "d:\program files\python\python38\lib\urllib\request.py", line 1353, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
Downloading weight file model-tiny.h5.bz2 ...
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.`
Hi, I noticed in this package as well as in some similar ones, that the step_size is very small - for example 10ms. What would happen if the step_size is set to for example 1000ms or even 2500ms? Would this give similar results to when within 1000ms a bag of all the 10ms is taken? The reason is that I am interested by an average pitch every second or every few seconds and don't need to know for every 10ms the pitch. This would also reduce the calculation time as far as I see.
But is there something logically flawed in taking larger values for the step_size?
Thank you and kind regards!
We should add guidelines + templates for posting issues and contributing code.
There are examples for both in Librosa we could work off of:
https://github.com/librosa/librosa/blob/master/CONTRIBUTING.md
@jongwook I'm happy to put this together if it sounds good to you?
On core.py#L149 we're passing the argmax of pitch salience matrix as observations to the decoder, but shouldn't we be passing the salience matrix directly? <-- @jongwook
Thanks for the great repo!
I have this typical question. Have you tried to solve it as a regression problem, e.g., predicting the pitch index directly for example? or cent or hz or whatever. I'd appreciate if you could share your research experience around it.
Hi there!
CREPE is great! I am doing my own experiments, trying to learn more.
I wonder, how exactly do you compute the target vector (360-cents) given a frequency?
Best,
Tristan
Since this model is largely training on songs and instrumentals, will it work reliably for speech analysis? Why not train the model on regular speech?
When the pitch in audio is zero, the ouput is the most likely pitch but not zero. Why the model do not take silence into consideration?
Hi, when I use CREPE, it automatically selects the GPU which is currently used for other jobs. Therefore, it returns out of memory error even if there is another GPU that is not used.
If the user can select the GPU which the user want to use, we can solve this type of an error.
Step_size does not need to be an int, floats work just fine and may be necessary if you are targeting a certain number of resulting samples. I’m currently trying to use this to replace another scheme that uses the hop length directly, so for me it would have been useful to directly set the value instead of back-calculating the step_size to match the needed hop_length.
Otherwise - very impressive works very well.
Thanks
Currently only process_file
is unit tested. We need to add tests for all other functions, including those exposed by the API (predict
and get_activation
) and those that aren't.
imshow is deprecated, we should use matplotlib for saving the figure to disk (we're already importing it, so might as well).
An aside is that I don't know if it's a good idea to make matplotlib a dependency... it's a pretty huge lib.
Currently the model capacity multiplier is fixed to 32, but this can be adjusted as a trade-off between the computation time and accuracy. Roughly speaking, the number of parameters is quadratic to this multiplier.
The one that is deployed on https://marl.github.io/crepe uses model multiplier 4, and still achieves quite comparable performance:
multiplier #params RPA
32 22.24M 93.75%
16 5.879M 93.22%
8 1.629M 92.47%
4 486k 91.52%
(note that these numbers are on MedleyDB v1 and not comparable to what's reported in the paper)
We'd like to have an option to select a smaller model, for faster calculation in the cost of slightly lower accuracy.
I'd suggest the following options for specifying the model capacity.
CLI Option | multiplier | # of params | Model file size |
---|---|---|---|
--model-capacity full |
32 | 22M | 88 MB |
--model-capacity large |
24 | 12M | 48 MB |
--model-capacity medium |
16 | 5.9M | 24 MB |
--model-capacity small |
8 | 1.6M | 6.4 MB |
--model-capacity tiny |
4 | 486k | 1.9 MB |
Currently the size of the PyPI archive is 57.7 MB, very close to the 60 MB limit, and it's only possible to add the tiny
for immediate upload to PyPI. Requesting quota increase on PyPI is possible, but it seems quite difficult and not so sure if they will allow us the increase.
We can alternatively put the models on a separate branch in this repo, and have the code download the models during the installation or first use.
Let me do the former first (adding tiny
), and figure out how to add the other three later.
Dear,
When I use
crepe audio_file.wav
in piano pitches track,
the results are not good in low frequency,so I just want to train the model by myself using DCNN proposed .
Please help me,SOS
Thx
crepe.py
into the subdirectory crepe
crepe.predict
and crepe.get_activation
setup.py
so that setuptools can install the modulesetup.py
so that the CLI still workcrepe
to PyPI as version 0.0.x
Add unit test coverage testing via coveralls
Hi, thank you for sharing this awesome work.
I've encountered this code below, and I dont' think this is correct.
crepe/core.py
frames = as_strided(audio, shape=(1024, n_frames),
strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose()
# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]
This code is meant to normalize wave by each frame, but numpy's as_strided function does not allocate new memory, but does just create a new view of the array. (for example, the memory of first frame and the memory of second frame is not separated.)
See numpy documentation for more details.
So when do "in-place" calculation on that array, unintented result may raise. Here is an example.
np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )
# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose()
# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]
print(frames.sum())
146.7856165123937
Since frame is normalized, its sum should be about zero. but it's not.
To avoid this behavior, target array should be copied or something, to do operation correctly as intended. here is an example. only one line is changed. (.copy())
np.random.seed(42)
model_srate,step_size = 16000, 10
audio = np.random.randn(16000) # (16000, )
# below code is from capre/core.py#get_activation
hop_length = int(model_srate * step_size / 1000)
n_frames = 1 + int((len(audio) - 1024) / hop_length)
frames = as_strided(audio, shape=(1024, n_frames),
strides=(audio.itemsize, hop_length * audio.itemsize))
frames = frames.transpose().copy()
# normalize each frame -- this is expected by the model
frames -= np.mean(frames, axis=1)[:, np.newaxis]
frames /= np.std(frames, axis=1)[:, np.newaxis]
print(frames.sum())
-2.384759056894836e-13
Dear,
where is audio_file.wav,Help
Please!
I'm interested in using crepe in an online app. I'd like to use crepe in a very similar way as it is used in the example website: marl.github.io/crepe/. The readme shows how to use it for wav files, is there somewhere I can find instructions on how to use it in real-time microphone input data?
Is Crepe basically trained on vocals (speeches)? I tried myself to track pitches on separate vocals and instrumental tracks of the same song and it resulted in considerably high pitch differences at the same time points for vocals and instrumental.
The first timestamp returned by CREPE is 0.0, but the first frame (I think) goes from samples 0-1023, meaning the first timestamp should actually be 512/16000 = 0.032 = 32ms.
Or, a better solution, lets zero-pad the input signal by 512 samples, so that the first frame is indeed centered at time 0.
Hi,
thanks for sharing your work! 🚀
I'm a bit confused which dataset and instruments were actually used for the provided models on Github.
In the paper, only RWC-synth and
MDB-stem-synth are specified as datasets. In the conclusions, a training on NSynth is mentioned for future work
In the README on Github:
The provided model is trained using the following datasets, composed of vocal and instrumental audio, and is therefore expected to work best on this type of audio signals.
* MIR-1K [1]
* Bach10 [2]
* RWC-Synth [3]
* MedleyDB [4]
* MDB-STEM-Synth [5]
* NSynth [6]
How did you determine the ground-truth f0 for the datasets that have only an annotated midi pitch, e.g. NSynth?
Hi,
In these lines
Lines 100 to 103 in 7f0bf05
cents_mapping
, is not the actual attribute subsequently used for caching (mapping
is used).Depending on your setup TF can get quite verbose, e.g.:
Using TensorFlow backend.
/Users/justin/Documents/dev/miniconda3/envs/py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.5
return f(*args, **kwds)
2018-07-13 11:30:52.866593: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
While I think we should keep these messages by default, it might be nice to add an optional argument to suppress them. Thoughts?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.