w-hc / torch_audioset Goto Github PK
View Code? Open in Web Editor NEWPyTorch transcribed audioset classifier, including VGGish and YAMNet, along with utils to manipulate autioset category ontology.
License: MIT License
PyTorch transcribed audioset classifier, including VGGish and YAMNet, along with utils to manipulate autioset category ontology.
License: MIT License
In the file 'torch_audioset/data/torch_input_processing.py', I only find function called 'VGGishLogMelSpectrogram'. But I can't find the way to process audio feature for yamnet. Can you can give me an example to use pytorch Yamnet to classify audio?
With latest Pytorch version:
######## CONFIG ################
cffi==1.14.6
cycler==0.10.0
kiwisolver==1.3.1
matplotlib==3.4.2
numpy==1.21.1
Pillow==8.3.1
pycparser==2.20
pyparsing==2.4.7
python-dateutil==2.8.2
PyYAML==5.4.1
six==1.16.0
SoundFile==0.10.3.post1
torch==1.9.0
torchaudio==0.9.0
typing-extensions==3.10.0.0
##################################
Running:
python visualization.py /path/to/a/sound.wav
It throws:
#####################################
Traceback (most recent call last):
File "visualization.py", line 34, in
pt_model = torch_yamnet(pretrained=False)
File "/home/guillaume/torch_audioset/torch_audioset/yamnet/model.py", line 167, in yamnet
model = YAMNet()
File "/home/guillaume/torch_audioset/torch_audioset/yamnet/model.py", line 139, in init
self.add_module(name, layer_mod(kernel, stride, input_dim, output_dim))
File "/home/guillaume/torch_audioset/torch_audioset/yamnet/model.py", line 79, in init
Conv2d_tf(
File "/home/guillaume/torch_audioset/torch_audioset/yamnet/model.py", line 19, in init
super().init(*args, **kwargs)
File "/home/guillaume/torch_audioset/.venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 430, in init
super(Conv2d, self).init(
File "/home/guillaume/torch_audioset/.venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 90, in init
raise ValueError(
ValueError: Invalid padding string 'SAME', should be one of {'valid', 'same'}
#######################################
Then, when I replace "SAME" by "same" in file torch_audioset/yamnet/model.py
It throws:
##################################
Traceback (most recent call last):
File "visualization.py", line 34, in
pt_model = torch_yamnet(pretrained=False)
File "/home/guillaume/torch_audioset/torch_audioset/yamnet/model.py", line 167, in yamnet
model = YAMNet()
File "/home/guillaume/torch_audioset/torch_audioset/yamnet/model.py", line 139, in init
self.add_module(name, layer_mod(kernel, stride, input_dim, output_dim))
File "/home/guillaume/torch_audioset/torch_audioset/yamnet/model.py", line 79, in init
Conv2d_tf(
File "/home/guillaume/torch_audioset/torch_audioset/yamnet/model.py", line 19, in init
super().init(*args, **kwargs)
File "/home/guillaume/torch_audioset/.venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 430, in init
super(Conv2d, self).init(
File "/home/guillaume/torch_audioset/.venv/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 94, in init
raise ValueError("padding='same' is not supported for strided convolutions")
ValueError: padding='same' is not supported for strided convolutions
###################################
Then, I don't know what to do :/
Environment:
CUDA_VISIBLE_DEVICES= python convert_yamnet.py
2021-03-26 17:33:08.960500: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-26 17:33:10.344642: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-26 17:33:10.345393: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-03-26 17:33:10.376690: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-03-26 17:33:10.376728: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: f89b1701d83f
2021-03-26 17:33:10.376735: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: f89b1701d83f
2021-03-26 17:33:10.376815: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.39.0
2021-03-26 17:33:10.376837: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.39.0
2021-03-26 17:33:10.376844: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.39.0
2021-03-26 17:33:10.377093: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-26 17:33:10.377559: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-26 17:33:10.380729: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-03-26 17:33:10.383577: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3499910000 Hz
WARNING:tensorflow:When passing input data as arrays, do not specify `steps_per_epoch`/`steps` argument. Please use `batch_size` instead.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
matching layer1.fused.conv.weight <---> layer1/conv/layer1/conv/kernel:0
matching layer1.fused.bn.bias <---> layer1/conv/bn/layer1/conv/bn/beta:0
matching layer1.fused.bn.running_mean <---> layer1/conv/bn/layer1/conv/bn/moving_mean:0
matching layer1.fused.bn.running_var <---> layer1/conv/bn/layer1/conv/bn/moving_variance:0
matching layer2.depthwise_conv.conv.weight <---> layer2/depthwise_conv/layer2/depthwise_conv/depthwise_kernel:0
matching layer2.depthwise_conv.bn.bias <---> layer2/depthwise_conv/bn/layer2/depthwise_conv/bn/beta:0
matching layer2.depthwise_conv.bn.running_mean <---> layer2/depthwise_conv/bn/layer2/depthwise_conv/bn/moving_mean:0
matching layer2.depthwise_conv.bn.running_var <---> layer2/depthwise_conv/bn/layer2/depthwise_conv/bn/moving_variance:0
matching layer2.pointwise_conv.conv.weight <---> layer2/pointwise_conv/layer2/pointwise_conv/kernel:0
matching layer2.pointwise_conv.bn.bias <---> layer2/pointwise_conv/bn/layer2/pointwise_conv/bn/beta:0
matching layer2.pointwise_conv.bn.running_mean <---> layer2/pointwise_conv/bn/layer2/pointwise_conv/bn/moving_mean:0
matching layer2.pointwise_conv.bn.running_var <---> layer2/pointwise_conv/bn/layer2/pointwise_conv/bn/moving_variance:0
matching layer3.depthwise_conv.conv.weight <---> layer3/depthwise_conv/layer3/depthwise_conv/depthwise_kernel:0
matching layer3.depthwise_conv.bn.bias <---> layer3/depthwise_conv/bn/layer3/depthwise_conv/bn/beta:0
matching layer3.depthwise_conv.bn.running_mean <---> layer3/depthwise_conv/bn/layer3/depthwise_conv/bn/moving_mean:0
matching layer3.depthwise_conv.bn.running_var <---> layer3/depthwise_conv/bn/layer3/depthwise_conv/bn/moving_variance:0
matching layer3.pointwise_conv.conv.weight <---> layer3/pointwise_conv/layer3/pointwise_conv/kernel:0
matching layer3.pointwise_conv.bn.bias <---> layer3/pointwise_conv/bn/layer3/pointwise_conv/bn/beta:0
matching layer3.pointwise_conv.bn.running_mean <---> layer3/pointwise_conv/bn/layer3/pointwise_conv/bn/moving_mean:0
matching layer3.pointwise_conv.bn.running_var <---> layer3/pointwise_conv/bn/layer3/pointwise_conv/bn/moving_variance:0
matching layer4.depthwise_conv.conv.weight <---> layer4/depthwise_conv/layer4/depthwise_conv/depthwise_kernel:0
matching layer4.depthwise_conv.bn.bias <---> layer4/depthwise_conv/bn/layer4/depthwise_conv/bn/beta:0
matching layer4.depthwise_conv.bn.running_mean <---> layer4/depthwise_conv/bn/layer4/depthwise_conv/bn/moving_mean:0
matching layer4.depthwise_conv.bn.running_var <---> layer4/depthwise_conv/bn/layer4/depthwise_conv/bn/moving_variance:0
matching layer4.pointwise_conv.conv.weight <---> layer4/pointwise_conv/layer4/pointwise_conv/kernel:0
matching layer4.pointwise_conv.bn.bias <---> layer4/pointwise_conv/bn/layer4/pointwise_conv/bn/beta:0
matching layer4.pointwise_conv.bn.running_mean <---> layer4/pointwise_conv/bn/layer4/pointwise_conv/bn/moving_mean:0
matching layer4.pointwise_conv.bn.running_var <---> layer4/pointwise_conv/bn/layer4/pointwise_conv/bn/moving_variance:0
matching layer5.depthwise_conv.conv.weight <---> layer5/depthwise_conv/layer5/depthwise_conv/depthwise_kernel:0
matching layer5.depthwise_conv.bn.bias <---> layer5/depthwise_conv/bn/layer5/depthwise_conv/bn/beta:0
matching layer5.depthwise_conv.bn.running_mean <---> layer5/depthwise_conv/bn/layer5/depthwise_conv/bn/moving_mean:0
matching layer5.depthwise_conv.bn.running_var <---> layer5/depthwise_conv/bn/layer5/depthwise_conv/bn/moving_variance:0
matching layer5.pointwise_conv.conv.weight <---> layer5/pointwise_conv/layer5/pointwise_conv/kernel:0
matching layer5.pointwise_conv.bn.bias <---> layer5/pointwise_conv/bn/layer5/pointwise_conv/bn/beta:0
matching layer5.pointwise_conv.bn.running_mean <---> layer5/pointwise_conv/bn/layer5/pointwise_conv/bn/moving_mean:0
matching layer5.pointwise_conv.bn.running_var <---> layer5/pointwise_conv/bn/layer5/pointwise_conv/bn/moving_variance:0
matching layer6.depthwise_conv.conv.weight <---> layer6/depthwise_conv/layer6/depthwise_conv/depthwise_kernel:0
matching layer6.depthwise_conv.bn.bias <---> layer6/depthwise_conv/bn/layer6/depthwise_conv/bn/beta:0
matching layer6.depthwise_conv.bn.running_mean <---> layer6/depthwise_conv/bn/layer6/depthwise_conv/bn/moving_mean:0
matching layer6.depthwise_conv.bn.running_var <---> layer6/depthwise_conv/bn/layer6/depthwise_conv/bn/moving_variance:0
matching layer6.pointwise_conv.conv.weight <---> layer6/pointwise_conv/layer6/pointwise_conv/kernel:0
matching layer6.pointwise_conv.bn.bias <---> layer6/pointwise_conv/bn/layer6/pointwise_conv/bn/beta:0
matching layer6.pointwise_conv.bn.running_mean <---> layer6/pointwise_conv/bn/layer6/pointwise_conv/bn/moving_mean:0
matching layer6.pointwise_conv.bn.running_var <---> layer6/pointwise_conv/bn/layer6/pointwise_conv/bn/moving_variance:0
matching layer7.depthwise_conv.conv.weight <---> layer7/depthwise_conv/layer7/depthwise_conv/depthwise_kernel:0
matching layer7.depthwise_conv.bn.bias <---> layer7/depthwise_conv/bn/layer7/depthwise_conv/bn/beta:0
matching layer7.depthwise_conv.bn.running_mean <---> layer7/depthwise_conv/bn/layer7/depthwise_conv/bn/moving_mean:0
matching layer7.depthwise_conv.bn.running_var <---> layer7/depthwise_conv/bn/layer7/depthwise_conv/bn/moving_variance:0
matching layer7.pointwise_conv.conv.weight <---> layer7/pointwise_conv/layer7/pointwise_conv/kernel:0
matching layer7.pointwise_conv.bn.bias <---> layer7/pointwise_conv/bn/layer7/pointwise_conv/bn/beta:0
matching layer7.pointwise_conv.bn.running_mean <---> layer7/pointwise_conv/bn/layer7/pointwise_conv/bn/moving_mean:0
matching layer7.pointwise_conv.bn.running_var <---> layer7/pointwise_conv/bn/layer7/pointwise_conv/bn/moving_variance:0
matching layer8.depthwise_conv.conv.weight <---> layer8/depthwise_conv/layer8/depthwise_conv/depthwise_kernel:0
matching layer8.depthwise_conv.bn.bias <---> layer8/depthwise_conv/bn/layer8/depthwise_conv/bn/beta:0
matching layer8.depthwise_conv.bn.running_mean <---> layer8/depthwise_conv/bn/layer8/depthwise_conv/bn/moving_mean:0
matching layer8.depthwise_conv.bn.running_var <---> layer8/depthwise_conv/bn/layer8/depthwise_conv/bn/moving_variance:0
matching layer8.pointwise_conv.conv.weight <---> layer8/pointwise_conv/layer8/pointwise_conv/kernel:0
matching layer8.pointwise_conv.bn.bias <---> layer8/pointwise_conv/bn/layer8/pointwise_conv/bn/beta:0
matching layer8.pointwise_conv.bn.running_mean <---> layer8/pointwise_conv/bn/layer8/pointwise_conv/bn/moving_mean:0
matching layer8.pointwise_conv.bn.running_var <---> layer8/pointwise_conv/bn/layer8/pointwise_conv/bn/moving_variance:0
matching layer9.depthwise_conv.conv.weight <---> layer9/depthwise_conv/layer9/depthwise_conv/depthwise_kernel:0
matching layer9.depthwise_conv.bn.bias <---> layer9/depthwise_conv/bn/layer9/depthwise_conv/bn/beta:0
matching layer9.depthwise_conv.bn.running_mean <---> layer9/depthwise_conv/bn/layer9/depthwise_conv/bn/moving_mean:0
matching layer9.depthwise_conv.bn.running_var <---> layer9/depthwise_conv/bn/layer9/depthwise_conv/bn/moving_variance:0
matching layer9.pointwise_conv.conv.weight <---> layer9/pointwise_conv/layer9/pointwise_conv/kernel:0
matching layer9.pointwise_conv.bn.bias <---> layer9/pointwise_conv/bn/layer9/pointwise_conv/bn/beta:0
matching layer9.pointwise_conv.bn.running_mean <---> layer9/pointwise_conv/bn/layer9/pointwise_conv/bn/moving_mean:0
matching layer9.pointwise_conv.bn.running_var <---> layer9/pointwise_conv/bn/layer9/pointwise_conv/bn/moving_variance:0
matching layer10.depthwise_conv.conv.weight <---> layer10/depthwise_conv/layer10/depthwise_conv/depthwise_kernel:0
matching layer10.depthwise_conv.bn.bias <---> layer10/depthwise_conv/bn/layer10/depthwise_conv/bn/beta:0
matching layer10.depthwise_conv.bn.running_mean <---> layer10/depthwise_conv/bn/layer10/depthwise_conv/bn/moving_mean:0
matching layer10.depthwise_conv.bn.running_var <---> layer10/depthwise_conv/bn/layer10/depthwise_conv/bn/moving_variance:0
matching layer10.pointwise_conv.conv.weight <---> layer10/pointwise_conv/layer10/pointwise_conv/kernel:0
matching layer10.pointwise_conv.bn.bias <---> layer10/pointwise_conv/bn/layer10/pointwise_conv/bn/beta:0
matching layer10.pointwise_conv.bn.running_mean <---> layer10/pointwise_conv/bn/layer10/pointwise_conv/bn/moving_mean:0
matching layer10.pointwise_conv.bn.running_var <---> layer10/pointwise_conv/bn/layer10/pointwise_conv/bn/moving_variance:0
matching layer11.depthwise_conv.conv.weight <---> layer11/depthwise_conv/layer11/depthwise_conv/depthwise_kernel:0
matching layer11.depthwise_conv.bn.bias <---> layer11/depthwise_conv/bn/layer11/depthwise_conv/bn/beta:0
matching layer11.depthwise_conv.bn.running_mean <---> layer11/depthwise_conv/bn/layer11/depthwise_conv/bn/moving_mean:0
matching layer11.depthwise_conv.bn.running_var <---> layer11/depthwise_conv/bn/layer11/depthwise_conv/bn/moving_variance:0
matching layer11.pointwise_conv.conv.weight <---> layer11/pointwise_conv/layer11/pointwise_conv/kernel:0
matching layer11.pointwise_conv.bn.bias <---> layer11/pointwise_conv/bn/layer11/pointwise_conv/bn/beta:0
matching layer11.pointwise_conv.bn.running_mean <---> layer11/pointwise_conv/bn/layer11/pointwise_conv/bn/moving_mean:0
matching layer11.pointwise_conv.bn.running_var <---> layer11/pointwise_conv/bn/layer11/pointwise_conv/bn/moving_variance:0
matching layer12.depthwise_conv.conv.weight <---> layer12/depthwise_conv/layer12/depthwise_conv/depthwise_kernel:0
matching layer12.depthwise_conv.bn.bias <---> layer12/depthwise_conv/bn/layer12/depthwise_conv/bn/beta:0
matching layer12.depthwise_conv.bn.running_mean <---> layer12/depthwise_conv/bn/layer12/depthwise_conv/bn/moving_mean:0
matching layer12.depthwise_conv.bn.running_var <---> layer12/depthwise_conv/bn/layer12/depthwise_conv/bn/moving_variance:0
matching layer12.pointwise_conv.conv.weight <---> layer12/pointwise_conv/layer12/pointwise_conv/kernel:0
matching layer12.pointwise_conv.bn.bias <---> layer12/pointwise_conv/bn/layer12/pointwise_conv/bn/beta:0
matching layer12.pointwise_conv.bn.running_mean <---> layer12/pointwise_conv/bn/layer12/pointwise_conv/bn/moving_mean:0
matching layer12.pointwise_conv.bn.running_var <---> layer12/pointwise_conv/bn/layer12/pointwise_conv/bn/moving_variance:0
matching layer13.depthwise_conv.conv.weight <---> layer13/depthwise_conv/layer13/depthwise_conv/depthwise_kernel:0
matching layer13.depthwise_conv.bn.bias <---> layer13/depthwise_conv/bn/layer13/depthwise_conv/bn/beta:0
matching layer13.depthwise_conv.bn.running_mean <---> layer13/depthwise_conv/bn/layer13/depthwise_conv/bn/moving_mean:0
matching layer13.depthwise_conv.bn.running_var <---> layer13/depthwise_conv/bn/layer13/depthwise_conv/bn/moving_variance:0
matching layer13.pointwise_conv.conv.weight <---> layer13/pointwise_conv/layer13/pointwise_conv/kernel:0
matching layer13.pointwise_conv.bn.bias <---> layer13/pointwise_conv/bn/layer13/pointwise_conv/bn/beta:0
matching layer13.pointwise_conv.bn.running_mean <---> layer13/pointwise_conv/bn/layer13/pointwise_conv/bn/moving_mean:0
matching layer13.pointwise_conv.bn.running_var <---> layer13/pointwise_conv/bn/layer13/pointwise_conv/bn/moving_variance:0
matching layer14.depthwise_conv.conv.weight <---> layer14/depthwise_conv/layer14/depthwise_conv/depthwise_kernel:0
matching layer14.depthwise_conv.bn.bias <---> layer14/depthwise_conv/bn/layer14/depthwise_conv/bn/beta:0
matching layer14.depthwise_conv.bn.running_mean <---> layer14/depthwise_conv/bn/layer14/depthwise_conv/bn/moving_mean:0
matching layer14.depthwise_conv.bn.running_var <---> layer14/depthwise_conv/bn/layer14/depthwise_conv/bn/moving_variance:0
matching layer14.pointwise_conv.conv.weight <---> layer14/pointwise_conv/layer14/pointwise_conv/kernel:0
matching layer14.pointwise_conv.bn.bias <---> layer14/pointwise_conv/bn/layer14/pointwise_conv/bn/beta:0
matching layer14.pointwise_conv.bn.running_mean <---> layer14/pointwise_conv/bn/layer14/pointwise_conv/bn/moving_mean:0
matching layer14.pointwise_conv.bn.running_var <---> layer14/pointwise_conv/bn/layer14/pointwise_conv/bn/moving_variance:0
matching classifier.weight <---> logits/logits/kernel:0
matching classifier.bias <---> logits/logits/bias:0
(5, 521)
(5, 521)
-42.8931 20.909971
0.0077692876 0.08557794
Traceback (most recent call last):
File "convert_yamnet.py", line 156, in <module>
main()
File "convert_yamnet.py", line 150, in main
assert np.allclose(pt_pred, tf_pred, atol=1e-6)
AssertionError
As tensorflow model has 3 outputs: spectrogram, patches and classification I was expecting that torch implementation will also return 3 outputs.
As I see it, spectrogram is not necessary, but patches should be as output with torch model. Can you provide me some insight in order to obtain them?
Thanks for the amazing work ๐
there are quite a number of py files, if i wanna construct a pre-trained model using YAMNet to extract the audio features, which one i should run?
thanks for your reply
ImportError: cannot import name 'layers' from 'yamnet' (./../yamnet/init.py)
tf.version : 2.0 (gpu)
torch.version: 1.2.0
python setup.py install
convert_yamnet.py: sys.path.append(path_yamnet)
python convert_yamnet.py
(5, 521)
(5, 521)
'''
print(pt_pred.mean(), pt_pred.std())
print(tf_pred.mean(), tf_pred.std())
'''
-42.964317 20.943628
0.007766822 0.08558724
'''
assert np.allclose(pt_pred, tf_pred, atol=1e-6)
'''
Traceback (most recent call last):
File "convert_yamnet.py", line 159, in
main()
File "convert_yamnet.py", line 153, in main
assert np.allclose(pt_pred, tf_pred, atol=1e-6)
AssertionError
Hello,
I am trying to convert the Yamnet tf weights to Pytorch format... However, I'm just getting this error. I am new to pytorch.
----> 9 import params as tf_params
ModuleNotFoundError: No module named 'params'
Thanks in advance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.