jungjee / rawnet Goto Github PK
View Code? Open in Web Editor NEWOfficial repository for RawNet, RawNet2, and RawNet3
License: MIT License
Official repository for RawNet, RawNet2, and RawNet3
License: MIT License
out
after line 72 will always be overwritten
https://github.com/Jungjee/RawNet/blob/master/model_RawNet2.py#L66
if not self.first: out = self.bn1(x) out = self.lrelu_keras(out) else: out = x
In RawNet2
The speaker embedding for the VoxCeleb1 was deleted
Please re-upload this file.
Thanks.
Hi,
Can you kindly share the weights of RawNet2_modified (trained on VoxCeleb2)?
Thank you very much
Is have function? No use batches or data loader, just one wav.
For people who don't want to use VoxCeleb + VoxCeleb2, it is hard to figure out what the directory structure should be for DB. Could you please document it?
Or even nicer, if there were a simple to download audio dataset (e.g. from torchaudio) that the script would lay out in the right way, people could immediately try your repo and see if it works on their GPU.
The generalization of RawNet2 is poor?
I trained RawNet2 in AISHELL dataset with 340 speaker and tested in trail.txt with 8w pairs bulit by another 40 speaker of AISHELL, and the final eer is 3.46%. But when tested in 40 speaker of VCTK dataset with 8w pairs, the eer got 32.71%. Do you know why? Thanks.
where is the center loss component I cant find that
Can you please in the readme show the directory tree of VoxCeleb1 files that were used in your experiment? I'm a bit confused when looking at 00-pre_process_waveforms.py.
Hi,
Can you share trained models for RawNet2 and RawNet2_modified for quick testing. Do you have script for extracting speaker embeddings from a wav file?
I am facing this error which trying to run the 01-trn_RawNet.py script.
Traceback (most recent call last):
File "01-trn_RawNet.py", line 279, in <module>
loss, loss1, loss2, acc1, acc2 = model.train_on_batch([x, y], [dummy_y, dummy_y])
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 918, in train_on_batch
outputs = self.train_function(ins) # pylint: disable=not-callable
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3510, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 572, in __call__
return self._call_flat(args)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 671, in _call_flat
outputs = self._inference_function.call(ctx, args)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 445, in call
ctx=ctx)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot update variable with shape [] using a Tensor with shape [4], shapes must be equal.
[[node metrics/s_bs_loss_accuracy/AssignAddVariableOp (defined at 01-trn_RawNet.py:279) ]] [Op:__inference_keras_scratch_graph_16089]
Function call stack:
keras_scratch_graph
Can you please help me out? I am using tensorflow 2.0.0-beta1
. I am using a batch size of 4.
pls provide me a script to get embedding audio. Thanks
Any timeline on when the scripts will be uploaded?
Thank you!
It would of great help if you can provide the config yaml file that you used to train the pytorch model?
I am having difficulties loading the pretrained model. tried both model_RawNet2_original_code and model_RawNet2
from model_RawNet2 import RawNet2
from parser import get_args
import sys
import torch
sys.argv = ['RawNet-Pytorch.ipynb'] + ['-name'] + ['Rawnet']
args = get_args()
args.model['nb_classes'] = 6112
model = RawNet2(args.model)
model.load_state_dict(torch.load('./Pre-trained_model/rawnet2_best_weights.pt'))
model.eval()
RuntimeError: Error(s) in loading state_dict for RawNet2:
Missing key(s) in state_dict: "block0.0.frm.fc.weight", "block0.0.frm.fc.bias", "block1.0.frm.fc.weight", "block1.0.frm.fc.bias", "block2.0.frm.fc.weight", "block2.0.frm.fc.bias", "block3.0.frm.fc.weight", "block3.0.frm.fc.bias", "block4.0.frm.fc.weight", "block4.0.frm.fc.bias", "block5.0.frm.fc.weight", "block5.0.frm.fc.bias".
Unexpected key(s) in state_dict: "fc_attention0.0.weight", "fc_attention0.0.bias", "fc_attention1.0.weight", "fc_attention1.0.bias", "fc_attention2.0.weight", "fc_attention2.0.bias", "fc_attention3.0.weight", "fc_attention3.0.bias", "fc_attention4.0.weight", "fc_attention4.0.bias", "fc_attention5.0.weight", "fc_attention5.0.bias".
code with model_RawNet2_original_code
from model_RawNet2_original_code import RawNet
model2 = RawNet(args.model, 'gpu')
model2.load_state_dict(torch.load('./Pre-trained_model/rawnet2_best_weights.pt'))
model2.eval()
RuntimeError: Error(s) in loading state_dict for RawNet:
Unexpected key(s) in state_dict: "block2.0.conv_downsample.weight", "block2.0.conv_downsample.bias".
size mismatch for block2.0.bn1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for block2.0.bn1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for block2.0.bn1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for block2.0.bn1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
size mismatch for block2.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3]).
Can you please share a reproducible code for loading the model?
Hi there,
I tried to download the pre-trained weight parameters from the provided link (https://huggingface.co/jungjee/RawNet3 ), but it seems like this link is invalid. I would appreciate if I get a correct link to download the pre-trained weights.
Hi Jungjee,
Thanks for sharing your great work!
Could you please share the code you used to create the test_list for a new test dataset? For example, if i want to test using TIMIT corpus.
head -5 vox1_veri_test2.txt
1 id10270/x6uYqmx31kE/00001.wav id10270/8jEAjG6SegY/00008.wav
0 id10270/x6uYqmx31kE/00001.wav id10300/ize_eiCFEg0/00003.wav
1 id10270/x6uYqmx31kE/00001.wav id10270/GWXujl-xAVM/00017.wav
0 id10270/x6uYqmx31kE/00001.wav id10273/0OCW1HUxZyg/00001.wav
1 id10270/x6uYqmx31kE/00001.wav id10270/8jEAjG6SegY/00022.wav
Here is what I understand: Column-1: 1, if column-2 and column-3 are from the same speaker, 0 otherwise
For a large corpus, like VoxCeleb, how to select coulmn-2 and column-3?
Your help will be greatly appreciated!
Regards,
Willy
Hi
I want to use rawnet3 model in my project to compute the speaker similarity of a pair of wavs. All the audio in my dataset is 22050Hz, for some reason I could down sample those audio to 16000 kHz. I wonder if the pretrained model is suitable to the 22050Hz audio.
Thanks
Hi, when I test the model by using pretrained model rawnet2_best_weights.pt
, it throws an error
in Missing key(s) in state_dict: "block0.0.frm.fc.weight", "block0.0.frm.fc.bias", "block1.0.frm.fc.weight", "block1.0.frm.fc.bias", "block2.0.frm.fc.weight", "block2.0.frm.fc.bias", "block3.0.frm.fc.weight", "block3.0.frm.fc.bias", "block4.0.frm.fc.weight", "block4.0.frm.fc.bias", "block5.0.frm.fc.weight", "block5.0.frm.fc.bias
It seems really missing these parameters after I checked the model by Netron.
Many thanks for your reply in advance.
I have a couple of questions regarding RawNet2 usage for inference:
model_RawNet2.py
file, one in model_RawNet2_original_code.py
. The "pre-trained model"'s weights are for the latter. What are the differences between them and which one should I use for inference (on other datasets, not necesserily VoxCeleb1)RawNet/RawNet1/Keras/01-trn_RawNet.py
Lines 19 to 20 in e767f17
Hello.
I've been trying to trying to run the reproduction of the system from the original RawNet paper and found that these two modules are missing. Could you please upload them ?
Hi!
I'm trying to make an evaluation of your implementation with a different dataset ( therefore with different test_path and test_list). What is the correct way to do so without modifying the default paths in trainSpeakerNet.py?
I mean, what the command to evaluate your implementation should look like? Something like this? -->
python ./trainSpeakerNet.py --test_path /path_to_test --test_list /path_to_the_list
Sorry for my ignorance, I'm new at programming in python and neural networks.
Looking forward to your response,
Thank you!
It is not clear from paper
https://arxiv.org/pdf/2004.00526.pdf
what does it mean when TTA is not used?
Any response will be appreciated.
P.S. I think in object recognition research, it means literal augmentation.
Hi JungJee,
After I trained the models, I want to see use it for speaker verification. I got a test set, say N speakers, and each has (M-enroll utterances, and L-test utterances).
Should I just enroll the N speakers, using M utterances, and then for each speaker N's each utterance in L-test, calculate the scores against the N speakers, and select the speaker who score is the highest?
Thanks,
Willy
I tried to train RawNet2 on VoxCeleb2 dataset with default settings. But I found that one epoch on average takes about 2.5 hours. By observing GPU activity, I found that most of the time GPU is waiting for data IO.
To reduce GPU waiting time, I tried to increase args "num_workers" and "prefetch_factor" in PyTorch DataLoader,but the data loading time just did not decrease.
My hardware: 3*RTX3090,128G Memory,HDD disk.
Hi.
I tried pre-trained RawNet on VoxCeleb and it's works better than ECAPA-TDNN pre-trained on VoxCeleb. But when i'm switching to another dataset on English(without training) or other languages it's works worst than ECAPA-TDNN, usually 2-4 times (EER). Do you have any ideas why it's happing? Thanks!
I'm wondering what network RawNet_weights.h5 provides weights for. Would you be able to provide the best model weights for both nets?
Hi, I would like to use your rawNet code for training and am testing it out on the voxCeleb dataset first. However, for VoxCeleb1 I do not see the veri_test.txt file. For VoxCeleb2, I only see list_test_all_cleaned.txt and list_test_hard_cleaned.txt instead of the 6 txt files in the filetree.
Could I check if this is supposed to be the case or are there missing files?
Hey, @Jungjee I was trying to fine-tune RawNet1 and observed that the center-loss
does not decrease as the training proceeds, although the total-loss
does go down. I tried to increase the weight of the center-loss
but still see similar patterns. If the weight is too high, the spk-basis-loss
starts to increase. So, it seems that both these losses are somewhat inversely related. Did you observe similar trends? I have attached the plots for the losses for different values of the c_lambda
parameter (which is the weight of the center-loss). Is there something that could be going wrong here? In all the cases above the learning-rate
has been set to 0.001
.
c_lambda=0.001(default) |
c_lambda=0.1 |
---|---|
c_lambda=0.5 |
c_lambda=5 |
---|---|
I am getting the following error in the PreEmphasis class:
RawNet3/RawNetBasicBlock.py", line 20, in forward
len(input.size()) == 2
AttributeError: 'builtin_function_or_method' object has no attribute 'size'
Could you please help?
First of all thanks for such amazing work of RawNet and its variants!
I just want to know about the hardware requirements for training RawNet2 and modified RawNet2. I have following two questions:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.