jungjee / rawnet Goto Github PK

View Code? Open in Web Editor NEW

333.0 14.0 56.0 180 MB

Official repository for RawNet, RawNet2, and RawNet3

License: MIT License

Python 99.51% Shell 0.49%

speaker-embeddings speaker-verification pytorch voxceleb2 rawnet extracted-speaker-embeddings spk-embd

rawnet's People

Contributors

Stargazers

Watchers

rawnet's Issues

Why do we need this block of code?

out after line 72 will always be overwritten
https://github.com/Jungjee/RawNet/blob/master/model_RawNet2.py#L66

if not self.first: out = self.bn1(x) out = self.lrelu_keras(out) else: out = x

The speaker embedding for the VoxCeleb1 was deleted

In RawNet2
The speaker embedding for the VoxCeleb1 was deleted
Please re-upload this file.
Thanks.

Weights of RawNet2_modified trained on VoxCeleb2

Hi,

Can you kindly share the weights of RawNet2_modified (trained on VoxCeleb2)?

Thank you very much

How make embedding for single wave file?

Is have function? No use batches or data loader, just one wav.

Document the DB directory structure

For people who don't want to use VoxCeleb + VoxCeleb2, it is hard to figure out what the directory structure should be for DB. Could you please document it?

Or even nicer, if there were a simple to download audio dataset (e.g. from torchaudio) that the script would lay out in the right way, people could immediately try your repo and see if it works on their GPU.

The generalization abality

The generalization of RawNet2 is poor?
I trained RawNet2 in AISHELL dataset with 340 speaker and tested in trail.txt with 8w pairs bulit by another 40 speaker of AISHELL, and the final eer is 3.46%. But when tested in 40 speaker of VCTK dataset with 8w pairs, the eer got 32.71%. Do you know why? Thanks.

centre loss

where is the center loss component I cant find that

Directory tree of files

Can you please in the readme show the directory tree of VoxCeleb1 files that were used in your experiment? I'm a bit confused when looking at 00-pre_process_waveforms.py.

trained model for RawNet2_modified and RawNet2

Hi,
Can you share trained models for RawNet2 and RawNet2_modified for quick testing. Do you have script for extracting speaker embeddings from a wav file?

What is the meaning of voxceleb1_test.txt and voxceleb1_val.txt ?

Unable to train RawNet1 using Keras

I am facing this error which trying to run the 01-trn_RawNet.py script.

Traceback (most recent call last):
  File "01-trn_RawNet.py", line 279, in <module>
    loss, loss1, loss2, acc1, acc2 = model.train_on_batch([x, y], [dummy_y, dummy_y])
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 918, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3510, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 572, in __call__
    return self._call_flat(args)
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 671, in _call_flat
    outputs = self._inference_function.call(ctx, args)
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 445, in call
    ctx=ctx)
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Cannot update variable with shape [] using a Tensor with shape [4], shapes must be equal.
         [[node metrics/s_bs_loss_accuracy/AssignAddVariableOp (defined at 01-trn_RawNet.py:279) ]] [Op:__inference_keras_scratch_graph_16089]

Function call stack:
keras_scratch_graph

Can you please help me out? I am using tensorflow 2.0.0-beta1. I am using a batch size of 4.

script generate embedding

pls provide me a script to get embedding audio. Thanks

Pytorch scripts?

Any timeline on when the scripts will be uploaded?

Thank you!

Config file

It would of great help if you can provide the config yaml file that you used to train the pytorch model?

can you share code to load the pretrained model?

I am having difficulties loading the pretrained model. tried both model_RawNet2_original_code and model_RawNet2

from model_RawNet2 import RawNet2
from parser import get_args
import sys
import torch

sys.argv = ['RawNet-Pytorch.ipynb'] + ['-name'] + ['Rawnet']
args = get_args()
args.model['nb_classes'] = 6112

model = RawNet2(args.model)

model.load_state_dict(torch.load('./Pre-trained_model/rawnet2_best_weights.pt'))
model.eval()

RuntimeError: Error(s) in loading state_dict for RawNet2:
	Missing key(s) in state_dict: "block0.0.frm.fc.weight", "block0.0.frm.fc.bias", "block1.0.frm.fc.weight", "block1.0.frm.fc.bias", "block2.0.frm.fc.weight", "block2.0.frm.fc.bias", "block3.0.frm.fc.weight", "block3.0.frm.fc.bias", "block4.0.frm.fc.weight", "block4.0.frm.fc.bias", "block5.0.frm.fc.weight", "block5.0.frm.fc.bias". 
	Unexpected key(s) in state_dict: "fc_attention0.0.weight", "fc_attention0.0.bias", "fc_attention1.0.weight", "fc_attention1.0.bias", "fc_attention2.0.weight", "fc_attention2.0.bias", "fc_attention3.0.weight", "fc_attention3.0.bias", "fc_attention4.0.weight", "fc_attention4.0.bias", "fc_attention5.0.weight", "fc_attention5.0.bias".

code with model_RawNet2_original_code

from model_RawNet2_original_code import RawNet
model2 = RawNet(args.model, 'gpu')
model2.load_state_dict(torch.load('./Pre-trained_model/rawnet2_best_weights.pt'))
model2.eval()


RuntimeError: Error(s) in loading state_dict for RawNet:
	Unexpected key(s) in state_dict: "block2.0.conv_downsample.weight", "block2.0.conv_downsample.bias". 
	size mismatch for block2.0.bn1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for block2.0.bn1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for block2.0.bn1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for block2.0.bn1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for block2.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3]).

Can you please share a reproducible code for loading the model?

The link for Pre-trained weight parameters for Rawnet 3 is not available.

Hi there,
I tried to download the pre-trained weight parameters from the provided link (https://huggingface.co/jungjee/RawNet3 ), but it seems like this link is invalid. I would appreciate if I get a correct link to download the pre-trained weights.

how to create the test_list for a new test dataset

Hi Jungjee,

Thanks for sharing your great work!
Could you please share the code you used to create the test_list for a new test dataset? For example, if i want to test using TIMIT corpus.

head -5 vox1_veri_test2.txt

1 id10270/x6uYqmx31kE/00001.wav id10270/8jEAjG6SegY/00008.wav
0 id10270/x6uYqmx31kE/00001.wav id10300/ize_eiCFEg0/00003.wav
1 id10270/x6uYqmx31kE/00001.wav id10270/GWXujl-xAVM/00017.wav
0 id10270/x6uYqmx31kE/00001.wav id10273/0OCW1HUxZyg/00001.wav
1 id10270/x6uYqmx31kE/00001.wav id10270/8jEAjG6SegY/00022.wav

Here is what I understand: Column-1: 1, if column-2 and column-3 are from the same speaker, 0 otherwise
For a large corpus, like VoxCeleb, how to select coulmn-2 and column-3?

Your help will be greatly appreciated!

Regards,
Willy

Can I feed the 22050 sr wav to the pre-trained rawnet3 model ?

Hi
I want to use rawnet3 model in my project to compute the speaker similarity of a pair of wavs. All the audio in my dataset is 22050Hz, for some reason I could down sample those audio to 16000 kHz. I wonder if the pretrained model is suitable to the 22050Hz audio.
Thanks

About the pretrained model

Hi, when I test the model by using pretrained model rawnet2_best_weights.pt, it throws an error
in Missing key(s) in state_dict: "block0.0.frm.fc.weight", "block0.0.frm.fc.bias", "block1.0.frm.fc.weight", "block1.0.frm.fc.bias", "block2.0.frm.fc.weight", "block2.0.frm.fc.bias", "block3.0.frm.fc.weight", "block3.0.frm.fc.bias", "block4.0.frm.fc.weight", "block4.0.frm.fc.bias", "block5.0.frm.fc.weight", "block5.0.frm.fc.bias
It seems really missing these parameters after I checked the model by Netron.
Many thanks for your reply in advance.

RawNet2 PyTorch weights & inference

I have a couple of questions regarding RawNet2 usage for inference:

By "pre-trained model" in README you meant a model fully trained on VoxCeleb2? Or just the model after the pre-training phase on speaker identification task?
There are two implementations of RawNet2 - one in model_RawNet2.py file, one in model_RawNet2_original_code.py. The "pre-trained model"'s weights are for the latter. What are the differences between them and which one should I use for inference (on other datasets, not necesserily VoxCeleb1)
Are you planning on releasing the weights of the model fully trained on VoxCeleb2? I would like to experiment with it on other datasets, but, unfortunatel, don't have the ability to train it myself.

Missing models for RawNet

RawNet/RawNet1/Keras/01-trn_RawNet.py

Lines 19 to 20 in e767f17

 from model_RawNet_pre_train import get_model as get_model_pretrn 

 from model_RawNet import get_model

Hello.

I've been trying to trying to run the reproduction of the system from the original RawNet paper and found that these two modules are missing. Could you please upload them ?

how to evaluate your implementation with a different dataset

Hi!

I'm trying to make an evaluation of your implementation with a different dataset ( therefore with different test_path and test_list). What is the correct way to do so without modifying the default paths in trainSpeakerNet.py?
I mean, what the command to evaluate your implementation should look like? Something like this? -->
python ./trainSpeakerNet.py --test_path /path_to_test --test_list /path_to_the_list

Sorry for my ignorance, I'm new at programming in python and neural networks.
Looking forward to your response,
Thank you!

Test-Time Augmentation (TTA) not clear

It is not clear from paper
https://arxiv.org/pdf/2004.00526.pdf
what does it mean when TTA is not used?

Does that mean just extracting one embedding from a random crop of audio? or
it means TTA with 0% overlap? or
part2 of TTA in this work: https://www.robots.ox.ac.uk/~vgg/publications/2018/Chung18a/chung18a.pdf

Any response will be appreciated.
P.S. I think in object recognition research, it means literal augmentation.

how to use it for speaker verification

Hi JungJee,

After I trained the models, I want to see use it for speaker verification. I got a test set, say N speakers, and each has (M-enroll utterances, and L-test utterances).
Should I just enroll the N speakers, using M utterances, and then for each speaker N's each utterance in L-test, calculate the scores against the N speakers, and select the speaker who score is the highest?

Thanks,
Willy

Too long IO time

I tried to train RawNet2 on VoxCeleb2 dataset with default settings. But I found that one epoch on average takes about 2.5 hours. By observing GPU activity, I found that most of the time GPU is waiting for data IO.
To reduce GPU waiting time, I tried to increase args "num_workers" and "prefetch_factor" in PyTorch DataLoader，but the data loading time just did not decrease.
My hardware: 3*RTX3090，128G Memory，HDD disk.

I wonder if you used a SSD when training the network? And how long one epoch takes when you train RawNet2?
Do you have some advice on decreasing IO time?
Thanks.

Overfitting on VoxCeleb

Hi.
I tried pre-trained RawNet on VoxCeleb and it's works better than ECAPA-TDNN pre-trained on VoxCeleb. But when i'm switching to another dataset on English(without training) or other languages it's works worst than ECAPA-TDNN, usually 2-4 times (EER). Do you have any ideas why it's happing? Thanks!

RawNet_weights.h5

I'm wondering what network RawNet_weights.h5 provides weights for. Would you be able to provide the best model weights for both nets?

What is the meaning of voxceleb1_test.txt and voxceleb1_val.txt ?

Missing .txt trial files

Hi, I would like to use your rawNet code for training and am testing it out on the voxCeleb dataset first. However, for VoxCeleb1 I do not see the veri_test.txt file. For VoxCeleb2, I only see list_test_all_cleaned.txt and list_test_hard_cleaned.txt instead of the 6 txt files in the filetree.

Could I check if this is supposed to be the case or are there missing files?

Misbehaving losses while training RawNet1

Hey, @Jungjee I was trying to fine-tune RawNet1 and observed that the center-loss does not decrease as the training proceeds, although the total-loss does go down. I tried to increase the weight of the center-loss but still see similar patterns. If the weight is too high, the spk-basis-loss starts to increase. So, it seems that both these losses are somewhat inversely related. Did you observe similar trends? I have attached the plots for the losses for different values of the c_lambda parameter (which is the weight of the center-loss). Is there something that could be going wrong here? In all the cases above the learning-rate has been set to 0.001.

`c_lambda=0.001(default)`	`c_lambda=0.1`

`c_lambda=0.5`	`c_lambda=5`

Error in PreEmphasis Class

I am getting the following error in the PreEmphasis class:

RawNet3/RawNetBasicBlock.py", line 20, in forward
len(input.size()) == 2
AttributeError: 'builtin_function_or_method' object has no attribute 'size'

Could you please help?

What is the requirement in terms of hardware?

First of all thanks for such amazing work of RawNet and its variants!

I just want to know about the hardware requirements for training RawNet2 and modified RawNet2. I have following two questions:

I tried to run it with a single GPU with 12GB memory and failed. What was your experimental hardware setting?
Is it recommended to reduce number of mini batches for better memory handling?

	from model_RawNet_pre_train import get_model as get_model_pretrn
	from model_RawNet import get_model

jungjee / rawnet Goto Github PK

rawnet's People

Contributors

Stargazers

Watchers

Forkers

rawnet's Issues

Recommend Projects

Recommend Topics

Recommend Org