Giter VIP home page Giter VIP logo

rawnet's People

Contributors

jungjee avatar kimho1wq avatar polestvr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rawnet's Issues

Document the DB directory structure

For people who don't want to use VoxCeleb + VoxCeleb2, it is hard to figure out what the directory structure should be for DB. Could you please document it?

Or even nicer, if there were a simple to download audio dataset (e.g. from torchaudio) that the script would lay out in the right way, people could immediately try your repo and see if it works on their GPU.

The generalization abality

The generalization of RawNet2 is poor?
I trained RawNet2 in AISHELL dataset with 340 speaker and tested in trail.txt with 8w pairs bulit by another 40 speaker of AISHELL, and the final eer is 3.46%. But when tested in 40 speaker of VCTK dataset with 8w pairs, the eer got 32.71%. Do you know why? Thanks.

centre loss

where is the center loss component I cant find that

Directory tree of files

Can you please in the readme show the directory tree of VoxCeleb1 files that were used in your experiment? I'm a bit confused when looking at 00-pre_process_waveforms.py.

Unable to train RawNet1 using Keras

I am facing this error which trying to run the 01-trn_RawNet.py script.

Traceback (most recent call last):
  File "01-trn_RawNet.py", line 279, in <module>
    loss, loss1, loss2, acc1, acc2 = model.train_on_batch([x, y], [dummy_y, dummy_y])
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 918, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3510, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 572, in __call__
    return self._call_flat(args)
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 671, in _call_flat
    outputs = self._inference_function.call(ctx, args)
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 445, in call
    ctx=ctx)
  File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Cannot update variable with shape [] using a Tensor with shape [4], shapes must be equal.
         [[node metrics/s_bs_loss_accuracy/AssignAddVariableOp (defined at 01-trn_RawNet.py:279) ]] [Op:__inference_keras_scratch_graph_16089]

Function call stack:
keras_scratch_graph

Can you please help me out? I am using tensorflow 2.0.0-beta1. I am using a batch size of 4.

Pytorch scripts?

Any timeline on when the scripts will be uploaded?

Thank you!

Config file

It would of great help if you can provide the config yaml file that you used to train the pytorch model?

can you share code to load the pretrained model?

I am having difficulties loading the pretrained model. tried both model_RawNet2_original_code and model_RawNet2

from model_RawNet2 import RawNet2
from parser import get_args
import sys
import torch

sys.argv = ['RawNet-Pytorch.ipynb'] + ['-name'] + ['Rawnet']
args = get_args()
args.model['nb_classes'] = 6112

model = RawNet2(args.model)
model.load_state_dict(torch.load('./Pre-trained_model/rawnet2_best_weights.pt'))
model.eval()
RuntimeError: Error(s) in loading state_dict for RawNet2:
	Missing key(s) in state_dict: "block0.0.frm.fc.weight", "block0.0.frm.fc.bias", "block1.0.frm.fc.weight", "block1.0.frm.fc.bias", "block2.0.frm.fc.weight", "block2.0.frm.fc.bias", "block3.0.frm.fc.weight", "block3.0.frm.fc.bias", "block4.0.frm.fc.weight", "block4.0.frm.fc.bias", "block5.0.frm.fc.weight", "block5.0.frm.fc.bias". 
	Unexpected key(s) in state_dict: "fc_attention0.0.weight", "fc_attention0.0.bias", "fc_attention1.0.weight", "fc_attention1.0.bias", "fc_attention2.0.weight", "fc_attention2.0.bias", "fc_attention3.0.weight", "fc_attention3.0.bias", "fc_attention4.0.weight", "fc_attention4.0.bias", "fc_attention5.0.weight", "fc_attention5.0.bias". 

code with model_RawNet2_original_code

from model_RawNet2_original_code import RawNet
model2 = RawNet(args.model, 'gpu')
model2.load_state_dict(torch.load('./Pre-trained_model/rawnet2_best_weights.pt'))
model2.eval()

RuntimeError: Error(s) in loading state_dict for RawNet:
	Unexpected key(s) in state_dict: "block2.0.conv_downsample.weight", "block2.0.conv_downsample.bias". 
	size mismatch for block2.0.bn1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for block2.0.bn1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for block2.0.bn1.running_mean: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for block2.0.bn1.running_var: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for block2.0.conv1.weight: copying a param with shape torch.Size([256, 128, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3]).

Can you please share a reproducible code for loading the model?

how to create the test_list for a new test dataset

Hi Jungjee,

Thanks for sharing your great work!
Could you please share the code you used to create the test_list for a new test dataset? For example, if i want to test using TIMIT corpus.

head -5 vox1_veri_test2.txt

1 id10270/x6uYqmx31kE/00001.wav id10270/8jEAjG6SegY/00008.wav
0 id10270/x6uYqmx31kE/00001.wav id10300/ize_eiCFEg0/00003.wav
1 id10270/x6uYqmx31kE/00001.wav id10270/GWXujl-xAVM/00017.wav
0 id10270/x6uYqmx31kE/00001.wav id10273/0OCW1HUxZyg/00001.wav
1 id10270/x6uYqmx31kE/00001.wav id10270/8jEAjG6SegY/00022.wav

Here is what I understand: Column-1: 1, if column-2 and column-3 are from the same speaker, 0 otherwise
For a large corpus, like VoxCeleb, how to select coulmn-2 and column-3?

Your help will be greatly appreciated!

Regards,
Willy

Can I feed the 22050 sr wav to the pre-trained rawnet3 model ?

Hi
I want to use rawnet3 model in my project to compute the speaker similarity of a pair of wavs. All the audio in my dataset is 22050Hz, for some reason I could down sample those audio to 16000 kHz. I wonder if the pretrained model is suitable to the 22050Hz audio.
Thanks

About the pretrained model

Hi, when I test the model by using pretrained model rawnet2_best_weights.pt, it throws an error
in Missing key(s) in state_dict: "block0.0.frm.fc.weight", "block0.0.frm.fc.bias", "block1.0.frm.fc.weight", "block1.0.frm.fc.bias", "block2.0.frm.fc.weight", "block2.0.frm.fc.bias", "block3.0.frm.fc.weight", "block3.0.frm.fc.bias", "block4.0.frm.fc.weight", "block4.0.frm.fc.bias", "block5.0.frm.fc.weight", "block5.0.frm.fc.bias
It seems really missing these parameters after I checked the model by Netron.
Many thanks for your reply in advance.

RawNet2 PyTorch weights & inference

I have a couple of questions regarding RawNet2 usage for inference:

  1. By "pre-trained model" in README you meant a model fully trained on VoxCeleb2? Or just the model after the pre-training phase on speaker identification task?
  2. There are two implementations of RawNet2 - one in model_RawNet2.py file, one in model_RawNet2_original_code.py. The "pre-trained model"'s weights are for the latter. What are the differences between them and which one should I use for inference (on other datasets, not necesserily VoxCeleb1)
  3. Are you planning on releasing the weights of the model fully trained on VoxCeleb2? I would like to experiment with it on other datasets, but, unfortunatel, don't have the ability to train it myself.

Missing models for RawNet

from model_RawNet_pre_train import get_model as get_model_pretrn
from model_RawNet import get_model

Hello.

I've been trying to trying to run the reproduction of the system from the original RawNet paper and found that these two modules are missing. Could you please upload them ?

how to evaluate your implementation with a different dataset

Hi!

I'm trying to make an evaluation of your implementation with a different dataset ( therefore with different test_path and test_list). What is the correct way to do so without modifying the default paths in trainSpeakerNet.py?
I mean, what the command to evaluate your implementation should look like? Something like this? -->
python ./trainSpeakerNet.py --test_path /path_to_test --test_list /path_to_the_list

Sorry for my ignorance, I'm new at programming in python and neural networks.
Looking forward to your response,
Thank you!

how to use it for speaker verification

Hi JungJee,

After I trained the models, I want to see use it for speaker verification. I got a test set, say N speakers, and each has (M-enroll utterances, and L-test utterances).
Should I just enroll the N speakers, using M utterances, and then for each speaker N's each utterance in L-test, calculate the scores against the N speakers, and select the speaker who score is the highest?

Thanks,
Willy

Too long IO time

I tried to train RawNet2 on VoxCeleb2 dataset with default settings. But I found that one epoch on average takes about 2.5 hours. By observing GPU activity, I found that most of the time GPU is waiting for data IO.
To reduce GPU waiting time, I tried to increase args "num_workers" and "prefetch_factor" in PyTorch DataLoader,but the data loading time just did not decrease.
My hardware: 3*RTX3090,128G Memory,HDD disk.

  1. I wonder if you used a SSD when training the network? And how long one epoch takes when you train RawNet2?
  2. Do you have some advice on decreasing IO time?
    Thanks.

Overfitting on VoxCeleb

Hi.
I tried pre-trained RawNet on VoxCeleb and it's works better than ECAPA-TDNN pre-trained on VoxCeleb. But when i'm switching to another dataset on English(without training) or other languages it's works worst than ECAPA-TDNN, usually 2-4 times (EER). Do you have any ideas why it's happing? Thanks!

RawNet_weights.h5

I'm wondering what network RawNet_weights.h5 provides weights for. Would you be able to provide the best model weights for both nets?

Missing .txt trial files

Hi, I would like to use your rawNet code for training and am testing it out on the voxCeleb dataset first. However, for VoxCeleb1 I do not see the veri_test.txt file. For VoxCeleb2, I only see list_test_all_cleaned.txt and list_test_hard_cleaned.txt instead of the 6 txt files in the filetree.

Could I check if this is supposed to be the case or are there missing files?

Misbehaving losses while training RawNet1

Hey, @Jungjee I was trying to fine-tune RawNet1 and observed that the center-loss does not decrease as the training proceeds, although the total-loss does go down. I tried to increase the weight of the center-loss but still see similar patterns. If the weight is too high, the spk-basis-loss starts to increase. So, it seems that both these losses are somewhat inversely related. Did you observe similar trends? I have attached the plots for the losses for different values of the c_lambda parameter (which is the weight of the center-loss). Is there something that could be going wrong here? In all the cases above the learning-rate has been set to 0.001.

c_lambda=0.001(default) c_lambda=0.1
c_lambda=0.001 c_lambda=0.1
c_lambda=0.5 c_lambda=5
c_lambda=0.5 c_lambda=5

Error in PreEmphasis Class

I am getting the following error in the PreEmphasis class:

RawNet3/RawNetBasicBlock.py", line 20, in forward
len(input.size()) == 2
AttributeError: 'builtin_function_or_method' object has no attribute 'size'

Could you please help?

What is the requirement in terms of hardware?

First of all thanks for such amazing work of RawNet and its variants!

I just want to know about the hardware requirements for training RawNet2 and modified RawNet2. I have following two questions:

  1. I tried to run it with a single GPU with 12GB memory and failed. What was your experimental hardware setting?
  2. Is it recommended to reduce number of mini batches for better memory handling?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.