vita-group / autospeech Goto Github PK

[InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei Zha, Zhangyang Wang

Home Page: https://arxiv.org/abs/2005.03215

License: MIT License

Python 99.02% Shell 0.98%

automl autospeech neural-architecture-search pytorch speaker-recognition

autospeech's People

Contributors

Stargazers

Watchers

autospeech's Issues

training search stage is very slow.

When I read your paper and follow your code.While I run the command CUDA_VISIBLE_DEVICES=0 python search.py --cfg exps/search.yaml, I run it about 8 day on single Quadro RTX 8000(it has 45GB CUDA memory). But I only run 4 epochs, I don't know Why(because your paper say:search stage is about 5 days on single NVIDIA TITAN RTX GPU).

I will summary some running logging:

2021-01-09 13:48:30,025 Namespace(cfg='exps/search.yaml', load_path=None, opts=[], path_helper={'prefix': 'logs_search/search_2021_01_09_13_48_29', 'ckpt_path': 'logs_search/search_2021_01_09_13_48_29/Model', 'log_path': 'logs_search/search_2021_01_09_13_48_29/Log', 'sample_path': 'logs_search/search_2021_01_09_13_48_29/Samples'})
2021-01-09 13:48:30,026 CUDNN:
  BENCHMARK: True
  DETERMINISTIC: False
  ENABLED: True
DATASET:
  DATA_DIR: data/VoxCeleb1
  NUM_WORKERS: 0
  PARTIAL_N_FRAMES: 300
  SUB_DIR: merged
  TEST_DATASET: 
  TEST_DATA_DIR: 
MODEL:
  DROP_PATH_PROB: 0.2
  INIT_CHANNELS: 64
  LAYERS: 8
  NAME: model_search
  NUM_CLASSES: 1251
  PRETRAINED: False
PRINT_FREQ: 200
SEED: 3
TRAIN:
  ARCH_BETA1: 0.9
  ARCH_BETA2: 0.999
  ARCH_LR: 0.001
  ARCH_WD: 0.001
  BATCH_SIZE: 2
  BEGIN_EPOCH: 0
  BETA1: 0.9
  BETA2: 0.999
  DROPPATH_PROB: 0.2
  END_EPOCH: 50
  LR: 0.01
  LR_MIN: 0.001
  WD: 0.0003
VAL_FREQ: 5
2021-01-09 13:48:32,472 genotype = Genotype(normal=[('sep_conv_5x5', 1), ('max_pool_3x3', 0), ('dil_conv_5x5', 0), ('dil_conv_3x3', 2), ('skip_connect', 0), ('avg_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 3)], normal_concat=range(2, 6), reduce=[('max_pool_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 2), ('max_pool_3x3', 0), ('dil_conv_5x5', 0), ('dil_conv_3x3', 1), ('sep_conv_3x3', 1), ('sep_conv_3x3', 4)], reduce_concat=range(2, 6))
2021-01-09 13:48:47,236 Epoch: [0][    0/69180]	Time 14.752 (14.752)	Data  0.011 ( 0.011)	Loss 7.0746e+00 (7.0746e+00)	Acc@1   0.00 (  0.00)	Acc@5   0.00 (  0.00)	Entropy 2.0794e+00 (2.0794e+00)
2021-01-09 14:03:16,840 Epoch: [0][  200/69180]	Time  4.416 ( 4.400)	Data  0.004 ( 0.005)	Loss 2.0358e+01 (1.0089e+01)	Acc@1   0.00 (  0.25)	Acc@5   0.00 (  0.50)	Entropy 2.0788e+00 (2.0792e+00)
2021-01-09 14:15:37,689 Epoch: [0][  400/69180]	Time  1.702 ( 4.053)	Data  0.004 ( 0.005)	Loss 1.6950e+01 (1.1480e+01)	Acc@1   0.00 (  0.12)	Acc@5   0.00 (  0.50)	Entropy 2.0772e+00 (2.0786e+00)
2021-01-09 14:21:15,859 Epoch: [0][  600/69180]	Time  1.688 ( 3.267)	Data  0.004 ( 0.005)	Loss 1.6810e+01 (1.2325e+01)	Acc@1   0.00 (  0.08)	Acc@5   0.00 (  0.33)	Entropy 2.0757e+00 (2.0779e+00)
2021-01-09 14:26:53,783 Epoch: [0][  800/69180]	Time  1.687 ( 2.873)	Data  0.004 ( 0.005)	Loss 1.0024e+01 (1.2812e+01)	Acc@1   0.00 (  0.06)	Acc@5   0.00 (  0.31)	Entropy 2.0742e+00 (2.0772e+00)
.........

2021-01-17 15:30:43,634 Epoch: [4][43800/69180]	Time  3.355 ( 2.443)	Data  0.083 ( 0.073)	Loss 6.8538e+00 (6.9083e+00)	Acc@1   0.00 (  0.91)	Acc@5   0.00 (  3.33)	Entropy 1.7882e+00 (1.8038e+00)
2021-01-17 15:40:39,570 Epoch: [4][44000/69180]	Time  3.173 ( 2.445)	Data  0.082 ( 0.073)	Loss 6.8132e+00 (6.9083e+00)	Acc@1   0.00 (  0.91)	Acc@5   0.00 (  3.32)	Entropy 1.7889e+00 (1.8037e+00)
2021-01-17 15:51:22,701 Epoch: [4][44200/69180]	Time  3.590 ( 2.449)	Data  0.441 ( 0.073)	Loss 5.9202e+00 (6.9080e+00)	Acc@1  50.00 (  0.92)	Acc@5  50.00 (  3.33)	Entropy 1.7913e+00 (1.8037e+00)
2021-01-17 16:02:04,050 Epoch: [4][44400/69180]	Time  3.137 ( 2.452)	Data  0.067 ( 0.073)	Loss 6.2018e+00 (6.9080e+00)	Acc@1   0.00 (  0.92)	Acc@5   0.00 (  3.34)	Entropy 1.7874e+00 (1.8036e+00)
2021-01-17 16:12:47,046 Epoch: [4][44600/69180]	Time  3.181 ( 2.455)	Data  0.081 ( 0.073)	Loss 7.0442e+00 (6.9075e+00)	Acc@1   0.00 (  0.93)	Acc@5   0.00 (  3.34)	Entropy 1.7865e+00 (1.8035e+00)

Training data of speaker verification

To my knowledge, in speaker verification, the speakers in the test data should not appeared in the training data. For the VoxCeleb1 dataset, only the dev set (1211 speakers) are used for training, and the test set (40 speakers) are used for evaluation.

How to get speaker embedding?

I want to get speaker embedding for my own wav files to distinguish speakers. Could you please give me some suggestions?

Trained Model?

Any chance you guys could upload the model weights for this?

关于代码和论文的匹配疑问

在使用代码的时候发现一个问题，你们的proposed 1过了temporal pooling后是1024 dim,这个于文章中说的256dim不同是否是代码上少了一个dense结构

problem

When i run the code, the error happen:
RuntimeError: Invalid index in scatter at /tmp/pip-req-build-808afw3c/aten/src/TH/generic/THTensorEvenMoreMath.cpp:151
do konw why?

Fine tuning on a subset of data

Hello,

I need a clarification.

I did train the identification model on VoxCeleb1 data from scratch with the given genotype.

I need to finetune the model trained from scratch on a subset of the VoxCeleb2 dataset. In file 'train_identification.py' loading the trained model and train for a few epochs work for fine-tuning or do I need to write my own script for performing fine-tuning?

Regards,
Sreeni...

TypeError: type object argument after * must be an iterable, not property

Thanks for sharing repository,
I tried following the Quick start guideline.
When I ran the command for training from scratch for verification:
python train_verification.py --cfg exps/scratch/scratch.yaml --text_arch Genotype and got the error message TypeError: type object argument after * must be an iterable, not property. Error highlight in line 21 of models/model.py: op_names, indices = zip(*genotype.normal)
How to resolve?
Thanks for the reply

Issue with README

I found that the documented directory structure in the master/README.md did not run with the code.

Where it says:

The data should be organized as:

VoxCeleb1
 - wav
 - vox1_meta.csv
 - iden_split.txt
 - veri_test.txt

It should say:

The data should be organized as:

VoxCeleb1
 - dev/wav/...
 - test/wav/...
 - vox1_meta.csv
 - iden_split.txt
 - veri_test.txt

'mean.npy' and 'std.npy' are missed

can you share the numpy files 'mean.npy' and 'std.npy'?

Reducing the batch size leads to overfitting?

Hello,

With batch size of 256, training the ResNet18 model, I go the CUDA out of memory error, so I had to reduce the batch size to 128. With batch size 128, below are the logs:

Epoch: [49][ 200/1080] Time 2.723 ( 2.664) Data 2.575 ( 2.515) Loss 1.3415e+00 (1.3413e+00) Acc@1 100.00 ( 99.33) Acc@5 100.00 ( 99.96)
Epoch: [49][ 400/1080] Time 3.064 ( 2.778) Data 2.915 ( 2.630) Loss 1.3431e+00 (1.3420e+00) Acc@1 99.22 ( 99.31) Acc@5 100.00 ( 99.96)
Epoch: [49][ 600/1080] Time 2.882 ( 2.845) Data 2.729 ( 2.696) Loss 1.3618e+00 (1.3444e+00) Acc@1 100.00 ( 99.27) Acc@5 100.00 ( 99.95)

I am not sure if the model is overfitting, could you please provide clarification on reading this output logs?

I am using the VoxCeleb1 data from 2022 challenge.

Pre-trained Models

Hello, thanks for sharing the code. Do you consider releasing the pre-trained models? I noticed a closed issue about this, but I cannot find the pre-trained models anywhere. Thanks a lot.

Alpha parameters are getting turned into 'nan' values

I have filtered the VoxCeleb1 dataset for only Indian nationalities but don't know why In the training loop, after a few batches, the alpha parameters are getting changed into NaN values. And I am getting the error from the forward function of MixedOp as
OverflowError: weight: tensor([nan, nan, nan, nan, nan, nan, nan, nan], device='cuda:0'.

Can anyone help with this?

How much space do i need for the preprocessing task?

Hi, I am preprocessing the audio files and it is taking too much space. More than 200 GB and my SSD is getting full every time. Can you tell me how much space I need for preprocessing the dataset?

vita-group / autospeech Goto Github PK

autospeech's People

Contributors

Stargazers

Watchers

Forkers

autospeech's Issues

Recommend Projects

Recommend Topics

Recommend Org