Giter VIP home page Giter VIP logo

vita-group / autospeech Goto Github PK

View Code? Open in Web Editor NEW
206.0 206.0 42.0 198 KB

[InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei Zha, Zhangyang Wang

Home Page: https://arxiv.org/abs/2005.03215

License: MIT License

Python 99.02% Shell 0.98%
automl autospeech neural-architecture-search pytorch speaker-recognition

autospeech's People

Contributors

shaojinding avatar soellingeraj avatar tianlong-chen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autospeech's Issues

training search stage is very slow.

When I read your paper and follow your code.While I run the command CUDA_VISIBLE_DEVICES=0 python search.py --cfg exps/search.yaml, I run it about 8 day on single Quadro RTX 8000(it has 45GB CUDA memory). But I only run 4 epochs, I don't know Why(because your paper say:search stage is about 5 days on single NVIDIA TITAN RTX GPU).

I will summary some running logging:

2021-01-09 13:48:30,025 Namespace(cfg='exps/search.yaml', load_path=None, opts=[], path_helper={'prefix': 'logs_search/search_2021_01_09_13_48_29', 'ckpt_path': 'logs_search/search_2021_01_09_13_48_29/Model', 'log_path': 'logs_search/search_2021_01_09_13_48_29/Log', 'sample_path': 'logs_search/search_2021_01_09_13_48_29/Samples'})
2021-01-09 13:48:30,026 CUDNN:
  BENCHMARK: True
  DETERMINISTIC: False
  ENABLED: True
DATASET:
  DATA_DIR: data/VoxCeleb1
  NUM_WORKERS: 0
  PARTIAL_N_FRAMES: 300
  SUB_DIR: merged
  TEST_DATASET: 
  TEST_DATA_DIR: 
MODEL:
  DROP_PATH_PROB: 0.2
  INIT_CHANNELS: 64
  LAYERS: 8
  NAME: model_search
  NUM_CLASSES: 1251
  PRETRAINED: False
PRINT_FREQ: 200
SEED: 3
TRAIN:
  ARCH_BETA1: 0.9
  ARCH_BETA2: 0.999
  ARCH_LR: 0.001
  ARCH_WD: 0.001
  BATCH_SIZE: 2
  BEGIN_EPOCH: 0
  BETA1: 0.9
  BETA2: 0.999
  DROPPATH_PROB: 0.2
  END_EPOCH: 50
  LR: 0.01
  LR_MIN: 0.001
  WD: 0.0003
VAL_FREQ: 5
2021-01-09 13:48:32,472 genotype = Genotype(normal=[('sep_conv_5x5', 1), ('max_pool_3x3', 0), ('dil_conv_5x5', 0), ('dil_conv_3x3', 2), ('skip_connect', 0), ('avg_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 3)], normal_concat=range(2, 6), reduce=[('max_pool_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 2), ('max_pool_3x3', 0), ('dil_conv_5x5', 0), ('dil_conv_3x3', 1), ('sep_conv_3x3', 1), ('sep_conv_3x3', 4)], reduce_concat=range(2, 6))
2021-01-09 13:48:47,236 Epoch: [0][    0/69180]	Time 14.752 (14.752)	Data  0.011 ( 0.011)	Loss 7.0746e+00 (7.0746e+00)	Acc@1   0.00 (  0.00)	Acc@5   0.00 (  0.00)	Entropy 2.0794e+00 (2.0794e+00)
2021-01-09 14:03:16,840 Epoch: [0][  200/69180]	Time  4.416 ( 4.400)	Data  0.004 ( 0.005)	Loss 2.0358e+01 (1.0089e+01)	Acc@1   0.00 (  0.25)	Acc@5   0.00 (  0.50)	Entropy 2.0788e+00 (2.0792e+00)
2021-01-09 14:15:37,689 Epoch: [0][  400/69180]	Time  1.702 ( 4.053)	Data  0.004 ( 0.005)	Loss 1.6950e+01 (1.1480e+01)	Acc@1   0.00 (  0.12)	Acc@5   0.00 (  0.50)	Entropy 2.0772e+00 (2.0786e+00)
2021-01-09 14:21:15,859 Epoch: [0][  600/69180]	Time  1.688 ( 3.267)	Data  0.004 ( 0.005)	Loss 1.6810e+01 (1.2325e+01)	Acc@1   0.00 (  0.08)	Acc@5   0.00 (  0.33)	Entropy 2.0757e+00 (2.0779e+00)
2021-01-09 14:26:53,783 Epoch: [0][  800/69180]	Time  1.687 ( 2.873)	Data  0.004 ( 0.005)	Loss 1.0024e+01 (1.2812e+01)	Acc@1   0.00 (  0.06)	Acc@5   0.00 (  0.31)	Entropy 2.0742e+00 (2.0772e+00)
.........

2021-01-17 15:30:43,634 Epoch: [4][43800/69180]	Time  3.355 ( 2.443)	Data  0.083 ( 0.073)	Loss 6.8538e+00 (6.9083e+00)	Acc@1   0.00 (  0.91)	Acc@5   0.00 (  3.33)	Entropy 1.7882e+00 (1.8038e+00)
2021-01-17 15:40:39,570 Epoch: [4][44000/69180]	Time  3.173 ( 2.445)	Data  0.082 ( 0.073)	Loss 6.8132e+00 (6.9083e+00)	Acc@1   0.00 (  0.91)	Acc@5   0.00 (  3.32)	Entropy 1.7889e+00 (1.8037e+00)
2021-01-17 15:51:22,701 Epoch: [4][44200/69180]	Time  3.590 ( 2.449)	Data  0.441 ( 0.073)	Loss 5.9202e+00 (6.9080e+00)	Acc@1  50.00 (  0.92)	Acc@5  50.00 (  3.33)	Entropy 1.7913e+00 (1.8037e+00)
2021-01-17 16:02:04,050 Epoch: [4][44400/69180]	Time  3.137 ( 2.452)	Data  0.067 ( 0.073)	Loss 6.2018e+00 (6.9080e+00)	Acc@1   0.00 (  0.92)	Acc@5   0.00 (  3.34)	Entropy 1.7874e+00 (1.8036e+00)
2021-01-17 16:12:47,046 Epoch: [4][44600/69180]	Time  3.181 ( 2.455)	Data  0.081 ( 0.073)	Loss 7.0442e+00 (6.9075e+00)	Acc@1   0.00 (  0.93)	Acc@5   0.00 (  3.34)	Entropy 1.7865e+00 (1.8035e+00)

Training data of speaker verification

To my knowledge, in speaker verification, the speakers in the test data should not appeared in the training data. For the VoxCeleb1 dataset, only the dev set (1211 speakers) are used for training, and the test set (40 speakers) are used for evaluation.

How to get speaker embedding?

I want to get speaker embedding for my own wav files to distinguish speakers. Could you please give me some suggestions?

Trained Model?

Any chance you guys could upload the model weights for this?

关于代码和论文的匹配疑问

在使用代码的时候发现一个问题,你们的proposed 1过了temporal pooling后是1024 dim,这个于文章中说的256dim不同是否是代码上少了一个dense结构

problem

When i run the code, the error happen:
RuntimeError: Invalid index in scatter at /tmp/pip-req-build-808afw3c/aten/src/TH/generic/THTensorEvenMoreMath.cpp:151
do konw why?

Fine tuning on a subset of data

Hello,

I need a clarification.

I did train the identification model on VoxCeleb1 data from scratch with the given genotype.

I need to finetune the model trained from scratch on a subset of the VoxCeleb2 dataset. In file 'train_identification.py' loading the trained model and train for a few epochs work for fine-tuning or do I need to write my own script for performing fine-tuning?

Regards,
Sreeni...

TypeError: type object argument after * must be an iterable, not property

Thanks for sharing repository,
I tried following the Quick start guideline.
When I ran the command for training from scratch for verification:
python train_verification.py --cfg exps/scratch/scratch.yaml --text_arch Genotype and got the error message TypeError: type object argument after * must be an iterable, not property. Error highlight in line 21 of models/model.py: op_names, indices = zip(*genotype.normal)
How to resolve?
Thanks for the reply

Issue with README

I found that the documented directory structure in the master/README.md did not run with the code.

Where it says:

The data should be organized as:

VoxCeleb1
 - wav
 - vox1_meta.csv
 - iden_split.txt
 - veri_test.txt

It should say:

The data should be organized as:

VoxCeleb1
 - dev/wav/...
 - test/wav/...
 - vox1_meta.csv
 - iden_split.txt
 - veri_test.txt

Reducing the batch size leads to overfitting?

Hello,

With batch size of 256, training the ResNet18 model, I go the CUDA out of memory error, so I had to reduce the batch size to 128. With batch size 128, below are the logs:

Epoch: [49][ 200/1080] Time 2.723 ( 2.664) Data 2.575 ( 2.515) Loss 1.3415e+00 (1.3413e+00) Acc@1 100.00 ( 99.33) Acc@5 100.00 ( 99.96)
Epoch: [49][ 400/1080] Time 3.064 ( 2.778) Data 2.915 ( 2.630) Loss 1.3431e+00 (1.3420e+00) Acc@1 99.22 ( 99.31) Acc@5 100.00 ( 99.96)
Epoch: [49][ 600/1080] Time 2.882 ( 2.845) Data 2.729 ( 2.696) Loss 1.3618e+00 (1.3444e+00) Acc@1 100.00 ( 99.27) Acc@5 100.00 ( 99.95)

I am not sure if the model is overfitting, could you please provide clarification on reading this output logs?

I am using the VoxCeleb1 data from 2022 challenge.

Pre-trained Models

Hello, thanks for sharing the code. Do you consider releasing the pre-trained models? I noticed a closed issue about this, but I cannot find the pre-trained models anywhere. Thanks a lot.

Alpha parameters are getting turned into 'nan' values

I have filtered the VoxCeleb1 dataset for only Indian nationalities but don't know why In the training loop, after a few batches, the alpha parameters are getting changed into NaN values. And I am getting the error from the forward function of MixedOp as
OverflowError: weight: tensor([nan, nan, nan, nan, nan, nan, nan, nan], device='cuda:0'.

Can anyone help with this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.