Giter VIP home page Giter VIP logo

fsl-experi's Introduction

Few-shot learning experiments

Dumping ground for miscellaneous ML experiments with focus on FSL.

Dependencies

Using conda to manage dependencies. Detailed list of dependencies in environment.yml and requirements.txt.

Eduskunta dataset

EDUSKUNTA.md contains a guide for compiling speaker recognition dataset consisting of Finnish speech from the The Plenary Sessions of the Parliament of Finland dataset.

Experiments

snn/librispeech/ contains multiple speaker recognition networks for one-shot learning on LibriSpeech dataset.

Useful command line options

General model options

  • --model snn | snn-capsnet | snn-angularproto | snn-softmaxproto:
  • --signal_transform melspectrogram | spectrogram | mfcc: The signal representation to feed to the ResNet, defaults to 'melspectrogram'.
  • --n_mels n: Number of Mels to use for the Mel spectrogram or MFCC, defaults to 40.
  • --n_fft n: The value of n_fft to use when constructing the spectrogram.

ResNet options:

  • --resnet_type thin | fast: Choose either thin-ResNet34 or fast-ResNet34, defaults to thin.
  • --resnet_aggregation_type SAP | ASP | NetVLAD | GhostVLAD: Choose the type of aggregation (or pooling) to use for the ResNet output, defaults to SAP.
  • --resnet_n_out n: Adjust the size of the ResNet output tensor, 512 by default.

Data augmentation options

  • --augment: Enable augmentation using audiomentations.
  • --torch_augment: Enable augmentation by torch-audiomentations.
  • --specaugment: Enable spectogram frequency and time masking as per SpecAugment[6].

Training options

  • --num_speakers n: Number of speakers to include in the training set, defaults to 0 which selects all available.
  • --num_train n: Number of random samples to take from the training set, defaults to the training set size but can be set higher.
  • --train_batch_size n: The batch size to use specifically only for training.

Networks

snn

Simple end-to-end Siamese neural network using binary cross-entropy loss and basic learning distance measure.

snn-angularproto

Neural network using metric learning with angular prototypical loss function[7].

Options:

  • --num_ways k: Number of speakers (or classes) to include in each training step.
  • --num_shots n: Number of samples to use per speaker.

snn-softmaxproto

Like snn-angularproto, but using softmax prototypical loss[8].

snn-capsnet

Experimenting based on ideas from paper by Hajavi et al. [5].

Extra

snn/omniglot/: Convolutional SNN for one-shot learning on Omniglot dataset[1].

Usage

python -m <model>.<dataset>.train --help

Example: train model snn/omniglot/ using 1 GPU:

python -O -m snn.omniglot.train --gpus 1 --num_workers 4 --batch_size 128
--max_epochs 50

References

  1. Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." In ICML deep learning workshop, vol. 2. 2015.
  2. Loshchilov, Ilya, and Frank Hutter. "Decoupled weight decay regularization." arXiv preprint arXiv:1711.05101 (2017). https://arxiv.org/abs/1711.05101.
  3. Smith, Leslie N., and Nicholay Topin. "Super-convergence: Very fast training of neural networks using large learning rates." In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications. Vol. 11006. International Society for Optics and Photonics, 2019. https://arxiv.org/abs/1708.07120.
  4. https://sgugger.github.io/the-1cycle-policy.html
  5. Hajavi, Amirhossein, and Ali Etemad. "Siamese Capsule Network for End-to-End Speaker Recognition In The Wild." arXiv preprint arXiv:2009.13480 (2020). https://arxiv.org/abs/2009.13480.
  6. Park, Daniel S., Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, and Yonghui Wu. "Specaugment on large scale datasets." In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6879-6883. IEEE, 2020. https://arxiv.org/abs/1904.08779.
  7. Chung, Joon Son and Huh, Jaesung and Mun, Seongkyu and Lee, Minjae and Heo, Hee Soo and Choe, Soyeon and Ham, Chiheon and Jung, Sunghwan and Lee, Bong-Jin and Han, Icksang. "In defence of metric learning for speaker recognition." Interspeech. 2019. https://arxiv.org/abs/2003.11982.
  8. Heo, Hee Soo and Lee, Bong-Jin and Huh, Jaesung and Chung, Joon Son. "Clova baseline system for the {VoxCeleb} Speaker Recognition Challenge 2020." arXiv preprint. 2020. https://arxiv.org/abs/2009.14153

fsl-experi's People

Contributors

vjoki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

banalasaritha

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.