Giter VIP home page Giter VIP logo

titanet's Introduction

TitaNet

titanet-architecture

This repository contains a small scale implementation of the following paper:

TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context,
Nithin Rao Koluguri, Taejin Park, Boris Ginsburg,
https://arxiv.org/abs/2110.04410.

It is "small scale" because we only rely on the LibriSpeech dataset, instead of using VoxCeleb1, VoxCeleb2, SRE, Fisher, Switchboard and LibriSpeech, as done in the original work. The main reason for this choice is related to resources, as the combined dataset has 3373 hours of speech, with 16681 speakers and 4890K utterances, which is quite big to be trained on Google Colab. Instead, the LibriSpeech subset that we consider has about 100 hours of speech, with 251 speakers and 28.5K utterances, which is sufficient to test the capabilities of the model. Moreover, we only test TitaNet on the speaker identification and verification tasks, instead of also testing it on speaker diarization.

Installation

In order to install all the dependencies required by the project, you need to make sure to have Python 3.9 installed on your system. Then, run the following commands to create a virtual environment and install the required libraries.

python3 -m venv venv
source venv/bin/activate
pip install -r init/requirements.txt

Execution

Both training and testing parts of the project are managed through a Jupyter notebook (titanet.ipynb). The notebook contains a broad analysis of the dataset in use, an explanation of all the data augmentation techniques reported in the paper, a description of the baseline and TitaNet models and a way to train and test them. Hyper-parameters are handled via the parameters.yml file. To run the Jupyter notebook, execute the following command:

jupyter notebook titanet.ipynb

If you just want to train a model from scratch, you can directly rely on the train.py module, which can be called in the following way:

python3 src/train.py -p "./parameters.yml"

Training and evaluation metrics, along with model checkpoints and results, are directly logged into a W&B project, which is openly accessible here. In case you want to perform a custom training run, you have to either disable W&B (see parameters.yml) or provide your own entity (your username), project and API key file location in the parameters.yml file. The W&B API key file is a plain text file that contains a single line with your W&B API key, that you can get from here.

Training & validation

This section shows training and validation metrics observed for around 75 epochs. In case you want to see more metrics, please head over to the W&B project.

Baseline CE vs TitaNet CE

This experiment compares training and validation loss and accuracy of the baseline and TitaNet models trained with cross-entropy loss. As we can see, training metrics reach similar values, while validation metrics are much better with TitaNet. Moreover, plots suggest that the baseline model had a slight overfitting problem.

Training Loss Training Accuracy
Validation Loss Validation Accuracy

TitaNet CE vs TitaNet ArcFace

This experiment compares training and validation loss and accuracy of two TitaNet models (model size "s"), trained with cross-entropy and ArcFace loss. The ArcFace parameters (scale and margin) are the ones specified in the original paper (30 and 0.2). As we can see, metrics are quite similar and no major differences can be observed.

Training Loss Training Accuracy
Validation Loss Validation Accuracy

Visualizations

This section shows some visual results obtained after training each embedding model for around 75 epochs. Please note that all figures represent the same set of utterances, even though different figures use different colours for the same speaker.

Baseline vs TitaNet on LibriSpeech

This test compares the baseline and TitaNet models on the LibriSpeech dataset used for training. Both models were trained with cross-entropy loss and 2D projections were performed with UMAP. As we can see, the good training and validation metrics of the baseline model are not mirrored in this empirical test. Instead, TitaNet is able to form compact clusters of utterances, thus reflecting the high performance metrics obtained during training.

Baseline TitaNet

Baseline vs TitaNet on VCTK

This test compares the baseline and TitaNet models on the VCTK dataset, unseen during training. Both models were trained with cross-entropy loss and 2D projections were performed with UMAP. As above, TitaNet beats the baseline model by a large margin.

Baseline TitaNet

SVD vs UMAP reduction

This test compares two 2D reduction methods, namely SVD and UMAP. Both figures rely on the TitaNet model trained with cross-entropy loss. As we can see, the choice of the reduction method highly influences our subjective evaluation, with UMAP giving much better separation in the latent space.

SVD UMAP

Cross-entropy vs ArcFace loss

This test compares two TitaNet models, one trained with cross-entropy loss and the other one trained with ArcFace loss. Both figures rely on UMAP as their 2D reduction method. As we can see, there doesn't seem to be a winner in this example, as both models are able to obtain good clustering properties.

Cross-entropy ArcFace

titanet's People

Contributors

wadaboa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

titanet's Issues

weights

Hello,Can you please provide the pretrained model for titanet and dvector model.

voxceleb dataset

Hi, thank you so much for sharing this code.
Recently, I have been reproducing and learning your project. Regarding custom data sets, both data sets in your project use torchaudio's own data. I have tried custom data in the past few days, but there have been many problems, resulting in failure. Hope you can guide me on non-torchaudio data such as the voxceleb dataset. thank you very much!

training

Hello, thank you very much for providing such a good project, currently trying to reproduce your project.
Currently encountering a problem during the training phase. A file is missing because google.colab cannot be used. Could you please provide the W&B API key file? ,thank you very much!

data set

Hello, thank you for providing such a good project.
The two data sets librispeech and vctk written in your document, how to create and load personal data sets?

titanet.ipynb: session crashed for an unknown reason

Hi, thank you for sharing this project.
I can run this project on my own computer successfully, but when I try to train it on Google Colab, I find that the colab session always crashed.
image

Could you give me some suggestions? Your help will be much appreciated!

How to train on a custom dataset?

  1. How to train the model on a custom dataset. I have seen the commit you pushed to train with voxceleb2 dataset which it downloads the data from the internet. But how can I train the model with a completely new dataset which is in my local pc?

2)This may be a dumb question but is there any cap on the maximum number of different classes(In this case people) the model can identify? Is it in the range of hundreads or thousands?

ValueError During Training Process

Dear sir,

First, I have to say that I really appreciate that you publish the code in this repo. What an amazing implementation!
However, I've just tried to train the given dataset followed by your tutorial and it seemed to work well until I saw the error:
image

I realized that final batch is really important for tasks related to the used custom dataset ! I think the error comes from the process of final batch.

Please give me feedback on this, and hopefully you will update the code ๐Ÿ˜„

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.