Giter VIP home page Giter VIP logo

Dipjyoti Paul's Projects

club icon club

Code for ICML2020 paper - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information

expressive-fastspeech2 icon expressive-fastspeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS (text to speech, speech synthesis) based on FastSpeech2, supporting English and Korean

expressivetacotron icon expressivetacotron

This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron, GST, VAE, GMVAE, and X-vectors for building prosody encoder.

fairseq icon fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

fastspeech2 icon fastspeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

melgan icon melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2)

meta-tts icon meta-tts

Official repository of https://doi.org/10.1109/TASLP.2022.3167258

nemo icon nemo

NeMo: a toolkit for conversational AI

nv-wavenet icon nv-wavenet

Reference implementation of real-time autoregressive wavenet inference

sc-wavernn icon sc-wavernn

Official PyTorch implementation of Speaker Conditional WaveRNN

speechsplit icon speechsplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck

styler icon styler

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, Interspeech 2021

svoice icon svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.

transformertts icon transformertts

🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.

tts icon tts

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.