Giter VIP home page Giter VIP logo

audio-ai-timeline's Introduction

Audio AI Timeline

Here we will keep track of the latest AI models for waveform based audio generation, starting in 2023!

2023

Date Release [Samples] Paper Code Trained Model
14.11 Mustango: Toward Controllable Text-to-Music Generation arXiv GitHub Hugging Face
13.11 Music ControlNet: Multiple Time-varying Controls for Music Generation arXiv - -
02.11 E3 TTS: Easy End-to-End Diffusion-based Text to Speech arXiv - -
01.10 UniAudio: An Audio Foundation Model Toward Universal Audio Generation arXiv GitHub -
24.09 VoiceLDM: Text-to-Speech with Environmental Context arXiv GitHub -
05.09 PromptTTS 2: Describing and Generating Voices with Text Prompt arXiv - -
14.08 SpeechX: Neural Codec Language Model as a Versatile Speech Transformer arXiv - -
10.08 AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining arXiv GitHub Hugging Face
09.08 JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models arXiv - -
03.08 MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies arXiv GitHub -
14.07 Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts arXiv - -
10.07 VampNet: Music Generation via Masked Acoustic Token Modeling arXiv GitHub -
22.06 AudioPaLM: A Large Language Model That Can Speak and Listen arXiv - -
19.06 Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale PDF GitHub -
08.06 MusicGen: Simple and Controllable Music Generation arXiv GitHub Hugging Face Colab
06.06 Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias arXiv - -
01.06 Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis arXiv GitHub -
29.05 Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation arXiv - -
25.05 MeLoDy: Efficient Neural Music Generation arXiv - -
18.05 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training arXiv - -
18.05 SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities arXiv GitHub -
16.05 SoundStorm: Efficient Parallel Audio Generation arXiv GitHub (unofficial) -
03.05 Diverse and Vivid Sound Generation from Text Descriptions arXiv - -
02.05 Long-Term Rhythmic Video Soundtracker arXiv GitHub -
24.04 TANGO: Text-to-Audio generation using instruction tuned LLM and Latent Diffusion Model PDF GitHub Hugging Face
18.04 NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers arXiv GitHub (unofficial) -
10.04 Bark: Text-Prompted Generative Audio Model - GitHub Hugging Face Colab
03.04 AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models arXiv - -
08.03 VALL-E X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling arXiv - -
27.02 I Hear Your True Colors: Image Guided Audio Generation arXiv GitHub -
08.02 Noise2Music: Text-conditioned Music Generation with Diffusion Models arXiv - -
04.02 Multi-Source Diffusion Models for Simultaneous Music Generation and Separation arXiv GitHub -
30.01 SingSong: Generating musical accompaniments from singing arXiv - -
30.01 AudioLDM: Text-to-Audio Generation with Latent Diffusion Models arXiv GitHub Hugging Face
30.01 Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion arXiv GitHub -
29.01 Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models PDF - -
28.01 Noise2Music - - -
27.01 RAVE2 [Samples RAVE1] arXiv GitHub -
26.01 MusicLM: Generating Music From Text arXiv GitHub (unofficial) -
18.01 Msanii: High Fidelity Music Synthesis on a Shoestring Budget arXiv GitHub Hugging Face Colab
16.01 ArchiSound: Audio Generation with Diffusion arXiv GitHub -
05.01 VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers arXiv GitHub (unofficial) (demo) -

audio-ai-timeline's People

Contributors

flavioschneider avatar haoheliu avatar justinyuu avatar lifeiteng avatar yuan-manx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.