Wenzhe Liu (刘文哲)'s Projects
Audio Captioning datasets for PyTorch.
Advanced Signal Processing Notebooks and Tutorials
Audio Super Resolution in the Spectral Domain
AI Audio Datasets 🎵. A list of datasets consisting of speech, music, and sound effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
Audio Coding Notebooks and Tutorials
Code for "End-to-End Optimized Speech Coding with Deep Neural Networks" (ICASSP 2018)
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting works (such as Music Synthesis, Automatic Music Transcription, Automatic MOS Prediction, SSL-based ASR...etc).
speech enhancement\speech seperation\sound source localization
不同波束形成算法仿真,共计30余种
1st Clarity Enhancement Challenge
Complex-valued Spatial Autoencoders for Multichannel Speech Enhancement
一个简单快速的分词、命名实体识别工具
Wenzhe Liu Notes: deep filter reproduction, see: 23_3090_speakerfilter_new_deepfilter_final_1024_new/networks/speakerfilter.py i.e. https://github.com/heshulin/23_3090_speakerfilter_new_deepfilter_final_1024_new/blob/86dd75cb9f7858b11e8adc0097da372f706c23a1/networks/speakerfilter.py#L103
A framework for large scale recommendation algorithms.
ERB representation of an audio file implemented in Python
The official PyTorch implementation of Google's Gemma models
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
MATLAB script of Independent Low-Rank Matrix Analysis (ILRMA)
FSA/FST algorithms, differentiable, with PyTorch compatibility.
C++ Audio and Music DSP Library
The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023
This is the microphone array generalization investigation based on previous Narrow Band Deep Filtering methods.
Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra