(Self use codebase) Diffusion based waveforom synthesizer. A collection or (re)implementations of interesting audio related diffusion models. Trainging framework is based on the Diffwave repo.
Training: run
python main.py --model_dir /path/to/ckpt --data_dirs /path/to/wavs
Inference: run
python inference.py --model_dir /path/to/ckpt --output /path/to/audio --num_wavs 512
- Unconditional diffusion
- Unconditional diffusion w and w/o guidance
- Conditional diffusion: Encoder(Conditioner) + Decoder(Vocoder)
- Notes
We compare different frameworks by testing on sc09 dataset using unconditional audio generation benchmark repo.
System | Backbone | Sampler | FID | Inception | mInception | AM |
---|---|---|---|---|---|---|
diffwave | WaveNet | AS | 1.80 | 5.70 | 51.88 | 0.65 |
audiodiff | UNet1d | AS | 1.51 | 7.07 | 105.8 | 0.471 |