zhikangniu / encodec-pytorch Goto Github PK

View Code? Open in Web Editor NEW

123.0 4.0 11.0 1.49 MB

unofficial implementation of the High Fidelity Neural Audio Compression

License: MIT License

Python 96.57% Shell 3.33% Dockerfile 0.10%

audio-compression audio-processing encodec pytorch

encodec-pytorch's People

Contributors

Stargazers

Watchers

Forkers

ishine whitefu techthiyanes spiralanch zetangforward pataweepr lilianedng leoauri wantt jinzr tksavy

encodec-pytorch's Issues

Release discriminator checkpoint

The pre-trained model on LibriTTS960h works very well, I wonder if you can release the corresponding discriminator checkpoint so I can continue training on my custom dataset instead of training from scrach.

[REQ] add license file

Hi there, 1st of all thanks for your awesome work !

Since we've "doxed" it in our HyMPS project (under the AUDIO \ AI-based projects page \ Codecs subsection), can you please add a "GH-standardized" license file ?

Expliciting licensing terms is extremely important to let other devs (and not only) understand how to reuse/adapt/modify your code in other open projects and vice-versa.

Although it may sounds like a minor aspect, license file omission obviously causes an inconsistent generation of the relative badge too:

(generative URL: https://flat.badgen.net/github/license/NoFish-528/encodec-pytorch/?label=LICENSE)

Anyway you can easily set a "compliant" one through the GH's license wizard tool.

Last but not least, let us know how - in your opinion - we could improve categorization/sorting of collected projects in order to push their evolution by favouring collaborations between developers (and not only).

Thanks in advance.

reconstruction audio not perform well

Some problems about encodec

Hello, i am interested in encodec. After reviewing the paper, it seems to have some similarities to self-supervised learning? I want to use it to finish the classification task.

is there any sample ?

thank you for your hard work.
i would like to customize and use this code, but before that, I'd like to try out a test sample.
could I listen to a test sample from the model you trained?

Training speed

I am training encodec on 8 gpus. The current training speed of the model is exceptionally slow, about 10min per 100 iterations. I don't know whether the gpus are fully utilized. Could you please give a reference for a normal training speed? Meanwhile, do you know which part of the model could be the bottleneck for training speed?

Additional config details from the hugging face checkpoints

Hi there,

Thanks a bunch for your effort on that. This is fantastic work.

I was wondering if you could provide a bit more details about the configuration you've used in training the checkpoints you've provided on hugging face ? They sound great and I'd like to re-train them for my own purpose. From their file names, I can infer the following: batch_size=12, tensor_cut=100000, and lr=0.0001, is this right ? What about warmup_epoch, for example ? Additionally, did you use only a subset of LibriTTS or the full 960 hours ?

Thanks again !

Missing loss functions?

Hello!

In Section 3.4 (https://arxiv.org/pdf/2210.13438v1.pdf) loss functions are given.

Which loss functions are still missing in the code?

wandb & tensorboard

visualize the loss

generate_train_file.py

Something is wrong in generate_train_file.py.

Please check "and mode in root".

No broadcast on buffer in DDP training?

When using multi-gpu, why does buffer broadcast is disabled as in this line:
https://github.com/NoFish-528/encodec-pytorch/blob/bd734c5dd2327456cc4b230ed6b3af9afd3d3145/train_multi_gpu.py#L269

In EuclideanCodebook module, the codebook is stored as buffer:
https://github.com/NoFish-528/encodec-pytorch/blob/bd734c5dd2327456cc4b230ed6b3af9afd3d3145/quantization/core_vq.py#L143

In my opinion, the buffer should be synchronized across all devices. Therefore, broadcast_buffers flag must be set to True.

Use language model in my_encodec?

Hello,I'm training on my own dataset now,and i wonder if i can use --lm to further compress the stream using entropy coding on my_encodec model,and how can I modify the code?

Is EnCodec_Trainer latest?

Hi again!

Is your code having the latest and greatest changes from the https://github.com/Mikxox/EnCodec_Trainer repository?

Problems with model performance

I attempt to train the encodec model on a 16kHz dataset with about 50000 waveforms. I am training on 8 gpus on 2 machines. I use tensor_cut = 65536, batch_size = 32(each gpu), ratios = [8,5,4,4], and lr = 5e-5. (Other configs set as default.) The model loss can converge to about the following values:

2024-07-13 17:12:10,985: INFO: [train_with_torchrun.py: 146]: Epoch 100 120/120	Avg loss_G: 8.3933	Avg losses_G: l_t: 0.0886	l_f: 5.9251	l_g: 0.5467	l_feat: 0.2731	Avg loss_W: 0.1236	lr_G: 5.306871e-06	lr_D: 5.306871e-06	loss_disc: 1.8721
2024-07-13 17:12:39,658: INFO: [train_with_torchrun.py: 165]: | TEST | epoch: 100 | loss_g: 6.761796489357948 | loss_disc: 1.8528

But the reconstructed audio is horrible. I would like to ask whether these loss values can properly indicate that the model can fit the dataset? Also, are there any issues with my config, and are there any other considerations for setting the config? Thank you!

encodec_ASR

hi~i have one question about encodec, I have noticed some research focus on TTS using encodec. So how about ASR using encodec? I wonder if you have tried it!

RuntimeError: one of the variables needed for gradient computation has been modified

Hi,

I am writing to seek your advice on an issue I am experiencing during backpropagation of my model. Specifically, I am encountering an error in the loss function after the warmup stage and am unsure how to proceed. Maybe enter if config.model.train_discriminator and epoch > config.lr_scheduler.warmup_epoch:

I would greatly appreciate any guidance or suggestions you may have to help me address this problem.

log：
Error executing job with overrides: ['distributed.torch_distributed_debug=False', 'distributed.find_unused_parameters=True', 'distributed.world_size=2', 'common.max_epoch=15', 'datasets.tensor_cut=8000', 'datasets.batch_size=40', 'datasets.train_csv_path=/home/anna.peng/PycharmProjects/encodec-pytorch-main/librispeech_train100h_anna.csv', 'lr_scheduler.warmup_epoch=2', 'optimization.lr=1e-4', 'optimization.disc_lr=1e-4']
Traceback (most recent call last):
File "train_multi_gpu.py", line 258, in main
join=True
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/anna.peng/PycharmProjects/encodec-pytorch-main/train_multi_gpu.py", line 209, in train
scheduler,disc_scheduler)
File "/home/anna.peng/PycharmProjects/encodec-pytorch-main/train_multi_gpu.py", line 59, in train_one_step
loss.backward()
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/_tensor.py", line 489, in backward
self, gradient, retain_graph, create_graph, inputs=inputs
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/autograd/init.py", line 199, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 32, 3, 3]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

AMP Loss -> Nan

the amp training isn't stable. The loss will be Nan

48 kHz mono training?

Hi! I think the code works mechanically to train the 24 kHz model. Do you know if one can train 48 kHz mono?

How to train on librispeech

Hello, using LibriTTS960 training about how many epochs, the model can be used. Can you share some config or training information?

traing convergence

The effect of multi-gpus training is not as good as that of single-card training, and it feels that multi-card training is quickly overfitted.

scheduler.step called multiple times?

Hi, I see scheduler.step() called in https://github.com/ZhikangNiu/encodec-pytorch/blob/main/train_multi_gpu.py#L65 (if config.common.amp) and in https://github.com/ZhikangNiu/encodec-pytorch/blob/main/train_multi_gpu.py#L99. Is this a bug?

你的模型可不可以导入官方的模型参数进行微调呀？

Test routine only logs metric of last dataloader item ?

encodec-pytorch/train_multi_gpu.py

Lines 127 to 128 in 036c443

 writer.add_scalar('Test/Loss_G', loss_g.item(), epoch) 

 writer.add_scalar('Test/Loss_Disc',loss_disc.item(), epoch)

Hey there! Thanks again for your effort on this repo. Very minor detail; it looks like the test function currently only logs the metrics for the last dataloader item (maybe I misunderstood how this plays out with DataParallel, if so my apology). Any reason behind that ?

Able to reproduce Meta's quality?

Did you try to replicate training of Meta?
Just curious - if it is at all possible to replicate the stuff from the code that was shared by Meta?

I would be very curious to hear your opinions.

Thanks!

	writer.add_scalar('Test/Loss_G', loss_g.item(), epoch)
	writer.add_scalar('Test/Loss_Disc',loss_disc.item(), epoch)