Giter VIP home page Giter VIP logo

encodec-pytorch's People

Contributors

leoauri avatar zhikangniu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

encodec-pytorch's Issues

Release discriminator checkpoint

The pre-trained model on LibriTTS960h works very well, I wonder if you can release the corresponding discriminator checkpoint so I can continue training on my custom dataset instead of training from scrach.

[REQ] add license file

Hi there, 1st of all thanks for your awesome work !

Since we've "doxed" it in our HyMPS project (under the AUDIO \ AI-based projects page \ Codecs subsection), can you please add a "GH-standardized" license file ?

Expliciting licensing terms is extremely important to let other devs (and not only) understand how to reuse/adapt/modify your code in other open projects and vice-versa.

Although it may sounds like a minor aspect, license file omission obviously causes an inconsistent generation of the relative badge too:


(generative URL: https://flat.badgen.net/github/license/NoFish-528/encodec-pytorch/?label=LICENSE)

Anyway you can easily set a "compliant" one through the GH's license wizard tool.

Last but not least, let us know how - in your opinion - we could improve categorization/sorting of collected projects in order to push their evolution by favouring collaborations between developers (and not only).

Thanks in advance.

Some problems about encodec

Hello, i am interested in encodec. After reviewing the paper, it seems to have some similarities to self-supervised learning? I want to use it to finish the classification task.

is there any sample ?

thank you for your hard work.
i would like to customize and use this code, but before that, I'd like to try out a test sample.
could I listen to a test sample from the model you trained?

Training speed

I am training encodec on 8 gpus. The current training speed of the model is exceptionally slow, about 10min per 100 iterations. I don't know whether the gpus are fully utilized. Could you please give a reference for a normal training speed? Meanwhile, do you know which part of the model could be the bottleneck for training speed?

Additional config details from the hugging face checkpoints

Hi there,

Thanks a bunch for your effort on that. This is fantastic work.

I was wondering if you could provide a bit more details about the configuration you've used in training the checkpoints you've provided on hugging face ? They sound great and I'd like to re-train them for my own purpose. From their file names, I can infer the following: batch_size=12, tensor_cut=100000, and lr=0.0001, is this right ? What about warmup_epoch, for example ? Additionally, did you use only a subset of LibriTTS or the full 960 hours ?

Thanks again !

No broadcast on buffer in DDP training?

When using multi-gpu, why does buffer broadcast is disabled as in this line:
https://github.com/NoFish-528/encodec-pytorch/blob/bd734c5dd2327456cc4b230ed6b3af9afd3d3145/train_multi_gpu.py#L269

In EuclideanCodebook module, the codebook is stored as buffer:
https://github.com/NoFish-528/encodec-pytorch/blob/bd734c5dd2327456cc4b230ed6b3af9afd3d3145/quantization/core_vq.py#L143

In my opinion, the buffer should be synchronized across all devices. Therefore, broadcast_buffers flag must be set to True.

Use language model in my_encodec?

Hello,I'm training on my own dataset now,and i wonder if i can use --lm to further compress the stream using entropy coding on my_encodec model,and how can I modify the code?

Problems with model performance

I attempt to train the encodec model on a 16kHz dataset with about 50000 waveforms. I am training on 8 gpus on 2 machines. I use tensor_cut = 65536, batch_size = 32(each gpu), ratios = [8,5,4,4], and lr = 5e-5. (Other configs set as default.) The model loss can converge to about the following values:

2024-07-13 17:12:10,985: INFO: [train_with_torchrun.py: 146]: Epoch 100 120/120	Avg loss_G: 8.3933	Avg losses_G: l_t: 0.0886	l_f: 5.9251	l_g: 0.5467	l_feat: 0.2731	Avg loss_W: 0.1236	lr_G: 5.306871e-06	lr_D: 5.306871e-06	loss_disc: 1.8721
2024-07-13 17:12:39,658: INFO: [train_with_torchrun.py: 165]: | TEST | epoch: 100 | loss_g: 6.761796489357948 | loss_disc: 1.8528

But the reconstructed audio is horrible. I would like to ask whether these loss values can properly indicate that the model can fit the dataset? Also, are there any issues with my config, and are there any other considerations for setting the config? Thank you!

encodec_ASR

hi~i have one question about encodec, I have noticed some research focus on TTS using encodec. So how about ASR using encodec? I wonder if you have tried it!

RuntimeError: one of the variables needed for gradient computation has been modified

Hi,

I am writing to seek your advice on an issue I am experiencing during backpropagation of my model. Specifically, I am encountering an error in the loss function after the warmup stage and am unsure how to proceed. Maybe enter if config.model.train_discriminator and epoch > config.lr_scheduler.warmup_epoch:

I would greatly appreciate any guidance or suggestions you may have to help me address this problem.

log:
Error executing job with overrides: ['distributed.torch_distributed_debug=False', 'distributed.find_unused_parameters=True', 'distributed.world_size=2', 'common.max_epoch=15', 'datasets.tensor_cut=8000', 'datasets.batch_size=40', 'datasets.train_csv_path=/home/anna.peng/PycharmProjects/encodec-pytorch-main/librispeech_train100h_anna.csv', 'lr_scheduler.warmup_epoch=2', 'optimization.lr=1e-4', 'optimization.disc_lr=1e-4']
Traceback (most recent call last):
File "train_multi_gpu.py", line 258, in main
join=True
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/anna.peng/PycharmProjects/encodec-pytorch-main/train_multi_gpu.py", line 209, in train
scheduler,disc_scheduler)
File "/home/anna.peng/PycharmProjects/encodec-pytorch-main/train_multi_gpu.py", line 59, in train_one_step
loss.backward()
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/_tensor.py", line 489, in backward
self, gradient, retain_graph, create_graph, inputs=inputs
File "/home/anna.peng/.local/lib/python3.7/site-packages/torch/autograd/init.py", line 199, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 32, 3, 3]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

48 kHz mono training?

Hi! I think the code works mechanically to train the 24 kHz model. Do you know if one can train 48 kHz mono?

How to train on librispeech

Hello, using LibriTTS960 training about how many epochs, the model can be used. Can you share some config or training information?

traing convergence

The effect of multi-gpus training is not as good as that of single-card training, and it feels that multi-card training is quickly overfitted.

Test routine only logs metric of last dataloader item ?

writer.add_scalar('Test/Loss_G', loss_g.item(), epoch)
writer.add_scalar('Test/Loss_Disc',loss_disc.item(), epoch)

Hey there! Thanks again for your effort on this repo. Very minor detail; it looks like the test function currently only logs the metrics for the last dataloader item (maybe I misunderstood how this plays out with DataParallel, if so my apology). Any reason behind that ?

Able to reproduce Meta's quality?

Did you try to replicate training of Meta?
Just curious - if it is at all possible to replicate the stuff from the code that was shared by Meta?

I would be very curious to hear your opinions.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.