Giter VIP home page Giter VIP logo

Comments (47)

enk100 avatar enk100 commented on July 17, 2024 2

try both:
checkpoint/(name of expName)/bestmodel.pth
checkpoint/(name of expName)/lastmodel.pth

from loop.

enk100 avatar enk100 commented on July 17, 2024 1
  1. What is the duration of this 140 files? i think that you should train it with more data. for vctk experiments each speaker has 20-25 min. alternatively you can try to fit the model on the new speaker like we wrote in the new version of the paper (just note that you need to use model that train on large amount of speakers).
  2. good.
  3. you can check it on the logger.

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

the output is very different from my orig.wav file.
output.zip

from loop.

enk100 avatar enk100 commented on July 17, 2024

Did you use Blizzard 2011 dataset?

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

@enk100 no, i used my own datasets.

from loop.

enk100 avatar enk100 commented on July 17, 2024

sj_017.gen_0.wav - is blizzard
Are you sure you train it on your data?
Did you change the data path to your own dataset?

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Did you change the data path to your own dataset?

which part do you mean? on training?

from loop.

enk100 avatar enk100 commented on July 17, 2024

yes, on train.py

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

here's what I did @enk100 ,

  1. extract own dataset using extract_feats.py
  2. override data/blizzard/* with the datasets that was extracted from 1
  3. run the training python train.py --noise 1 --expName blizzard_init --seq-len 1600 --max-seq-len 1600 --data data/blizzard --nspk 1 --lr 1e-5 --epochs 10
  4. run the second stage of the training python train.py --noise 1 --expName blizzard --seq-len 1600 --max-seq-len 1600 --data data/blizzard --nspk 1 --lr 1e-4 --checkpoint checkpoints/blizzard_init/bestmodel.pth --epochs 90
  5. Then generate python generate.py --npz data/blizzard/numpy_features_valid/sj_017.npz --checkpoint models/blizzard/bestmodel.pth

from loop.

enk100 avatar enk100 commented on July 17, 2024

Are you sure you didn't mix between your dataset & blizzard?
Can you look into data/blizzard/ and check that it contain only your dataset?

It is very odd that you hear blizzard and you didn't train it on blizzard... maybe you start from checkpoint of blizzard model?

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

the data/blizzard only contains my datasets. I use the model/blizzard for training. Is that okay? or do I need to create a model from my datasets?

from loop.

enk100 avatar enk100 commented on July 17, 2024

You need to train the model from scratch.
Does the argument '--checkpoint' in train.py stay empty string or did you insert the blizzard model checkpoint?

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

on the first stage of training the --checkpoint is empty. on the second stage of training the --checkpoint i use is checkpoints/blizzard_init/bestmodel.pth

from loop.

enk100 avatar enk100 commented on July 17, 2024

check please the argument '--checkpoint' in train.py. if it contain some checkpoint of blizzard then the first stage train on pretrained model of blizzard

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

I'm sorry I'm confused on this statement

if it contain some checkpoint of blizzard then the first stage train on pretrained model of blizzard

from loop.

enk100 avatar enk100 commented on July 17, 2024

for example, if you got argument in 'default' in train.py-
parser.add_argument('--checkpoint', default='checkpoints/blizzard_init/bestmodel.pth', metavar='C', type=str, help='Checkpoint path')
then your training is initialize with blizzard model.

if the 'default' argument is empty then it is ok -
parser.add_argument('--checkpoint', default='', metavar='C', type=str, help='Checkpoint path')
you start to train your model from scratch.

somehow your model get blizzard samples, you should search for blizzard data leak.

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Ok I've done that. but what about the 2nd stage of training? do i need to execute it?

python train.py --noise 1 --expName blizzard --seq-len 1600 --max-seq-len 1600 --data data/blizzard --nspk 1 --lr 1e-4 --checkpoint checkpoints/blizzard_init/bestmodel.pth --epochs 90

from loop.

enk100 avatar enk100 commented on July 17, 2024

yes, you should execute it with the checkpoint argument
--checkpoint checkpoints/blizzard_init/bestmodel.pth

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

so it should give me the generated file with the same voice as my datasets right?

from loop.

enk100 avatar enk100 commented on July 17, 2024

yes, of course

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Thank you so much for the clarification @enk100

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Hi @enk100 I trained the data and generate an output but the generated wav file don't have a sound. See attachment below.
output2.zip

from loop.

enk100 avatar enk100 commented on July 17, 2024

1/ how many files do you have in your dataset for each speaker?
2/ are you sure that you extract the features correctly? you can check it by generate from the npz files
3/ how long did you train ? did you see convergence? can you share the learning curve ?

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024
  1. how many files do you have in your dataset for each speaker?
    A: I have 140 wav files of 1 speaker in my datasets and 140 txt files

  2. are you sure that you extract the features correctly? you can check it by generate from the npz files
    A: Yes I extracted it correctly. You can hear the generated npz on the zip file I attached (the file ending with 'orig.wav'.

  3. how long did you train ? did you see convergence? can you share the learning curve ?
    A: first stage of training 10 epochs. second stage of training 90 epochs. where can I see the convergence and the learning curve?

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024
  1. The total duration is 23 mins. what do you mean by this (just note that you need to use model that train on large amount of speakers). Does that mean that I don't have to train from scratch and just use the model in your paper instead?

Thanks.

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Hi @enk100 I used the data in the vctk corpus for single speaker. after generation there is no sound.

from loop.

enk100 avatar enk100 commented on July 17, 2024

Hi, you can choose -

  1. Combine your data with vctk data and train the model from scratch
  2. Take the vctk model, and fine tune to your new identity - add embedding vector for your new speaker

good luck.

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

You mean train it as multi speaker?

from loop.

enk100 avatar enk100 commented on July 17, 2024

yes. train it on vctk with the 22 speakers + your data

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

so i have to run extract_feats.py with the 22 speakers + my data right?

from loop.

lvenoxi avatar lvenoxi commented on July 17, 2024

@enk100 if I train it on vctk with the 22 speakers + my data , should I set the --nspkr to 23 in train.py?

from loop.

enk100 avatar enk100 commented on July 17, 2024

@lvenoxi - yes
@jaxlinksync - no, run extract_feats.py only for your data and then combine the vctk22 with your data

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

what about the norm.dat of the extracted data? do I have to add it also to the norm_info directory and just name it anything? in my case i name it sj_norm.dat.
so inside my norm_info directory is

  1. norm.dat (included on downloading data in the voiceloop)
  2. sj_norm.dat (norm file generated after extracting my datasets.)

from loop.

enk100 avatar enk100 commented on July 17, 2024

it only relevant when you are going to generate samples. so when you generate vctk, use vctk norm.dat. when you generate sj, use sj_norm.dat

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Hi @enk100 Thank you so much for your help. One last thing.
By generate samples you mean this command?
python generate.py --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 13 --checkpoint models/vctk/bestmodel.pth
how can I pass the sj_norm.dat as a parameter?

from loop.

enk100 avatar enk100 commented on July 17, 2024

https://github.com/facebookresearch/loop/blob/c866e8df9b7afdc58460bcae060a3bc0e11a8987/generate.py#L86

modify this line or add new argument to the function

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Thank you so much @enk100

from loop.

enk100 avatar enk100 commented on July 17, 2024

you welcome!

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

by the way @enk100 how do I know which speaker ID is my new speaker?

from loop.

enk100 avatar enk100 commented on July 17, 2024

print self.speakers
in https://github.com/facebookresearch/loop/blob/c866e8df9b7afdc58460bcae060a3bc0e11a8987/data.py#L94

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Hi @enk100 you're awesome 😄 thanks.

one last thing. so when I generate the voice which checkpoint will I use?
a. models/vctk/bestmodel.pth
b. checkpoint/(name of expName)/bestmodel.pth

Thank you so much for your help.

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Thank you so much @enk100

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

after I generate the data this is what I get.
output.zip
The generated output does not match the original wav file.
Here's the command when i generate
sudo python generate.py --npz data/vctk/numpy_features_valid/sj_014.npz --spkr 21 --checkpoint checkpoints/vctk_noise_2/bestmodel.pth
same goes for latestmodel.pth

Did I miss something? other speakers is ok but ours.

from loop.

enk100 avatar enk100 commented on July 17, 2024

Are you sure your speaker is 21? i guess it should be 22, as vctk has 22 speakers.
Can you get more data of your speaker?

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

Hi @enk100 , I tried spkr 22 but it said that speaker did not exist. So i printed the list of speakers as per your suggestion above and got this.

image

As you can see the speaker with sj is 21

from loop.

jaxlinksync avatar jaxlinksync commented on July 17, 2024

@enk100 can you please confirm if our datasets are valid? Please pm me at [email protected] so that I can send you a link to our corpus if it's ok with you.

from loop.

melaanya avatar melaanya commented on July 17, 2024

Hi, @jaxlinksync! Can you, please, give me an advice: did you succeed to fine tune an existing vector to your new identity?

from loop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.