Giter VIP home page Giter VIP logo

s2vc's People

Contributors

howard1337 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

s2vc's Issues

Could you provide ppg-extracting code?

Dear author,

In your paper, you mentioned you extracted ppg and SSL features by s3prl toolkit. However, I cannot find in s3prl on how to extract ppg. Could you provide the code or guideline on extracting ppgs? Thanks a lot!

What are vocoder-ckpt-*.pt?

You release the following vocoder checkpoints:

vocoder-ckpt-apc.pt
vocoder-ckpt-cpc.pt
vocoder-ckpt-wav2vec2.pt

What are they?

Are they vocoders fine-tuned on the output of a particular model? I didn't see that described in the paper. Why is this needed, if the S2VC output is a mel? If it's because different models produce different mels, do you use vocoder-ckpt-cpc.pt when target model is cpc? And if so, how did you do the fine-tuning?

Cannot find f2114342ff9e813e18a580fa41418aee9925414e in https://github.com/s3prl/s3prl

Running convert_batch.py throws ValueError: Cannot find f2114342ff9e813e18a580fa41418aee9925414e in https://github.com/s3prl/s3prl that originates from

torch.hub.load("s3prl/s3prl:f2114342ff9e813e18a580fa41418aee9925414e", feature_name, refresh=True).eval().to(device)

File "convert_batch.py", line 61, in main
src_feat_model = FeatureExtractor(src_feat_name, wav2vec_path, device)
File "/deepmind/experiments/howard1337/s2vc/data/feature_extract.py", line 18, in __init__
torch.hub.load("s3prl/s3prl:f2114342ff9e813e18a580fa41418aee9925414e", feature_name, refresh=True).eval().to(device)
File "/storage/usr/conda/envs/s2vc/lib/python3.8/site-packages/torch/hub.py", line 402, in load
repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation)
File "/storage/usr/conda/envs/s2vc/lib/python3.8/site-packages/torch/hub.py", line 190, in _get_cache_or_reload
_validate_not_a_forked_repo(repo_owner, repo_name, branch)
File "/storage/usr/conda/envs/s2vc/lib/python3.8/site-packages/torch/hub.py", line 160, in _validate_not_a_forked_repo
raise ValueError(f'Cannot find {branch} in https://github.com/{repo_owner}/{repo_name}. '
ValueError: Cannot find f2114342ff9e813e18a580fa41418aee9925414e in https://github.com/s3prl/s3prl. If it's a commit from a forked repo, please call hub.load() with forked repo directly.

Any idea on how to solve this?

Is SourceEncoder dead code?

I don't see any code using SourceEncoder.

Is this dead code? Or is it part of the paper's model, with more code to be released?

Trying your pre trained model to convert a wav file to another's voice

Hi,
I am trying your pre trained model to convert a voice to another voice.
The convert_batch file's changed parts are as below ( I changed the paths ...):

def parse_args():
"""Parse command-line arguments."""
parser = ArgumentParser()
parser.add_argument("info_path", type=str)
parser.add_argument("output_dir", type=str, default=".")
parser.add_argument("-c", "/content/S2VC/chckpt",
default="checkpoints/cpc-cpc.pt")
parser.add_argument("-s", "src_feat_name", default="cpc")
parser.add_argument("-r", "ref_feat_name", default="cpc")
parser.add_argument("-w", "/content/S2VC/wav2vec_small.pt",
default="checkpoints/wav2vec_small.pt")
parser.add_argument("-v", "/content/S2VC/wav2vec_small.pt",
default="checkpoints/vocoder.pt")

parser.add_argument("--sample_rate", type=int, default=16000)

return vars(parser.parse_args())

the error is below too:
File "", line 1
python /content/S2VC/convert_batch.py
^
SyntaxError: incomplete input

What should I do to fix it?

Training of other features (apc, timit_posteriorgram etc.) do not work

I have tried training with other than the cpc feature on my prepared corpus.
However, the training script fails when the loss function (train.py , line 69).
I found that the size of the output vector out is hard-coded, which is inconsistent with the size of the target Mel spectrogram of other features.

The size of some vectors of the model are:

  • apc case: Input dim: 512, Reference dim: 512, Target dim: 240
  • cpc case: Input dim: 256, Reference dim: 256, Target dim: 80

I prepared the input feature vectors by using preprocess.py, e.g. python .\preprocess.py (my own corpus) apc .\checkpoints\wav2vec_small.pt processed/apc.

I have modified the model by changing the size of the vectors and can run train.py now.
In the model.py, __init__() of S2VC function, I replace 80 with a function argument and pass the size of Mel vector size.
But I cannot determine the modification is appropriate, for I am not familiar with NLP.

convert_batch.py with pre-trained models works well as you described in README.md.

Other details of my situation are:

  • Windows 10, PowerShell
  • pytorch 1.7.1 + cu110
  • torchaudio 0.7.1
  • sox 1.4.1
  • tqdm 4.42.0
  • librosa 0.8.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.