howard1337 / s2vc Goto Github PK

View Code? Open in Web Editor NEW

96.0 96.0 17.0 16.19 MB

Python 100.00%

s2vc's People

Contributors

Stargazers

Watchers

Forkers

s2vc's Issues

Could you provide ppg-extracting code?

Dear author,

In your paper, you mentioned you extracted ppg and SSL features by s3prl toolkit. However, I cannot find in s3prl on how to extract ppg. Could you provide the code or guideline on extracting ppgs? Thanks a lot!

What are vocoder-ckpt-*.pt?

You release the following vocoder checkpoints:

vocoder-ckpt-apc.pt
vocoder-ckpt-cpc.pt
vocoder-ckpt-wav2vec2.pt

What are they?

Are they vocoders fine-tuned on the output of a particular model? I didn't see that described in the paper. Why is this needed, if the S2VC output is a mel? If it's because different models produce different mels, do you use vocoder-ckpt-cpc.pt when target model is cpc? And if so, how did you do the fine-tuning?

Cannot find f2114342ff9e813e18a580fa41418aee9925414e in https://github.com/s3prl/s3prl

Running convert_batch.py throws ValueError: Cannot find f2114342ff9e813e18a580fa41418aee9925414e in https://github.com/s3prl/s3prl that originates from

S2VC/data/feature_extract.py

Line 18 in 8a6dceb

 torch.hub.load("s3prl/s3prl:f2114342ff9e813e18a580fa41418aee9925414e", feature_name, refresh=True).eval().to(device) 

File "convert_batch.py", line 61, in main
src_feat_model = FeatureExtractor(src_feat_name, wav2vec_path, device)
File "/deepmind/experiments/howard1337/s2vc/data/feature_extract.py", line 18, in __init__
torch.hub.load("s3prl/s3prl:f2114342ff9e813e18a580fa41418aee9925414e", feature_name, refresh=True).eval().to(device)
File "/storage/usr/conda/envs/s2vc/lib/python3.8/site-packages/torch/hub.py", line 402, in load
repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation)
File "/storage/usr/conda/envs/s2vc/lib/python3.8/site-packages/torch/hub.py", line 190, in _get_cache_or_reload
_validate_not_a_forked_repo(repo_owner, repo_name, branch)
File "/storage/usr/conda/envs/s2vc/lib/python3.8/site-packages/torch/hub.py", line 160, in _validate_not_a_forked_repo
raise ValueError(f'Cannot find {branch} in https://github.com/{repo_owner}/{repo_name}. '
ValueError: Cannot find f2114342ff9e813e18a580fa41418aee9925414e in https://github.com/s3prl/s3prl. If it's a commit from a forked repo, please call hub.load() with forked repo directly.

Any idea on how to solve this?

Can you provide a pre-trained model

Checkpoints for cpc-mel and mel-cpc?

Do you mind providing checkpoints for cpc-mel and mel-cpc, and describing how to use them?

Is SourceEncoder dead code?

I don't see any code using SourceEncoder.

Is this dead code? Or is it part of the paper's model, with more code to be released?

Trying your pre trained model to convert a wav file to another's voice

Hi,
I am trying your pre trained model to convert a voice to another voice.
The convert_batch file's changed parts are as below ( I changed the paths ...):

def parse_args():
"""Parse command-line arguments."""
parser = ArgumentParser()
parser.add_argument("info_path", type=str)
parser.add_argument("output_dir", type=str, default=".")
parser.add_argument("-c", "/content/S2VC/chckpt",
default="checkpoints/cpc-cpc.pt")
parser.add_argument("-s", "src_feat_name", default="cpc")
parser.add_argument("-r", "ref_feat_name", default="cpc")
parser.add_argument("-w", "/content/S2VC/wav2vec_small.pt",
default="checkpoints/wav2vec_small.pt")
parser.add_argument("-v", "/content/S2VC/wav2vec_small.pt",
default="checkpoints/vocoder.pt")

parser.add_argument("--sample_rate", type=int, default=16000)

return vars(parser.parse_args())

the error is below too:
File "", line 1
python /content/S2VC/convert_batch.py
^
SyntaxError: incomplete input

What should I do to fix it?

Training of other features (apc, timit_posteriorgram etc.) do not work

I have tried training with other than the cpc feature on my prepared corpus.
However, the training script fails when the loss function (train.py , line 69).
I found that the size of the output vector out is hard-coded, which is inconsistent with the size of the target Mel spectrogram of other features.

The size of some vectors of the model are:

apc case: Input dim: 512, Reference dim: 512, Target dim: 240
cpc case: Input dim: 256, Reference dim: 256, Target dim: 80

I prepared the input feature vectors by using preprocess.py, e.g. python .\preprocess.py (my own corpus) apc .\checkpoints\wav2vec_small.pt processed/apc.

I have modified the model by changing the size of the vectors and can run train.py now.
In the model.py, __init__() of S2VC function, I replace 80 with a function argument and pass the size of Mel vector size.
But I cannot determine the modification is appropriate, for I am not familiar with NLP.

convert_batch.py with pre-trained models works well as you described in README.md.

Other details of my situation are:

Windows 10, PowerShell
pytorch 1.7.1 + cu110
torchaudio 0.7.1
sox 1.4.1
tqdm 4.42.0
librosa 0.8.1

howard1337 / s2vc Goto Github PK

s2vc's People

Contributors

Stargazers

Watchers

Forkers

s2vc's Issues

Could you provide ppg-extracting code?

What are vocoder-ckpt-*.pt?

Cannot find f2114342ff9e813e18a580fa41418aee9925414e in https://github.com/s3prl/s3prl

Can you provide a pre-trained model

Checkpoints for cpc-mel and mel-cpc?

Is SourceEncoder dead code?

Trying your pre trained model to convert a wav file to another's voice

Training of other features (apc, timit_posteriorgram etc.) do not work

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent