i want to run the sid pretrain model, but i got an error like this: <code class="n

It seem USER_DIR was not given. We would set <cod

The error of import seems as <a href="https://github.com/facebookresearch/fairseq/issu

The error of import seems as <a href="https://github.com/facebookresearch

so we can get the speaker embedding from this line: <div class="Box Box--conde

how to fine tune sid on pretrained model？ about speecht5 HOT 11 CLOSED

microsoft commented on May 17, 2024

how to fine tune sid on pretrained model？

from speecht5.

Comments (11)

mechanicalsea commented on May 17, 2024

It seem USER_DIR was not given.
We would set USER_DIR to speecht5 code directory so that the fairseq adds 'speecht5' task into its task list.

from speecht5.

haha010508 commented on May 17, 2024

This is my code:
`CHECKPOINT_PATH=/project/SpeechT5/SpeechT5/pretrained_models/speecht5_sid.pt
DATA_ROOT=/project/SpeechT5/SpeechT5/manifest
SUBSET=test
USER_DIR=/project/SpeechT5/SpeechT5/speecht5
RESULTS_PATH=/project/SpeechT5/SpeechT5/experimental/s2c/results

mkdir -p ${RESULTS_PATH}

python scripts/generate_class.py ${DATA_ROOT}
--gen-subset ${SUBSET}
--user-dir ${USER_DIR}
--log-format json
--task speecht5
--t5-task s2c
--path ${CHECKPOINT_PATH}
--results-path ${RESULTS_PATH}
--batch-size 1
--max-speech-positions 8000
--sample-rate 16000 | tee -a ${RESULTS_PATH}/generate-class.txt`

and , if i debug the code, i got this error:
python -m ipdb scripts/generate_class.py ...
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)

and if i run the code, i got this error:
soundfile.LibsndfileError: <exception str() failed>
does this mean no wav file? if right, how to specify file path? i have dowanload the vox1 already.

i dont know why i can not debug the code? and why the debug and run the code have different error?
Thanks very much!

from speecht5.

mechanicalsea commented on May 17, 2024

The error of import seems as Multi-GPU training doesn't work when --user-dir specified. Move or link the USER_DIR in the directory of fairseq/examples and use it as USER_DIR. The issue occurs at facebookresearch/fairseq#4875.

from speecht5.

haha010508 commented on May 17, 2024

The error of import seems as Multi-GPU training doesn't work when --user-dir specified. Move or link the USER_DIR in the directory of fairseq/examples and use it as USER_DIR. The issue occurs at facebookresearch/fairseq#4875.

Thanks for your reply，i try it. but got same error

from speecht5.

haha010508 commented on May 17, 2024

i find this is a bug
from fairseq import metrics, search, tokenizer, utils
got this error
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
and the metrics file in fairseq/logging

from speecht5.

mechanicalsea commented on May 17, 2024

i find this is a bug from fairseq import metrics, search, tokenizer, utils got this error ImportError: cannot import name 'metrics' from 'fairseq' (unknown location) and the metrics file in fairseq/logging

It seems an issue caused by the version of torch. The issue occurs when I reimplement the SpeechT5 in a new environment.
Could you provide some details of your computer environment.
By the way, I usually conduct the SpeechT5 using 1.10.x torch.

from speecht5.

haha010508 commented on May 17, 2024

The issue caused by fairseq, you need move the metrics.py and meters.py from fairseq/logging to fairseq folder, and then the error disappeared. my torch version is 2.0.0, but this version not installed by me, it is installed by fairseq or espnet

from speecht5.

haha010508 commented on May 17, 2024

By the way, if use VoxCeleb1-O to evaluate the speaker class performance? how about the EER score? usually, we enroll some speakers in dataset, and in test, we get an embedding and use cos similar to compute the similar with enrolled speakers embedding, and decide if they are same speaker or not, the speaker verify is not use speaker classify method. so compare with ECAPA-TDNN, How about the speaker_sid model performance?

from speecht5.

mechanicalsea commented on May 17, 2024

By the way, if use VoxCeleb1-O to evaluate the speaker class performance? how about the EER score? usually, we enroll some speakers in dataset, and in test, we get an embedding and use cos similar to compute the similar with enrolled speakers embedding, and decide if they are same speaker or not, the speaker verify is not use speaker classify method. so compare with ECAPA-TDNN, How about the speaker_sid model performance?

For SID, the fune-tuned SpeechT5 produce 96.46% accuracy. The paper of ECAPA-TDNN did not report VoxCeleb1 SID results, making it difficult to compare with SpeechT5.
For ASV (report EER score), the fune-tuned SpeechT5 did not conduct this task. If we would like to compare SpeechT5 and ECAPA-TDNN, we first extract speaker embedding from SpeechT5. Generally speaking, we can consider the hidden state before the input of decoder's classifier as speaker embedding, making it available to compare with ECAPA-TDNN. Or we could create a speaker model as Transformer variant (a) to obtain speaker embeddings.

from speecht5.

haha010508 commented on May 17, 2024

so we can get the speaker embedding from this line:

SpeechT5/SpeechT5/speecht5/models/speecht5.py

Line 1183 in 7134e96

decoder_out, embed = self.speaker_decoder_postnet(decoder_output.mean(1))

right? a 768 dim data?

from speecht5.

mechanicalsea commented on May 17, 2024

so we can get the speaker embedding from this line:

SpeechT5/SpeechT5/speecht5/models/speecht5.py

Line 1183 in 7134e96

decoder_out, embed = self.speaker_decoder_postnet(decoder_output.mean(1))

right? a 768 dim data?

yes

from speecht5.

how to fine tune sid on pretrained model？ about speecht5 HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent