Giter VIP home page Giter VIP logo

Comments (11)

mechanicalsea avatar mechanicalsea commented on May 17, 2024

It seem USER_DIR was not given.
We would set USER_DIR to speecht5 code directory so that the fairseq adds 'speecht5' task into its task list.

from speecht5.

haha010508 avatar haha010508 commented on May 17, 2024

This is my code:
`CHECKPOINT_PATH=/project/SpeechT5/SpeechT5/pretrained_models/speecht5_sid.pt
DATA_ROOT=/project/SpeechT5/SpeechT5/manifest
SUBSET=test
USER_DIR=/project/SpeechT5/SpeechT5/speecht5
RESULTS_PATH=/project/SpeechT5/SpeechT5/experimental/s2c/results

mkdir -p ${RESULTS_PATH}

python scripts/generate_class.py ${DATA_ROOT}
--gen-subset ${SUBSET}
--user-dir ${USER_DIR}
--log-format json
--task speecht5
--t5-task s2c
--path ${CHECKPOINT_PATH}
--results-path ${RESULTS_PATH}
--batch-size 1
--max-speech-positions 8000
--sample-rate 16000 | tee -a ${RESULTS_PATH}/generate-class.txt`

and , if i debug the code, i got this error:
python -m ipdb scripts/generate_class.py ...
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)

and if i run the code, i got this error:
soundfile.LibsndfileError: <exception str() failed>
does this mean no wav file? if right, how to specify file path? i have dowanload the vox1 already.

i dont know why i can not debug the code? and why the debug and run the code have different error?
Thanks very much!

from speecht5.

mechanicalsea avatar mechanicalsea commented on May 17, 2024

The error of import seems as Multi-GPU training doesn't work when --user-dir specified. Move or link the USER_DIR in the directory of fairseq/examples and use it as USER_DIR. The issue occurs at facebookresearch/fairseq#4875.

from speecht5.

haha010508 avatar haha010508 commented on May 17, 2024

The error of import seems as Multi-GPU training doesn't work when --user-dir specified. Move or link the USER_DIR in the directory of fairseq/examples and use it as USER_DIR. The issue occurs at facebookresearch/fairseq#4875.

Thanks for your reply,i try it. but got same error

from speecht5.

haha010508 avatar haha010508 commented on May 17, 2024

i find this is a bug
from fairseq import metrics, search, tokenizer, utils
got this error
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
and the metrics file in fairseq/logging

from speecht5.

mechanicalsea avatar mechanicalsea commented on May 17, 2024

i find this is a bug from fairseq import metrics, search, tokenizer, utils got this error ImportError: cannot import name 'metrics' from 'fairseq' (unknown location) and the metrics file in fairseq/logging

It seems an issue caused by the version of torch. The issue occurs when I reimplement the SpeechT5 in a new environment.
Could you provide some details of your computer environment.
By the way, I usually conduct the SpeechT5 using 1.10.x torch.

from speecht5.

haha010508 avatar haha010508 commented on May 17, 2024

The issue caused by fairseq, you need move the metrics.py and meters.py from fairseq/logging to fairseq folder, and then the error disappeared. my torch version is 2.0.0, but this version not installed by me, it is installed by fairseq or espnet

from speecht5.

haha010508 avatar haha010508 commented on May 17, 2024

By the way, if use VoxCeleb1-O to evaluate the speaker class performance? how about the EER score? usually, we enroll some speakers in dataset, and in test, we get an embedding and use cos similar to compute the similar with enrolled speakers embedding, and decide if they are same speaker or not, the speaker verify is not use speaker classify method. so compare with ECAPA-TDNN, How about the speaker_sid model performance?

from speecht5.

mechanicalsea avatar mechanicalsea commented on May 17, 2024

By the way, if use VoxCeleb1-O to evaluate the speaker class performance? how about the EER score? usually, we enroll some speakers in dataset, and in test, we get an embedding and use cos similar to compute the similar with enrolled speakers embedding, and decide if they are same speaker or not, the speaker verify is not use speaker classify method. so compare with ECAPA-TDNN, How about the speaker_sid model performance?

For SID, the fune-tuned SpeechT5 produce 96.46% accuracy. The paper of ECAPA-TDNN did not report VoxCeleb1 SID results, making it difficult to compare with SpeechT5.
For ASV (report EER score), the fune-tuned SpeechT5 did not conduct this task. If we would like to compare SpeechT5 and ECAPA-TDNN, we first extract speaker embedding from SpeechT5. Generally speaking, we can consider the hidden state before the input of decoder's classifier as speaker embedding, making it available to compare with ECAPA-TDNN. Or we could create a speaker model as Transformer variant (a) to obtain speaker embeddings.

from speecht5.

haha010508 avatar haha010508 commented on May 17, 2024

so we can get the speaker embedding from this line:

decoder_out, embed = self.speaker_decoder_postnet(decoder_output.mean(1))

right? a 768 dim data?

from speecht5.

mechanicalsea avatar mechanicalsea commented on May 17, 2024

so we can get the speaker embedding from this line:

decoder_out, embed = self.speaker_decoder_postnet(decoder_output.mean(1))

right? a 768 dim data?

yes

from speecht5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.