Comments (11)
It seem USER_DIR
was not given.
We would set USER_DIR
to speecht5
code directory so that the fairseq adds 'speecht5' task into its task list.
from speecht5.
This is my code:
`CHECKPOINT_PATH=/project/SpeechT5/SpeechT5/pretrained_models/speecht5_sid.pt
DATA_ROOT=/project/SpeechT5/SpeechT5/manifest
SUBSET=test
USER_DIR=/project/SpeechT5/SpeechT5/speecht5
RESULTS_PATH=/project/SpeechT5/SpeechT5/experimental/s2c/results
mkdir -p ${RESULTS_PATH}
python scripts/generate_class.py ${DATA_ROOT}
--gen-subset ${SUBSET}
--user-dir ${USER_DIR}
--log-format json
--task speecht5
--t5-task s2c
--path ${CHECKPOINT_PATH}
--results-path ${RESULTS_PATH}
--batch-size 1
--max-speech-positions 8000
--sample-rate 16000 | tee -a ${RESULTS_PATH}/generate-class.txt`
and , if i debug the code, i got this error:
python -m ipdb scripts/generate_class.py ...
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
and if i run the code, i got this error:
soundfile.LibsndfileError: <exception str() failed>
does this mean no wav file? if right, how to specify file path? i have dowanload the vox1 already.
i dont know why i can not debug the code? and why the debug and run the code have different error?
Thanks very much!
from speecht5.
The error of import seems as Multi-GPU training doesn't work when --user-dir
specified. Move or link the USER_DIR
in the directory of fairseq/examples
and use it as USER_DIR
. The issue occurs at facebookresearch/fairseq#4875.
from speecht5.
The error of import seems as Multi-GPU training doesn't work when
--user-dir
specified. Move or link theUSER_DIR
in the directory offairseq/examples
and use it asUSER_DIR
. The issue occurs at facebookresearch/fairseq#4875.
Thanks for your reply,i try it. but got same error
from speecht5.
i find this is a bug
from fairseq import metrics, search, tokenizer, utils
got this error
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
and the metrics
file in fairseq/logging
from speecht5.
i find this is a bug
from fairseq import metrics, search, tokenizer, utils
got this errorImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
and themetrics
file infairseq/logging
It seems an issue caused by the version of torch. The issue occurs when I reimplement the SpeechT5 in a new environment.
Could you provide some details of your computer environment.
By the way, I usually conduct the SpeechT5 using 1.10.x torch.
from speecht5.
The issue caused by fairseq, you need move the metrics.py
and meters.py
from fairseq/logging
to fairseq
folder, and then the error disappeared. my torch version is 2.0.0, but this version not installed by me, it is installed by fairseq or espnet
from speecht5.
By the way, if use VoxCeleb1-O to evaluate the speaker class performance? how about the EER score? usually, we enroll some speakers in dataset, and in test, we get an embedding and use cos similar to compute the similar with enrolled speakers embedding, and decide if they are same speaker or not, the speaker verify is not use speaker classify method. so compare with ECAPA-TDNN, How about the speaker_sid model performance?
from speecht5.
By the way, if use VoxCeleb1-O to evaluate the speaker class performance? how about the EER score? usually, we enroll some speakers in dataset, and in test, we get an embedding and use cos similar to compute the similar with enrolled speakers embedding, and decide if they are same speaker or not, the speaker verify is not use speaker classify method. so compare with ECAPA-TDNN, How about the speaker_sid model performance?
For SID, the fune-tuned SpeechT5 produce 96.46% accuracy. The paper of ECAPA-TDNN did not report VoxCeleb1 SID results, making it difficult to compare with SpeechT5.
For ASV (report EER score), the fune-tuned SpeechT5 did not conduct this task. If we would like to compare SpeechT5 and ECAPA-TDNN, we first extract speaker embedding from SpeechT5. Generally speaking, we can consider the hidden state before the input of decoder's classifier as speaker embedding, making it available to compare with ECAPA-TDNN. Or we could create a speaker model as Transformer variant (a) to obtain speaker embeddings.
from speecht5.
so we can get the speaker embedding from this line:
SpeechT5/SpeechT5/speecht5/models/speecht5.py
Line 1183 in 7134e96
right? a 768 dim data?
from speecht5.
so we can get the speaker embedding from this line:
SpeechT5/SpeechT5/speecht5/models/speecht5.py
Line 1183 in 7134e96
right? a 768 dim data?
yes
from speecht5.
Related Issues (20)
- VATLM: Error when loading finetuned checkpoints for infer_s2s
- pretrain loss HOT 4
- Getting TTS output voice close to the training data - Finetuning on different language HOT 2
- Voice Conversion - Error with Some Mono, 16kHz, 16bit Audio HOT 2
- Reproduce ASR experiment results in Hugging Face
- Generate the N-best (top few) hypotheses
- Is end-to-end S2ST possible with Speecht5?
- ASR SpeechT5 training - model predicts same output for different inputs
- SpeechT5 - TTS - Tokenizer adding `▁` token between newly added Vietnamese characters HOT 1
- The size of tensor a (674) must match the size of tensor b (600) at non-singleton dimension 1 HOT 1
- 是否支持中文转语音? HOT 4
- How to setting language when do S2T HOT 1
- Baseline implementation HOT 1
- Text feature extraction using SpeechLM
- British English TTS model HOT 1
- "SpeechT5" on Android OS
- Link to train_960.tsv is broken
- What is the time taken to converge for the hidden unit tokenizer?
- Does the pre-trained model for hidden unit tokenizer use speaker embeddings?
- extract transorformer layer feature HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from speecht5.