Comments (6)
Hi Willy,
I think the task you are mentioning would be open-set speaker identification.
Assuming you're using a neural network to do it, there are two approaches.
- Train a neural net with N speakers. In the test phase, you feed the input to the same network and select the highest post-probability. This would be a closed-set speaker identification because you cannot classify newly emerging speakers.
- Use a pre-trained speaker embedding extractor to enroll N speakers. Then for each input, compare the input utterance's speaker embedding with N speakers' embeddings to select the speaker with the highest similarity. This would be the open-set speaker identification. I believe this coincides with your description.
On the other hand, in speaker verification, you normally have a claimed (or target) speaker and an input utterance. You compare the two and make a binary decision, about whether they are the same or not.
from rawnet.
Thanks JungJee!
You are right, my task is for open-set, speaker verification, which is the case in your description in item-2.
More question:
When I train the speaker embedding, I used X speakers.
In verification case, we have N speaker (including target speaker T), each has M utterances. Then I should have NM embeddings? The target speaker T's new input utterance (not included in speaker T's enroll utterance-set), and calculate this input utterance's embedding also. Then compare with the NM embeddings, and make binary decision.
- I think X >>> N. For a given N, say N=10, what value of X should be suitable for acceptable performance? Will size of X influence model size much?
- None of the N enroll speaker is included in X. If want to get better performance, should the N speaker's utterances be used in training the model, like fine-tuning?
- Can M=1? I think bigger will give better accuracy.
Could you please share your opinion on these question?
Thanks,
Willy
from rawnet.
Maybe I'm the one who's confused. In my understanding, if what you want is not an "open-set speaker identification" but "speaker verification" you can simply compare the speaker embedding of the enrollment and the test utterance. If there exist multiple enrollment utterances (from your explanation, I think it is "M"), you can average their speaker embeddings to derive one speaker embedding representing one speaker.
from rawnet.
Hi Jungjee,
Thanks for your reply. In my understanding, "open-set" means the models can be used to recognize any speaker, and the speaker will sure not be included in the training set of the models.
I am doing speaker verification. In the system, there will be N speakers in total. For speaker n, there are M utterances used to enroll, and I got M embeddings for speaker n. Can i average the M embeddings into one? For example, the embeddings are K dimensions, so just get an average the M vectors and save one vector?
Can I save the M embeddings and use them all? when test a utterance, for each embeddings, I got a score with an embedding, and then i average the score, and use this score as the final one to judge if accept or reject the test utterance. The problem is, calculating M embeddings will be much slower compared with calculating only one embedding.
Please let me know your opinion.
Thanks,
Willy
from rawnet.
Related Issues (20)
- About the pretrained model HOT 2
- script generate embedding HOT 1
- The generalization abality HOT 1
- How make embedding for single wave file? HOT 3
- The speaker embedding for the VoxCeleb1 was deleted HOT 1
- can you share code to load the pretrained model? HOT 1
- What is the requirement in terms of hardware? HOT 1
- Unable to train RawNet1 using Keras HOT 1
- trained model for RawNet2_modified and RawNet2 HOT 4
- Weights of RawNet2_modified trained on VoxCeleb2 HOT 3
- Misbehaving losses while training RawNet1 HOT 2
- Too long IO time HOT 2
- The link for Pre-trained weight parameters for Rawnet 3 is not available. HOT 2
- Overfitting on VoxCeleb HOT 2
- Error in PreEmphasis Class HOT 2
- how to evaluate your implementation with a different dataset HOT 1
- Can I feed the 22050 sr wav to the pre-trained rawnet3 model ? HOT 3
- how to create the test_list for a new test dataset HOT 3
- centre loss HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rawnet.