Comments (9)
Hi @nguyenthienhy , I think the line is blurred at the end of
merge_subclusters()
because the function does not make use ofself.clusters
after that assignment. This is usually helpful for IDEs to highlight in case you forgot to make use of a variable after assignment. But, in this case it's okay because the assigned data is used elsewhere in the class.I just pushed an update that I hope solves the problem. It was possible for a 'ghost' subcluster to remain in the connected subclusters list after it was merged into a different subcluster. Please try it again and let me know if you are still having problems.
It is very difficult for me to reproduce problems based on streaming microphone data since I can't reproduce the input data. If you continue to have problems and are able to record some input data that triggers the issue, that would be helpful.
Thank you very much, seem problem is gone !!!
from links_clustering.
Hi gaushh,
Thank you for opening an issue. I just pushed an update to the master branch that I hope fixes the bug that you ran into. Please try it again and let me know how it goes. If you continue to have problems, it is helpful to have a script that only depends on static data. In this case, it depends on data streaming from a microphone, which I can't reproduce exactly.
For your questions:
- I think this was caused by one of two bugs that I just fixed. Please pull from the master branch and try again.
- Tuning hyperparameters is difficult in general. One way is to examine your data, understand what each hyperparameter means, and make careful choices based on theory. But, another method that might work is choosing randomly and measuring the outcomes to find the best configuration. The former is generally better, but don't be ashamed to try the latter if it gets you where you need to be.
- I think you are doing online clustering of speaker embeddings. This is related to a task called 'diarization', i.e. 'who spoke when'. Are you familiar with the pyannote library? It has some easy-to-use building blocks for diarization models. Some of their tools might be helpful.
https://github.com/pyannote/pyannote-audio
from links_clustering.
Thanks for the prompt response @QEDan
- The same issue still persists. (i.e. I'm still getting the same error)
- Will try understanding the algo again by going through the paper
- To my knowledge, pyannote doesn't provide the functionality of online speaker diarization (since, they use t-sne for clustering) which Is why I planned on using links clustering
from links_clustering.
Hi @QEDan
I am facing the same issue while trying to run link clustring code shared by you. Ofcourse i tried to understand code while refering paper. There are few doubts i m having -
- in line 175 you have written code to raise error but according to paper(whatever i understood) this shouldn't be case i mean every cluster will be having all it's subcluster within itself.may be i m not able to understand actual logic so it will be great if you can explain
if connected_sc_idx is None:
raise ValueError(f"Connected subcluster of {sc_idx} "
f"was not found in cluster list of {cl_idx}.")
it will be great help if you can help out to resolve this issue.
Thanks !!
from links_clustering.
I've pushed another bug fix. In this case, during update_cluster()
, two subclusters could be merged with the deleted subcluster being treated as a severed subcluster in the following logic. This allowed edges to exist that didn't make sense. I hope this was the cause of the problems. If not, I find this problem difficult to replicate reliably, so it would be helpful to have a test case that doesn't depend on streaming data.
@gaushh Online speaker diarization doesn't have too many tools available, unfortunately. pyannote only helps with the offline case. One thing that you might find helpful is this thesis on a new task called Low-Latency Speaker Spotting, identifying a target speaker in the lowest possible time: https://www.researchgate.net/publication/338935292_Efficient_speaker_diarization_and_low-latency_speaker_spotting.
The lack of online clustering algorithms is exactly what motivated my to code up this algorithm, and probably why the authors developed it in the first place.
@Suma3 The exception that you mention should never get raised when the algorithm is working correctly. It implies an edge between one subcluster and another subcluster that isn't in the same cluster as the first. So, any time it is raised, it means there is a bug. In the bug I just fixed, there was a way for that to happen because of improperly deleting a merged subcluster.
from links_clustering.
I will close this for now. Hopefully the fixes have worked for both of you. Please comment if there are still problems.
from links_clustering.
Hi @QEDan
I am facing the same issue with streaming data through microphone today. Can you help me, I am from Vietnam. Thank you !
There is my error:
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 1 was not found in cluster list of 0.
from links_clustering.
I found some unresonable in this block code (see image on link)
after method:
self.update_cluster(cl_idx, sc_idx1)
the visual code hint that the code :
self.clusters[cl_idx] = self.clusters[cl_idx][:sc_idx2]
+ self.clusters[cl_idx][sc_idx2 + 1:]
is blurred => thΓ‘t means it will not be running.
Can you explain it ?
from links_clustering.
Hi @nguyenthienhy ,
I think the line is blurred at the end of merge_subclusters()
because the function does not make use of self.clusters
after that assignment. This is usually helpful for IDEs to highlight in case you forgot to make use of a variable after assignment. But, in this case it's okay because the assigned data is used elsewhere in the class.
I just pushed an update that I hope solves the problem. It was possible for a 'ghost' subcluster to remain in the connected subclusters list after it was merged into a different subcluster. Please try it again and let me know if you are still having problems.
It is very difficult for me to reproduce problems based on streaming microphone data since I can't reproduce the input data. If you continue to have problems and are able to record some input data that triggers the issue, that would be helpful.
from links_clustering.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from links_clustering.