Giter VIP home page Giter VIP logo

Comments (9)

nguyenthienhy avatar nguyenthienhy commented on August 11, 2024 1

Hi @nguyenthienhy , I think the line is blurred at the end of merge_subclusters() because the function does not make use of self.clusters after that assignment. This is usually helpful for IDEs to highlight in case you forgot to make use of a variable after assignment. But, in this case it's okay because the assigned data is used elsewhere in the class.

I just pushed an update that I hope solves the problem. It was possible for a 'ghost' subcluster to remain in the connected subclusters list after it was merged into a different subcluster. Please try it again and let me know if you are still having problems.

It is very difficult for me to reproduce problems based on streaming microphone data since I can't reproduce the input data. If you continue to have problems and are able to record some input data that triggers the issue, that would be helpful.

Thank you very much, seem problem is gone !!!

from links_clustering.

QEDan avatar QEDan commented on August 11, 2024

Hi gaushh,

Thank you for opening an issue. I just pushed an update to the master branch that I hope fixes the bug that you ran into. Please try it again and let me know how it goes. If you continue to have problems, it is helpful to have a script that only depends on static data. In this case, it depends on data streaming from a microphone, which I can't reproduce exactly.

For your questions:

  1. I think this was caused by one of two bugs that I just fixed. Please pull from the master branch and try again.
  2. Tuning hyperparameters is difficult in general. One way is to examine your data, understand what each hyperparameter means, and make careful choices based on theory. But, another method that might work is choosing randomly and measuring the outcomes to find the best configuration. The former is generally better, but don't be ashamed to try the latter if it gets you where you need to be.
  3. I think you are doing online clustering of speaker embeddings. This is related to a task called 'diarization', i.e. 'who spoke when'. Are you familiar with the pyannote library? It has some easy-to-use building blocks for diarization models. Some of their tools might be helpful.
    https://github.com/pyannote/pyannote-audio

from links_clustering.

gaushh avatar gaushh commented on August 11, 2024

Thanks for the prompt response @QEDan

  1. The same issue still persists. (i.e. I'm still getting the same error)
  2. Will try understanding the algo again by going through the paper
  3. To my knowledge, pyannote doesn't provide the functionality of online speaker diarization (since, they use t-sne for clustering) which Is why I planned on using links clustering

from links_clustering.

Suma3 avatar Suma3 commented on August 11, 2024

Hi @QEDan
I am facing the same issue while trying to run link clustring code shared by you. Ofcourse i tried to understand code while refering paper. There are few doubts i m having -

  1. in line 175 you have written code to raise error but according to paper(whatever i understood) this shouldn't be case i mean every cluster will be having all it's subcluster within itself.may be i m not able to understand actual logic so it will be great if you can explain
    if connected_sc_idx is None:
    raise ValueError(f"Connected subcluster of {sc_idx} "
    f"was not found in cluster list of {cl_idx}.")
    it will be great help if you can help out to resolve this issue.
    Thanks !!

from links_clustering.

QEDan avatar QEDan commented on August 11, 2024

I've pushed another bug fix. In this case, during update_cluster(), two subclusters could be merged with the deleted subcluster being treated as a severed subcluster in the following logic. This allowed edges to exist that didn't make sense. I hope this was the cause of the problems. If not, I find this problem difficult to replicate reliably, so it would be helpful to have a test case that doesn't depend on streaming data.

@gaushh Online speaker diarization doesn't have too many tools available, unfortunately. pyannote only helps with the offline case. One thing that you might find helpful is this thesis on a new task called Low-Latency Speaker Spotting, identifying a target speaker in the lowest possible time: https://www.researchgate.net/publication/338935292_Efficient_speaker_diarization_and_low-latency_speaker_spotting.

The lack of online clustering algorithms is exactly what motivated my to code up this algorithm, and probably why the authors developed it in the first place.

@Suma3 The exception that you mention should never get raised when the algorithm is working correctly. It implies an edge between one subcluster and another subcluster that isn't in the same cluster as the first. So, any time it is raised, it means there is a bug. In the bug I just fixed, there was a way for that to happen because of improperly deleting a merged subcluster.

from links_clustering.

QEDan avatar QEDan commented on August 11, 2024

I will close this for now. Hopefully the fixes have worked for both of you. Please comment if there are still problems.

from links_clustering.

nguyenthienhy avatar nguyenthienhy commented on August 11, 2024

Hi @QEDan
I am facing the same issue with streaming data through microphone today. Can you help me, I am from Vietnam. Thank you !
There is my error:
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 1 was not found in cluster list of 0.

from links_clustering.

nguyenthienhy avatar nguyenthienhy commented on August 11, 2024

I found some unresonable in this block code (see image on link)
Screenshot 2021-10-08 110644
after method:
self.update_cluster(cl_idx, sc_idx1)
the visual code hint that the code :
self.clusters[cl_idx] = self.clusters[cl_idx][:sc_idx2]
+ self.clusters[cl_idx][sc_idx2 + 1:]
is blurred => thΓ‘t means it will not be running.

Can you explain it ?

from links_clustering.

QEDan avatar QEDan commented on August 11, 2024

Hi @nguyenthienhy ,
I think the line is blurred at the end of merge_subclusters() because the function does not make use of self.clusters after that assignment. This is usually helpful for IDEs to highlight in case you forgot to make use of a variable after assignment. But, in this case it's okay because the assigned data is used elsewhere in the class.

I just pushed an update that I hope solves the problem. It was possible for a 'ghost' subcluster to remain in the connected subclusters list after it was merged into a different subcluster. Please try it again and let me know if you are still having problems.

It is very difficult for me to reproduce problems based on streaming microphone data since I can't reproduce the input data. If you continue to have problems and are able to record some input data that triggers the issue, that would be helpful.

from links_clustering.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.