Giter VIP home page Giter VIP logo

links_clustering's Issues

Raise ValueError(f"Connected subcluster of {sc_idx} "

Hi @QEDan
I am facing the same issue with streaming data through microphone today. Can you help me, I am from Vietnam. Thank you !
There is my error:
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 1 was not found in cluster list of 0.

save model

Hi, for online clustering is it possible to save model according to previous examples that it has seen, or for a new sample, it should see whole data again?
could you please give me an example for online clustering of new sample.
in this example bellow we have loop of for , it means we should cluster whole data all together.

links_cluster = LinksCluster(cluster_similarity_threshold, subcluster_similarity_threshold, pair_similarity_maximum) for vector in data: predicted_cluster = links_cluster.predict(vector)

stream clustering

Hi,
can this method be used for stream clustering, I mean we save whole clustering information and then if by stream a data comes we can cluster that to the whole data clusters that were clustered before?

Error while clustering the streaming audio data at higher hyperparameter values i.e. cluster_similarity_threshold, subcluster_similarity_threshold and pair_similarity_maximum.

I am using Resemblyzer to encode the streaming input audio coming from the microphone and using links clustering to cluster the audio embedding. At low values of hyperparams, I am getting underwhelming results (new cluster not being created even with the change in speaker). When hyper params are set to high values (say (0.7, 0.7, 0.7)) I am getting the following error:

Traceback (most recent call last):
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 156, in
main()
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 130, in main
predicted_cluster = links_cluster.predict(vector)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 96, in predict
self.update_cluster(best_subcluster_cluster_id, best_subcluster_id)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 180, in update_cluster
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 0 was not found in cluster list of 0.

I have the following questions:

  1. How to resolve the aforementioned error.
  2. How to efficiently tune the hyperparameters. (I tried going through the paper but didn't understand much)
  3. Is there a better way to perform this whole operation.

Here is my code:

import re
import sys
import numpy as np
import pyaudio
from six.moves import queue
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from links_clustering.links_cluster import LinksCluster
import wave
CHUNK = 16000
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 16000
RECORD_SECONDS = 20
WAVE_OUTPUT_FILENAME = "voice.wav"
p = pyaudio.PyAudio()
# Audio recording parameters
RATE = 44100
CHUNK = int(RATE/10)  # 100ms
encoder = VoiceEncoder("cpu")
links_cluster = LinksCluster(0.7, 0.7, 0.7) #LinksCluster(0.8, 0.7, 0.85)
class MicrophoneStream(object):
    """Opens a recording stream as a generator yielding the audio chunks."""
    def __init__(self, rate, chunk):
        self._rate = rate
        self._chunk = chunk
        # Create a thread-safe buffer of audio data
        self._buff = queue.Queue()
        self.closed = True
    def __enter__(self):
        self._audio_interface = pyaudio.PyAudio()
        self._audio_stream = self._audio_interface.open(
            format=pyaudio.paInt16,
            # The API currently only supports 1-channel (mono) audio
            # https://goo.gl/z757pE
            channels=2,
            rate=self._rate,
            input=True,
            frames_per_buffer=self._chunk,
            # Run the audio stream asynchronously to fill the buffer object.
            # This is necessary so that the input device's buffer doesn't
            # overflow while the calling thread makes network requests, etc.
            stream_callback=self._fill_buffer,
        )
        self.closed = False
        return self
    def __exit__(self, type, value, traceback):
        self._audio_stream.stop_stream()
        self._audio_stream.close()
        self.closed = True
        # Signal the generator to terminate so that the client's
        # streaming_recognize method will not block the process termination.
        self._buff.put(None)
        self._audio_interface.terminate()
    def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
        """Continuously collect data from the audio stream, into the buffer."""
        self._buff.put(in_data)
        return None, pyaudio.paContinue
    def generator(self):
        while not self.closed:
            # Use a blocking get() to ensure there's at least one chunk of
            # data, and stop iteration if the chunk is None, indicating the
            # end of the audio stream.
            chunk = self._buff.get()
            if chunk is None:
                return
            data = [chunk]
            # Now consume whatever other data's still buffered.
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break
            yield b"".join(data)
def main():
    with MicrophoneStream(RATE, CHUNK) as stream:
        audio_generator = stream.generator()
        print(audio_generator)
        for content in audio_generator:
            write_frame('WAVE_OUTPUT_FILENAME_{}.wav'.format(i), content)
            numpy_array = np.frombuffer(content, dtype=np.int16)
            wav = preprocess_wav(numpy_array)
            _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
            for vector in cont_embeds:
                predicted_cluster = links_cluster.predict(vector)
                print(predicted_cluster)
if __name__ == "__main__":
    main()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.