The links_clustering's discuss from qedan

Raise ValueError(f"Connected subcluster of {sc_idx} "

Hi @QEDan
I am facing the same issue with streaming data through microphone today. Can you help me, I am from Vietnam. Thank you !
There is my error:
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 1 was not found in cluster list of 0.

What is the time complexity? What is the space complexity?

save model

Hi, for online clustering is it possible to save model according to previous examples that it has seen, or for a new sample, it should see whole data again?
could you please give me an example for online clustering of new sample.
in this example bellow we have loop of for , it means we should cluster whole data all together.

links_cluster = LinksCluster(cluster_similarity_threshold, subcluster_similarity_threshold, pair_similarity_maximum) for vector in data: predicted_cluster = links_cluster.predict(vector)

stream clustering

Hi,
can this method be used for stream clustering, I mean we save whole clustering information and then if by stream a data comes we can cluster that to the whole data clusters that were clustered before?

Error while clustering the streaming audio data at higher hyperparameter values i.e. cluster_similarity_threshold, subcluster_similarity_threshold and pair_similarity_maximum.

I am using Resemblyzer to encode the streaming input audio coming from the microphone and using links clustering to cluster the audio embedding. At low values of hyperparams, I am getting underwhelming results (new cluster not being created even with the change in speaker). When hyper params are set to high values (say (0.7, 0.7, 0.7)) I am getting the following error:

Traceback (most recent call last):
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 156, in
main()
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 130, in main
predicted_cluster = links_cluster.predict(vector)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 96, in predict
self.update_cluster(best_subcluster_cluster_id, best_subcluster_id)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 180, in update_cluster
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 0 was not found in cluster list of 0.

I have the following questions:

How to resolve the aforementioned error.
How to efficiently tune the hyperparameters. (I tried going through the paper but didn't understand much)
Is there a better way to perform this whole operation.

Here is my code:

import re
import sys
import numpy as np
import pyaudio
from six.moves import queue
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from links_clustering.links_cluster import LinksCluster
import wave
CHUNK = 16000
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 16000
RECORD_SECONDS = 20
WAVE_OUTPUT_FILENAME = "voice.wav"
p = pyaudio.PyAudio()
# Audio recording parameters
RATE = 44100
CHUNK = int(RATE/10)  # 100ms
encoder = VoiceEncoder("cpu")
links_cluster = LinksCluster(0.7, 0.7, 0.7) #LinksCluster(0.8, 0.7, 0.85)
class MicrophoneStream(object):
    """Opens a recording stream as a generator yielding the audio chunks."""
    def __init__(self, rate, chunk):
        self._rate = rate
        self._chunk = chunk
        # Create a thread-safe buffer of audio data
        self._buff = queue.Queue()
        self.closed = True
    def __enter__(self):
        self._audio_interface = pyaudio.PyAudio()
        self._audio_stream = self._audio_interface.open(
            format=pyaudio.paInt16,
            # The API currently only supports 1-channel (mono) audio
            # https://goo.gl/z757pE
            channels=2,
            rate=self._rate,
            input=True,
            frames_per_buffer=self._chunk,
            # Run the audio stream asynchronously to fill the buffer object.
            # This is necessary so that the input device's buffer doesn't
            # overflow while the calling thread makes network requests, etc.
            stream_callback=self._fill_buffer,
        )
        self.closed = False
        return self
    def __exit__(self, type, value, traceback):
        self._audio_stream.stop_stream()
        self._audio_stream.close()
        self.closed = True
        # Signal the generator to terminate so that the client's
        # streaming_recognize method will not block the process termination.
        self._buff.put(None)
        self._audio_interface.terminate()
    def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
        """Continuously collect data from the audio stream, into the buffer."""
        self._buff.put(in_data)
        return None, pyaudio.paContinue
    def generator(self):
        while not self.closed:
            # Use a blocking get() to ensure there's at least one chunk of
            # data, and stop iteration if the chunk is None, indicating the
            # end of the audio stream.
            chunk = self._buff.get()
            if chunk is None:
                return
            data = [chunk]
            # Now consume whatever other data's still buffered.
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break
            yield b"".join(data)
def main():
    with MicrophoneStream(RATE, CHUNK) as stream:
        audio_generator = stream.generator()
        print(audio_generator)
        for content in audio_generator:
            write_frame('WAVE_OUTPUT_FILENAME_{}.wav'.format(i), content)
            numpy_array = np.frombuffer(content, dtype=np.int16)
            wav = preprocess_wav(numpy_array)
            _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
            for vector in cont_embeds:
                predicted_cluster = links_cluster.predict(vector)
                print(predicted_cluster)
if __name__ == "__main__":
    main()

qedan / links_clustering Goto Github PK

links_clustering's Issues

Raise ValueError(f"Connected subcluster of {sc_idx} "

test issue

What is the time complexity? What is the space complexity?

save model

stream clustering

Error while clustering the streaming audio data at higher hyperparameter values i.e. cluster_similarity_threshold, subcluster_similarity_threshold and pair_similarity_maximum.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent