Giter VIP home page Giter VIP logo

links_clustering's People

Contributors

qedan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

links_clustering's Issues

save model

Hi, for online clustering is it possible to save model according to previous examples that it has seen, or for a new sample, it should see whole data again?
could you please give me an example for online clustering of new sample.
in this example bellow we have loop of for , it means we should cluster whole data all together.

links_cluster = LinksCluster(cluster_similarity_threshold, subcluster_similarity_threshold, pair_similarity_maximum) for vector in data: predicted_cluster = links_cluster.predict(vector)

stream clustering

Hi,
can this method be used for stream clustering, I mean we save whole clustering information and then if by stream a data comes we can cluster that to the whole data clusters that were clustered before?

Error while clustering the streaming audio data at higher hyperparameter values i.e. cluster_similarity_threshold, subcluster_similarity_threshold and pair_similarity_maximum.

I am using Resemblyzer to encode the streaming input audio coming from the microphone and using links clustering to cluster the audio embedding. At low values of hyperparams, I am getting underwhelming results (new cluster not being created even with the change in speaker). When hyper params are set to high values (say (0.7, 0.7, 0.7)) I am getting the following error:

Traceback (most recent call last):
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 156, in
main()
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 130, in main
predicted_cluster = links_cluster.predict(vector)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 96, in predict
self.update_cluster(best_subcluster_cluster_id, best_subcluster_id)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 180, in update_cluster
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 0 was not found in cluster list of 0.

I have the following questions:

  1. How to resolve the aforementioned error.
  2. How to efficiently tune the hyperparameters. (I tried going through the paper but didn't understand much)
  3. Is there a better way to perform this whole operation.

Here is my code:

import re
import sys
import numpy as np
import pyaudio
from six.moves import queue
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from links_clustering.links_cluster import LinksCluster
import wave
CHUNK = 16000
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 16000
RECORD_SECONDS = 20
WAVE_OUTPUT_FILENAME = "voice.wav"
p = pyaudio.PyAudio()
# Audio recording parameters
RATE = 44100
CHUNK = int(RATE/10)  # 100ms
encoder = VoiceEncoder("cpu")
links_cluster = LinksCluster(0.7, 0.7, 0.7) #LinksCluster(0.8, 0.7, 0.85)
class MicrophoneStream(object):
    """Opens a recording stream as a generator yielding the audio chunks."""
    def __init__(self, rate, chunk):
        self._rate = rate
        self._chunk = chunk
        # Create a thread-safe buffer of audio data
        self._buff = queue.Queue()
        self.closed = True
    def __enter__(self):
        self._audio_interface = pyaudio.PyAudio()
        self._audio_stream = self._audio_interface.open(
            format=pyaudio.paInt16,
            # The API currently only supports 1-channel (mono) audio
            # https://goo.gl/z757pE
            channels=2,
            rate=self._rate,
            input=True,
            frames_per_buffer=self._chunk,
            # Run the audio stream asynchronously to fill the buffer object.
            # This is necessary so that the input device's buffer doesn't
            # overflow while the calling thread makes network requests, etc.
            stream_callback=self._fill_buffer,
        )
        self.closed = False
        return self
    def __exit__(self, type, value, traceback):
        self._audio_stream.stop_stream()
        self._audio_stream.close()
        self.closed = True
        # Signal the generator to terminate so that the client's
        # streaming_recognize method will not block the process termination.
        self._buff.put(None)
        self._audio_interface.terminate()
    def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
        """Continuously collect data from the audio stream, into the buffer."""
        self._buff.put(in_data)
        return None, pyaudio.paContinue
    def generator(self):
        while not self.closed:
            # Use a blocking get() to ensure there's at least one chunk of
            # data, and stop iteration if the chunk is None, indicating the
            # end of the audio stream.
            chunk = self._buff.get()
            if chunk is None:
                return
            data = [chunk]
            # Now consume whatever other data's still buffered.
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break
            yield b"".join(data)
def main():
    with MicrophoneStream(RATE, CHUNK) as stream:
        audio_generator = stream.generator()
        print(audio_generator)
        for content in audio_generator:
            write_frame('WAVE_OUTPUT_FILENAME_{}.wav'.format(i), content)
            numpy_array = np.frombuffer(content, dtype=np.int16)
            wav = preprocess_wav(numpy_array)
            _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
            for vector in cont_embeds:
                predicted_cluster = links_cluster.predict(vector)
                print(predicted_cluster)
if __name__ == "__main__":
    main()

Raise ValueError(f"Connected subcluster of {sc_idx} "

Hi @QEDan
I am facing the same issue with streaming data through microphone today. Can you help me, I am from Vietnam. Thank you !
There is my error:
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 1 was not found in cluster list of 0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.