qedan / links_clustering Goto Github PK
View Code? Open in Web Editor NEWImplementation of the Links Online Clustering algorithm: https://arxiv.org/abs/1801.10123
License: GNU General Public License v3.0
Implementation of the Links Online Clustering algorithm: https://arxiv.org/abs/1801.10123
License: GNU General Public License v3.0
Hi, for online clustering is it possible to save model according to previous examples that it has seen, or for a new sample, it should see whole data again?
could you please give me an example for online clustering of new sample.
in this example bellow we have loop of for , it means we should cluster whole data all together.
links_cluster = LinksCluster(cluster_similarity_threshold, subcluster_similarity_threshold, pair_similarity_maximum) for vector in data: predicted_cluster = links_cluster.predict(vector)
Hi,
can this method be used for stream clustering, I mean we save whole clustering information and then if by stream a data comes we can cluster that to the whole data clusters that were clustered before?
I am using Resemblyzer to encode the streaming input audio coming from the microphone and using links clustering to cluster the audio embedding. At low values of hyperparams, I am getting underwhelming results (new cluster not being created even with the change in speaker). When hyper params are set to high values (say (0.7, 0.7, 0.7)) I am getting the following error:
Traceback (most recent call last):
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 156, in
main()
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\nono2.py", line 130, in main
predicted_cluster = links_cluster.predict(vector)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 96, in predict
self.update_cluster(best_subcluster_cluster_id, best_subcluster_id)
File "C:\Users\e13356\Downloads\audio_analysis_online\audio_analysis_online_2\Resemblyzer\links_clustering\links_cluster.py", line 180, in update_cluster
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 0 was not found in cluster list of 0.
I have the following questions:
Here is my code:
import re
import sys
import numpy as np
import pyaudio
from six.moves import queue
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from links_clustering.links_cluster import LinksCluster
import wave
CHUNK = 16000
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 16000
RECORD_SECONDS = 20
WAVE_OUTPUT_FILENAME = "voice.wav"
p = pyaudio.PyAudio()
# Audio recording parameters
RATE = 44100
CHUNK = int(RATE/10) # 100ms
encoder = VoiceEncoder("cpu")
links_cluster = LinksCluster(0.7, 0.7, 0.7) #LinksCluster(0.8, 0.7, 0.85)
class MicrophoneStream(object):
"""Opens a recording stream as a generator yielding the audio chunks."""
def __init__(self, rate, chunk):
self._rate = rate
self._chunk = chunk
# Create a thread-safe buffer of audio data
self._buff = queue.Queue()
self.closed = True
def __enter__(self):
self._audio_interface = pyaudio.PyAudio()
self._audio_stream = self._audio_interface.open(
format=pyaudio.paInt16,
# The API currently only supports 1-channel (mono) audio
# https://goo.gl/z757pE
channels=2,
rate=self._rate,
input=True,
frames_per_buffer=self._chunk,
# Run the audio stream asynchronously to fill the buffer object.
# This is necessary so that the input device's buffer doesn't
# overflow while the calling thread makes network requests, etc.
stream_callback=self._fill_buffer,
)
self.closed = False
return self
def __exit__(self, type, value, traceback):
self._audio_stream.stop_stream()
self._audio_stream.close()
self.closed = True
# Signal the generator to terminate so that the client's
# streaming_recognize method will not block the process termination.
self._buff.put(None)
self._audio_interface.terminate()
def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
"""Continuously collect data from the audio stream, into the buffer."""
self._buff.put(in_data)
return None, pyaudio.paContinue
def generator(self):
while not self.closed:
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = self._buff.get()
if chunk is None:
return
data = [chunk]
# Now consume whatever other data's still buffered.
while True:
try:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
except queue.Empty:
break
yield b"".join(data)
def main():
with MicrophoneStream(RATE, CHUNK) as stream:
audio_generator = stream.generator()
print(audio_generator)
for content in audio_generator:
write_frame('WAVE_OUTPUT_FILENAME_{}.wav'.format(i), content)
numpy_array = np.frombuffer(content, dtype=np.int16)
wav = preprocess_wav(numpy_array)
_, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
for vector in cont_embeds:
predicted_cluster = links_cluster.predict(vector)
print(predicted_cluster)
if __name__ == "__main__":
main()
Hi @QEDan
I am facing the same issue with streaming data through microphone today. Can you help me, I am from Vietnam. Thank you !
There is my error:
raise ValueError(f"Connected subcluster of {sc_idx} "
ValueError: Connected subcluster of 1 was not found in cluster list of 0.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.