gwaggli's Introduction

Gwaggli

Gwaggli is a simple voice processing tool. It consists of the following components:

gwaggli-events: An event system that allows to read and write gwaggli events
gwaggli-insights: A web application to show the results of the processing
gwaggli-pipeline: A pipeline to process voice streams in real time and output useful information

Installation & Usage

Preparation

Setup an AWS account with permissions to run Amazon Polly and add the credentials into the following file:

./gwaggli-pipeline/.aws/config.json

{
  "accessKeyId": "<YOUR-ACCESS-KEY-ID>",
  "secretAccessKey": "<YOUR-SECRET-ACCESS-KEY>",
  "region": "eu-central-1"
}

Add the OpenAI API-Key to the following file:

./gwaggli-pipeline/.openai/config.json

{
  "apiKey": "<YOUR-API-KEY>
}

Add the Replicate API-Key to the following file:

./gwaggli-pipeline/.replicate/config.json

{
  "apiKey": "<YOUR-API-KEY>
}

Add the elevenlabs.io API-Key to the following file:

./gwaggli-pipeline/.elevenlabs/config.json

{
  "apiKey": "<YOUR-API-KEY>
}

Running the project

Run the following command to install all dependencies:

npm install

Run the following command to build all relevant components:

npm run build

Run the following command to start the application:

npm run start

If you want to start the application with live-reload, run the following command:

npm run dev

gwaggli's People

Contributors

Stargazers

Watchers

gwaggli's Issues

Add embeddings to be able to process user-specific data

Remove memory leak in audio-buffering.ts

audio-buffering.ts is using an in-memory buffer to store all AudioChunks.

This has been done for simplicity but does not scale well, since it fills up memory and is heavy in processing. The whole buffer gets converted to base64 with every new AudioChunk that gets passed trough the EventSystem

To improve the functionality, implement the following idea:

Remove the audio-buffering.ts and integrate it directly into voice-activation.ts
Implement a ring buffer which has just enough space to fit the observed window-size for voice-activation while it is not active
As soon as voice activation is active, fill a dynamically sized buffer with the active voice data
Pass data along and clear this active buffer as soon as the voice activation detecs end of active voice data