Giter VIP home page Giter VIP logo

picovoice / leopard Goto Github PK

View Code? Open in Web Editor NEW
407.0 18.0 23.0 427.42 MB

On-device speech-to-text engine powered by deep learning

Home Page: https://picovoice.ai/

License: Apache License 2.0

Python 18.72% C 3.66% Ruby 0.47% Swift 6.70% Java 10.81% JavaScript 9.14% Shell 0.54% Go 9.84% TypeScript 9.78% C# 13.48% Rust 12.02% Dart 4.85%
stt speech-to-text asr automatic-speech-recognition on-device speech-recognition transcription voice-recognition voice-to-text

leopard's Introduction

Picovoice

GitHub release GitHub GitHub language count

PyPI Nuget Go Reference Pub Version npm Maven Central Maven Central npm npm npm npm

Crates.io

Made in Vancouver, Canada by Picovoice

Twitter URL

YouTube Channel Views

Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate. Using Picovoice, one can infer a user’s intent from a naturally spoken utterance such as:

"Hey Edison, set the lights in the living room to blue"

Picovoice detects the occurrence of the custom wake word (Hey Edison), and then extracts the intent from the follow-on spoken command:

{
  "intent": "changeColor",
  "slots": {
    "location": "living room",
    "color": "blue"
  }
}

Why Picovoice

  • Private & Secure: Everything is processed offline. Intrinsically private; HIPAA and GDPR-compliant.
  • Accurate: Resilient to noise and reverberation. Outperforms cloud-based alternatives by wide margins.
  • Cross-Platform: Design once, deploy anywhere. Build using familiar languages and frameworks.
    • Arm Cortex-M, STM32, Arduino, and i.MX RT
    • Raspberry Pi, NVIDIA Jetson Nano, and BeagleBone
    • Android and iOS
    • Chrome, Safari, Firefox, and Edge
    • Linux (x86_64), macOS (x86_64, arm64), and Windows (x86_64)
  • Self-Service: Design, train, and test voice interfaces instantly in your browser, using Picovoice Console.
  • Reliable: Runs locally without needing continuous connectivity.
  • Zero Latency: Edge-first architecture eliminates unpredictable network delay.

Build with Picovoice

  1. Evaluate: The Picovoice SDK is a cross-platform library for adding voice to anything. It includes some pre-trained speech models. The SDK is licensed under Apache 2.0 and available on GitHub to encourage independent benchmarking and integration testing. You are empowered to make a data-driven decision.

  2. Design: Picovoice Console is a cloud-based platform for designing voice interfaces and training speech models, all within your web browser. No machine learning skills are required. Simply describe what you need with text and export trained models.

  3. Develop: Exported models can run on Picovoice SDK without requiring constant connectivity. The SDK runs on a wide range of platforms and supports a large number of frameworks. The Picovoice Console and Picovoice SDK enable you to design, build and iterate fast.

  4. Deploy: Deploy at scale without having to maintain complex cloud infrastructure. Avoid unbounded cloud fees, limitations, and control imposed by big tech.

Picovoice in Action

Platform Features

Custom Wake Words

Picovoice makes use of the Porcupine wake word engine to detect utterances of given wake phrases. You can train custom wake words using Picovoice Console and then run the exported wake word model on the Picovoice SDK.

Intent Inference

Picovoice relies on the Rhino Speech-to-Intent engine to directly infer user's intent from spoken commands within a given domain of interest (a "context"). You can design and train custom contexts for your product using Picovoice Console. The exported Rhino models then can run with the Picovoice SDK on any supported platform.

Table of Contents

Language Support

  • English, German, French, Spanish, Italian, Japanese, Korean, and Portuguese.
  • Support for additional languages is available for commercial customers on a case-by-case basis.

Performance

Picovoice makes use of the Porcupine wake word engine to detect utterances of given wake phrases. An open-source benchmark of Porcupine is available here. In summary, compared to the best-performing alternative, Porcupine's standard model is 5.4 times more accurate.

Picovoice relies on the Rhino Speech-to-Intent engine to directly infer user's intent from spoken commands within a given domain of interest (a "context"). An open-source benchmark of Rhino is available here. Rhino outperforms all major cloud-based alternatives with wide margins.

Picovoice Console

Picovoice Console is a web-based platform for designing, testing, and training voice user interfaces. Using Picovoice Console you can train custom wake word, and domain-specific NLU (Speech-to-Intent) models.

Demos

If using SSH, clone the repository with:

git clone --recurse-submodules [email protected]:Picovoice/picovoice.git

If using HTTPS, clone the repository with:

git clone --recurse-submodules https://github.com/Picovoice/picovoice.git

Python Demos

sudo pip3 install picovoicedemo

From the root of the repository run the following in the terminal:

picovoice_demo_mic \
--access_key ${ACCESS_KEY} \
--keyword_path resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn \
--context_path resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn

Replace ${PLATFORM} with the platform you are running the demo on (e.g. raspberry-pi, beaglebone, linux, mac, or windows). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints [Listening ...] to the console. Then say:

Porcupine, set the lights in the kitchen to purple.

Upon success, the demo prints the following into the terminal:

[wake word]

{
  intent : 'changeColor'
  slots : {
    location : 'kitchen'
    color : 'purple'
  }
}

For more information regarding Python demos refer to their documentation.

NodeJS Demos

Install the demo package:

npm install -g @picovoice/picovoice-node-demo

From the root of the repository run:

pv-mic-demo \
--access_key ${ACCESS_KEY} \
-k resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn \
-c resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn

Replace ${PLATFORM} with the platform you are running the demo on (e.g. raspberry-pi, linux, or mac). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints Listening for wake word 'porcupine' ... to the console. Then say:

Porcupine, turn on the lights.

Upon success, the demo prints the following into the terminal:

Inference:
{
    "isUnderstood": true,
    "intent": "changeLightState",
    "slots": {
        "state": "on"
    }
}

Please see the demo instructions for details.

.NET Demos

From the root of the repository run the following in the terminal:

dotnet run -p demo/dotnet/PicovoiceDemo/PicovoiceDemo.csproj -c MicDemo.Release -- \
--access_key ${ACCESS_KEY} \
--keyword_path resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn \
--context_path resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn

Replace ${PLATFORM} with the platform you are running the demo on (e.g. linux, mac, or windows). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints Listening... to the console. Then say:

Porcupine, set the lights in the kitchen to orange.

Upon success the following it printed into the terminal:

[wake word]
{
  intent : 'changeColor'
  slots : {
    location : 'kitchen'
    color : 'orange'
  }
}

For more information about .NET demos go to demo/dotnet.

Java Demos

Make sure there is a working microphone connected to your device. Then invoke the following commands from the terminal:

cd demo/java
./gradlew build
cd build/libs
java -jar picovoice-mic-demo.jar \
-a ${ACCESS_KEY} \
-k resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn \
-c resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn

Replace ${PLATFORM} with the platform you are running the demo on (e.g. linux, mac, or windows). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints Listening ... to the console. Then say:

Porcupine, set the lights in the kitchen to orange.

Upon success the following it printed into the terminal:

[wake word]
{
  intent : 'changeColor'
  slots : {
    location : 'kitchen'
    color : 'orange'
  }
}

For more information about the Java demos go to demo/java.

Go Demos

The demos require cgo, which means that a gcc compiler like Mingw is required.

From demo/go run the following command from the terminal to build and run the mic demo:

go run micdemo/picovoice_mic_demo.go \
-access_key ${ACCESS_KEY} \
-keyword_path "../../resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn" \
-context_path "../../resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn"

Replace ${PLATFORM} with the platform you are running the demo on (e.g. linux, mac, or windows). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints Listening ... to the console. Then say:

Porcupine, set the lights in the kitchen to orange.

Upon success the following it printed into the terminal:

[wake word]
{
  intent : 'changeColor'
  slots : {
    location : 'kitchen'
    color : 'orange'
  }
}

For more information about the Go demos go to demo/go.

Unity Demos

To run the Picovoice Unity demo, import the latest Picovoice Unity package into your project, open the PicovoiceDemo scene and hit play. To run on other platforms or in the player, go to File > Build Settings, choose your platform and hit the Build and Run button.

To browse the demo source go to demo/unity.

Flutter Demos

To run the Picovoice demo on Android or iOS with Flutter, you must have the Flutter SDK installed on your system. Once installed, you can run flutter doctor to determine any other missing requirements for your relevant platform. Once your environment has been set up, launch a simulator or connect an Android/iOS device.

Run the prepare_demo script from demo/flutter with a language code to set up the demo in the language of your choice (e.g. de -> German, ko -> Korean). To see a list of available languages, run prepare_demo without a language code.

dart scripts/prepare_demo.dart ${LANGUAGE}

Replace your AccessKey in lib/main.dart file:

final String accessKey = "{YOUR_ACCESS_KEY_HERE}"; // AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)

Run the following command from demo/flutter to build and deploy the demo to your device:

flutter run

Once the demo app has started, press the start button and utter a command to start inferring context. To see more details about the current context information, press the Context Info button on the top right corner in the app.

React Native Demos

To run the React Native Picovoice demo app you'll first need to install yarn and set up your React Native environment. For this, please refer to React Native's documentation. Once your environment has been set up, you can run the following commands:

Running On Android

cd demo/react-native
yarn android-install    # sets up environment
yarn android-run        # builds and deploys to Android

Running On iOS

cd demo/react-native
yarn ios-install        # sets up environment
yarn ios-run            # builds and deploys to iOS

Once the application has been deployed, press the start button and say

Porcupine, turn off the lights in the kitchen.

For the full set of supported commands refer to demo's readme.

Android Demos

Using Android Studio, open demo/android/Activity as an Android project and then run the application. Press the start button and say

Porcupine, turn off the lights in the kitchen.

For the full set of supported commands refer to demo's readme.

iOS Demos

The BackgroundService demo runs audio recording in the background while the application is not in focus and remains running in the background. The ForegroundApp demo runs only when the application is in focus.

BackgroundService Demo

To run the demo, go to demo/ios/BackgroundService and run:

pod install

Then, using Xcode, open the generated PicovoiceBackgroundServiceDemo.xcworkspace and paste your AccessKey into the ACCESS_KEY variable in ContentView.swift. Build and run the demo.

ForegroundApp Demo

To run the demo, go to demo/ios/ForegroundApp and run:

pod install

Then, using Xcode, open the generated PicovoiceForegroundAppDemo.xcworkspace and paste your AccessKey into the ACCESS_KEY variable in ContentView.swift. Build and run the demo.

Wake Word Detection and Context Inference

After running the demo, press the start button and try saying the following:

Picovoice, shut of the lights in the living room.

For more details about the iOS demos and full set of supported commands refer to demo's readme.

Web Demos

Vanilla JavaScript and HTML

From demo/web use yarn or npm to install the dependencies, and the start script with a language code to start a local web server hosting the demo in the language of your choice (e.g. pl -> Polish, ko -> Korean). To see a list of available languages, run start without a language code.

yarn
yarn start ${LANGUAGE}

(or)

npm install
npm run start ${LANGUAGE}

Open http://localhost:5000 in your browser to try the demo.

Angular Demos

From demo/angular use yarn or npm to install the dependencies, and the start script with a language code to start a local web server hosting the demo in the language of your choice (e.g. pl -> Polish, ko -> Korean). To see a list of available languages, run start without a language code.

yarn
yarn start ${LANGUAGE}

(or)

npm install
npm run start ${LANGUAGE}

Open http://localhost:4200 in your browser to try the demo.

React Demos

From demo/react use yarn or npm to install the dependencies, and the start script with a language code to start a local web server hosting the demo in the language of your choice (e.g. pl -> Polish, ko -> Korean). To see a list of available languages, run start without a language code.

yarn
yarn start ${LANGUAGE}

(or)

npm install
npm run start ${LANGUAGE}

Open http://localhost:3000 in your browser to try the demo.

Vue Demos

From demo/vue use yarn or npm to install the dependencies, and the start script with a language code to start a local web server hosting the demo in the language of your choice (e.g. pl -> Polish, ko -> Korean). To see a list of available languages, run start without a language code.

yarn
yarn start ${LANGUAGE}

(or)

npm install
npm run start ${LANGUAGE}

The command-line output will provide you with a localhost link and port to open in your browser.

Rust Demos

From demo/rust/micdemo run the following command from the terminal to build and run the mic demo:

cargo run --release -- \
--keyword_path "../../../resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn" \
--context_path "../../../resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn"

Replace ${PLATFORM} with the platform you are running the demo on (e.g. linux, mac, or windows). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints Listening ... to the console. Then say:

Porcupine, set the lights in the kitchen to orange.

Upon success the following it printed into the terminal:

[wake word]
{
  intent : 'changeColor'
  slots : {
    location : 'kitchen'
    color : 'orange'
  }
}

For more information about the Rust demos go to demo/rust.

C Demos

The C demo requires CMake version 3.4 or higher.

The Microphone demo requires miniaudio for accessing microphone audio data.

Windows Requires MinGW to build the demo.

Microphone Demo

At the root of the repository, build with:

cmake -S demo/c/. -B demo/c/build && cmake --build demo/c/build --target picovoice_demo_mic

Linux (x86_64), macOS (x86_64), Raspberry Pi, and BeagleBone

List input audio devices with:

./demo/c/build/picovoice_demo_mic --show_audio_devices

Run the demo using:

./demo/c/build/picovoice_demo_mic \
-a ${ACCESS_KEY}
-l ${PICOVOICE_LIBRARY_PATH} \
-p resources/porcupine/lib/common/porcupine_params.pv \
-k resources/porcupine/resources/keyword_files/${PLATFORM}/picovoice_${PLATFORM}.ppn \
-r resources/rhino/lib/common/rhino_params.pv \
-c resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn \
-i {AUDIO_DEVICE_INDEX}

Replace ${LIBRARY_PATH} with path to appropriate library available under /sdk/c/lib, ${PLATFORM} with the name of the platform you are running on (linux, raspberry-pi, mac, or beaglebone), and ${AUDIO_DEVICE_INDEX} with the index of your audio device.

Windows

List input audio devices with:

.\\demo\\c\\build\\picovoice_demo_mic.exe --show_audio_devices

Run the demo using:

.\\demo\\c\\build\\picovoice_demo_mic.exe -a ${ACCESS_KEY} -l sdk/c/lib/windows/amd64/libpicovoice.dll -p resources/porcupine/lib/common/porcupine_params.pv -k resources/porcupine/resources/keyword_files/windows/picovoice_windows.ppn -r resources/rhino/lib/common/rhino_params.pv -c resources/rhino/resources/contexts/windows/smart_lighting_windows.rhn -i {AUDIO_DEVICE_INDEX}

Replace ${AUDIO_DEVICE_INDEX} with the index of your audio device.

The demo opens an audio stream and waits for the wake word "Picovoice" to be detected. Once it is detected, it infers your intent from spoken commands in the context of a smart lighting system. For example, you can say:

"Turn on the lights in the bedroom."

File Demo

At the root of the repository, build with:

cmake -S demo/c/. -B demo/c/build && cmake --build demo/c/build --target picovoice_demo_file

Linux (x86_64), macOS (x86_64), Raspberry Pi, and BeagleBone

Run the demo using:

./demo/c/build/picovoice_demo_file \
-a ${ACCESS_KEY}
-l ${LIBRARY_PATH} \
-p resources/porcupine/lib/common/porcupine_params.pv \
-k resources/porcupine/resources/keyword_files/${PLATFORM}/picovoice_${PLATFORM}.ppn \
-r resources/rhino/lib/common/rhino_params.pv \
-c resources/rhino/resources/contexts/${PLATFORM}/coffee_maker_${PLATFORM}.rhn \
-w resources/audio_samples/picovoice-coffee.wav

Replace ${LIBRARY_PATH} with path to appropriate library available under sdk/c/lib, ${PLATFORM} with the name of the platform you are running on (linux, raspberry-pi, mac, or beaglebone).

Windows

Run the demo using:

.\\demo\\c\\build\\picovoice_demo_file.exe -a ${ACCESS_KEY} -l sdk/c/lib/windows/amd64/libpicovoice.dll -p resources/porcupine/lib/common/porcupine_params.pv -k resources/porcupine/resources/keyword_files/windows/picovoice_windows.ppn -r resources/rhino/lib/common/rhino_params.pv -c resources/rhino/resources/contexts/windows/coffee_maker_windows.rhn -w resources/audio_samples/picovoice-coffee.wav

The demo opens up the WAV file. It detects the wake word and infers the intent in the context of a coffee maker system.

For more information about C demos go to demo/c.

Microcontroller Demos

There are several projects for various development boards inside the mcu demo folder.

SDKs

Python

Install the package:

pip3 install picovoice

Create a new instance of Picovoice:

from picovoice import Picovoice

access_key = "${ACCESS_KEY}" # AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)

keyword_path = ...

def wake_word_callback():
    pass

context_path = ...

def inference_callback(inference):
    print(inference.is_understood)
    print(inference.intent)
    print(inference.slots)

handle = Picovoice(
        access_key=access_key,
        keyword_path=keyword_path,
        wake_word_callback=wake_word_callback,
        context_path=context_path,
        inference_callback=inference_callback)

handle is an instance of the Picovoice runtime engine. It detects utterances of wake phrase defined in the file located at keyword_path. Upon detection of wake word it starts inferring user's intent from the follow-on voice command within the context defined by the file located at context_path. keyword_path is the absolute path to the Porcupine wake word engine keyword file (with .ppn extension). context_path is the absolute path to the Rhino Speech-to-Intent engine context file (with .rhn extension). wake_word_callback is invoked upon the detection of wake phrase and inference_callback is invoked upon completion of follow-on voice command inference.

When instantiated, the required rate can be obtained via handle.sample_rate. Expected number of audio samples per frame is handle.frame_length. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio. The set of supported commands can be retrieved (in YAML format) via handle.context_info.

def get_next_audio_frame():
    pass

while True:
    handle.process(get_next_audio_frame())

When done, resources have to be released explicitly handle.delete().

NodeJS

The Picovoice SDK for NodeJS is available from NPM:

yarn add @picovoice/picovoice-node

(or)

npm install @picovoice/picovoice-node

The SDK provides the Picovoice class. Create an instance of this class using a Porcupine keyword (with .ppn extension) and Rhino context file (with .rhn extension), as well as callback functions that will be invoked on wake word detection and command inference completion events, respectively:

const Picovoice = require("@picovoice/picovoice-node");

const accessKey = "${ACCESS_KEY}"; // Obtained from the Picovoice Console (https://console.picovoice.ai/)

let keywordCallback = function (keyword) {
  console.log(`Wake word detected`);
};

let inferenceCallback = function (inference) {
  console.log("Inference:");
  console.log(JSON.stringify(inference, null, 4));
};

let handle = new Picovoice(
  accessKey,
  keywordArgument,
  keywordCallback,
  contextPath,
  inferenceCallback
);

The keywordArgument can either be a path to a Porcupine keyword file (.ppn), or one of the built-in keywords (integer enums). The contextPath is the path to the Rhino context file (.rhn).

Upon constructing the Picovoice class, send it frames of audio via its process method. Internally, Picovoice will switch between wake word detection and inference. The Picovoice class includes frameLength and sampleRate properties for the format of audio required.

// process audio frames that match the Picovoice requirements (16-bit linear pcm audio, single-channel)
while (true) {
  handle.process(frame);
}

As the audio is processed through the Picovoice engines, the callbacks will fire.

.NET

You can install the latest version of Picovoice by adding the latest Picovoice NuGet package in Visual Studio or using the .NET CLI.

dotnet add package Picovoice

To create an instance of Picovoice, do the following:

using Pv;

const string accessKey = "${ACCESS_KEY}"; // obtained from Picovoice Console (https://console.picovoice.ai/)

string keywordPath = "/absolute/path/to/keyword.ppn";
void wakeWordCallback() => {..}
string contextPath = "/absolute/path/to/context.rhn";
void inferenceCallback(Inference inference)
{
    // `inference` exposes three immutable properties:
    // (1) `IsUnderstood`
    // (2) `Intent`
    // (3) `Slots`
    // ..
}

Picovoice handle = Picovoice.Create(accessKey,
                                 keywordPath,
                                 wakeWordCallback,
                                 contextPath,
                                 inferenceCallback);

handle is an instance of Picovoice runtime engine that detects utterances of wake phrase defined in the file located at keywordPath. Upon detection of wake word it starts inferring user's intent from the follow-on voice command within the context defined by the file located at contextPath. accessKey is your Picovoice AccessKey. keywordPath is the absolute path to Porcupine wake word engine keyword file (with .ppn extension). contextPath is the absolute path to Rhino Speech-to-Intent engine context file (with .rhn extension). wakeWordCallback is invoked upon the detection of wake phrase and inferenceCallback is invoked upon completion of follow-on voice command inference.

When instantiated, the required sample rate can be obtained via handle.SampleRate. The expected number of audio samples per frame is handle.FrameLength. The Picovoice engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio.

short[] GetNextAudioFrame()
{
    // .. get audioFrame
    return audioFrame;
}

while(true)
{
    handle.Process(GetNextAudioFrame());
}

Picovoice will have its resources freed by the garbage collector, but to have resources freed immediately after use, wrap it in a using statement:

using(Picovoice handle = Picovoice.Create(accessKey, keywordPath, wakeWordCallback, contextPath, inferenceCallback))
{
    // .. Picovoice usage here
}

Java

The Picovoice Java library is available from Maven Central at ai.picovoice:picovoice-java:${version}.

The easiest way to create an instance of the engine is with the Picovoice Builder:

import ai.picovoice.picovoice.*;

String keywordPath = "/absolute/path/to/keyword.ppn";

final String accessKey = "${ACCESS_KEY}"; // AccessKey obtained from [Picovoice Console](https://console.picovoice.ai/)

PicovoiceWakeWordCallback wakeWordCallback = () -> {..};

String contextPath = "/absolute/path/to/context.rhn";

PicovoiceInferenceCallback inferenceCallback = inference -> {
    // `inference` exposes three getters:
    // (1) `getIsUnderstood()`
    // (2) `getIntent()`
    // (3) `getSlots()`
    // ..
};

try {
    Picovoice handle = new Picovoice.Builder()
                    .setAccessKey(accessKey)
                    .setKeywordPath(keywordPath)
                    .setWakeWordCallback(wakeWordCallback)
                    .setContextPath(contextPath)
                    .setInferenceCallback(inferenceCallback)
                    .build();
} catch (PicovoiceException e) { }

handle is an instance of the Picovoice runtime engine that detects utterances of wake phrase defined in the file located at keywordPath. Upon detection of wake word it starts inferring the user's intent from the follow-on voice command within the context defined by the file located at contextPath. keywordPath is the absolute path to Porcupine wake word engine keyword file (with .ppn extension). contextPath is the absolute path to Rhino Speech-to-Intent engine context file (with .rhn extension). wakeWordCallback is invoked upon the detection of wake phrase and inferenceCallback is invoked upon completion of follow-on voice command inference.

When instantiated, the required sample rate can be obtained via handle.getSampleRate(). The expected number of audio samples per frame is handle.getFrameLength(). The Picovoice engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio.

short[] getNextAudioFrame()
{
    // .. get audioFrame
    return audioFrame;
}

while(true)
{
    handle.process(getNextAudioFrame());
}

Once you're done with Picovoice, ensure you release its resources explicitly:

handle.delete();

Go

To install the Picovoice Go module to your project, use the command:

go get github.com/Picovoice/picovoice/sdk/go

To create an instance of the engine with default parameters, use the NewPicovoice function. You must provide a Porcupine keyword file, a wake word detection callback function, a Rhino context file and an inference callback function. You must then make a call to Init().

. "github.com/Picovoice/picovoice/sdk/go/v2"
rhn "github.com/Picovoice/rhino/binding/go/v2"

const accessKey string = "${ACCESS_KEY}" // obtained from Picovoice Console (https://console.picovoice.ai/)

keywordPath := "/path/to/keyword/file.ppn"
wakeWordCallback := func() {
    // let user know wake word detected
}

contextPath := "/path/to/keyword/file.rhn"
inferenceCallback := func(inference rhn.RhinoInference) {
    if inference.IsUnderstood {
            intent := inference.Intent
            slots := inference.Slots
        // add code to take action based on inferred intent and slot values
    } else {
        // add code to handle unsupported commands
    }
}

picovoice := NewPicovoice(
    accessKey,
    keywordPath,
    wakeWordCallback,
    contextPath,
    inferenceCallback)

err := picovoice.Init()
if err != nil {
    // handle error
}

Upon detection of wake word defined by keywordPath it starts inferring user's intent from the follow-on voice command within the context defined by the file located at contextPath. accessKey is your Picovoice AccessKey. keywordPath is the absolute path to Porcupine wake word engine keyword file (with .ppn suffix). contextPath is the absolute path to Rhino Speech-to-Intent engine context file (with .rhn suffix). wakeWordCallback is invoked upon the detection of wake phrase and inferenceCallback is invoked upon completion of follow-on voice command inference.

When instantiated, valid sample rate can be obtained via SampleRate. Expected number of audio samples per frame is FrameLength. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio.

func getNextFrameAudio() []int16 {
    // get audio frame
}

for {
    err := picovoice.Process(getNextFrameAudio())
}

When done resources have to be released explicitly

picovoice.Delete()

Unity

Import the latest Picovoice Unity Package into your Unity project.

The SDK provides two APIs:

High-Level API

PicovoiceManager provides a high-level API that takes care of audio recording. This is the quickest way to get started.

The constructor PicovoiceManager.Create will create an instance of the PicovoiceManager using the Porcupine keyword and Rhino context files that you pass to it.

using Pv.Unity;

PicovoiceManager _picovoiceManager = new PicovoiceManager(
                                "/path/to/keyword/file.ppn",
                                () => {},
                                "/path/to/context/file.rhn",
                                (inference) => {};

Once you have instantiated a PicovoiceManager, you can start/stop audio capture and processing by calling:

try
{
    _picovoiceManager.Start();
}
catch(Exception ex)
{
    Debug.LogError(ex.ToString());
}

// .. use picovoice

_picovoiceManager.Stop();

PicovoiceManager uses our unity-voice-processor Unity package to capture frames of audio and automatically pass it to the Picovoice platform.

Low-Level API

Picovoice provides low-level access to the Picovoice platform for those who want to incorporate it into an already existing audio processing pipeline.

Picovoice is created by passing a Porcupine keyword file and Rhino context file to the Create static constructor.

using Pv.Unity;

try
{
    Picovoice _picovoice = Picovoice.Create(
                                "path/to/keyword/file.ppn",
                                OnWakeWordDetected,
                                "path/to/context/file.rhn",
                                OnInferenceResult);
}
catch (Exception ex)
{
    // handle Picovoice init error
}

To use Picovoice, you must pass frames of audio to the Process function. The callbacks will automatically trigger when the wake word is detected and then when the follow-on command is detected.

short[] GetNextAudioFrame()
{
    // .. get audioFrame
    return audioFrame;
}

short[] buffer = GetNextAudioFrame();
try
{
    _picovoice.Process(buffer);
}
catch (Exception ex)
{
    Debug.LogError(ex.ToString());
}

For Process to work correctly, the provided audio must be single-channel and 16-bit linearly-encoded.

Picovoice implements the IDisposable interface, so you can use Picovoice in a using block. If you don't use a using block, resources will be released by the garbage collector automatically, or you can explicitly release the resources like so:

_picovoice.Dispose();

Flutter

Add the Picovoice Flutter package to your pub.yaml.

dependencies:
  picovoice: ^<version>

The SDK provides two APIs:

High-Level API

PicovoiceManager provides a high-level API that takes care of audio recording. This class is the quickest way to get started.

The static constructor PicovoiceManager.create will create an instance of a PicovoiceManager using a Porcupine keyword file and Rhino context file that you pass to it.

import 'package:picovoice/picovoice_manager.dart';
import 'package:picovoice/picovoice_error.dart';

final String accessKey = "{ACCESS_KEY}"; // AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)

void createPicovoiceManager() {
  _picovoiceManager = PicovoiceManager.create(
      accessKey,
      "/path/to/keyword/file.ppn",
      _wakeWordCallback,
      "/path/to/context/file.rhn",
      _inferenceCallback);
}

The wakeWordCallback and inferenceCallback parameters are functions that you want to execute when a wake word is detected and when an inference is made.

The inferenceCallback callback function takes a parameter of RhinoInference instance with the following variables:

  • isUnderstood - true if Rhino understood what it heard based on the context or false if Rhino did not understand context
  • intent - null if isUnderstood is not true, otherwise name of intent that were inferred
  • slots - null if isUnderstood is not true, otherwise the dictionary of slot keys and values that were inferred

Once you have instantiated a PicovoiceManager, you can start/stop audio capture and processing by calling:

await _picovoiceManager.start();
// .. use for detecting wake words and commands
await _picovoiceManager.stop();

Our flutter_voice_processor Flutter plugin handles audio capture and passes frames to Picovoice for you.

Low-Level API

Picovoice provides low-level access to the Picovoice platform for those who want to incorporate it into an already existing audio processing pipeline.

Picovoice is created by passing a Porcupine keyword file and Rhino context file to the create static constructor. Sensitivity, model files and requireEndpoint are optional.

import 'package:picovoice/picovoice_manager.dart';
import 'package:picovoice/picovoice_error.dart';

final String accessKey = "{ACCESS_KEY}"; // AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)

void createPicovoice() async {
    double porcupineSensitivity = 0.7;
    double rhinoSensitivity = 0.6;
    try {
        _picovoice = await Picovoice.create(
            accessKey,
            "/path/to/keyword/file.ppn",
            wakeWordCallback,
            "/path/to/context/file.rhn",
            inferenceCallback,
            porcupineSensitivity,
            rhinoSensitivity,
            "/path/to/porcupine/model.pv",
            "/path/to/rhino/model.pv",
            requireEndpoint);
    } on PicovoiceException catch (err) {
        // handle picovoice init error
    }
}

To use Picovoice, just pass frames of audio to the process function. The callbacks will automatically trigger when the wake word is detected and then when the follow-on command is detected.

List<int> buffer = getAudioFrame();

try {
    _picovoice.process(buffer);
} on PicovoiceException catch (error) {
    // handle error
}

// once you are done using Picovoice
_picovoice.delete();

React Native

First add our React Native modules to your project via yarn or npm:

yarn add @picovoice/react-native-voice-processor
yarn add @picovoice/porcupine-react-native
yarn add @picovoice/rhino-react-native
yarn add @picovoice/picovoice-react-native

The @picovoice/picovoice-react-native package exposes a high-level and a low-level API for integrating Picovoice into your application.

High-Level API

PicovoiceManager provides a high-level API that takes care of audio recording. This class is the quickest way to get started.

The static constructor PicovoiceManager.create will create an instance of a PicovoiceManager using a Porcupine keyword file and Rhino context file that you pass to it.

const accessKey = "${ACCESS_KEY}"; // obtained from Picovoice Console (https://console.picovoice.ai/)

this._picovoiceManager = PicovoiceManager.create(
    accessKey,
    '/path/to/keyword/file.ppn',
    wakeWordCallback,
    '/path/to/context/file.rhn',
    inferenceCallback);

The wakeWordCallback and inferenceCallback parameters are functions that you want to execute when a wake word is detected and when an inference is made.

Once you have instantiated a PicovoiceManager, you can start/stop audio capture and processing by calling:

try {
  let didStart = await this._picovoiceManager.start();
} catch(err) { }
// .. use for detecting wake words and commands
let didStop = await this._picovoiceManager.stop();

@picovoice/react-native-voice-processor module handles audio capture and passes frames to Picovoice for you.

Low-Level API

Picovoice provides low-level access to the Picovoice platform for those who want to incorporate it into an already existing audio processing pipeline.

Picovoice is created by passing a Porcupine keyword file and Rhino context file to the create static constructor. Sensitivity and model files are optional.

const accessKey = "${ACCESS_KEY}"; // obtained from Picovoice Console (https://console.picovoice.ai/)

async createPicovoice() {
    let porcupineSensitivity = 0.7;
    let rhinoSensitivity = 0.6;
    let requireEndpoint = false;

    try {
        this._picovoice = await Picovoice.create(
            accessKey,
            '/path/to/keyword/file.ppn',
            wakeWordCallback,
            '/path/to/context/file.rhn',
            inferenceCallback,
            processErrorCallback,
            porcupineSensitivity,
            rhinoSensitivity,
            "/path/to/porcupine/model.pv",
            "/path/to/rhino/model.pv",
            requireEndpoint);
    } catch (err) {
        // handle error
    }
}

To use Picovoice, just pass frames of audio to the process function. The callbacks will automatically trigger when the wake word is detected and then when the follow-on command is detected.

let buffer = getAudioFrame();

try {
    await this._picovoice.process(buffer);
} catch (e) {
    // handle error
}

// once you are done
this._picovoice.delete();

Android

Porcupine can be found on Maven Central. To include the package in your Android project, ensure you have included mavenCentral() in your top-level build.gradle file and then add the following to your app's build.gradle:

dependencies {
    // ...
    implementation 'ai.picovoice:picovoice-android:${LATEST_VERSION}'
}

There are two possibilities for integrating Picovoice into an Android application.

High-Level API

PicovoiceManager provides a high-level API for integrating Picovoice into Android applications. It manages all activities related to creating an input audio stream, feeding it into Picovoice engine, and invoking user-defined callbacks upon wake word detection and inference completion.

final String accessKey = "${ACCESS_KEY}"; // AccessKey obtained from Picovoice Console (https://console.picovoice.ai/)
final String keywordPath = "/path/to/keyword.ppn"; // path relative to 'assets' folder
final String contextPath = "/path/to/context.rhn"; // path relative to 'assets' folder

PicovoiceManager manager = new PicovoiceManager.Builder()
    .setAccessKey(accessKey)
    .setKeywordPath(keywordPath)
    .setWakeWordCallback(new PicovoiceWakeWordCallback() {
        @Override
        public void invoke() {
            // logic to execute upon detection of wake word
        }
    })
    .setContextPath(contextPath)
    .setInferenceCallback(new PicovoiceInferenceCallback() {
        @Override
        public void invoke(final RhinoInference inference) {
            // logic to execute upon completion of intent inference
        }
    })
    .build(appContext);
);

Keyword (.ppn) and context (.rhn) files should be placed under the Android project assets folder (src/main/assets/).

The appContext parameter is the Android application context - this is used to extract Picovoice resources from the APK.

When initialized, input audio can be processed using:

manager.start();

Stop the manager with:

manager.stop();

Low-Level API

Picovoice.java provides a low-level binding for Android. It can be initialized as follows:

import ai.picovoice.picovoice.*;

try {
    Picovoice picovoice = new Picovoice.Builder()
        .setPorcupineModelPath("/path/to/porcupine/model.pv")
        .setKeywordPath("/path/to/keyword.ppn")
        .setPorcupineSensitivity(0.7f)
        .setWakeWordCallback(new PicovoiceWakeWordCallback() {
            @Override
            public void invoke() {
                // logic to execute upon detection of wake word
            }
        })
        .setRhinoModelPath("/path/to/rhino/model.pv")
        .setContextPath("/path/to/context.rhn")
        .setRhinoSensitivity(0.55f)
        .setInferenceCallback(new PicovoiceInferenceCallback() {
            @Override
            public void invoke(final RhinoInference inference) {
                // logic to execute upon completion of intent inference
            }
        })
        .build(appContext);
} catch(PicovoiceException ex) { }

Keyword (.ppn), context (.rhn) and model (.pv) files should be placed under the Android project assets folder (src/main/assets/).

Once initialized, picovoice can be used to process incoming audio.

private short[] getNextAudioFrame();

while (true) {
    try {
        picovoice.process(getNextAudioFrame());
    } catch (PicovoiceException e) {
        // error handling logic
    }
}

Finally, be sure to explicitly release resources acquired as the binding class does not rely on the garbage collector for releasing native resources:

picovoice.delete();

iOS

The Picovoice iOS SDK is available via Cocoapods. To import it into your iOS project install Cocoapods and add the following line to your Podfile:

pod 'Picovoice-iOS'

There are two possibilities for integrating Picovoice into an iOS application.

High-Level API

PicovoiceManager class manages all activities related to creating an audio input stream, feeding it into Picovoice engine, and invoking user-defined callbacks upon wake word detection and completion of intent inference. The class can be initialized as below:

import Picovoice

let accessKey = "${ACCESS_KEY}" // obtained from Picovoice Console (https://console.picovoice.ai/)

let manager = PicovoiceManager(
    accessKey: accessKey,
    keywordPath: "/path/to/keyword.ppn",
    onWakeWordDetection: {
        // logic to execute upon detection of wake word
    },
    contextPath: "/path/to/context.rhn",
    onInference: { inference in
        // logic to execute upon completion of intent inference
    })

when initialized input audio can be processed using manager.start(). The processing can be interrupted using manager.stop().

Low-Level API

Picovoice.swift provides an API for passing audio from your own audio pipeline into the Picovoice Platform for wake word detection and intent inference.

o construct an instance, you'll need to provide a Porcupine keyword file (.ppn), a Rhino context file (.rhn) and callbacks for when the wake word is detected and an inference is made. Sensitivity and model parameters are optional

import Picovoice

let accessKey = "${ACCESS_KEY}" // obtained from Picovoice Console (https://console.picovoice.ai/)

do {
    let picovoice = try Picovoice(
        accessKey: accessKey,
        keywordPath: "/path/to/keyword.ppn",
        porcupineSensitivity: 0.4,
        porcupineModelPath: "/path/to/porcupine/model.pv"
        onWakeWordDetection: {
            // logic to execute upon detection of wake word
        },
        contextPath: "/path/to/context.rhn",
        rhinoSensitivity: 0.7,
        rhinoModelPath: "/path/to/rhino/model.pv"
        onInference: { inference in
            // logic to execute upon completion of intent inference
        })
} catch { }

Once initialized, picovoice can be used to process incoming audio. The underlying logic of the class will handle switching between wake word detection and intent inference, as well as invoking the associated events.

func getNextAudioFrame() -> [Int16] {
    // .. get audioFrame
    return audioFrame;
}

while (true) {
    do {
        try picovoice.process(getNextAudioFrame());
    } catch { }
}

Once you're done with an instance of Picovoice you can force it to release its native resources rather than waiting for the garbage collector:

picovoice.delete();

Web

Install the Web SDK using yarn:

yarn add @picovoice/picovoice-web

or using npm:

npm install --save @picovoice/picovoice-web

Create an instance of the engine using PicovoiceWorker and run on an audio input stream:

import { PicovoiceWorker } from "@picovoice/picovoice-web";

function wakeWordCallback(detection: PorcupineDetection) {
  console.log(`Porcupine detected keyword: ${detection.label}`);
}

function inferenceCallback(inference: RhinoInference) {
  if (inference.isFinalized) {
    if (inference.isUnderstood) {
      console.log(inference.intent)
      console.log(inference.slots)
    }
  }
}

function getAudioData(): Int16Array {
  ... // function to get audio data
  return new Int16Array();
}

const picovoice = await PicovoiceWorker.create(
  "${ACCESS_KEY}",
  keyword,
  wakeWordCallback,
  porcupineModel,
  context,
  inferenceCallback,
  rhinoModel
);

for (; ;) {
  picovoice.process(getAudioData());
  // break on some condition
}

Replace ${ACCESS_KEY} with yours obtained from Picovoice Console.

When done, release the resources allocated to Picovoice using picovoice.release().

Angular

yarn add @picovoice/picovoice-angular @picovoice/web-voice-processor

(or)

npm install @picovoice/picovoice-angular @picovoice/web-voice-processor
import { Subscription } from "rxjs"
import { PicovoiceService } from "@picovoice/picovoice-angular"

...

constructor(private picovoiceService: PicovoiceService) {
  this.wakeWordDetectionSubscription = picovoiceService.wakeWordDetection$.subscribe(
          (wakeWordDetection: PorcupineDetection) => {
            this.inference = null;
            this.wakeWordDetection = wakeWordDetection;
          }
  );

  this.inferenceSubscription = picovoiceService.inference$.subscribe(
          (inference: RhinoInference) => {
            this.wakeWordDetection = null;
            this.inference = inference;
          }
  );

  this.contextInfoSubscription = picovoiceService.contextInfo$.subscribe(
          (contextInfo: string | null) => {
            this.contextInfo = contextInfo;
          }
  );

  this.isLoadedSubscription = picovoiceService.isLoaded$.subscribe(
          (isLoaded: boolean) => {
            this.isLoaded = isLoaded;
          }
  );
  this.isListeningSubscription = picovoiceService.isListening$.subscribe(
          (isListening: boolean) => {
            this.isListening = isListening;
          }
  );
  this.errorSubscription = picovoiceService.error$.subscribe(
          (error: string | null) => {
            this.error = error;
          }
  );
}

async ngOnInit() {
    try {
      await this.picovoiceService.init(
              accessKey,
              porcupineKeyword,
              porcupineModel,
              rhinoContext,
              rhinoModel
      );
    }
    catch (error) {
      console.error(error)
    }
}

ngOnDestroy() {
  this.wakeWordDetectionSubscription.unsubscribe();
  this.inferenceSubscription.unsubscribe();
  this.contextInfoSubscription.unsubscribe();
  this.isLoadedSubscription.unsubscribe();
  this.isListeningSubscription.unsubscribe();
  this.errorSubscription.unsubscribe();
  this.picovoiceService.release();
}

React

yarn add @picovoice/picovoice-react @picovoice/web-voice-processor

(or)

npm install @picovoice/picovoice-react @picovoice/web-voice-processor
import { usePicovoice } from '@picovoice/picovoice-react';

function App(props) {
  const {
    wakeWordDetection,
    inference,
    contextInfo,
    isLoaded,
    isListening,
    error,
    init,
    start,
    stop,
    release,
  } = usePicovoice();

  const initEngine = async () => {
    await init(
            ${ACCESS_KEY},
            porcupineKeyword,
            porcupineModel,
            rhinoContext,
            rhinoModel
    );
    await start();
  }

  useEffect(() => {
    if (wakeWordDetection !== null) {
      console.log(`Picovoice detected keyword: ${wakeWordDetection.label}`);
    }
  }, [wakeWordDetection])

  useEffect(() => {
    if (inference !== null) {
      if (inference.isUnderstood) {
        console.log(inference.intent)
        console.log(inference.slots)
      }
    }
  }, [inference])
}

Vue

yarn add @picovoice/picovoice-vue @picovoice/web-voice-processor

(or)

npm install @picovoice/picovoice-vue @picovoice/web-voice-processor
<script lang='ts'>
import { usePicovoice } from '@picovoice/picovoice-vue';

export default {
  data() {
    const {
      state,
      init,
      start,
      stop,
      release
    } = usePicovoice();

    init(
      ${ACCESS_KEY},
      {
        label: "Picovoice",
        publicPath: "picovoice_wasm.ppn",
      },
      { publicPath: "porcupine_params.pv" },
      { publicPath: "clock_wasm.rhn" },
      { publicPath: "rhino_params.pv" },
    );

    return {
      state,
      start,
      stop,
      release
    }
  },
  watch: {
    "state.wakeWordDetection": function(wakeWord) {
      if (wakeWord !== null) {
        console.log(wakeWord)
      }
    },
    "state.inference": function(inference) {
      if (inference !== null) {
        console.log(inference)
      }
    },
    "state.contextInfo": function(contextInfo) {
      if (contextInfo !== null) {
        console.log(contextInfo)
      }
    },
    "state.isLoaded": function(isLoaded) {
      console.log(isLoaded)
    },
    "state.isListening": function(isListening) {
      console.log(isListening)
    },
    "state.error": function(error) {
      console.error(error)
    },
  },
  onBeforeDestroy() {
    this.release();
  },
};
</script>

Rust

To add the picovoice library into your app, add picovoice to your app's Cargo.toml manifest:

[dependencies]
picovoice = "*"

To create an instance of the engine with default parameters, use the PicovoiceBuilder function. You must provide a Porcupine keyword file, a wake word detection callback function, a Rhino context file and an inference callback function. You must then make a call to init():

use picovoice::{rhino::RhinoInference, PicovoiceBuilder};

let wake_word_callback = || {
    // let user know wake word detected
};
let inference_callback = |inference: RhinoInference| {
    if inference.is_understood {
        let intent = inference.intent.unwrap();
        let slots = inference.slots;
        // add code to take action based on inferred intent and slot values
    } else {
        // add code to handle unsupported commands
    }
};

let mut picovoice = PicovoiceBuilder::new(
    keyword_path,
    wake_word_callback,
    context_path,
    inference_callback,
).init().expect("Failed to create picovoice");

Upon detection of wake word defined by keyword_path it starts inferring user's intent from the follow-on voice command within the context defined by the file located at context_path. keyword_path is the absolute path to Porcupine wake word engine keyword file (with .ppn suffix). context_path is the absolute path to Rhino Speech-to-Intent engine context file (with .rhn suffix). wake_word_callback is invoked upon the detection of wake phrase and inference_callback is invoked upon completion of follow-on voice command inference.

When instantiated, valid sample rate can be obtained via sample_rate(). Expected number of audio samples per frame is frame_length(). The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio:

fn next_audio_frame() -> Vec<i16> {
    // get audio frame
}

loop {
    picovoice.process(&next_audio_frame()).expect("Picovoice failed to process audio");
}

C

Picovoice is implemented in ANSI C and therefore can be directly linked to C applications. Its public header file (sdk/c/include/pv_picovoice.h) contains relevant information. An instance of the Picovoice object can be constructed as follows.

const char* ACCESS_KEY = "${ACCESS_KEY}"; // AccessKey string obtained from [Picovoice Console](https://console.picovoice.ai/)

const char *porcupine_model_path = ... // Available at resources/porcupine/lib/common/porcupine_params.pv
const char *keyword_path = ...
const float porcupine_sensitivity = 0.5f;

const char *rhino_model_path = ... // Available at resources/rhino/lib/common/rhino_params.pv
const char *context_path = ...
const float rhino_sensitivity = 0.5f;
const bool require_endpoint = true;

static void wake_word_callback(void) {
    // take action upon detection of wake word
}

static void inference_callback(pv_inference_t *inference) {
    // `inference` exposes three immutable properties:
    // (1) `IsUnderstood`
    // (2) `Intent`
    // (3) `Slots`

    // take action based on inferred intent
    pv_inference_delete(inference);
}

pv_picovoice_t *handle = NULL;

pv_status_t status = pv_picovoice_init(
        access_key,
        porcupine_model_path,
        keyword_path,
        porcupine_sensitivity,
        wake_word_callback,
        rhino_model_path,
        context_path,
        rhino_sensitivity,
        require_endpoint,
        inference_callback,
        &handle);

if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Sensitivity is the parameter that enables developers to trade miss rate for false alarm. It is a floating-point number within [0, 1]. A higher sensitivity reduces miss rate (false reject rate) at cost of increased false alarm rate.

handle is an instance of Picovoice runtime engine that detects utterances of the wake phrase provided by keyword_path. Upon detection of wake word it starts inferring user's intent from the follow-on voice command within the context defined in context_path. wake_word_callback is invoked upon the detection of wake phrase and inference_callback is invoked upon completion of follow-on voice command inference.

Picovoice accepts single channel, 16-bit PCM audio. The sample rate can be retrieved using pv_sample_rate(). Finally, Picovoice accepts input audio in consecutive chunks (aka frames) the length of each frame can be retrieved using pv_porcupine_frame_length().

extern const int16_t *get_next_audio_frame(void);

while (true) {
    const int16_t *pcm = get_next_audio_frame();
    const pv_status_t status = pv_picovoice_process(handle, pcm);
    if (status != PV_STATUS_SUCCESS) {
        // error handling logic
    }
}

Finally, when done be sure to release the acquired resources.

pv_picovoice_delete(handle);

Microcontroller

Picovoice is implemented in ANSI C and therefore can be directly linked to embedded C projects. Its public header file contains relevant information. An instance of the Picovoice object can be constructed as follows:

#define MEMORY_BUFFER_SIZE ...
static uint8_t memory_buffer[MEMORY_BUFFER_SIZE] __attribute__((aligned(16)));

static const uint8_t *keyword_array = ...
const float porcupine_sensitivity = 0.5f

static void wake_word_callback(void) {
    // logic to execute upon detection of wake word
}

static const uint8_t *context_array = ...
const float rhino_sensitivity = 0.75f

static void inference_callback(pv_inference_t *inference) {
    // `inference` exposes three immutable properties:
    // (1) `IsUnderstood`
    // (2) `Intent`
    // (3) `Slots`
    // ..
    pv_inference_delete(inference);
}

pv_picovoice_t *handle = NULL;

const pv_status_t status = pv_picovoice_init(
        MEMORY_BUFFER_SIZE,
        memory_buffer,
        sizeof(keyword_array),
        keyword_array,
        porcupine_sensitivity,
        wake_word_callback,
        sizeof(context_array),
        context_array,
        rhino_sensitivity,
        inference_callback,
        &handle);

if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Sensitivity is the parameter that enables developers to trade miss rate for false alarm. It is a floating-point number within [0, 1]. A higher sensitivity reduces miss rate (false reject rate) at cost of increased false alarm rate.

handle is an instance of Picovoice runtime engine that detects utterances of wake phrase defined in keyword_array. Upon detection of wake word it starts inferring user's intent from the follow-on voice command within the context defined in context_array. wake_word_callback is invoked upon the detection of wake phrase and inference_callback is invoked upon completion of follow-on voice command inference.

Picovoice accepts single channel, 16-bit PCM audio. The sample rate can be retrieved using pv_sample_rate(). Finally, Picovoice accepts input audio in consecutive chunks (aka frames) the length of each frame can be retrieved using pv_porcupine_frame_length().

extern const int16_t *get_next_audio_frame(void);

while (true) {
    const int16_t *pcm = get_next_audio_frame();
    const pv_status_t status = pv_picovoice_process(handle, pcm);
    if (status != PV_STATUS_SUCCESS) {
        // error handling logic
    }
}

Finally, when done be sure to release the acquired resources.

pv_picovoice_delete(handle);

Releases

v3.0.0 - October 26th, 2023

  • Improvements to error reporting
  • Upgrades to authorization and authentication system
  • Added reset() function to API
  • PicovoiceManager classes can now access context information without a call to start()
  • Added Farsi support for microcontrollers
  • Various bug fixes and improvements
  • Node min support bumped to 16
  • Unity editor min support bumped to 2021
  • Patches to .NET support

v2.2.0 - April 12th, 2023

  • Added language support for Arabic, Dutch, Hindi, Mandarin, Polish, Russian, Swedish and Vietnamese
  • Added support for .NET 7.0 and fixed support for .NET Standard 2.0
  • iOS minimum support moved to 11.0
  • Improved stability and performance

v2.1.0 - January 20th, 2022

  • macOS arm64 (Apple Silicon) support added for Java and Unity SDKs
  • Various bug fixes and improvements

v2.0.0 - November 25th, 2021

  • Improved accuracy
  • Added Rust SDK
  • macOS arm64 support
  • Added NodeJS support for Windows, NVIDIA Jetson Nano, and BeagleBone
  • Added .NET support for NVIDIA Jetson Nano and BeagleBone
  • Runtime optimization

v1.1.0 - December 2nd, 2020

  • Improved accuracy
  • Runtime optimizations
  • .NET SDK
  • Java SDK
  • React Native SDK
  • C SDK

v1.0.0 - October 22, 2020

  • Initial release

FAQ

You can find the FAQ here.

leopard's People

Contributors

albho avatar dejaydev avatar dependabot[bot] avatar erismik avatar kenarsa avatar ksyeo1010 avatar laves avatar mrrostam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

leopard's Issues

Leopard Issue: [LeopardIOError: Process failed.]

Make sure you have read the documentation, and have put forth a reasonable effort to find an existing answer.

Expected behaviour

I am uploading an mp3 audio file or WAV audio file and passing its absolute path using expo-document-picker to the processFile method of Leopard. I am expecting it to transcribe the audio file and return the transcript and words after processing. The same file working for web but it's not working for React Native

Actual behaviour

It throws [LeopardIOError: Process failed.] error, it's not clear enough to understand what I am doing wrong.

image

Steps to reproduce the behaviour

(Include enough details so that the issue can be reproduced independently.)

import { StatusBar } from 'expo-status-bar';
import { StyleSheet,Text, View,Button } from 'react-native';
import { Leopard } from '@picovoice/leopard-react-native'
import { useEffect, useRef, useCallback, useState } from 'react';
import * as DocumentPicker from 'expo-document-picker';

const accessKey = "<accessKey>";

export default function App() {
  const [fileResponse, setFileResponse] = useState([]);
  const leopard = useRef();

  useEffect(() => {
    createLeopardInstance();

    return () => {
      deleteLeopardInstance();
    }
  },[]);

  const createLeopardInstance = async () => {
    try {
      leopard.current = await Leopard.create(accessKey, "models/leopard_params.pv", {enableAutomaticPunctuation: true});
    } catch (err) {
      if (err) {
        // handle error
        console.log("===",err);
      }
    }
  }

  const deleteLeopardInstance = async () => {
    if(!leopard.current) return;

    await leopard.current.delete();
  }

  const transcribeAudio = async (path) => {
    try {
      const { transcript, words } = await leopard.current.processFile(path);
      console.log(transcript,words)
    } catch (err) {
      if (err) {
        // handle error
        console.log("===",err);
      }
    }
  }

  const handleDocumentSelection = useCallback(async () => {
    try {
      const response = await DocumentPicker.getDocumentAsync({
        presentationStyle: 'fullScreen',
      });
      setFileResponse(response);
      transcribeAudio(response.assets[0].uri);
    } catch (err) {
      console.warn(err);
    }
  }, []);

  return (
    <View style={styles.container}>
      <Text>Hello</Text>
      <Button title="Select 📑" onPress={handleDocumentSelection} />
      <StatusBar style="auto" />
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: '#fff',
    alignItems: 'center',
    justifyContent: 'center',
  },
});

picovoice.h uses visibility attribute not available in MSVC

currently picovoice.h has

#define PV_API __attribute__((visibility("default")))

Which does not work with MSVC. This should have an additional switch for MSVC to use either __declspec(dllimport) or __declspec(dllexport) depending if its for your internal build for the libraries or the consumable client build.

Turning words to numbers and avoid saving a file

For a project, I tried using pvleopard but had to ultimately decide against it. This was because of two issues:

  1. It would take the user saying 'forty four' and return the words 'forty four' instead of the number.
  2. I could only use it for a limited amount of time or I would have to use speech recognition to save the file and then process the wav file.

I was wondering if there were any ways around these issues. For the second one, I would like to avoid using speech recognition but would need the program to stop listening when the user has stopped talking, not after a set amount of time. Sorry if this isn't the right place for this.

[Error] license file belongs to a different version of the library

Hi Sir,

I am using Ubuntu 14.04 LTS to try your demo. I had complied using gcc -I include/ -O3 demo/c/leopard_demo.c -ldl -o leopard_demo. However, I faced problem when I tried to run the following:
./leopard_demo
./lib/linux/x86_64/libpv_leopard.so
./lib/common/acoustic_model.pv
./lib/common/language_model.pv
./resources/license/leopard_eval_linux.lic
./resources/audio_samples/test.wav

It gave me an error of the following:

[Error] license file belongs to a different version of the library

Is there anyway to resolve this?

only support english?

when i run this project, i find only english language can be accept? this is not support multi-language?

libpv_leopard.so: wrong ELF class: ELFCLASS64 - Python

I've just reinstalled pvleopard for Python 3.9 on my Raspberry Pi 4. Upon attempting to create a leopard object, I receive an error. It worked fine prior to the reinstallation, but I'm now running v1.2. I tried downgrading to v1.1 and even v1.0, but I just get the same error.

import pvleopard

with open('access_key.txt', 'r') as file:
    key = file.read()

leopard = pvleopard.create(access_key=key)

Here's the error I receive.

Traceback (most recent call last):
  File "/home/pi/Desktop/Project/test.py", line 6, in <module>
    leopard = pvleopard.create(access_key=key)
  File "/home/pi/Desktop/Project/venv/lib/python3.9/site-packages/pvleopard/__init__.py", line 47, in create
    return Leopard(access_key=access_key, library_path=library_path, model_path=model_path)
  File "/home/pi/Desktop/Project/venv/lib/python3.9/site-packages/pvleopard/leopard.py", line 117, in __init__
    library = cdll.LoadLibrary(library_path)
  File "/usr/lib/python3.9/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python3.9/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/pi/Desktop/Project/venv/lib/python3.9/site-packages/pvleopard/lib/raspberry-pi/cortex-a72-aarch64/libpv_leopard.so: wrong ELF class: ELFCLASS64

From the error message, I can tell the problem is with the library version, as my platform is 32-bit. Would you mind helping me figure out why pip is installing the wrong version and how I can specify the version that I want installed? Thanks very much.

Leopard Issue: Golang - Panic when no text was transcribed

Go 1.18.2, v1.1.1 leopard bindings

Expected behaviour

When the Process function gets called and it is given audio data with no transcribe-able voice data, it should return a blank string or something to indicate there was no text transcribed

Actual behaviour

A panic happens and the program crashes, here is the trace https://pastebin.com/raw/PKzqVmnj

A defer-recover function could be used to prevent a crash, but that is not ideal.

Steps to reproduce the behaviour

Call the Process function with audio data that does not have any transcribe-able voice in it.

requires SoundFile-0.10.*

Dear Picovoice,
please include in documentation that the python test requires installation of SoundFile module

python2 -m pip install SoundFile
python3 -m pip install SoundFile

Russian language support

Is your feature request related to a problem? Please describe.
Leopard doesn't support Russian language

Describe the solution you'd like
Leopard being able to recognize Russian

Additional context
I know additional languages support is planned, but there is no clear way to be notified when particular language is available. If that's ok, maybe this FR could be used for that: those interested could subscribe, and when support for Russian language lands, this FR gets closed, and all subscribers are automatically notified by GitHub : )

And thank you for your work! 🤘

Leopard Issue: OSError: exception: stack overflow

Expected behaviour

Leopard outputs transcribed text from audio.

Actual behaviour

Leopard throws an OSError.

Steps to reproduce the behaviour

  1. pip3 install pvleoparddemo
  2. leopard_demo_file --access_key ${ACCESS_KEY} --audio_paths ${AUDIO_PATH}
  3. Wait

Other Information

I cannot provide the audio file I am using, sorry.
I have tried a different audio file and Leopard does work correctly for it.
The audio file I am using is not broken, and behaves normally in any other circumstance I've observed.

Full traceback:
Traceback (most recent call last): File "C:\Users\jgogo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\jgogo\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\jgogo\AppData\Local\Programs\Python\Python310\Scripts\leopard_demo_file.exe\__main__.py", line 7, in <module> File "C:\Users\jgogo\AppData\Local\Programs\Python\Python310\lib\site-packages\pvleoparddemo\leopard_demo_file.py", line 36, in main transcript, words = o.process_file(audio_path) File "C:\Users\jgogo\AppData\Local\Programs\Python\Python310\lib\site-packages\pvleopard\leopard.py", line 253, in process_file status = self._process_file_func( OSError: exception: stack overflow

Transcripts with Word Alignments

Hello,

Is it possible to output the world alignments similar to Mozilla's Deepspeech when using Leopard?

I tried looking around the docs and can't seem to find it. If it is not possible, any suggestions on how I can achieve it using Picovoice.

Thank you!

iOS and android libraries

Hey guys I have previously asked about flutter plug-in and I know it doesn't exist.

But what about the iOS and android libraries? How can I access it?

Leopard Documentation Issue

Hi,

I have a question regarding the accessKey. How I understand it from the docs the key needs to be present in the client to use the api. But that makes it easily extractable especially in web usage. So how can it be keept secret if it needs to be send to every user in a production environment?

Thanks

Question

I don't understand the free version, can I use it in a web app, no matter how users will access the web app?

Can't install lib

I have some problems when install lib

I used npm to install, and i received error:

➜  TestPicoVoice npm install @picovoice/web-voice-processor @picovoice/cheetah-web
npm ERR! code 127
npm ERR! path /Directory/Workspace/TestPicoVoice/node_modules/@picovoice/cheetah-web
npm ERR! command failed
npm ERR! command sh -c yarn copywasm
npm ERR! sh: yarn: command not found

npm ERR! A complete log of this run can be found in:
npm ERR!    /Directory/.npm/_logs/2022-11-07T13_54_11_535Z-debug-0.log

How can fix it? Thanks guys!

Questions about custom vocabulary and keywords boosting features

Hello, I have questions about custom vocabulary and keywords boosting features,
Are there limitations ?

I have 2 use case, one with 1K custom words, the other with 10K, is it still a viable approach accuracy/performance wize or should I consider an alternative/train my own model ?
The same question for Cheetah aswell !

Thanks

Ability to control length of subtitles with Leopard

I'm using Leopard to try and create subtitles for a project, the problem is that each subtitle has way too many words and is way too long. I don't see any way to control this and I'm getting sections of subtitles that are too wordy

Leopard Issue: Android Build Failed | JAVA Version Clash

Make sure you have read the documentation, and have put forth a reasonable effort to find an existing answer.

Expected behaviour

Local system has JAVA 17 version installed. There is some clash happening between the Local java version and the java version used in the android demo.

Actual behaviour

image image

Steps to reproduce the behaviour

Following the steps in the document, Run -> npm run android-run en , command to view the issue. Expecting to build successfully. without version clashes.

(Include enough details so that the issue can be reproduced independently.)

Error when running python demo

when running the python demo I have this error that pop out.
I have made sure to use the whole folder as given on the github and to include my licence file:

Traceback (most recent call last):
File "demo/python/leopard_demo.py", line 61, in
license_path=args.license_path)
File "demo/python../../binding/python\leopard.py", line 54, in init
self.libc = CDLL(find_library('c'))
File "C:\Users\parth\anaconda3\lib\ctypes_init
.py", line 364, in init
self._handle = _dlopen(self._name, mode)
TypeError: LoadLibrary() argument 1 must be str, not None

Request: Cortex-A7 Linux lib

I would like to run a golang application on a Qualcomm APQ8009 armv7 CPU which utilizes Leopard. I tried modifying the code a bit to use the android/armeabi-v7a lib but the golang wrappers don't seem to be compatible with it. By default, on an APQ8009 running embedded Linux, I get an error saying "Unsupported CPU: 0xc07". I also have a Banana Pi with a Cortex-A7 processor which I would like to try running this on. Judging by how well Leopard works on old Core 2 Duo processors it seems like it would run well on this too.

Leopard Issue:

Make sure you have read the documentation, and have put forth a reasonable effort to find an existing answer.

Expected behaviour

I've download from the folder https://github.com/Picovoice/leopard/tree/master/lib/common the default models.
I put all of them into the config/language_models folder. Then I try to load them with the model path, but an error occurs

pvleopard.create(access_key=Mykey,
model_path = 'config/language_models/leopard_params.pv')

Actual behaviour


LeopardInvalidArgumentError Traceback (most recent call last)
/var/folders/n0/7rhb7mg55rdfrz13sxnn0dgc0000gn/T/ipykernel_14326/2304799399.py in
----> 1 pvleopard.create(access_key=myKey,
2 model_path = 'config/language_models/leopard_params.pv')

~/opt/anaconda3/lib/python3.9/site-packages/pvleopard/_factory.py in create(access_key, model_path, library_path, enable_automatic_punctuation)
38 library_path = default_library_path('')
39
---> 40 return Leopard(
41 access_key=access_key,
42 model_path=model_path,

~/opt/anaconda3/lib/python3.9/site-packages/pvleopard/_leopard.py in init(self, access_key, model_path, library_path, enable_automatic_punctuation)
156 status = init_func(access_key.encode(), model_path.encode(), enable_automatic_punctuation, byref(self._handle))
157 if status is not self.PicovoiceStatuses.SUCCESS:
--> 158 raise self._PICOVOICE_STATUS_TO_EXCEPTIONstatus
159
160 self._delete_func = library.pv_leopard_delete

LeopardInvalidArgumentError:

Steps to reproduce the behaviour

(Include enough details so that the issue can be reproduced independently.)

Additional info

I've insalled the package using pip, actual version pvleopard==1.2.2
Also, I tried with a download model from the picovice console, but the result is the same error.
The code is correct, because if I copy in the same folder the default model downloaded with the package itself, it's working.

Leopard Issue: Demo doesn't run

Have you checked the docs and existing issues?

  • I have read all of the relevant Picovoice Leopard docs
  • I have searched the existing issues for Leopard

SDK

.NET

Leopard package version

2.0.1

Framework version

.NET 8.0

Platform

Windows (x86_64)

OS/Browser version

N/A

Describe the bug

Pv.LeopardActivationLimitException: Leopard init failed:
  [0] Picovoice Error (code `00000136`)
  [1] Picovoice Error (code `00000136`)
  [2] Picovoice Error (code `0000012C`)
   at Pv.Leopard..ctor(String accessKey, String modelPath, Boolean enableAutomaticPunctuation, Boolean enableDiarization)
   at Pv.Leopard.Create(String accessKey, String modelPath, Boolean enableAutomaticPunctuation, Boolean enableDiarization)
   at LeopardDemo.MicDemo.RunDemo(String accessKey, String modelPath, Boolean enableAutomaticPunctuation, Boolean enableDiarization, Boolean verbose, Int32 audioDeviceIndex) in C:\git\picovoice\leopard\demo\dotnet\LeopardDemo\MicDemo.cs:line 59
   at LeopardDemo.MicDemo.Main(String[] args) in C:\git\picovoice\leopard\demo\dotnet\LeopardDemo\MicDemo.cs:line 249

Steps To Reproduce

  1. Open project
  2. Switch "StartupObject" to "LeopardDemo.MicDemo"
  3. Switch Framework version to net8.0 (net6.0 was LTS, but the support has lapsed)
  4. Run the project

Expected Behavior

Demo runs. This previously worked before the 2.* upgrade.

Leopard Issue: An error occurred while creating video: Unsupported CPU: `0x000`

Make sure you have read the documentation, and have put forth a reasonable effort to find an existing answer.

Expected behaviour

image

Actual behaviour

image

Steps to reproduce the behaviour

pvleopard.create(access_key=access_key)
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.11/site-packages/pvleopard/_factory.py", line 42, in create
library_path = default_library_path('')
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pvleopard/_util.py", line 59, in default_library_path
linux_machine = _linux_machine()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pvleopard/_util.py", line 45, in _linux_machine
raise NotImplementedError("Unsupported CPU: %s." % cpu_part)
NotImplementedError: Unsupported CPU: 0x000.

(Include enough details so that the issue can be reproduced independently.)

This is running in container docker

FROM python:3.11-slim
# FROM python:3.11-alpine3.18

and system arch
r```
oot@b6136564c5d7:/app# uname -a
Linux b6136564c5d7 5.15.49-linuxkit-pr #1 SMP PREEMPT Thu May 25 07:27:39 UTC 2023 aarch64 GNU/Linux
root@b6136564c5d7:/app# uname -am
Linux b6136564c5d7 5.15.49-linuxkit-pr #1 SMP PREEMPT Thu May 25 07:27:39 UTC 2023 aarch64 GNU/Linux
root@b6136564c5d7:/app# uname -m
aarch64
root@b6136564c5d7:/app#

Leopard Issue: Error trying to run demo on windows 10

windows 10
node v8.11.3
npm 5.6.0

0 info it worked if it ends with ok
1 verbose cli [ 'C:\Program Files\nodejs\node.exe',
1 verbose cli 'C:\Program Files\nodejs\node_modules\npm\bin\npm-cli.js',
1 verbose cli 'run',
1 verbose cli 'start' ]
2 info using [email protected]
3 info using [email protected]
4 verbose run-script [ 'prestart', 'start', 'poststart' ]
5 info lifecycle [email protected]prestart: [email protected]
6 info lifecycle [email protected]
start: [email protected]
7 verbose lifecycle [email protected]start: unsafe-perm in lifecycle true
8 verbose lifecycle [email protected]
start: PATH: C:\Program Files\nodejs\node_modules\npm\node_modules\npm-lifecycle\node-gyp-bin;C:\Ricardo\git\leopard\demo\web\node_modules.bin;C:\Program Files\Microsoft MPI\Bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0;C:\Program Files\Microsoft SQL Server\130\Tools\Binn;C:\Program Files\dotnet;C:\Program Files (x86)\GtkSharp\2.12\bin;C:\Tools\ffmpeg\bin;C:\Program Files (x86)\Calibre2;C:\WINDOWS\System32\OpenSSH;C:\Program Files\nodejs;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files (x86)\Microsoft VS Code\bin;C:\HaxeToolkit\haxe;C:\HaxeToolkit\neko;C:\Program Files (x86)\dotnet;C:\Program Files\PuTTY;C:\ProgramData\UNIVALI\Portugol Studio;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn;C:\Users\Ricardo\AppData\Local\Programs\Python\Python311\Scripts;C:\Users\Ricardo\AppData\Local\Programs\Python\Python311;C:\Users\Ricardo.cargo\bin;C:\Users\Ricardo\AppData\Local\Programs\Python\Python36-32\Scripts;C:\Users\Ricardo\AppData\Local\Programs\Python\Python36-32;C:\Users\Ricardo\AppData\Local\Microsoft\WindowsApps;C:\Program Files (x86)\Microsoft VS Code\bin;C:\Program Files\MongoDB\Server\3.4\bin;C:\Program Files\Heroku\bin;C:\Users\Ricardo\AppData\Roaming\npm;C:\Tools\C3PO;;C:\Users\Ricardo\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\Ricardo.dotnet\tools
9 verbose lifecycle [email protected]start: CWD: C:\Ricardo\git\leopard\demo\web
10 silly lifecycle [email protected]
start: Args: [ '/d /s /c', 'yarn run http-server -a localhost -p 5000' ]
11 silly lifecycle [email protected]start: Returned: code: 1 signal: null
12 info lifecycle [email protected]
start: Failed to exec start script
13 verbose stack Error: [email protected] start: yarn run http-server -a localhost -p 5000
13 verbose stack Exit status 1
13 verbose stack at EventEmitter. (C:\Program Files\nodejs\node_modules\npm\node_modules\npm-lifecycle\index.js:285:16)
13 verbose stack at emitTwo (events.js:126:13)
13 verbose stack at EventEmitter.emit (events.js:214:7)
13 verbose stack at ChildProcess. (C:\Program Files\nodejs\node_modules\npm\node_modules\npm-lifecycle\lib\spawn.js:55:14)
13 verbose stack at emitTwo (events.js:126:13)
13 verbose stack at ChildProcess.emit (events.js:214:7)
13 verbose stack at maybeClose (internal/child_process.js:925:16)
13 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:209:5)
14 verbose pkgid [email protected]
15 verbose cwd C:\Ricardo\git\leopard\demo\web
16 verbose Windows_NT 10.0.19044
17 verbose argv "C:\Program Files\nodejs\node.exe" "C:\Program Files\nodejs\node_modules\npm\bin\npm-cli.js" "run" "start"
18 verbose node v8.11.3
19 verbose npm v5.6.0
20 error code ELIFECYCLE
21 error errno 1
22 error [email protected] start: yarn run http-server -a localhost -p 5000
22 error Exit status 1
23 error Failed at the [email protected] start script.
23 error This is probably not a problem with npm. There is likely additional logging output above.
24 verbose exit [ 1, true ]

Leopard Issue: App crashing in Non-GMS device and in APK

When I try ti run application APK in any device it crashes on tapping START. I have been creating apks using "Flutter buil apk --split-per-abi". Also I want to know that can we run this application in a device without google services like non-google Android with any GMS.

Expected behaviour

App should work fine.

Actual behaviour

App Crashing

Steps to reproduce the behaviour

(Include enough details so that the issue can be reproduced independently.)

Leopard Issue:

Make sure you have read the documentation, and have put forth a reasonable effort to find an existing answer.

Expected behaviour

read files from model "assets/models/leopard_params_it.pv"

Actual behaviour

1- i put leopard_params_it.pv on assets/models
2- i put
flutter:
assets:
- assets/models/leopard_params_it.pv

Steps to reproduce the behaviour

on my code i put:
final String modelPath = "assets/models/leopard_params_it.pv";
_leopard = await Leopard.create(accessKey, modelPath,
enableAutomaticPunctuation: true);

error: l "failed to extract 'assets/models/leopard_params_it.pv'"

(Include enough details so that the issue can be reproduced independently.)

pvLeopard not exported by package leopard

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I am using Golang 1.19 and am trying to define a Leopard object. In older versions of Leopard, I could define it like

var leopardInstance leopard.Leopard

But now, the name has changed to pvLeopard (starting with a lowercase character) rather than Leopard (starting with a capital character), so it gives the error shown in the title.

Describe the solution you'd like

I would like pvLeopard to start with a capital letter so it is exported, like leopard.PvLeopard or leopard.Leopard rather than leopard.pvLeopard.

Additional context
The intended use is like this:

leopardInstance := leopard.NewLeopard("key")
(do stuff with leopardInstance)

But my use-case involves having multiple Leopard processes ready to be used at any time, and there isn't enough time to initiate a process each time a request comes in. This is what my code was like before this change

var leopardSTTArray []leopard.Leopard
for i := 0; i < picovoiceInstances; i++ {
fmt.Println("Initializing Picovoice Instance " + strconv.Itoa(i))
leopardSTTArray = append(leopardSTTArray, leopard.NewLeopard(picovoiceKey))
leopardSTTArray[i].Init()
}

Leopard Issue: Unable to use Android demo because of "initalization failed" error message

Make sure you have read the documentation, and have put forth a reasonable effort to find an existing answer.

Expected behaviour

To be able to use the Android demo with my AccessKey from the console.

Actual behaviour

I have used the same AccessKey in the python demo and it works, but when I use that AccessKey in the Android Demo I get:
Initialisation Failed.
Ensure your AccessKey 'removed for security reasons' is valid.

i have checked the app has access to the internet and it says its connected by adding the following code to your demo:
ConnectivityManager cm = (ConnectivityManager)getApplicationContext().getSystemService(Context.CONNECTIVITY_SERVICE);
NetworkInfo nInfo = cm.getActiveNetworkInfo();
boolean connected = nInfo != null && nInfo.isAvailable() && nInfo.isConnected();
if (connected) Log.d("ben", "connected = yes");
else Log.d("ben", "connected = no");

Steps to reproduce the behaviour

Android 11 and Android 13 phones
Use my access key with the Android demo app. I have submitted this error under a personal account, but if you search my name in your console you'll see I have an account with my work email that I didn't want to share online.

(Include enough details so that the issue can be reproduced independently.)

Leopard Issue: Python library not working - GLIBC_2.29 not found

Expected behaviour

pvleopard.create(...) should work similiar to the other constructors e.g. pvrhino.create(...) and pvporcupine.create(...).

Actual behaviour

The constructors for pvrhino.create(...) and pvporcupine.create(...), which I have been using for several months now, are working as expected when used in Python 3.7.3 on Raspian (Buster).

The constructor pvleopard.create(...) fails:

self._leopard = pvleopard.create(access_key=ACCESS_KEY, model_path=LEOPARD_MODEL_PATH)

File "/home/patrick/.local/lib/python3.7/site-packages/pvleopard/_factory.py", line 44, in create
enable_automatic_punctuation=enable_automatic_punctuation)
File "/home/patrick/.local/lib/python3.7/site-packages/pvleopard/_leopard.py", line 148, in init
library = cdll.LoadLibrary(library_path)
File "/usr/lib/python3.7/ctypes/init.py", line 434, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python3.7/ctypes/init.py", line 356, in init
self._handle = _dlopen(self._name, mode)
OSError: /lib/arm-linux-gnueabihf/libm.so.6: version `GLIBC_2.29' not found (required by /home/patrick/.local/lib/python3.7/site-packages/pvleopard/lib/raspberry-pi/cortex-a72/libpv_leopard.so)

When I try to manually upgrade GLIBC, I get:

sudo apt-get install libc6
Reading package lists... Done
Building dependency tree
Reading state information... Done
libc6 is already the newest version (2.28-10+rpt2+rpi1+deb10u2).
The following package was automatically installed and is no longer required:
libva-wayland2
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Is it really the case that Leopard - unlike the other libraries - needs GLIBC 2.29?

Getting Timestamps in Output

Hello Picovoice team!

I was wondering if your tool provides a way to extract word-level timestamps in the transcript, or a way to output the occurrences of words in audio files (maybe returned as JSON)?

Thanks!

Are there any tools to recognize only digits in popular languages?

Hello. Are there any tools, maybe by picovoice or other, that can recognize digits in many languages at once?
I want to be able to quickly transform all digits heard into, well, string of numbers. In many popular languages like english, italian, spanish, russian, german...
Thank you.

Defining new words

Can Leopard or Cheetah be enhanced with new words, e.g. by giving a mapping from phonemes to words?

LeopardIOError not otherwise specified

On Ubuntu 20.04, with version 1.1.2, installed via pip3, when calling the process_file method whether passing absolute or relative paths I get the following error:

“LeopardIOError:”

I cannot figure this out because when this error is raised in the code it appears to always have additional verbiage (“cannot find the [audio/model/etc]”). My error message does not further specify what the IO error is. The sample rate of the file is 48000, but this is greater than the sample rate param (which is 16000), and this should be supported by the process_file method based on the comments

Flutter plugin

do you guys offer flutter plugin for this to work next to picovoice manager?

help with using leopard with pyaudio

So I want to use pyaudio to record from mic detect the sentence and print the text. this is happening in loop but I can't get it to work in other way other than leopard demo package. help me with this

Add flutter macos plugin

Hello,

I can see there is Flutter Android/iOS compatible plugin. We are building a MacOS commercial product in Flutter. Would be feasible for you to add MacOS support to your Flutter library, please?

Thank you,
Jakub

Leopard crashes when hearing the word "misses"

Make sure you have read the documentation, and have put forth a reasonable effort to find an existing answer.

Expected behaviour

Leopard can understand the word "misses"

Actual behaviour

Crashes on the word "misses"

Steps to reproduce the behaviour

This is super weird. I don't know if it's specific to my voice or what, but it's crashed about 5 times whenever I say "misses." It'll be happily running along and then I'll say "misses" which will often pick the wrong word (usually "Mrs." which would make sense, but if I repeat it then it often appears to fault all the way out of the program.

I'm running a lightly modified version of the Leopard Mic demo for dotnet. I have a little bit of silence detection and then a little bit of logic to detect the end of a sentence (probably not the best thing to do here, but I was experimenting):

                Task recordingTask = Task.Run(() =>
                {
                    audioFrame.Clear();
                    recorder.Start();
                    var keepListening = 0;
                    while (!token.IsCancellationRequested)
                    {
                        short[] pcm = recorder.Read();
                        var loudest = pcm
                            .Max(p => Math.Abs(p));

                        if (loudest < 1000 || keepListening > 0)
                        {
                            if (keepListening > 0)
                            {
                                keepListening--;
                            }
                            else if (audioFrame.Count > 0)
                            {
                                var result = leopard.Process(audioFrame.ToArray());
                                audioFrame.Clear();
                                Console.WriteLine(result.TranscriptString);
                                continue;
                            }
                            else
                            {
                                continue;
                            }
                        }
                        else
                        {
                            keepListening = 15;
                        }
                        audioFrame.AddRange(pcm);
                    }
                    recorder.Stop();
                });

image

I'm not particularly worried about it, but thought someone might want to have a second look at it.

(Include enough details so that the issue can be reproduced independently.)

Way to Track Progress of Processing?

Is there any way to track the progress of the process function of Leopard? A way to find the percentage complete would be amazing. Maybe with a callback to the Leopard object? I'm using Python.

Thanks very much.

Error in C# binding - Attempted to Read or Write Protected Memory

Make sure you have read the documentation, and have put forth a reasonable effort to find an existing answer.

Expected behaviour

Processing an audio file should produce a transcript on each and every call.

var transcript = leopardTranscriber.ProcessFile(audioFilePath);

Actual behaviour

The transcription requests are being handled by a server and work very well when the requests come from a single client. However, when I send requests simultaneously from two clients:

  1. The requests get serviced properly for a while
  2. Then after some unpredictable time the server crashes with the error:
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:
--------------------------------
   at Pv.Leopard.pv_leopard_process_file(IntPtr, IntPtr, IntPtr ByRef, Int32 ByRef, IntPtr ByRef)
--------------------------------
   at Pv.Leopard.ProcessFile(System.String)
   at AudioTranscriber.Services.TranscriptionService.GetTranscript(AudioTranscriber.AudioRequest, Grpc.Core.ServerCallContext)

After some research, I came across this plausible explanation for the behavior: https://stackoverflow.com/a/42382470/212076

  • The problem may be due to mixed build platforms DLLs in the project. i.e You build your project to Any CPU but have some DLLs in the project already built for x86 platform. These will cause random crashes because of different memory mapping of 32bit and 64bit architecture. If all the DLLs are built for one platform the problem can be solved.

I examined the DLLs that ship with the Leopard Nuget package:

  • Leopard.dll: 32-bit
  • libpv_leopard.dll: 64-bit

Since my project targets 64-bit architecture, then the issue must be triggered by the involvement of the 32-bit Leopard.dll. Under normal circumstances, it plays nicely with it's 64-bit counterparts. However, under conditions of increased load from multiple simultaneous requests, it causes memory access issues arising from the different memory mapping.

Is it possible to have a 64-bit version of Leopard.dll available for testing to verify this assertion?

Steps to reproduce the behaviour

  1. Create a C# gRPC Server that

    • Initializes the Leopard library
      • var leopardTranscriber = Leopard.Create(accessKey);
    • Waits for requests from clients
    • Handles each client request and returns the transcript
      • var transcript = leopardTranscriber.ProcessFile(audioFilePath);
      • return Task.FromResult(transcript);
  2. Create a C# gRPC Client that

    • Initializes the gRPC connection

      • var port = 3050;
      • var serverUrl = $"http://localhost:{port}";
      • var channel = GrpcChannel.ForAddress(serverUrl);
      • var client = new AudioTranscriber.AudioTranscriberClient(channel);
      • var audioRequest = new AudioRequest { AudioFilePath = audioFilePath };
    • Sends a request to the server for a transcript

      • var transcript = await client.GetTranscript(audioRequest);
  3. Start two instances of the Client and have them repeatedly send requests to the Server. Exactly as you would to stress test the Server.

After a while of getting proper transcripts you get the AccessViolationException and the server crashes.

Leopard Implementaiton Issue- React Native and Expo Go:

Have you checked the docs and existing issues?

  • I have read all of the relevant Picovoice Leopard docs
  • I have searched the existing issues for Leopard

SDK

React Native

Leopard package version

2.0.2

Framework version

React Native: 0.72.6 Expo:^49.0.13

Platform

Windows (x86_64)

OS/Browser version

Windows 11

Describe the bug

I am attempting to implement the Leopard STT api into my project that is built using React Native and Expo Go. When calling const leopard = await Leopard.create(accessKey, modelPath);, I get an error stating [Error: unexpected code: undefined, message: Cannot read property 'create' of null], despite me following all of the installation steps correctly and I am verified that my Leopard import is indeed functional (console.log(Leopard) prints [Function Leopard] prior to my create call). Are there compatibility issues with Leopard and Expo? I have also ensured that the accessKey and model path are being correctly passed.

Steps To Reproduce

  1. Import Leopard
  2. Use useState to set and use the Leopard instance
  3. Define an asynchronous function to get an instance of Leopard
  4. Use leopardInstance in stopRecording() function when recording is done and audio needs to be transcribed
    The txt file is a simple app.js file for a basic react native & expo set up that is throwing this error
    leopard_error.txt

Expected Behavior

I would expect the audio file being passed (which in my case is just a tester .wav file) to be transcribed, and I would expected for the leopardInstance to be set correctly as everything is being done as per the docs.

Automatic punctuation in "words" list

Is your feature request related to a problem? Please describe.
I am trying to make subtitles to videos with leopard's function "leopard.process_file()" and I want to do it with punctuation in it. But enable_automatic_punctuation=True works only on "transcript" string but not on list of words and because of that there's no punctuation in subtitles. I am using Python.

Describe the solution you'd like
If enable_automatic_punctuation is True, then punctuation is on not only in transcript, but in words as well.

Describe alternatives you've considered
I would like to see one more flag like "enable_punctuation_in_words" that enables to add punctuation the same way as in transcript string.

Additional context
image
image
As you can see on that images, there's the end of the sentence in transcript (and the dot as well there), but there's no dot in words.
And also i have a question. what punctuation leopard can mark in text? I didnt find this in documentation, sorry.
Thank you very much!

Leopard Documentation Issue: Need for specific GlibC version on Raspberry

What is the URL of the doc?

https://picovoice.ai/docs/quick-start/leopard-python/

What's the nature of the issue? (e.g. steps do not work, typos/grammar/spelling, etc., out of date)

I tried to use leopard on my Raspberry 4 with Raspian Buster. Unfortunately, I get the following error that I don't have the correct glibc version /lib/arm-linux-gnueabihf/libm.so.6: version 'GLIBC_2.29' not found. In the documentation, I can't find any information about the need of a specific version. From what I read it's discouraged to only upgrade glibc. So does one need the latest Raspberry OS for leopard to work?

Cheetah and porcupine work flawlessly. Really great software. Thanks so much for developing it and making it available for personal users :)

Publish IOS React Native Demo for I Phone 14.

After running npm run ios-run en, below issue is happening. In my updated operating system I don't have I phone 13. Please update this so that I can use the demo for ios app also.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.