Giter VIP home page Giter VIP logo

picovoice's Introduction

Picovoice

Made in Vancouver, Canada by Picovoice

Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate. Using Picovoice, one can infer a user’s intent from a naturally spoken utterance such as:

Hey Edison, set the lights in the living room to blue.

Picovoice detects the occurrence of the custom wake word (Hey Edison), and then extracts the intent from the follow-on spoken command:

{
  "intent": "changeColor",
  "slots": {
    "location": "living room",
    "color": "blue"
  }
}

Why Picovoice

  • Private & Secure: Everything is processed offline. Intrinsically private; HIPAA and GDPR compliant.
  • Accurate: Resilient to noise and reverberation. Outperforms cloud-based alternatives by wide margins.
  • Cross-Platform: Design once, deploy anywhere. Build using familiar languages and frameworks. Raspberry Pi, BeagleBone, Android, iOS, Linux (x86_64), macOS (x86_64), Windows (x86_64), and modern web browsers are supported. Enterprise customers can access the ARM Cortex-M SDK.
  • Self-Service: Design, train, and test voice interfaces instantly in your browser, using Picovoice Console.
  • Reliable: Runs locally without needing continuous connectivity.
  • Zero Latency: Edge-first architecture eliminates unpredictable network delay.

Build with Picovoice

  1. Evaluate: The Picovoice SDK is a cross-platform library for adding voice to anything. It includes some pre-trained speech models. The SDK is licensed under Apache 2.0 and available on GitHub to encourage independent benchmarking and integration testing. You are empowered to make a data-driven decision.

  2. Design: Picovoice Console is a cloud-based platform for designing voice interfaces and training speech models, all within your web browser. No machine learning skills are required. Simply describe what you need with text and export trained models.

  3. Develop: Exported models can run on Picovoice SDK without requiring constant connectivity. The SDK runs on a wide range of platforms and supports a large number of frameworks. The Picovoice Console and Picovoice SDK enable you to design, build and iterate fast.

  4. Deploy: Deploy at scale without having to maintain complex cloud infrastructure. Avoid unbounded cloud fees, limitations, and control imposed by big tech.

Picovoice in Action

Platform Features

Custom Wake Words

Picovoice makes use of the Porcupine wake word engine to detect utterances of given wake phrases. You can train custom wake words using Picovoice Console and then run the exported wake word model on the Picovoice SDK.

Intent Inference

Picovoice relies on the Rhino Speech-to-Intent engine to directly infer user's intent from spoken commands within a given domain of interest (a "context"). You can design and train custom contexts for your product using Picovoice Console. The exported Rhino models then can run with the Picovoice SDK on any supported platform.

License & Terms

The Picovoice SDK is free and licensed under Apache 2.0 including the models released within. Picovoice Console offers two types of subscriptions: Personal and Enterprise. Personal accounts can train custom speech models that run on the Picovoice SDK, subject to limitations and strictly for non-commercial purposes. Personal accounts empower researchers, hobbyists, and tinkerers to experiment. Enterprise accounts can unlock all capabilities of Picovoice Console, are permitted for use in commercial settings, and have a path to graduate to commercial distribution*.

Table of Contents

Language Support

  • English, German, French, and Spanish.
  • Support for additional languages is available for commercial customers on a case-by-case basis.

Performance

Picovoice makes use of the Porcupine wake word engine to detect utterances of given wake phrases. An open-source benchmark of Porcupine is available here. In summary, compared to the best-performing alternative, Porcupine's standard model is 5.4 times more accurate.

Picovoice relies on the Rhino Speech-to-Intent engine to directly infer user's intent from spoken commands within a given domain of interest (a "context"). An open-source benchmark of Rhino is available here. Rhino outperforms all major cloud-based alternatives with wide margins.

Picovoice Console

Picovoice Console is a web-based platform for designing, testing, and training voice user interfaces. Using Picovoice Console you can train custom wake word, and domain-specific NLU (Speech-to-Intent) models.

Demos

If using SSH, clone the repository with:

git clone --recurse-submodules [email protected]:Picovoice/picovoice.git

If using HTTPS, clone the repository with:

git clone --recurse-submodules https://github.com/Picovoice/picovoice.git

Python Demos

Install PyAudio and then the demo package:

sudo pip3 install picovoicedemo

From the root of the repository run the following in the terminal:

picovoice_demo_mic \
--keyword_path resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn \
--context_path resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn

Replace ${PLATFORM} with the platform you are running the demo on (e.g. raspberry-pi, beaglebone, linux, mac, or windows). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints [Listening ...] to the console. Then say:

Porcupine, set the lights in the kitchen to purple.

Upon success, the demo prints the following into the terminal:

[wake word]

{
  intent : 'changeColor'
  slots : {
    location : 'kitchen'
    color : 'purple'
  }
}

For more information regarding Python demos refer to their documentation.

NodeJS Demos

Make sure there is a working microphone connected to your device. Refer to documentation within node-record-lpm16 to set up your microphone for access from NodeJS. Then install the demo package:

npm install -g @picovoice/picovoice-node-demo

From the root of the repository run:

pv-mic-demo \
-k resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn \
-c resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn

Replace ${PLATFORM} with the platform you are running the demo on (e.g. raspberry-pi, linux, or mac). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints Listening for wake word 'porcupine' ... to the console. Then say:

Porcupine, turn on the lights.

Upon success, the demo prints the following into the terminal:

Inference:
{
    "isUnderstood": true,
    "intent": "changeLightState",
    "slots": {
        "state": "on"
    }
}

Please see the demo instructions for details.

.NET Demos

Install OpenAL. From the root of the repository run the following in the terminal:

dotnet run -p demo/dotnet/PicovoiceDemo/PicovoiceDemo.csproj -c MicDemo.Release -- \
--keyword_path resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn \
--context_path resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn

Replace ${PLATFORM} with the platform you are running the demo on (e.g. linux, mac, or windows). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints Listening... to the console. Then say:

Porcupine, set the lights in the kitchen to orange.

Upon success the following it printed into the terminal:

[wake word]
{
  intent : 'changeColor'
  slots : {
    location : 'kitchen'
    color : 'orange'
  }
}

For more information about .NET demos go to demo/dotnet.

Java Demos

Make sure there is a working microphone connected to your device. Then, from the root of the repository run the following in a terminal:

java -jar demo/java/bin/picovoice-mic-demo.jar \
-k resources/porcupine/resources/keyword_files/${PLATFORM}/porcupine_${PLATFORM}.ppn \
-c resources/rhino/resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn

Replace ${PLATFORM} with the platform you are running the demo on (e.g. linux, mac, or windows). The microphone demo opens an audio stream from the microphone, detects utterances of a given wake phrase, and infers intent from the follow-on spoken command. Once the demo initializes, it prints Listening ... to the console. Then say:

Porcupine, set the lights in the kitchen to orange.

Upon success the following it printed into the terminal:

[wake word]
{
  intent : 'changeColor'
  slots : {
    location : 'kitchen'
    color : 'orange'
  }
}

For more information about the Java demos go to demo/java.

Unity Demos

To run the Picovoice Unity demo, import the Picovoice Unity package into your project, open the PicovoiceDemo scene and hit play. To run on other platforms or in the player, go to File > Build Settings, choose your platform and hit the Build and Run button.

To browse the demo source go to demo/unity.

Flutter Demos

To run the Picovoice demo on Android or iOS with Flutter, you must have the Flutter SDK installed on your system. Once installed, you can run flutter doctor to determine any other missing requirements for your relevant platform. Once your environment has been set up, launch a simulator or connect an Android/iOS device.

Before launching the app, use the copy_assets.sh script to copy the Picovoice demo assets into the demo project. (NOTE: on Windows, Git Bash or another bash shell is required, or you will have to manually copy the context into the project.).

Run the following command from demo/flutter to build and deploy the demo to your device:

flutter run

Once the application has been deployed, press the start button and say:

Picovoice, turn of the lights in the kitchen.

For the full set of supported commands refer to demo's readme.

React Native Demos

To run the React Native Picovoice demo app you'll first need to install yarn and setup your React Native environment. For this, please refer to React Native's documentation. Once your environment has been set up, you can run the following commands:

Running On Android

cd demo/react-native
yarn android-install    # sets up environment
yarn android-run        # builds and deploys to Android

Running On iOS

cd demo/react-native
yarn ios-install        # sets up environment
yarn ios-run            # builds and deploys to iOS

Once the application has been deployed, press the start button and say

Porcupine, turn of the lights in the kitchen.

For the full set of supported commands refer to demo's readme.

Android Demos

Using Android Studio, open demo/android/Activity as an Android project and then run the application. Press the start button and say

Porcupine, turn of the lights in the kitchen.

For the full set of supported commands refer to demo's readme.

iOS Demos

Using Xcode, open demo/ios/PicovoiceDemo/PicovoiceDemo.xcodeproj and run the application. Press the start button and say

Porcupine, shut of the lights in the living room.

For the full set of supported commands refer to demo's readme.

JavaScript Demos

There is a "Vanilla" JavaScript demo and React demo available, both of which run offline in the browser.

Microcontroller Demos

There are several projects for various development boards inside the mcu demo folder.

SDKs

Python

Install the package:

pip3 install picovoice

Create a new instance of Picovoice:

from picovoice import Picovoice

keyword_path = ...

def wake_word_callback():
    pass

context_path = ...

def inference_callback(inference):
    print(inference.is_understood)
    print(inference.intent)
    print(inference.slots)

handle = Picovoice(
        keyword_path=keyword_path,
        wake_word_callback=wake_word_callback,
        context_path=context_path,
        inference_callback=inference_callback)

handle is an instance of the Picovoice runtime engine. It detects utterances of wake phrase defined in the file located at keyword_path. Upon detection of wake word it starts inferring user's intent from the follow-on voice command within the context defined by the file located at context_path. keyword_path is the absolute path to the Porcupine wake word engine keyword file (with .ppn extension). context_path is the absolute path to the Rhino Speech-to-Intent engine context file (with .rhn extension). wake_word_callback is invoked upon the detection of wake phrase and inference_callback is invoked upon completion of follow-on voice command inference.

When instantiated, the required rate can be obtained via handle.sample_rate. Expected number of audio samples per frame is handle.frame_length. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio. The set of supported commands can be retrieved (in YAML format) via handle.context_info.

def get_next_audio_frame():
    pass

while True:
    handle.process(get_next_audio_frame())

When done, resources have to be released explicitly handle.delete().

NodeJS

The Picovoice SDK for NodeJS is available from NPM:

yarn add @picovoice/picovoice-node

(or)

npm install @picovoice/picovoice-node

The SDK provides the Picovoice class. Create an instance of this class using a Porcupine keyword (with .ppn extension) and Rhino context file (with .rhn extension), as well as callback functions that will be invoked on wake word detection and command inference completion events, respectively:

const Picovoice = require("@picovoice/picovoice-node");

let keywordCallback = function (keyword) {
  console.log(`Wake word detected`);
};

let inferenceCallback = function (inference) {
  console.log("Inference:");
  console.log(JSON.stringify(inference, null, 4));
};

let handle = new Picovoice(
  keywordArgument,
  keywordCallback,
  contextPath,
  inferenceCallback
);

The keywordArgument can either be a path to a Porcupine keyword file (.ppn), or one of the built-in keywords (integer enums). The contextPath is the path to the Rhino context file (.rhn).

Upon constructing the Picovoice class, send it frames of audio via its process method. Internally, Picovoice will switch between wake word detection and inference. The Picovoice class includes frameLength and sampleRate properties for the format of audio required.

// process audio frames that match the Picovoice requirements (16-bit linear pcm audio, single-channel)
while (true) {
  handle.process(frame);
}

As the audio is processed through the Picovoice engines, the callbacks will fire.

.NET

You can install the latest version of Picovoice by adding the latest Picovoice NuGet package in Visual Studio or using the .NET CLI.

dotnet add package Picovoice

To create an instance of Picovoice, do the following:

using Pv;

string keywordPath = "/absolute/path/to/keyword.ppn";

void wakeWordCallback() => {..}

string contextPath = "/absolute/path/to/context.rhn";

void inferenceCallback(Inference inference)
{
    // `inference` exposes three immutable properties:
    // (1) `IsUnderstood`
    // (2) `Intent`
    // (3) `Slots`
    // ..
}

Picovoice handle = new Picovoice(keywordPath,
                                 wakeWordCallback,
                                 contextPath,
                                 inferenceCallback);

handle is an instance of Picovoice runtime engine that detects utterances of wake phrase defined in the file located at keywordPath. Upon detection of wake word it starts inferring user's intent from the follow-on voice command within the context defined by the file located at contextPath. keywordPath is the absolute path to Porcupine wake word engine keyword file (with .ppn extension). contextPath is the absolute path to Rhino Speech-to-Intent engine context file (with .rhn extension). wakeWordCallback is invoked upon the detection of wake phrase and inferenceCallback is invoked upon completion of follow-on voice command inference.

When instantiated, the required sample rate can be obtained via handle.SampleRate. The expected number of audio samples per frame is handle.FrameLength. The Picovoice engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio.

short[] GetNextAudioFrame()
{
    // .. get audioFrame
    return audioFrame;
}

while(true)
{
    handle.Process(GetNextAudioFrame());
}

Porcupine will have its resources freed by the garbage collector, but to have resources freed immediately after use, wrap it in a using statement:

using(Picovoice handle = new Picovoice(keywordPath, wakeWordCallback, contextPath, inferenceCallback))
{
    // .. Picovoice usage here
}

Java

You can add the Picovoice Java SDK by downloading and referencing the latest Picovoice JAR available here.

The easiest way to create an instance of the engine is with the Picovoice Builder:

import ai.picovoice.picovoice.*;

String keywordPath = "/absolute/path/to/keyword.ppn"

PicovoiceWakeWordCallback wakeWordCallback = () -> {..};

String contextPath = "/absolute/path/to/context.rhn"

PicovoiceInferenceCallback inferenceCallback = inference -> {
    // `inference` exposes three getters:
    // (1) `getIsUnderstood()`
    // (2) `getIntent()`
    // (3) `getSlots()`
    // ..
};

try{
    Picovoice handle = new Picovoice.Builder()
                    .setKeywordPath(keywordPath)
                    .setWakeWordCallback(wakeWordCallback)
                    .setContextPath(contextPath)
                    .setInferenceCallback(inferenceCallback)
                    .build();
} catch (PicovoiceException e) { }

handle is an instance of the Picovoice runtime engine that detects utterances of wake phrase defined in the file located at keywordPath. Upon detection of wake word it starts inferring the user's intent from the follow-on voice command within the context defined by the file located at contextPath. keywordPath is the absolute path to Porcupine wake word engine keyword file (with .ppn extension). contextPath is the absolute path to Rhino Speech-to-Intent engine context file (with .rhn extension). wakeWordCallback is invoked upon the detection of wake phrase and inferenceCallback is invoked upon completion of follow-on voice command inference.

When instantiated, the required sample rate can be obtained via handle.getSampleRate(). The expected number of audio samples per frame is handle.getFrameLength(). The Picovoice engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio.

short[] getNextAudioFrame()
{
    // .. get audioFrame
    return audioFrame;
}

while(true)
{
    handle.process(getNextAudioFrame());
}

Once you're done with Picovoice, ensure you release its resources explicitly:

handle.delete();

Unity

Import the Picovoice Unity Package into your Unity project.

The SDK provides two APIs:

High-Level API

PicovoiceManager provides a high-level API that takes care of audio recording. This is the quickest way to get started.

The constructor PicovoiceManager.Create will create an instance of the PicovoiceManager using the Porcupine keyword and Rhino context files that you pass to it.

using Pv.Unity;

try 
{    
    PicovoiceManager _picovoiceManager = PicovoiceManager.Create(
                                    "/path/to/keyword/file.ppn",
                                    (keywordIndex) => {},
                                    "/path/to/context/file.rhn",
                                    (inference) => {};
}
catch (Exception ex)
{
    // handle picovoice init error
}

Once you have instantiated a PicovoiceManager, you can start audio capture and processing by calling:

_picovoiceManager.Start();
// .. use picovoice
_picovoiceManager.Stop();

Once the app is done with using an instance of PicovoiceManager, you can explicitly release the audio resources and the resources allocated to Picovoice:

_picovoiceManager.Delete();

PicovoiceManager uses our unity-voice-processor Unity package to capture frames of audio and automatically pass it to the Picovoice platform.

Low-Level API

Picovoice provides low-level access to the Picovoice platform for those who want to incorporate it into a already existing audio processing pipeline.

Picovoice is created by passing a Porcupine keyword file and Rhino context file to the Create static constructor.

using Pv.Unity;

try
{    
    Picovoice _picovoice = Picovoice.Create(
                                "path/to/keyword/file.ppn",
                                OnWakeWordDetected,
                                "path/to/context/file.rhn",
                                OnInferenceResult);
} 
catch (Exception ex) 
{
    // handle Picovoice init error
}

To use Picovoice, you must pass frames of audio to the Process function. The callbacks will automatically trigger when the wake word is detected and then when the follow-on command is detected.

short[] GetNextAudioFrame()
{
    // .. get audioFrame
    return audioFrame;
}

short[] buffer = GetNextAudioFrame();
try 
{
    _picovoice.Process(buffer);
}
catch (Exception ex)
{
    Debug.LogError(ex.ToString());
}  

For Process to work correctly, the provided audio must be single-channel and 16-bit linearly-encoded.

Picovoice implements the IDisposable interface, so you can use Picovoice in a using block. If you don't use a using block, resources will be released by the garbage collector automatically or you can explicitly release the resources like so:

_picovoice.Dispose();

Flutter

Add the Picovoice Flutter package to your pub.yaml.

dependencies:  
  picovoice: ^<version>

The SDK provides two APIs:

High-Level API

PicovoiceManager provides a high-level API that takes care of audio recording. This class is the quickest way to get started.

The static constructor PicovoiceManager.create will create an instance of a PicovoiceManager using a Porcupine keyword file and Rhino context file that you pass to it.

import 'package:picovoice/picovoice_manager.dart';
import 'package:picovoice/picovoice_error.dart';

void createPicovoiceManager() async {
    try{
        _picovoiceManager = await PicovoiceManager.create(
            "/path/to/keyword/file.ppn",
            _wakeWordCallback,
            "/path/to/context/file.rhn",
            _inferenceCallback);
    } on PvError catch (err) {
        // handle picovoice init error
    }
}

The wakeWordCallback and inferenceCallback parameters are functions that you want to execute when a wake word is detected and when an inference is made.

Once you have instantiated a PicovoiceManager, you can start/stop audio capture and processing by calling:

await _picovoiceManager.start();
// .. use for detecting wake words and commands
await _picovoiceManager.stop();

Once the app is done with using an instance of PicovoiceManager, be sure you explicitly release the resources allocated to Picovoice:

await _picovoiceManager.delete();

Our flutter_voice_processor Flutter plugin handles audio capture and passes frames to Picovoice for you.

Low-Level API

Picovoice provides low-level access to the Picovoice platform for those who want to incorporate it into a already existing audio processing pipeline.

Picovoice is created by passing a a Porcupine keyword file and Rhino context file to the create static constructor. Sensitivity and model files are optional.

import 'package:picovoice/picovoice_manager.dart';
import 'package:picovoice/picovoice_error.dart';

void createPicovoice() async {
    double porcupineSensitivity = 0.7;
    double rhinoSensitivity = 0.6;
    try{
        _picovoice = await Picovoice.create(
            "/path/to/keyword/file.ppn",
            wakeWordCallback,
            "/path/to/context/file.rhn",
            inferenceCallback,
            porcupineSensitivity,
            rhinoSensitivity,
            "/path/to/porcupine/model.pv",
            "/path/to/rhino/model.pv");
    } on PvError catch (err) {
        // handle picovoice init error
    }
}

To use Picovoice, just pass frames of audio to the process function. The callbacks will automatically trigger when the wake word is detected and then when the follow-on command is detected.

List<int> buffer = getAudioFrame();

try {
    _picovoice.process(buffer);
} on PvError catch (error) {
    // handle error
}

// once you are done using Picovoice
_picovoice.delete();

React Native

First add our React Native modules to your project via yarn or npm:

yarn add @picovoice/react-native-voice-processor
yarn add @picovoice/porcupine-react-native
yarn add @picovoice/rhino-react-native
yarn add @picovoice/picovoice-react-native

The @picovoice/picovoice-react-native package exposes a high-level and a low-level API for integrating Picovoice into your application.

High-Level API

PicovoiceManager provides a high-level API that takes care of audio recording. This class is the quickest way to get started.

The static constructor PicovoiceManager.create will create an instance of a PicovoiceManager using a Porcupine keyword file and Rhino context file that you pass to it.

async createPicovoiceManager(){
    try{
        this._picovoiceManager = await PicovoiceManager.create(
            '/path/to/keyword/file.ppn',
            wakeWordCallback,
            '/path/to/context/file.rhn',
            inferenceCallback);
    } catch (err) {
        // handle error
    }
}

The wakeWordCallback and inferenceCallback parameters are functions that you want to execute when a wake word is detected and when an inference is made.

Once you have instantiated a PicovoiceManager, you can start/stop audio capture and processing by calling:

let didStart = await this._picovoiceManager.start();
// .. use for detecting wake words and commands
let didStop = await this._picovoiceManager.stop();

Once the app is done with using PicovoiceManager, be sure you explicitly release the resources allocated for it:

this._picovoiceManager.delete();

@picovoice/react-native-voice-processor module handles audio capture and passes frames to Picovoice for you.

Low-Level API

Picovoice provides low-level access to the Picovoice platform for those who want to incorporate it into a already existing audio processing pipeline.

Picovoice is created by passing a a Porcupine keyword file and Rhino context file to the create static constructor. Sensitivity and model files are optional.

async createPicovoice(){
    let porcupineSensitivity = 0.7
    let rhinoSensitivity = 0.6

    try{
        this._picovoice = await Picovoice.create(
            '/path/to/keyword/file.ppn',
            wakeWordCallback,
            '/path/to/context/file.rhn',
            inferenceCallback,
            porcupineSensitivity,
            rhinoSensitivity,
            "/path/to/porcupine/model.pv",
            "/path/to/rhino/model.pv")
    } catch (err) {
        // handle error
    }
}

To use Picovoice, just pass frames of audio to the process function. The callbacks will automatically trigger when the wake word is detected and then when the follow-on command is detected.

let buffer = getAudioFrame();

try {
    await this._picovoice.process(buffer);
} catch (e) {
    // handle error
}

// once you are done
this._picovoice.delete();

Android

There are two possibilities for integrating Picovoice into an Android application.

High-Level API

PicovoiceManager provides a high-level API for integrating Picovoice into Android applications. It manages all activities related to creating an input audio stream, feeding it into Picovoice engine, and invoking user-defined callbacks upon wake word detection and inference completion. The class can be initialized as follows:

import ai.picovoice.picovoice.PicovoiceManager;

final String porcupineModelPath = ...
final String keywordPath = ...
final float porcupineSensitivity = 0.5f;
final String rhinoModelPath = ...
final String contextPath = ...
final float rhinoSensitivity = 0.5f;

PicovoiceManager manager = new PicovoiceManager(
    porcupineModelPath,
    keywordPath,
    porcupineSensitivity,
    new PicovoiceWakeWordCallback() {
        @Override
        public void invoke() {
            // logic to execute upon deletection of wake word
        }
    },
    rhinoModelPath,
    contextPath,
    rhinoSensitivity,
    new PicovoiceInferenceCallback() {
        @Override
        public void invoke(final RhinoInference inference) {
            // logic to execute upon completion of intent inference
        }
    }
);

Sensitivity is the parameter that enables developers to trade miss rate for false alarm. It is a floating point number within [0, 1]. A higher sensitivity reduces miss rate at cost of increased false alarm rate.

When initialized, input audio can be processed using:

manager.start();

Stop the manager with:

manager.stop();

When done be sure to release resources:

manager.delete();

Low-Level API

Picovoice.java provides a low-level binding for Android. It can be initialized as follows:

import ai.picovoice.picovoice.Picovoice;

final String porcupineModelPath = ...
final String keywordPath = ...
final float porcupineSensitivity = 0.5f;
final String rhinoModelPath = ...
final String contextPath = ...
final float rhinoSensitivity = 0.5f;

Picovoice picovoice = new Picovoice(
    porcupineModelPath,
    keywordPath,
    porcupineSensitivity,
    new PicovoiceWakeWordCallback() {
        @Override
        public void invoke() {
            // logic to execute upon deletection of wake word
        }
    },
    rhinoModelPath,
    contextPath,
    rhinoSensitivity,
    new PicovoiceInferenceCallback() {
        @Override
        public void invoke(final RhinoInference inference) {
            // logic to execute upon completion of intent inference
        }
    }
);

Sensitivity is the parameter that enables developers to trade miss rate for false alarm. It is a floating point number within [0, 1]. A higher sensitivity reduces miss rate at cost of increased false alarm rate.

Once initialized, picovoice can be used to process incoming audio.

private short[] getNextAudioFrame();

while (true) {
    try {
        picovoice.process(getNextAudioFrame());
    } catch (PicovoiceException e) {
        // error handling logic
    }
}

Finally, be sure to explicitly release resources acquired as the binding class does not rely on the garbage collector for releasing native resources:

picovoice.delete();

iOS

PicovoiceManager class manages all activities related to creating an audio input stream, feeding it into Picovoice engine, and invoking user-defined callbacks upon wake word detection and completion of intent inference. The class can be initialized as below:

let porcupineModelpath: String = ...
let keywordPath: String = ...
let porcupineSensitivity: Float32 = 0.5
let rhinoModelPath: String = ...
let contextPath: String = ...
let rhinoSensitivity: Float32 = 0.5
let manager = PicovoiceManager(
    porcupineModelpath: porcupineModelpath,
    keywordPath: keywordPath,
    porcupineSensitivity: porcupineSensitivity,
    onWakeWordDetection: {
        // logic to execute upon wake word detection.
    },
    rhinoModelPath: rhinoModelPath,
    contextPath: contextPath,
    rhinoSensitivity: rhinoSensitivity,
    onInference: {
        // logic to execute upon intent inference completion.
    }
)

when initialized input audio can be processed using manager.start(). The processing can be interrupted using manager.stop().

Microcontroller

Picovoice is implemented in ANSI C and therefore can be directly linked to embedded C projects. Its public header file contains relevant information. An instance of the Picovoice object can be constructed as follows:

#define MEMORY_BUFFER_SIZE ...
static uint8_t memory_buffer[MEMORY_BUFFER_SIZE] __attribute__((aligned(16)));

static const uint8_t *keyword_array = ...
const float porcupine_sensitivity = 0.5f

static void wake_word_callback(void) {
    // logic to execute upon detection of wake word
}

static const uint8_t *context_array = ...
const float rhino_sensitivity = 0.75f

static void inference_callback(pv_inference_t *inference) {
    // `inference` exposes three immutable properties:
    // (1) `IsUnderstood`
    // (2) `Intent`
    // (3) `Slots`
    // ..
    pv_inference_delete(inference);
}

pv_picovoice_t *handle = NULL;

const pv_status_t status = pv_picovoice_init(
        MEMORY_BUFFER_SIZE,
        memory_buffer,
        sizeof(keyword_array),
        keyword_array,
        porcupine_sensitivity,
        wake_word_callback,
        sizeof(context_array),
        context_array,
        rhino_sensitivity,
        inference_callback,
        &handle);

if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Sensitivity is the parameter that enables developers to trade miss rate for false alarm. It is a floating-point number within [0, 1]. A higher sensitivity reduces miss rate (false reject rate) at cost of increased false alarm rate.

handle is an instance of Picovoice runtime engine that detects utterances of wake phrase defined in keyword_array. Upon detection of wake word it starts inferring user's intent from the follow-on voice command within the context defined in context_array. wake_word_callback is invoked upon the detection of wake phrase and inference_callback is invoked upon completion of follow-on voice command inference.

Picovoice accepts single channel, 16-bit PCM audio. The sample rate can be retrieved using pv_sample_rate(). Finally, Picovoice accepts input audio in consecutive chunks (aka frames) the length of each frame can be retrieved using pv_porcupine_frame_length().

extern const int16_t *get_next_audio_frame(void);

while (true) {
    const int16_t *pcm = get_next_audio_frame();
    const pv_status_t status = pv_picovoice_process(handle, pcm);
    if (status != PV_STATUS_SUCCESS) {
        // error handling logic
    }
}

Finally, when done be sure to release the acquired resources.

pv_picovoice_delete(handle);

Releases

v1.1.0 - December 2nd, 2020

  • Improved accuracy.
  • Runtime optimizations.
  • .NET SDK.
  • Java SDK.
  • React Native SDK.
  • C SDK.

v1.0.0 - October 22, 2020

  • Initial release.

FAQ

You can find the FAQ here.

picovoice's People

Contributors

kenarsa avatar laves avatar mrrostam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.