Giter VIP home page Giter VIP logo

voice-engine's Introduction

Voice Engine

Build Status

The library is used to create voice interface applications. It includes building blocks such as KWS (keyword spotting), DOA (Direction Of Arrival). There are also elements to measure RMS (dBFS or dB(A)).

Requirements

  • pyaudio
  • numpy
  • snowboy

Installation

Install pyaudio, numpy and snowboy, use virtualenv a virtual python environment.

sudo apt install python-pyaudio python-numpy python-virtualenv
sudo apt-get install swig python-dev libatlas-base-dev build-essential make
git clone --depth 1 https://github.com/Kitt-AI/snowboy.git
cd snowboy
virtualenv --system-site-packages env
source env/bin/activate
python setup.py build
python setup.py bdist_wheel
pip install dist/snowboy*.whl
cd ..
git clone https://github.com/voice-engine/voice-engine.git
cd voice-engine
python setup.py bdist_wheel
pip install dist/*.whl

Get started

To record audio and search keyword "snowboy", see also kws_snowboy.py

import time
from voice_engine.kws import KWS
from voice_engine.source import Source

src = Source()
kws = KWS()
src.link(kws)

def on_detected(keyword):
    print('found {}'.format(keyword))
kws.on_detected = on_detected

kws.start()
src.start()
while True:
    try:
        time.sleep(1)
    except KeyboardInterrupt:
        break
kws.stop()
src.stop()

Building blocks

The library uses gstreamer-like elements which can be linked together as an audio pipeline. One element can connect to more than one other elements.

The topology can be:

Source --> ChannelPicker --> KWS          Source --> ChannelPicker --> KWS --> Alexa
  |                          /\
  V                        /   \
 DOA                   Alexa   Google Asissitant 
  

voice-engine's People

Contributors

goldenqwerty avatar jerryyip avatar petcupaula avatar wingyiu avatar xiongyihui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

voice-engine's Issues

the demo program ns_kws_doa.py is not responding

Dear Voice Engine developers,
Does anyone know how to fix this error after running the command python ns_kws_doa.py like below?
pi@raspberrypi:~/respeaker/2mic $ python ns_kws_doa.py
['arecord', '-t', 'raw', '-f', 'S16_LE', '-c', '2', '-r', '16000', '-D', 'default', '-q']
this program had not responded to anything even though I have already had set up a default sound card (respeaker 2-mics Pi HAT).
Thank you so much for your assistance!

How to print the text after we say alexa

Team,

I need to print the text my word when I speak or call alexa.

How this function getting triggered when I call only alexa.

def on_detected(keyword):
direction = doa.get_direction()
print('detected {} at direction {}'.format(keyword, direction))
alexa.listen()
pixel_ring.wakeup(direction)

Playing audio after kws

Hey guys, any idea on how to play audio after kws? i have a full code of a personal assistant i built using microsoft tech stack, the thing is whenever i try to play any audio it gives
aplay: main:788: audio open error: No such file or directory
i tried the sample code on the main page and added playing audio from os, won't work either, any ideas?

import time
import os
from voice_engine.kws import KWS
from voice_engine.source import Source

src = Source()
kws = KWS()
src.link(kws)

def on_detected(keyword):
    print('found {}'.format(keyword))
    os.system("aplay -c3 sample.wav")

kws.on_detected = on_detected

kws.start()
src.start()
while True:
    try:
        time.sleep(1)
    except KeyboardInterrupt:
        break
kws.stop()
src.stop()

Snowboy doa precision 4 mic array

Hey,

We setup snowboy on a rasberry pi with a respeaker 4mic array. We were wondering if can increase the precision of the doa supplied by the voice-engine. By default it is setup in a way that it gives back directions with doa.direction() corresponding directly with the LEDs positions.

ImportError: No module named _webrtc_audio_processing

I got the following ImportError when I run the example of "example/ns_kws.py" on Respeaker Core V2.0 :

Traceback (most recent call last):
File "ns_kws.py", line 9, in
from voice_engine.ns import NS
File "/usr/local/lib/python2.7/dist-packages/voice_engine/ns.py", line 7, in
from webrtc_audio_processing import AP
File "/home/respeaker/.local/lib/python2.7/site-packages/webrtc_audio_processing/init.py", line 2, in
from .webrtc_audio_processing import AudioProcessingModule
File "/home/respeaker/.local/lib/python2.7/site-packages/webrtc_audio_processing/webrtc_audio_processing.py", line 17, in
_webrtc_audio_processing = swig_import_helper()
File "/home/respeaker/.local/lib/python2.7/site-packages/webrtc_audio_processing/webrtc_audio_processing.py", line 16, in swig_import_helper
return importlib.import_module('_webrtc_audio_processing')
File "/usr/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
ImportError: No module named _webrtc_audio_processing

I alread installed the voice-engine and python-webrtc-audio-processing package, the example of aec_kws.py and kws_doa.py worked fine,

Is there any idea on the above issue?
Thanks a lot!

record audio

I want to record audio for 5 seconds and save that as a file after the keyword detection. How to do that

voice to text

Team,
Is there any function, that I can use from voice_engine that I can covert voice to text.

I need search for a word

Would you change the WakeWord?

Hello

now, i am using kws_doa.py so it is work

but i want to change the other wakeword nither snowboy

I'd appreciate it if you could tell me how.

KWS 识别通道问题

在kws_doa_alexa_respeaker_v2.py和其他很多范例中,为什么都是只取了一个通道 比如ch0 作为 关键词识别的通道, 而没有用所有通道 或者 每个通道都做关键词识别呢?

另外 , 除了 snowboy 我想用别的关键词识别器(比如Porcupine) 结合respeakerd。 但是研究了respeakerd,感觉 如果我用“manual_without_kws”启动 respeakerd, 好像是 一开始就 启动了mic0 的 Beamforming?这样我就很难从各个方向监听关键词(使用 Porcupine),触发后, 获取doa, 然后src.on_set_direction(dir) 进行 Beamforming, 不知道这个能否有解决办法?

如果我用from voice_engine.source 的方案,好像没有 Beamforming,还有最上面这个问题

还有一个问题,from respeakerd_source import RespeakerdSource 的方案, alexa 开始说话或者说完话,Beamforming 是怎么取消的? 是在 librespeaker里面吗, python 有没有方法控制手工取消?也就是 实现 python 设置Beamforming角度 ,python再说完话取消

Dependence on the geometry of the mic-array

We came across your DOA code implemented for 2, 4, 6 microphones. We would like to inquire if it is geometry-dependent, i.e. the microphones should form a specific geometric shape, in order of it to work properly.

Is your DOA code for 4 and 6 microphones array based or assume a certain geometry of the mic-array (e.g. the 6 microphones are located in an equilateral hexagon)?

Do you run on a specific mic-array card? If so, what is its model and how the mic-array are arranged (e.g. in an equilateral hexagon)? What adaptions or changes do I need to do if you want to run it with my own customized mic-array?

Does your DOA range is 360 degrees (on the mic-array plane)?

Can you explain the algorithm how to infer the global DOA angle from the local angles of each pair of microphones?

Thanks in advance.

ImportError: No module named speexdsp and webrtc_audio_processin

I bought a Respeaker Core V2.0, did the exact same instructions in the readme here, doa is working fine, kws is working fine but ns and aec gives the following when run from examples:

ImportError: No module named speexdsp
ImportError: No module named webrtc_audio_processing

tried to install speexdsp using pip, it installs fine but the example still don't work!

any ideas?

OS: Respeaker V2.0 latest image

Why use argmin(tau) for DoA on 6-mic array?

Dear Voice Engine developers,

I find your voice engine module for KWS and DOA detections particularly useful, so first of all, thank you for providing this important building block for many voice related applications.

While I am reading the code for the 6-mic circular array DOA, I found in line 45 of doa_respeaker_v2_6mic_array.py:

min_index = np.argmin(np.abs(tau))

I was wondering why you take only one pair of the mics rather than 6 mics all together to compute the angle of arrival? and why specifically the one pair with the minimum tau is taken to compute the 'best_guess'?

It would be much appreciated if you could kindly elaborate on this.

All the best
Rui

Missing pixel_ring file

Hi,

The code in voice-engine/examples/ds_kws_doa_for_respeaker_4mic_array.py refers to pixel_ring, but this is not in the voice_engine directory nor is it a python requirement.

Guess it's just a missing file?

Thanks,
Akram

how to replace Alexa UMDL

Hello Team,

I want to trigger my own voice .imdl file along with Alexa. Could you please advise, where to change the code and add my voice file.

Regards
vithal

Multiple keywords

Is there an option to add multiple keywords detection as well as direction of arrival?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.