Comments (10)
Here is a summary of what I've done so far and the things that are not yet working:
The goal for me was to have an offline working voice assistant that could be integrated with Home Assistant and node red. With the basic setup of this github project, a couple of modifications of the porcupine_launcher script and a seperate rhasspy instance, I basically got what I wanted..
instead of vosk, I went with rhasspy. rhasspy can act as the brain of the voice recognition so speech analysis and intents don't have to run on less powerful devices like the LX06/LX01. rhasspy can be configured as the base or a satellite instance. Satellites normally are installed on a Pi zero (or something similar), but are still fully fledged rhasppy installs. But you don't actually need the rhasspy install on a satellite, porcupine and an mqtt client is actually sufficient.
The way it works:
- porcupine is constantly listening on the LX06 for the defined wakeword. Once that is recognised, it will send a mqtt message to the rhasspy base that a hotword was detected.
- the porcupine script will record x seconds after the wakeword and send the recorded wav file over mqtt.
- rhasspy takes the wav file from mqtt, runs speech to text on the wav file
- the resulting text is checked against the intent definition on rhasspy. "what time is it" gets tagged with GetTime for example
- the result is available via websocket as json. It contains the tagged intent name and the words that were recognised (among other things).
- node red is listening on the websocket from rhasspy. Then it is just "standard" node red logic, parse the json message into a payload message that you can work with and then comes the rest of the logic. For example, if intent name = GetTime, then get the current time and send it via TTS back to the speaker.
Instead of node red, you can work directly with the intent interface of HA if you prefer that, there are also scripts that pull in all devices trom HA into rhasspy. I highly recomment the rhasspy documentation
Short HowTo on how to get here:
- Install rhasspy
- configure it like this:
mqtt: external or internal, just make sure that you use the same credentials in the porcupine_launcher script
audio recording: disabled
wake word: hermes mqtt
speech to text: kaldi
intent recognition: fsticuffs
text to speech: disabled
audio playing: disabled
dialogue management: rhasspy
intent handling: disabled
- replace rows 58 to 64 of the porcupine_launcher script (packages/porcupine/config/launcher) with something like this:
# lower volume on the speaker in case anything is currently playing
amixer set mysoftvol 10%
# publish a mqtt message for rhasspy
mosquitto_pub -h $MQTT_HOST -p 1883 -u $MQTT_USER -P $MQTT_PASS -i me --qos 1-t 'hermes/hotword/default/detected' -m '{"siteId": "default" ,"modelId":"null"}'
# start recording and save the file once the defined time is up
arecord -N -D$MIC -d $TIME -f S16_LE -c $CHANNEL -r $RATE /data/message.wav
# publish the wav file as an mqtt message
mosquitto_pub -h $MQTT_HOST -p 1883 -u $MQTT_USER -P $MQTT_PASS -i me --qos 1 -t 'hermes/audioServer/default/audioFrame' -f /data/message.wav
# set volume on the speaker to 70%
amixer set mysoftvol 70%
- add a websocket node in node red, add a new listener: ws://rhasspy.local:12101/api/events/intent and the json parser as the next node. add a debug node to see what the payload actually looks like.
As you can see, there is a lot of room for improvement in the modification of the porcupine script that should be done.
- use more variables in the script
- Save the current volume before setting to 10%, so that we can restore the value
- There is weird behaviour, where porcupine will only start recording after a successful recording, if you say the wakeword 3 times. Or you have to wait a bit. Not sure yet why that is exactly.
I haven't figured out yet how to get the name of the speaker that was voice activated, back into node red. If I set the siteID to something else than default, the payload from websocket still contains default. That would be useful in order to just say: turn on the light and based on the siteID you can implement logic in node red to turn on the lights in the room of the speaker.
EDIT: siteId corresponds to the name of the satellite, but it has to be added to the rhasspy config as well. Then you can use it in node red as well.
Once that is done, the rest is just config work in rhasspy (adding intents, or sentences to recognise) and the corresponding flows in node red.
Maybe that is of some help to other users out there. If you have any questions or remarks, feel free to comment.
from xiaoai-patch.
Vosk server is websocket, and the way its author programmed it is a bit special.
Thing is, you won't be able to send that file only with curl
, since you need to continue receiving data while sending the file.
websocat works but only on PC, trying the prebuilt binary with speakers does not work properly, connection hangs up.
And I don't want to code another program in C for doing websocket connections.
That's why I'm working in creating a Vosk custom_compontent
for Home Assistant and using it as an STT provider, so that way I can send audio and get the result text.
For debug purposes and until I decide how to process that text, I'm just repeating it locally with Google TTS. Ideally I can send it back to Home Assistant API conversation
to trigger an Intent / Command. But once again, all those actions have to be coded manually. Almond would be ideal here, but only works in English - I'd like to use it in Spanish.
from xiaoai-patch.
Interesting!
Have you looked at the functionality of rhasspy for the voice assistant? Since they offer an http/mqtt endpoint, it might be easier to just use that. There is also work being done on integrating vosk as a service within the rhasspy project.
I'll start experimenting with their API over the next couple of days, maybe that it's an option.
oh, btw. what language is tts_google using? BEcause the pronunciation of error sounds really weird :p
from xiaoai-patch.
If there's a way to integrate Rhasspy into Home Assistant, then I'll may try it.
Line 4 in e54212d
from xiaoai-patch.
Already available as an addon (third party) for HA.
https://github.com/synesthesiam/hassio-addons
Rhasspy also has a very good integration with HA when it comes to intents.
Rhasspy communicates with Home Assistant directly over its REST API. Specifically, Rhasspy intents are POST-ed to the /api/intents/handle endpoint.
You must add intent: to your Home Assistant configuration.yaml to enable the endpoint.
To get started, check out the built-in intents. You can trigger them by simply naming your Rhasspy intents the same:
[HassTurnOn]
turn on the (bedroom light){name}
[HassTurnOff]
turn off the (bedroom light){name}
If you have an entity named "bedroom light" in Home Assistant, you can now turn it on and off from Rhasspy!
Documentation is here
In addition, it is also straightforward to integrate it with nodered. Next chapter in the documentation.
from xiaoai-patch.
Ok, so progess update. This is way easier than I thought and we basically already have everything in place for the basics.
With just two messages to mqtt and a little bit of logic in node red, I'm able to turn my office light on and off. Activate the hotword listener and then send the recorded message to rhasspy, listen to the rhasspy websocket via node red, convert the json payload and call the home assistant service.
I'll modify the porcupine_launcher script so that it works with my setup in order to see how good the wakeword detection with the hardware on the LX06 really is.
But so far, this looks promising. Will keep you posted.
EDIT: Well, as always it seems. Hotword detection is quite good when there's no other sound, even from a distance. But turn on the radio on the LX06 and the hotword detection basically gets unusable. You have to yell more or less and do it several times in order to "launch" porcupine. I modified the script to lower the volume to 10% before listening for the command and then restore it, so that works great. But the hotword detection leaves a lot of room for improvement.
from xiaoai-patch.
Hey @duhow
I wanted to check in again and see how things are going? Have you made any progress with vosk? I basically abandoned my work on this shortly after the last post, the wakeword detection with porcupine was really not ready for a daily use.
Not sure if you follow the home assistant news, but they declared 2023 the year of the voice and also hired the rhasspy dev to work on better integration, but also to work on rhasspy. It seems they want to integrate rhasspy further with home assistant.
I tried to build porcupine 2.1 yesterday (2.0 included some bugfixes and imrovements), but it failed. I'll look into it a bit more this afternoon.
from xiaoai-patch.
Hi there! Happy new year!
No, I did not invest much time to improve the voice assistance thing...
Porcupine new version very likely require additional changes to the new additions, what I'm doing is to use the "demo" software with a patch to "detect wakeword and exit", so that next program can continue with the recording.
Having any other software replacing Porcupine that fits in the flash memory is ok, but I have no idea. And unfortunately most of the active projects are based in Python...
Ideally what the project should do is using Home Assistant API to do STT, or sending the audio directly to Rhasspy via Hermes MQTT as stream, so Rhasspy can confirm when the voice message has finished. Still, somewhat hard to achieve with Bash only, don't want to "overkill" with a C app, and Micropython might not be enough.
from xiaoai-patch.
Thanks and a happy new year to you too!
Ah, yes, now the patch makes sense. I guess it is probably best to wait and see in which direction they're taking it with the different components and then continue the journey here.
As a starting point the 1.9 version of porcupine is good enough I'd say. Let's see where they're taking it.
from xiaoai-patch.
Update status:
As of Home Assistant new services for Whisper, new commit 34bf45f will allow to use it as Speech-to-text. Note this STT is still somewhat inacurate.
âšī¸ Get more details in Voice Assistant docs to setup.
There's still a lot of work to do both in Home Assistant intents, STT service and internal speaker Intents (TBD), so I encourage people to take more seriously home-assistant/intents to provide full functionality.
Closing this one, but feel free to provide PRs for improvements or requests. đ
from xiaoai-patch.
Related Issues (20)
- libmad build fails HOT 3
- toolchain might be better to use armv8l instead of armv7a HOT 1
- [Workaround] hardcoded wifi mac address is used for LX01 HOT 3
- Xiaoai Speaker Play - L05B HOT 19
- Hack into my new LX06 HOT 26
- wget2: certificate verify failed HOT 1
- mpd: Input latency for AUX input HOT 2
- Build for package 'sndio' not completed HOT 2
- AS05G HOT 15
- Is this the correct image? HOT 38
- Docker container exits with error "Build for package 'glibc' not completed" HOT 5
- I tried to connect procupine to MQTT, but failed.
- L09G HOT 60
- Question about the fallback partition on LX06 HOT 4
- bluealsa-aplay v4 not working HOT 1
- `porcupine_launcher` does not work with recent Home Assistant HOT 1
- monitor.sh install is broken HOT 6
- Requirements for new packages HOT 5
- support on arm64 host? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ī¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from xiaoai-patch.