Giter VIP home page Giter VIP logo

Comments (10)

hillbicks avatar hillbicks commented on May 27, 2024 2

Here is a summary of what I've done so far and the things that are not yet working:

The goal for me was to have an offline working voice assistant that could be integrated with Home Assistant and node red. With the basic setup of this github project, a couple of modifications of the porcupine_launcher script and a seperate rhasspy instance, I basically got what I wanted..

instead of vosk, I went with rhasspy. rhasspy can act as the brain of the voice recognition so speech analysis and intents don't have to run on less powerful devices like the LX06/LX01. rhasspy can be configured as the base or a satellite instance. Satellites normally are installed on a Pi zero (or something similar), but are still fully fledged rhasppy installs. But you don't actually need the rhasspy install on a satellite, porcupine and an mqtt client is actually sufficient.

The way it works:

  • porcupine is constantly listening on the LX06 for the defined wakeword. Once that is recognised, it will send a mqtt message to the rhasspy base that a hotword was detected.
  • the porcupine script will record x seconds after the wakeword and send the recorded wav file over mqtt.
  • rhasspy takes the wav file from mqtt, runs speech to text on the wav file
  • the resulting text is checked against the intent definition on rhasspy. "what time is it" gets tagged with GetTime for example
  • the result is available via websocket as json. It contains the tagged intent name and the words that were recognised (among other things).
  • node red is listening on the websocket from rhasspy. Then it is just "standard" node red logic, parse the json message into a payload message that you can work with and then comes the rest of the logic. For example, if intent name = GetTime, then get the current time and send it via TTS back to the speaker.

Instead of node red, you can work directly with the intent interface of HA if you prefer that, there are also scripts that pull in all devices trom HA into rhasspy. I highly recomment the rhasspy documentation

Short HowTo on how to get here:

  • Install rhasspy
  • configure it like this:
mqtt: external or internal, just make sure that you use the same credentials in the porcupine_launcher script
audio recording: disabled
wake word: hermes mqtt
speech to text: kaldi
intent recognition: fsticuffs
text to speech: disabled
audio playing: disabled
dialogue management: rhasspy
intent handling: disabled
  • replace rows 58 to 64 of the porcupine_launcher script (packages/porcupine/config/launcher) with something like this:
# lower volume on the speaker in case anything is currently playing
amixer set mysoftvol 10%
# publish a mqtt message for rhasspy
mosquitto_pub -h $MQTT_HOST -p 1883 -u $MQTT_USER -P $MQTT_PASS -i me --qos 1-t 'hermes/hotword/default/detected' -m '{"siteId": "default" ,"modelId":"null"}'
# start recording and save the file once the defined time is up
arecord -N -D$MIC -d $TIME -f S16_LE -c $CHANNEL -r $RATE /data/message.wav
# publish the wav file as an mqtt message
mosquitto_pub -h $MQTT_HOST -p 1883 -u $MQTT_USER -P $MQTT_PASS -i me --qos 1 -t 'hermes/audioServer/default/audioFrame' -f /data/message.wav
# set volume on the speaker  to 70%
amixer set mysoftvol 70%
  • add a websocket node in node red, add a new listener: ws://rhasspy.local:12101/api/events/intent and the json parser as the next node. add a debug node to see what the payload actually looks like.

As you can see, there is a lot of room for improvement in the modification of the porcupine script that should be done.

  • use more variables in the script
  • Save the current volume before setting to 10%, so that we can restore the value
  • There is weird behaviour, where porcupine will only start recording after a successful recording, if you say the wakeword 3 times. Or you have to wait a bit. Not sure yet why that is exactly.

I haven't figured out yet how to get the name of the speaker that was voice activated, back into node red. If I set the siteID to something else than default, the payload from websocket still contains default. That would be useful in order to just say: turn on the light and based on the siteID you can implement logic in node red to turn on the lights in the room of the speaker.

EDIT: siteId corresponds to the name of the satellite, but it has to be added to the rhasspy config as well. Then you can use it in node red as well.

Once that is done, the rest is just config work in rhasspy (adding intents, or sentences to recognise) and the corresponding flows in node red.

Maybe that is of some help to other users out there. If you have any questions or remarks, feel free to comment.

from xiaoai-patch.

duhow avatar duhow commented on May 27, 2024

Vosk server is websocket, and the way its author programmed it is a bit special.
Thing is, you won't be able to send that file only with curl, since you need to continue receiving data while sending the file.
websocat works but only on PC, trying the prebuilt binary with speakers does not work properly, connection hangs up.
And I don't want to code another program in C for doing websocket connections.
That's why I'm working in creating a Vosk custom_compontent for Home Assistant and using it as an STT provider, so that way I can send audio and get the result text.
For debug purposes and until I decide how to process that text, I'm just repeating it locally with Google TTS. Ideally I can send it back to Home Assistant API conversation to trigger an Intent / Command. But once again, all those actions have to be coded manually. Almond would be ideal here, but only works in English - I'd like to use it in Spanish.

from xiaoai-patch.

hillbicks avatar hillbicks commented on May 27, 2024

Interesting!

Have you looked at the functionality of rhasspy for the voice assistant? Since they offer an http/mqtt endpoint, it might be easier to just use that. There is also work being done on integrating vosk as a service within the rhasspy project.

I'll start experimenting with their API over the next couple of days, maybe that it's an option.

oh, btw. what language is tts_google using? BEcause the pronunciation of error sounds really weird :p

from xiaoai-patch.

duhow avatar duhow commented on May 27, 2024

If there's a way to integrate Rhasspy into Home Assistant, then I'll may try it.

DEFAULT_LANGUAGE=es

from xiaoai-patch.

hillbicks avatar hillbicks commented on May 27, 2024

Already available as an addon (third party) for HA.

https://github.com/synesthesiam/hassio-addons

Rhasspy also has a very good integration with HA when it comes to intents.

Rhasspy communicates with Home Assistant directly over its REST API. Specifically, Rhasspy intents are POST-ed to the /api/intents/handle endpoint.

You must add intent: to your Home Assistant configuration.yaml to enable the endpoint.

To get started, check out the built-in intents. You can trigger them by simply naming your Rhasspy intents the same:

[HassTurnOn]
turn on the (bedroom light){name}

[HassTurnOff]
turn off the (bedroom light){name}

If you have an entity named "bedroom light" in Home Assistant, you can now turn it on and off from Rhasspy!

Documentation is here

In addition, it is also straightforward to integrate it with nodered. Next chapter in the documentation.

from xiaoai-patch.

hillbicks avatar hillbicks commented on May 27, 2024

Ok, so progess update. This is way easier than I thought and we basically already have everything in place for the basics.

With just two messages to mqtt and a little bit of logic in node red, I'm able to turn my office light on and off. Activate the hotword listener and then send the recorded message to rhasspy, listen to the rhasspy websocket via node red, convert the json payload and call the home assistant service.

I'll modify the porcupine_launcher script so that it works with my setup in order to see how good the wakeword detection with the hardware on the LX06 really is.

But so far, this looks promising. Will keep you posted.

EDIT: Well, as always it seems. Hotword detection is quite good when there's no other sound, even from a distance. But turn on the radio on the LX06 and the hotword detection basically gets unusable. You have to yell more or less and do it several times in order to "launch" porcupine. I modified the script to lower the volume to 10% before listening for the command and then restore it, so that works great. But the hotword detection leaves a lot of room for improvement.

from xiaoai-patch.

hillbicks avatar hillbicks commented on May 27, 2024

Hey @duhow

I wanted to check in again and see how things are going? Have you made any progress with vosk? I basically abandoned my work on this shortly after the last post, the wakeword detection with porcupine was really not ready for a daily use.

Not sure if you follow the home assistant news, but they declared 2023 the year of the voice and also hired the rhasspy dev to work on better integration, but also to work on rhasspy. It seems they want to integrate rhasspy further with home assistant.

I tried to build porcupine 2.1 yesterday (2.0 included some bugfixes and imrovements), but it failed. I'll look into it a bit more this afternoon.

from xiaoai-patch.

duhow avatar duhow commented on May 27, 2024

Hi there! Happy new year!

No, I did not invest much time to improve the voice assistance thing...

Porcupine new version very likely require additional changes to the new additions, what I'm doing is to use the "demo" software with a patch to "detect wakeword and exit", so that next program can continue with the recording.
Having any other software replacing Porcupine that fits in the flash memory is ok, but I have no idea. And unfortunately most of the active projects are based in Python...

Ideally what the project should do is using Home Assistant API to do STT, or sending the audio directly to Rhasspy via Hermes MQTT as stream, so Rhasspy can confirm when the voice message has finished. Still, somewhat hard to achieve with Bash only, don't want to "overkill" with a C app, and Micropython might not be enough.

from xiaoai-patch.

hillbicks avatar hillbicks commented on May 27, 2024

Thanks and a happy new year to you too!

Ah, yes, now the patch makes sense. I guess it is probably best to wait and see in which direction they're taking it with the different components and then continue the journey here.

As a starting point the 1.9 version of porcupine is good enough I'd say. Let's see where they're taking it.

from xiaoai-patch.

duhow avatar duhow commented on May 27, 2024

Update status:

As of Home Assistant new services for Whisper, new commit 34bf45f will allow to use it as Speech-to-text. Note this STT is still somewhat inacurate.

ℹī¸ Get more details in Voice Assistant docs to setup.

There's still a lot of work to do both in Home Assistant intents, STT service and internal speaker Intents (TBD), so I encourage people to take more seriously home-assistant/intents to provide full functionality.

Closing this one, but feel free to provide PRs for improvements or requests. 😃

from xiaoai-patch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.