gtreshchev / runtimespeechrecognizer Goto Github PK

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

License: MIT License

C++ 97.58% C 1.38% C# 1.04%

openai speech-recognition speech-to-text ue4-plugin ue5-plugin whisper audio-processing ue4 ue5 whis whisper-ai whisper-cpp unreal-engine unreal-engine-4 unreal-engine-5 speech-processing voice-recognition speech-detection

runtimespeechrecognizer's People

Contributors

Stargazers

Watchers

runtimespeechrecognizer's Issues

Does this use require computer configuration? I used it the same way as the blueprint node is connected. This will cause UE4 to get stuck

Unreal crashes when trying to record from microphone.

I tried both unreal 5.2 and 5.3 and cloning the github project and using the marketplace plugin. Same error.

Unhandled Exception: EXCEPTION_ILLEGAL_INSTRUCTION

UnrealEditor_RuntimeSpeechRecognizer!ggml_vec_dot_f16() [D:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Source\ThirdParty\whisper.cpp\ggml.c:1583]
UnrealEditor_RuntimeSpeechRecognizer!ggml_compute_forward_mul_mat() [D:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Source\ThirdParty\whisper.cpp\ggml.c:10944]
UnrealEditor_RuntimeSpeechRecognizer!ggml_compute_forward() [D:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Source\ThirdParty\whisper.cpp\ggml.c:16223]
UnrealEditor_RuntimeSpeechRecognizer!ggml_graph_compute_thread() [D:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Source\ThirdParty\whisper.cpp\ggml.c:18390]
kernel32
ntdll

Can use gpu when the audio data capture and transcribe?

Hello,
when i use the runtimespeechreconginzer in my pc, the language model is medium, my cpu which size is 16g will be 100% used. I found the whispper support use cuda gpu to accelerate, if i can use the gpu in this plugin?

Missing Nodes in 5.4

I bought the pack from the market place and attempted to load the demo project but my copy is missing all nodes involving waves such as "create capturable wave" etc. Does anyone have any solutions (my editor is 5.4)

[Blank Audio] ， I can't capture sound wave

when I set BP like this and compile it, everything seems fine, but when I started, it shows blank Audio, and There's no issue with the microphone. so I am confuse, I don't know what happens. Please help me , thanks a lot. and this is the BP and the output log:

output_LOG.txt

[part of output log]：
……
LogRuntimeSpeechRecognizer: Pending audio data instead of enqueuing it since it is not enough to fill the step size (pending: 14879, num of samples per step: 80000)
LogRuntimeAudioImporter: No need to resample or mix audio data
LogRuntimeAudioImporter: Reallocating buffer to append data (new capacity: 566400)
LogRuntimeSpeechRecognizer: Pending audio data instead of enqueuing it since it is not enough to fill the step size (pending: 15039, num of samples per step: 80000)
LogBlueprintUserMessages: [BP_ThirdPersonCharacter_C_0] 完成声音捕获
LogRuntimeSpeechRecognizer: Enqueued audio data from the pending audio to the queue of the speech recognizer as the last data (num of samples: 15039)
LogRuntimeSpeechRecognizer: Processed audio data with the size of 79360 samples to the whisper recognizer
LogRuntimeSpeechRecognizer: Recognized text segment: " [BLANK_AUDIO]"
LogBlueprintUserMessages: [BP_ThirdPersonCharacter_C_0] [BLANK_AUDIO]
LogRuntimeSpeechRecognizer: Speech recognition progress: 100
LogRuntimeSpeechRecognizer: Speech recognition progress: 0
LogRuntimeSpeechRecognizer: Processed audio data with the size of 17600 samples to the whisper recognizer
LogRuntimeSpeechRecognizer: Recognized text segment: " [BLANK_AUDIO]"
LogBlueprintUserMessages: [BP_ThirdPersonCharacter_C_0] [BLANK_AUDIO]
LogRuntimeSpeechRecognizer: Speech recognition progress: 100
LogBlueprintUserMessages: [BP_ThirdPersonCharacter_C_0] Can't capture sound wave
LogRuntimeSpeechRecognizer: Speech recognition finished
LogRuntimeSpeechRecognizer: Stopping the speech recognizer thread
LogCore: Display: Tracing Screenshot "ScreenShot00007" taken with size: 2578 x 1408
LogCore: Display: Tracing Screenshot "ScreenShot00008" taken with size: 2578 x 1408
LogSlate: Updating window title bar state: overlay mode, drag disabled, window buttons hidden, title bar hidden
LogWorld: BeginTearingDown for /Game/ThirdPerson/Maps/UEDPIE_0_ThirdPersonMap
LogWorld: UWorld::CleanupWorld for ThirdPersonMap, bSessionEnded=true, bCleanupResources=true
LogSlate: InvalidateAllWidgets triggered. All widgets were invalidated
LogWorldPartition: UWorldPartition::Uninitialize : World = /Game/ThirdPerson/Maps/UEDPIE_0_ThirdPersonMap.ThirdPersonMap
LogContentBundle: [ThirdPersonMap(Standalone)] Deleting container.
LogWorldMetrics: [UWorldMetricsSubsystem::Deinitialize]
LogWorldMetrics: [UWorldMetricsSubsystem::Clear]
LogPlayLevel: Display: Shutting down PIE online subsystems
LogSlate: InvalidateAllWidgets triggered. All widgets were invalidated
LogRuntimeAudioImporter: Warning: Imported sound wave ('CapturableSoundWave_1') data will be cleared because it is being unloaded
LogSlate: Updating window title bar state: overlay mode, drag disabled, window buttons hidden, title bar hidden
LogAudioMixer: Deinitializing Audio Bus Subsystem for audio device with ID 3
LogAudioMixer: FMixerPlatformXAudio2::StopAudioStream() called. InstanceID=3
LogAudioMixer: FMixerPlatformXAudio2::StopAudioStream() called. InstanceID=3
LogUObjectHash: Compacting FUObjectHashTables data took 0.79ms
LogPlayLevel: Display: Destroying online subsystem :Context_8

how to use the plugin with wake on voice

Thanks for the plugin.

I want to use this plugin with wake on voice, like
when user say "hello alexa, what time is it", the system whill recognize user's voice, and do something with the command。

so the micphone will be working on all the time, and delay some seconds to send data to recognize with streaming audio input.

If restart with "Start Speech Recognition" or with " Start Capture", all works wrong.
The error is about the thread status or parameter "bIsStopped".

so how to do this? thanks.

Android build 5.2

Unreal 5.1 ~ 5.2
Runtime 5.2

Built according to document but print string recognizes "!!!!!!!!!!!!!!!!!!!!!"

Unreal Engine 5.3 crash

Unreal engine (5.3.1/5.3.2 both) crashes when I try to run the nodes copied from https://blueprintue.com/blueprint/et6u52bm/. I simply copied the nodes on my character BP and added the missing variables.

Full logs below:

Fatal error: [File:D:\build++UE5\Sync\Engine\Source\Runtime\Core\Private\Containers\ContainerHelpers.cpp] [Line: 8] Trying to resize TArray to an invalid size of 2147483648

UnrealEditor_Core!UE::Core::Private::OnInvalidArrayNum() [D:\build++UE5\Sync\Engine\Source\Runtime\Core\Private\Containers\ContainerHelpers.cpp:8]
UnrealEditor_RuntimeSpeechRecognizer!TArray<float,TAlignedHeapAllocator<16> >::ResizeGrow() [D:\RocketSync\5.3.0-27405482+++UE5+Release-5.3\Working\Engine\Source\Runtime\Core\Public\Containers\Array.h:2983]
UnrealEditor_RuntimeSpeechRecognizer!FSpeechRecognizerThread::FPendingAudioData::GetMixedAndResampledAudio() [D:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Source\RuntimeSpeechRecognizer\Private\SpeechRecognizerThread.cpp:259]
UnrealEditor_RuntimeSpeechRecognizer!FSpeechRecognizerThread::ForceProcessPendingAudioData() [D:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Source\RuntimeSpeechRecognizer\Private\SpeechRecognizerThread.cpp:542]
UnrealEditor_CoreUObject!UFunction::Invoke() [D:\build++UE5\Sync\Engine\Source\Runtime\CoreUObject\Private\UObject\Class.cpp:6665]
UnrealEditor_CoreUObject!UObject::CallFunction() [D:\build++UE5\Sync\Engine\Source\Runtime\CoreUObject\Private\UObject\ScriptCore.cpp:1139]
UnrealEditor_CoreUObject!UObject::ProcessContextOpcode() [D:\build++UE5\Sync\Engine\Source\Runtime\CoreUObject\Private\UObject\ScriptCore.cpp:3094]
UnrealEditor_CoreUObject!ProcessLocalScriptFunction() [D:\build++UE5\Sync\Engine\Source\Runtime\CoreUObject\Private\UObject\ScriptCore.cpp:1209]
UnrealEditor_CoreUObject!UObject::ProcessInternal() [D:\build++UE5\Sync\Engine\Source\Runtime\CoreUObject\Private\UObject\ScriptCore.cpp:1306]
UnrealEditor_CoreUObject!UFunction::Invoke() [D:\build++UE5\Sync\Engine\Source\Runtime\CoreUObject\Private\UObject\Class.cpp:6665]
UnrealEditor_CoreUObject!UObject::ProcessEvent() [D:\build++UE5\Sync\Engine\Source\Runtime\CoreUObject\Private\UObject\ScriptCore.cpp:2145]
UnrealEditor_Engine!AActor::ProcessEvent() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\Actor.cpp:1122]
UnrealEditor_Engine!FLatentActionManager::TickLatentActionForObject() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\LatentActionManager.cpp:314]
UnrealEditor_Engine!FLatentActionManager::ProcessLatentActions() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\LatentActionManager.cpp:208]
UnrealEditor_Engine!AActor::Tick() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\Actor.cpp:1540]
UnrealEditor_Engine!AActor::TickActor() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\Actor.cpp:1516]
UnrealEditor_Engine!FActorTickFunction::ExecuteTick() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\Actor.cpp:251]
UnrealEditor_Engine!FTickFunctionTask::DoTask() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\TickTaskManager.cpp:278]
UnrealEditor_Engine!TGraphTask::ExecuteTask() [D:\build++UE5\Sync\Engine\Source\Runtime\Core\Public\Async\TaskGraphInterfaces.h:1265]
UnrealEditor_Core!FNamedTaskThread::ProcessTasksNamedThread() [D:\build++UE5\Sync\Engine\Source\Runtime\Core\Private\Async\TaskGraph.cpp:758]
UnrealEditor_Core!FNamedTaskThread::ProcessTasksUntilQuit() [D:\build++UE5\Sync\Engine\Source\Runtime\Core\Private\Async\TaskGraph.cpp:649]
UnrealEditor_Core!FTaskGraphCompatibilityImplementation::WaitUntilTasksComplete() [D:\build++UE5\Sync\Engine\Source\Runtime\Core\Private\Async\TaskGraph.cpp:2125]
UnrealEditor_Engine!FTickTaskSequencer::ReleaseTickGroup() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\TickTaskManager.cpp:556]
UnrealEditor_Engine!FTickTaskManager::RunTickGroup() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\TickTaskManager.cpp:1583]
UnrealEditor_Engine!UWorld::RunTickGroup() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\LevelTick.cpp:771]
UnrealEditor_Engine!UWorld::Tick() [D:\build++UE5\Sync\Engine\Source\Runtime\Engine\Private\LevelTick.cpp:1515]
UnrealEditor_UnrealEd!UEditorEngine::Tick() [D:\build++UE5\Sync\Engine\Source\Editor\UnrealEd\Private\EditorEngine.cpp:1924]
UnrealEditor_UnrealEd!UUnrealEdEngine::Tick() [D:\build++UE5\Sync\Engine\Source\Editor\UnrealEd\Private\UnrealEdEngine.cpp:531]
UnrealEditor!FEngineLoop::Tick() [D:\build++UE5\Sync\Engine\Source\Runtime\Launch\Private\LaunchEngineLoop.cpp:5825]
UnrealEditor!GuardedMain() [D:\build++UE5\Sync\Engine\Source\Runtime\Launch\Private\Launch.cpp:188]
UnrealEditor!GuardedMainWrapper() [D:\build++UE5\Sync\Engine\Source\Runtime\Launch\Private\Windows\LaunchWindows.cpp:118]
UnrealEditor!LaunchWindowsStartup() [D:\build++UE5\Sync\Engine\Source\Runtime\Launch\Private\Windows\LaunchWindows.cpp:258]
UnrealEditor!WinMain() [D:\build++UE5\Sync\Engine\Source\Runtime\Launch\Private\Windows\LaunchWindows.cpp:298]
UnrealEditor!__scrt_common_main_seh() [D:\a_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288]
kernel32
ntdll

I face an issue with ProcessAudioData, unable to get it to work right when directly feeding Populated audio data from captured sound wave.

Testing with Model: "Small" & "English only".
Microphone Properties: 48000 sample rate, 2 channels.
Speech recognition params, I used:

When I feed populated audio data directly to SpeechRecognizer->ProcessAudioData like mentioned in wiki, the recognized words are always {"this","that","you","{Music}" etc}. it is always recognizing random single words that doesn't make sense.

But when I change it to the following way, by accumulating the PCM Data array to be big enough and then feeding to ProcessAudioData, then it provides awesome results, with slight delay of course.

But from documentation you provided, the "StepSize" parameter should control the duration of audio data going in for speech processing. in that case even when Populated audio data is directly feeded to ProcessAudioData, the speech should only process once every 5 seconds (StepSize value) right? But that doesn't seem the case.

The results are accurate only when the stepSize is >= 5 seconds and when the ProcessAudioData is feeded with a PCMData array with 5 second duration of audio data all at once.

Am I Understanding the "StepSize" param the wrong way? What's the better way to approach this to get accurate results with as much less delay as possible?

Also wanted to know, if there is a way to get TimeStamp along with each recognized Text Segment.

Depoloy build??

My setting
1.ue5.1~
2.sdk 31~
3.runtime audio / runtime speech 5.1 ~ 5.2
4.https://blueprintue.com/blueprint/et6u52bm/
5.Microphone and other file permissions (Ok)

(Speech to text test)
I tried to use it on Android, but in the print string, it looks like this "!!!!!!!!!!!!!!!!!"
And for Play Editor and Windows Depoloy, Speech works normally

[Android phone speech PrintString]

Cannot run RSR_Demo in ue5.3.2

just blank project
need little help

has no Create Capturable sound wave

faster-whisper as custom model failed

hello!
i tried adding https://huggingface.co/guillaumekln/faster-whisper-tiny.en/tree/main as a custom model. it is a faster implementation of whisper. after loading the custom model i get an error when activating the mic. "The audio data could not be processess to the recognizer since the thread is stopped". any idea why? faster-whisper has a great speed advantage in my python environment...
cheers

edit: i forgot to mention: the error message comes from On_RecognitionError_Event

Unreal Engine Source does not work with version 5300

I have downloaded Unreal Engine Source from Epic Games Github account. I have downloaded the release RuntimeSpeechRecognizer5300 and extracted it to a zip. I have created a brand new game directory through Unreal Engine Source. Extracted the plugin to my plugins folder MyGame/Plugins/RuntimeSpeechRecognizer. I have ran Generate Project Files and all other steps for running setup for Unreal Engine Source. Once I attempt to run the project I am told that the modules are out of date and would I like to build then now. After clicking yes and it attempts to build I am met with this error:

Engine modules are out of date, and cannot be compiled while the engine is running. Please build through your IDE.

I have tried manually building everything all to the same result. I have verified that my Unreal Engine Version: Version: 5.3.2-0+UE5

PackageRunning In Quest3

Hi,Thank you for developing such a great plug-in, but I have a small question about the StartSpeechRecognition node. I log out the runtime log, but find that the OnStarted event is not executable on quest3 after executing the StartSpeechRecognition node, which, by the way, is successfully packaged into the project file. I would like to ask if there is any solution to enable me to successfully implement speech recognition on quest3?After trying, it is possible to obtain recording permissions and successfully implement speech recognition on Android phones, but it cannot be used on quest3. The StartSpeechRecognition node seems to have big problems, making the model unable to load correctly

Translate to English parameter does not seem to translate when set to true?

Error: whisper_full_with_state: encoder_begin_callback returned false - aborting

I hope you can help me with this. I do not know the source of the issue, but this seems to only happen in my code when it runs in C++. I'm using Unreal 5.3 on Windows (for reference).

If I run the blueprint example it runs just fine. It will loop the pending audio indefinitely (does the pending: 79680, num of samples per step: 80000 then starts over). No problems here.

But if I run a similar code in C++, it does one loop alright, but at the end of the second a warning, then an error shows up.

Warning: Aborting whisper recognition due to stop request
Error: whisper_full_with_state: encoder_begin_callback returned false - aborting

And it never recovers. Can you replicate the behavior? (I leave a snippet of code below.)

Snippet of Log

PCM Info:
Validity of PCM data in memory: Valid, number of PCM frames: 160, PCM data size: 320
LogRuntimeSpeechRecognizer: Pending audio data instead of enqueuing it since it is not enough to fill the step size (pending: 79680, num of samples per step: 80000)
LogRuntimeAudioImporter: Successfully added audio data to streaming sound wave.
Added audio info: SoundWave Basic Info:
Number of channels: 2, sample rate: 16000, duration: 0.010000
PCM Info:
Validity of PCM data in memory: Valid, number of PCM frames: 160, PCM data size: 320
LogRuntimeSpeechRecognizer: Enqueued audio data from the pending audio to the queue of the speech recognizer (num of samples: 79840)
LogRuntimeAudioImporter: Successfully added audio data to streaming sound wave.
Added audio info: SoundWave Basic Info:
Number of channels: 2, sample rate: 16000, duration: 0.010000
PCM Info:
Validity of PCM data in memory: Valid, number of PCM frames: 160, PCM data size: 320
LogRuntimeSpeechRecognizer: Warning: Aborting whisper recognition due to stop request
LogRuntimeSpeechRecognizer: Warning: Aborting whisper recognition due to stop request
LogRuntimeSpeechRecognizer: Error: whisper_full_with_state: encoder_begin_callback returned false - aborting
LogRuntimeSpeechRecognizer: Processed audio data with the size of 79840 samples to the whisper recognizer
LogRuntimeAudioImporter: Successfully added audio data to streaming sound wave.
Added audio info: SoundWave Basic Info:
Number of channels: 2, sample rate: 16000, duration: 0.010000
PCM Info:
Validity of PCM data in memory: Valid, number of PCM frames: 160, PCM data size: 320
LogRuntimeSpeechRecognizer: Error: Audio processing failed: The audio data could not be processed to the recognizer since the thread is stopped
LogTemp: Error: DreamiaCharacter: Error in Speech Recognition: The audio data could not be processed to the recognizer since the thread is stopped
LogRuntimeSpeechRecognizer: Error: Audio processing failed: The audio data could not be processed to the recognizer since the thread is stopped
LogTemp: Error: DreamiaCharacter: Error in Speech Recognition: The audio data could not be processed to the recognizer since the thread is stopped
LogRuntimeAudioImporter: Successfully added audio data to streaming sound wave.
Added audio info: SoundWave Basic Info:
Number of channels: 2, sample rate: 16000, duration: 0.010000

Code Snippet

Header

UFUNCTION(BlueprintCallable, Category="Dreamia|SpeechRecognition")
void StartAudioSession(bool bIsMuted = false);

UFUNCTION(BlueprintCallable, Category="Dreamia|SpeechRecognition")
void StopAudioSession();

UFUNCTION(BlueprintCallable, Category="Dreamia|SpeechRecognition")
void SetAudioSessionMute(const bool bIsMuted);

/**
* Audio Input Device ID (aka which microphone to use).
*/
UPROPERTY(BlueprintReadWrite, EditAnywhere, Category="Dreamia|SpeechRecognition")
int DeviceId = 0;

UPROPERTY()
FOnSpeechRecognitionStartedDynamic OnStartSpeechRecognitionEvent;

// STT Section
UPROPERTY()
bool IsAudioSessionMuted = false;

UPROPERTY()
class USpeechRecognizer* SpeechRecognizer;

UPROPERTY()
class UCapturableSoundWave* CapturableSoundWave;

UFUNCTION()
void OnRecognitionFinished();
UFUNCTION()
void OnRecognitionError(const FString& ShortErrorMessage, const FString& LongErrorMessage);
UFUNCTION()
void OnRecognizedTextSegment(const FString& RecognizedWords);

UFUNCTION()
void OnStartSpeechRecognition(bool bSucceeded);

UFUNCTION()
void OnPopulateAudioData(const TArray<float>& PopulatedAudioData);

Code

void UDreamiaCharacterComponent::StartAudioSession(const bool bIsMuted)
{
	SpeechRecognizer = USpeechRecognizer::CreateSpeechRecognizer();
	SpeechRecognizer->SetLanguage(ESpeechRecognizerLanguage::En);
	SpeechRecognizer->SetNumOfThreads(8);
	SpeechRecognizer->SetStepSize(1000);
	SpeechRecognizer->OnRecognitionFinished.AddUniqueDynamic(this, &UDreamiaCharacterComponent::OnRecognitionFinished);
	SpeechRecognizer->OnRecognitionError.AddUniqueDynamic(this, &UDreamiaCharacterComponent::OnRecognitionError);
	SpeechRecognizer->OnRecognizedTextSegment.AddUniqueDynamic(
		this, &UDreamiaCharacterComponent::OnRecognizedTextSegment);
	SpeechRecognizer->SetStreamingDefaults();

	OnStartSpeechRecognitionEvent.Clear();
	OnStartSpeechRecognitionEvent.BindDynamic(this, &UDreamiaCharacterComponent::OnStartSpeechRecognition);
	SpeechRecognizer->StartSpeechRecognition(OnStartSpeechRecognitionEvent);
}

void UDreamiaCharacterComponent::OnRecognitionError(const FString& ShortErrorMessage, const FString& LongErrorMessage)
{
	UE_LOG(LogTemp, Error, TEXT("%s: Error in Speech Recognition: %s"), *(GetNameSafe(this)), *LongErrorMessage);
}

void UDreamiaCharacterComponent::OnStartSpeechRecognition(bool bSucceeded)
{
	CapturableSoundWave = UCapturableSoundWave::CreateCapturableSoundWave();
	CapturableSoundWave->OnPopulateAudioData.AddUniqueDynamic(this, &UDreamiaCharacterComponent::OnPopulateAudioData);

	CapturableSoundWave->StartCapture(DeviceId);
	// CapturableSoundWave->ToggleMute(IsAudioSessionMuted);
}

void UDreamiaCharacterComponent::OnPopulateAudioData(const TArray<float>& PopulatedAudioData)
{
	if (!SpeechRecognizer)
	{
		// TODO LOG ERROR
		return;
	}
	if (!CapturableSoundWave)
	{
		// TODO LOG ERROR
		return;
	}

	SpeechRecognizer->ProcessAudioData(PopulatedAudioData, CapturableSoundWave->GetSampleRate(),
	                                   CapturableSoundWave->GetNumOfChannels(), false);
}

void UDreamiaCharacterComponent::OnRecognizedTextSegment(const FString& RecognizedWords)
{
	FString FilteredWords = RecognizedWords;

	// PROCESS TEXT
}

void UDreamiaCharacterComponent::OnRecognitionFinished()
{
	if (!SpeechRecognizer)
	{
		// TODO LOG ERROR
		return;
	}
	if (!CapturableSoundWave)
	{
		// TODO LOG ERROR
		SpeechRecognizer->StopSpeechRecognition();
		return;
	}

	if (CapturableSoundWave->IsCapturing())
	{
		SpeechRecognizer->StopSpeechRecognition();
	}
}

void UDreamiaCharacterComponent::StopAudioSession()
{
	if (CapturableSoundWave)
	{
		CapturableSoundWave->StopCapture();
	}
	if (SpeechRecognizer)
	{
		SpeechRecognizer->StopSpeechRecognition();
	}
}

void UDreamiaCharacterComponent::SetAudioSessionMute(const bool bIsMuted)
{
	IsAudioSessionMuted = bIsMuted;
	if (CapturableSoundWave && CapturableSoundWave->IsCapturing())
	{
		CapturableSoundWave->ToggleMute(IsAudioSessionMuted);
	}
}

Works once only per CreateSpeehRecognizer

Hi, I modified your sample to StartCapture/StopCapture on keypress. I notice it will work the first time only unless I recreate the CreateSpeechRecognizer everytime I capture. That doesn't seem right to me, can you please confirm.

Nodes missing in Unreal 5.3.2

Hi, some nodes are missing in unreal 5.3.2.

I tried installing the plugins through marketplace and manually. Nothing worked so far.

UE5.3 records just "you"

The setup for Runtime audio importer and speech recognizer just always recognizes "you"
Tested on UE5.3 blank project freshly activated plugins. Used mic to capture audio.
Worked with same setup on UE5.1

There is no Get Sample Rate node from Capturable Sound Wave

I am trying out the plugin but when I copy paste the nodes from the documentation, here

It is the streaming example

and I noticed I cannot copy the Get Sample Rate node from the Capturable Sound Wave. I am using UE 5.2.1.

I think this causes the engine to crash when I try to run the streaming template as there is an error that causes the engine to crash, and the log says it tried to resize an array to an invalid size.

LoginId:fe87237647aa5ca51eede69be64b7aca
EpicAccountId:124ea92a439340ceb337a1ae3ed11351

Fatal error: [File:D:\build\++UE5\Sync\Engine\Source\Runtime\Core\Private\Containers\Array.cpp] [Line: 8] Trying to resize TArray to an invalid size of 2147483648

UnrealEditor_RuntimeSpeechRecognizer!TArray<float,TAlignedHeapAllocator<16> >::ResizeGrow() [D:\RocketSync\5.2.0-25360045+++UE5+Release-5.2\Working\Engine\Source\Runtime\Core\Public\Containers\Array.h:2942]
UnrealEditor_RuntimeSpeechRecognizer!FSpeechRecognizerThread::ProcessPCMData() [D:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Source\RuntimeSpeechRecognizer\Private\SpeechRecognizerThread.cpp:397]
UnrealEditor_RuntimeSpeechRecognizer!USpeechRecognizer::ProcessAudioData() [D:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Source\RuntimeSpeechRecognizer\Private\SpeechRecognizer.cpp:64]
UnrealEditor_RuntimeSpeechRecognizer!USpeechRecognizer::ProcessAudioData() [D:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Source\RuntimeSpeechRecognizer\Private\SpeechRecognizer.cpp:59]
UnrealEditor_RuntimeSpeechRecognizer!USpeechRecognizer::execProcessAudioData() [d:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeSpeechRecognizer\Intermediate\Build\Win64\UnrealEditor\Inc\RuntimeSpeechRecognizer\UHT\SpeechRecognizer.gen.cpp:366]
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_Engine
UnrealEditor_RuntimeAudioImporter!TMulticastScriptDelegate<FWeakObjectPtr>::ProcessMulticastDelegate<UObject>() [D:\RocketSync\5.2.0-25360045+++UE5+Release-5.2\Working\Engine\Source\Runtime\Core\Public\UObject\ScriptDelegates.h:565]
UnrealEditor_RuntimeAudioImporter!FOnPopulateAudioData_DelegateWrapper() [d:\build\U5M-Marketplace\Sync\LocalBuilds\PluginTemp\HostProject\Plugins\RuntimeAudioImporter\Intermediate\Build\Win64\UnrealEditor\Inc\RuntimeAudioImporter\UHT\ImportedSoundWave.gen.cpp:259]
UnrealEditor_RuntimeAudioImporter!UE::Core::Private::Function::TFunctionRefCaller<<lambda_b0f2cea00923fb3576dede781dab1b37>,void __cdecl(void)>::Call() [D:\RocketSync\5.2.0-25360045+++UE5+Release-5.2\Working\Engine\Source\Runtime\Core\Public\Templates\Function.h:475]
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Core
UnrealEditor_Engine
UnrealEditor
UnrealEditor
UnrealEditor
UnrealEditor
UnrealEditor
UnrealEditor
kernel32
ntdll

FYI: The Engine does pick up my microphone.

Any help is appreciated thx.

The instructions are not really instructions

Is ther no Project file at least we can learn from how to use this? AAn example? Anything?

For instruction all there is this:

Create a Speech Recognizer and set the necessary parameters (CreateSpeechRecognizer, for parameters see here).
Bind to the needed delegates (OnRecognitionFinished, OnRecognizedTextSegment and OnRecognitionError).
Start the speech recognition (StartSpeechRecognition).
Process audio data and wait for results from the delegates (ProcessAudioData).
Stop the speech recognizer when not needed (e.g., after the OnRecognitionFinished broadcast).

Voice Activation

Hey,

I saw you had voice activation planned for this tool. I ended up implementing a basic solution on my end and wanted to share it with you.

In my use case, I really needed an open mic where the player spoke when they wanted. So I implemented voice detection through changes in the intensity of the audio (not perfect but it kinda works). I leave the code below if you want to try it out.

In essence, I pick up the data points from the CapturableSoundWave and create an average over time. If for a given time frame, the values deviate enough from the average, then I activate the voice recording. The recording stops after a second if the data points return to the average (the player stopped speaking).

Header

#pragma once

#include "CoreMinimal.h"
#include "SpeechRecognizer.h"
#include "MyRecorderSpeechRecognition.generated.h"

/**
 * 
 */
UCLASS(Blueprintable)
class MY_API UMyRecorderSpeechRecognition : public UObject
{
	GENERATED_BODY()

public:
	virtual void BeginDestroy() override;

	virtual void Init();
	virtual void StartAudioSession(bool bIsMuted);
	virtual void StopAudioSession();
	virtual void SetAudioSessionMute(const bool bIsMuted);
	virtual bool GetAudioSessionMute();

	/**
	 * Audio Input Device ID (aka which microphone to use).
	 */
	UPROPERTY(BlueprintReadWrite, EditAnywhere, Category="STT")
	int DeviceId = 0;

	/**
	 * Window of captured audio data to be stored prior to voice activation.
	 */
	UPROPERTY(BlueprintReadWrite, EditAnywhere, Category="STT|Voice Activation")
	int VoiceActivationWindow = 5;

	/**
	 * Time to wait after voice activation and before sending the audio data to be recognized.
	 */
	UPROPERTY(BlueprintReadWrite, EditAnywhere, Category="STT|Voice Activation")
	float VoiceActivationDelay = 1.0f;

	/**
	 * Multiplier of the deviation to detect a significant shift in volume to trigger voice activation.
	 */
	UPROPERTY(BlueprintReadWrite, EditAnywhere, Category="STT|Voice Activation")
	float VoiceActivationDeviationMultiplier = 2.0f;

	/**
	 * Minimum value the deviation can have to offer better results. (Most relevant in quiet environments.)
	 */
	UPROPERTY(BlueprintReadWrite, EditAnywhere, Category="STT|Voice Activation")
	float VoiceActivationMinDeviation = 0.5f;

protected:

	UPROPERTY()
	TArray<float> AudioData;

	float ActiveAverage = -1;
	float ActiveDeviation = -1;
	
	// STT Section
	UPROPERTY()
	bool IsAudioSessionMuted = false;

	UPROPERTY(BlueprintReadWrite)
	class USpeechRecognizer* SpeechRecognizer;

	UPROPERTY(BlueprintReadWrite)
	class UCapturableSoundWave* CapturableSoundWave;

	UPROPERTY()
	FOnSpeechRecognitionStartedDynamic OnStartSpeechRecognitionEvent;

	UFUNCTION()
	void OnRecognitionFinished();
	UFUNCTION()
	void OnRecognitionError(const FString& ShortErrorMessage, const FString& LongErrorMessage);
	UFUNCTION()
	void OnRecognizedTextSegment(const FString& RecognizedWords);

	UFUNCTION()
	void OnStartSpeechRecognition(bool bSucceeded);

	UFUNCTION()
	void OnPopulateAudioData(const TArray<float>& PopulatedAudioData);

	UFUNCTION()
	void Reset();

	bool IsVoiceActivated = false;
	
	FTimerDelegate VoiceActivationDelegate{};
	FTimerHandle VoiceActivationTimerHandle{};

	UFUNCTION()
	void OnVoiceActivation();
};

Code

// Fill out your copyright notice in the Description page of Project Settings.


#include "MyRecorderSpeechRecognition.h"

#include "Sound/CapturableSoundWave.h"

void UMyRecorderSpeechRecognition::BeginDestroy()
{
	if (SpeechRecognizer)
	{
		SpeechRecognizer->StopSpeechRecognition();
		SpeechRecognizer->OnRecognitionFinished.RemoveDynamic(
			this, &UMyRecorderSpeechRecognition::OnRecognitionFinished);
		SpeechRecognizer->OnRecognitionError.
		                  RemoveDynamic(this, &UMyRecorderSpeechRecognition::OnRecognitionError);
		SpeechRecognizer->OnRecognizedTextSegment.RemoveDynamic(
			this, &UMyRecorderSpeechRecognition::OnRecognizedTextSegment);
	}

	OnStartSpeechRecognitionEvent.Clear();

	if (CapturableSoundWave)
	{
		CapturableSoundWave->StopCapture();
		CapturableSoundWave->OnPopulateAudioData.RemoveDynamic(
			this, &UMyRecorderSpeechRecognition::OnPopulateAudioData);
	}

	if (GetWorld() && GetWorld()->GetTimerManager().IsTimerActive(VoiceActivationTimerHandle))
	{
		GetWorld()->GetTimerManager().ClearTimer(VoiceActivationTimerHandle);
	}

	Super::BeginDestroy();
}

void UMyRecorderSpeechRecognition::Init()
{
	SpeechRecognizer = USpeechRecognizer::CreateSpeechRecognizer();
	SpeechRecognizer->SetLanguage(ESpeechRecognizerLanguage::En);
	SpeechRecognizer->OnRecognitionFinished.AddUniqueDynamic(
		this, &UMyRecorderSpeechRecognition::OnRecognitionFinished);
	SpeechRecognizer->OnRecognitionError.AddUniqueDynamic(this, &UMyRecorderSpeechRecognition::OnRecognitionError);
	SpeechRecognizer->OnRecognizedTextSegment.AddUniqueDynamic(
		this, &UMyRecorderSpeechRecognition::OnRecognizedTextSegment);
	SpeechRecognizer->SetStreamingDefaults();
	SpeechRecognizer->SetSuppressBlank(true);
	SpeechRecognizer->SetSuppressNonSpeechTokens(true);
	SpeechRecognizer->SetNumOfThreads(0);
	SpeechRecognizer->SetStepSize(0);

	OnStartSpeechRecognitionEvent.BindDynamic(this, &UMyRecorderSpeechRecognition::OnStartSpeechRecognition);

	CapturableSoundWave = UCapturableSoundWave::CreateCapturableSoundWave();
	CapturableSoundWave->OnPopulateAudioData.AddUniqueDynamic(
		this, &UMyRecorderSpeechRecognition::OnPopulateAudioData);
}

void UMyRecorderSpeechRecognition::StartAudioSession(bool bIsMuted)
{
	if (!SpeechRecognizer)
	{
		UE_LOG(LogTemp, Error, TEXT("Unable to start audio session. Speech Recognizer is not defined."));
		return;
	}

	SpeechRecognizer->StartSpeechRecognition(OnStartSpeechRecognitionEvent);
	IsAudioSessionMuted = bIsMuted;
}

void UMyRecorderSpeechRecognition::StopAudioSession()
{
	if (CapturableSoundWave)
	{
		CapturableSoundWave->StopCapture();
	}
	if (SpeechRecognizer)
	{
		SpeechRecognizer->StopSpeechRecognition();
	}
	Reset();
}

void UMyRecorderSpeechRecognition::SetAudioSessionMute(const bool bIsMuted)
{
	if (IsAudioSessionMuted == bIsMuted)
	{
		return;
	}

	IsAudioSessionMuted = bIsMuted;
	if (!CapturableSoundWave)
	{
		return;
	}

	CapturableSoundWave->ToggleMute(IsAudioSessionMuted);
}

bool UMyRecorderSpeechRecognition::GetAudioSessionMute()
{
	return IsAudioSessionMuted;
}

void UMyRecorderSpeechRecognition::OnRecognitionFinished()
{
}

void UMyRecorderSpeechRecognition::OnRecognitionError(const FString& ShortErrorMessage,
                                                           const FString& LongErrorMessage)
{
	UE_LOG(LogTemp, Error, TEXT("Speech Recognition Error. %s. %s"), *ShortErrorMessage, *LongErrorMessage);
}

void UMyRecorderSpeechRecognition::OnRecognizedTextSegment(const FString& RecognizedWords)
{
	// Send text to game.
}

void UMyRecorderSpeechRecognition::OnStartSpeechRecognition(bool bSucceeded)
{
	if (!CapturableSoundWave)
	{
		UE_LOG(LogTemp, Error, TEXT("Unable to start speech recognition. CapturableSoundWave is not defined."));
		return;
	}

	Reset();

	CapturableSoundWave->StartCapture(DeviceId);
	if (!IsAudioSessionMuted) SetAudioSessionMute(IsAudioSessionMuted);
}

float MathSumAbs(const TArray<float>& Population)
{
	float Std = 0;

	for (const auto Data : Population)
	{
		Std += FMath::Abs(Data);
	}
	return Std;
}

void UMyRecorderSpeechRecognition::OnPopulateAudioData(const TArray<float>& PopulatedAudioData)
{
	if (!SpeechRecognizer)
	{
		UE_LOG(LogTemp, Error, TEXT("Unable to process audio data. SpeechRecognizer is not defined."));
		return;
	}
	if (!CapturableSoundWave)
	{
		UE_LOG(LogTemp, Error, TEXT("Unable to process audio data. CapturableSoundWave is not defined."));
		return;
	}

	const float Sum = MathSumAbs(PopulatedAudioData);
	const float Deviation = FMath::Abs(ActiveAverage - Sum);

	if (ActiveAverage <= 0)
	{
		ActiveAverage = Sum;
		ActiveDeviation = FMath::Max(Deviation, VoiceActivationMinDeviation);
		return;
	}

	if (Sum > ActiveAverage + (ActiveDeviation * VoiceActivationDeviationMultiplier))
	{
		FTimerManager& TimerManager = GetWorld()->GetTimerManager();
		if (TimerManager.IsTimerActive(VoiceActivationTimerHandle))
		{
			TimerManager.ClearTimer(VoiceActivationTimerHandle);
		}
		else
		{
			UE_LOG(LogTemp, Log, TEXT("Voice Activation: ON"));
		}

		VoiceActivationDelegate.BindUObject(this, &UMyRecorderSpeechRecognition::OnVoiceActivation);
		TimerManager.SetTimer(VoiceActivationTimerHandle, VoiceActivationDelegate,
		                      VoiceActivationDelay, false);
		IsVoiceActivated = true;
	}

	ActiveAverage = (ActiveAverage * 0.3f) + (Sum * 0.7f);
	ActiveDeviation = FMath::Max((ActiveDeviation + Deviation) * 0.5f, VoiceActivationMinDeviation);

	if (!IsVoiceActivated)
	{
		const int WindowNum = PopulatedAudioData.Num() * (VoiceActivationWindow - 1);
		if (AudioData.Num() > WindowNum)
		{
			const int NumDataToRemove = AudioData.Num() - WindowNum;
			AudioData.RemoveAt(0, NumDataToRemove);
		}
	}
	AudioData.Append(PopulatedAudioData);

	// SpeechRecognizer->ProcessAudioData(PopulatedAudioData, CapturableSoundWave->GetSampleRate(),
	//                                    CapturableSoundWave->GetNumOfChannels(), false);
}

void UMyRecorderSpeechRecognition::Reset()
{
	ActiveAverage = -1;
	ActiveDeviation = VoiceActivationMinDeviation;

	IsVoiceActivated = false;

	AudioData.Empty();
}

void UMyRecorderSpeechRecognition::OnVoiceActivation()
{
	UE_LOG(LogTemp, Log, TEXT("Voice Activation: OFF"));

	SpeechRecognizer->ProcessAudioData(AudioData, CapturableSoundWave->GetSampleRate(),
	                                   CapturableSoundWave->GetNumOfChannels(), true);

	AudioData.Empty();
	IsVoiceActivated = false;
}

I called the blueprint example in UE4.27 but found that some nodes could not be found. Could you please give a configured project sample project file to download

Assertion failed: CaptureThread == nullptr on CapturableSoundWave->ToggleMute(false)

Hello again,

I am experiencing an issue with the CapturableSoundWave->ToggleMute(false). If I set it to mute (CapturableSoundWave->ToggleMute(true)) it works just fine, but when I try to unmute it it crashes with a failed assertion of a Windows library.

I was able to pin it down to the call AudioCapture.StartStream() in Plugins/RuntimeAudioImporter/Source/RuntimeAudioImporter/Private/Sound/CapturableSoundWave.cpp line 282. (Crash logs below).

Can you replicate the behavior? Is this specific to my setup?

I'm using the plugin from the commit 47481b5, also I'm on Windows 11 and using Unreal Engine 5.3.

PS: My current solution is to stop and restart the capture, like this:

CapturableSoundWave->StopCapture();
CapturableSoundWave->StartCapture(DeviceId);

Crash Logs

Assertion failed: CaptureThread == nullptr [File:D:\build\++UE5\Sync\Engine\Source\Runtime\AudioCaptureImplementations\Windows\AudioCaputureWasapi\Private\WasapiCaptureThread.cpp] [Line: 67]

UnrealEditor_RuntimeAudioImporter!UCapturableSoundWave::ToggleMute() [C:\P4\ricardo_mrkite-ricardo_TDY-RD_main_5595\Plugins\RuntimeAudioImporter\Source\RuntimeAudioImporter\Private\Sound\CapturableSoundWave.cpp:282]
UnrealEditor_Dreamia!UDreamiaCharacterComponent::SetAudioSessionMute() [C:\P4\ricardo_mrkite-ricardo_TDY-RD_main_5595\Plugins\Dreamia\Source\Dreamia\Private\DreamiaCharacterComponent.cpp:302]
UnrealEditor_Dreamia!UDreamiaCharacterComponent::execSetAudioSessionMute() [C:\P4\ricardo_mrkite-ricardo_TDY-RD_main_5595\Plugins\Dreamia\Intermediate\Build\Win64\UnrealEditor\Inc\Dreamia\UHT\DreamiaCharacterComponent.gen.cpp:211]
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_CoreUObject
UnrealEditor_UMG
UnrealEditor_UMG
UnrealEditor_UMG
UnrealEditor_Slate
UnrealEditor_Slate
UnrealEditor_Slate
UnrealEditor_Slate
UnrealEditor_Slate
UnrealEditor_Slate
UnrealEditor_Slate
UnrealEditor_ApplicationCore
UnrealEditor_ApplicationCore
UnrealEditor_ApplicationCore
UnrealEditor_ApplicationCore
user32
user32
UnrealEditor_ApplicationCore
UnrealEditor
UnrealEditor
UnrealEditor
UnrealEditor
UnrealEditor
UnrealEditor
kernel32
ntdll

Feature: Allow choose model in runtime (when creating the recognizer)

It's a common practice to use a tiny model for the streaming process. And use a larger model to get an accurate result when the full audio is received.

Can we add an option to choose the model when we create a recognizer? Then we can use 2 recognizers with different models to finish the job above.

Short Voice Inputs Not Recognized Immediately After Recording Starts

Environment:
Unreal Engine Version: 5.3

Issue Description:
When attempting to use the Runtime Speech Recognizer plugin to recognize short voice inputs (e.g., single-word commands like "yes"), the plugin fails to detect the input if spoken immediately after pressing the record button followed by stopping the record right after the voice input. The issue does not occur if I speak for a longer period or wait for a brief moment after pressing record before speaking.

Steps to Reproduce:

Start the speech recognition session using the "FROM MICROPHONE" functionality included in the demo project of the plugin.
Keep the default settings (Streaming defaults)
Immediately after pressing the "START RECORDING" button, say a short english word like "yes" and then stop recording by pressing "STOP RECORDING".

Expected Behavior:
The plugin should be able to recognize short voice inputs spoken immediately after recording starts.

Actual Behavior:
The plugin does not recognize the voice input unless I speak for a longer duration or pause for a moment after pressing the record button before speaking.

Additional Information:
This issue seems to impact the usability of the plugin for applications that rely on quick voice commands or short inputs.

LanguageModel object already exist prompt appears on every editor startup

Hi, I've noticed that after initial boot up after a selected language model has been downloaded and then LanguageModel.uasset created, every time I restart the editor I get the prompt about the asset already existing, asking whether I want to replace it or not. If I select no, new window will show saying that RuntimeSpeechRecognizer cannot function correctly because the language model asset could not be created.

I think there is an issue with the editor code not being able to detect whether the language model asset already exist or not and it's attempting on creating it from scratch regardless.

Unreal Engine version: 5.0.3
Plugin version: 1.0 (this is version specified in .uplugin file, this is latest from the marketplace)

Get Sample Rate showing error after recent update

No longer able to connect Capturable sound wave to 'target' in Get sample rate and further 'return value' to convert integer in float.

Issue cropped up the moment we made the recent August updates to the plugin in Unreal engine 5.1

Blank Audio returned in Unreal Engine 5.4

Following a Youtube video, I implemented the runtime speech recognition into my game on key press the same way the creator did

It seems to work except that no audio message is returned, it just says "You: [blank here where message should be]". Print string is also returning a value of 0.0

when enable base submix is ticked i do hear my voice back but its quite distorted, when un ticked you wouldnt know anything is happening at all because there is no sound and no string output

A few others in the comments are having the same issue, some finding solutions and yet those solutions dont work for everyone so i thought id ask here

solutions mentioned that have worked for some

1: go to your windows privacy settings and allow access for your mic
2: in the blueprints under the start capture section, experiment with changing device number to a number between 0 and 3 to ensure its capturing from the device you are using.

there are still people there who these fixes didnt work for, anyone have any more ideas?

I know i should look at the documentation but this seems a much quicker way for people to find and solve the issue and or related errors when first setting up voice recognition

gtreshchev / runtimespeechrecognizer Goto Github PK

runtimespeechrecognizer's People

Contributors

Stargazers

Watchers

Forkers

runtimespeechrecognizer's Issues

Header

Code

Recommend Projects

Recommend Topics

Recommend Org