nvidia-riva / tutorials Goto Github PK

View Code? Open in Web Editor NEW

111.0 111.0 46.0 37.2 MB

NVIDIA Riva runnable tutorials

Jupyter Notebook 99.32% Python 0.68%

notebook nvidia riva

tutorials's People

Contributors

Stargazers

Watchers

tutorials's Issues

TTS Helm chart is outdated

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/deploy-eks.html

This is after changing k8s version to 1.22 as mentioned here and running eksctl create cluster -f eks_launch_conf.yaml

can't open tutorials /asr-finetune-conformer-ctc-nemo.ipynb

got error when open tutorials /asr-finetune-conformer-ctc-nemo.ipynb, please check this issue

Request for 'Bad List' of Noisy German Samples

Hello,
I am currently working with the German speech recognition data provided by this project and came across the following line in the README:
"In addition, we also filter out samples that are considered 'noisy', that is, samples having very high WER (word error rate) or CER (character error rate) w.r.t. a previously trained German model."
Unfortunately, I do not have access to a pre-trained German model to calculate the WER or CER for my dataset. This makes it challenging for me to filter out the noisy samples effectively.
Could you please provide a list of these 'noisy' samples or the criteria used to identify them?

Getting empty transcription for shorter length audios

Hello everyone!
After the riva model deployment, I'm generating transcriptions from my audio files. It's working fine.
I wanted head to head comparison of the raw .nemo model with the riva model repositories. But I've noticed that, in few cases like shorter audios(not applicable for all the shorter audios) I'm not getting transcriptions, where I was getting from raw .nemo model. I've tried with both Quick Start Scripts way and Docker way, but with no luck. Both of these give same results, as expected.

Here's the command I used to build rmir model:
riva-build speech_recognition /servicemaker-dev/<output name of rmir model> /servicemaker-dev/<riva model name> --name=conformer-bn-BD-asr-streaming --featurizer.use_utterance_norm_params=False --featurizer.precalc_norm_time_steps=0 --featurizer.precalc_norm_params=False --ms_per_timestep=40 --endpointing.start_history=200 --nn.fp16_needs_obey_precision_pass --endpointing.residue_blanks_at_start=-2 --chunk_size=0.16 --left_padding_size=1.92 --right_padding_size=1.92 --decoder_type=flashlight --decoding_language_model_binary=<lm_binary> --decoding_vocab=<decoder_vocab_file> --flashlight_decoder.lm_weight=0.2 --flashlight_decoder.word_insertion_score=0.2 --flashlight_decoder.beam_threshold=20. --language_code=bn-BD

What could be the underlying cause for not receiving transcriptions after the model transformation?
CC: @vinhngx

broken link for tts tutorials.

The link (https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/tts-python-basics-and-customization-with-ssml.html) shared in https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/tts-basics-customize-ssml.html#note is not valid anymore.

Final transcripts showing empty transcription

Hello everyone!
I've created my own NeMo model and then all other steps of riva model, rmir model and riva repositories according to documentation.
I'm using Quick Start Scripts to Deploy model. After using this line: riva_streaming_asr_client --audio_file <wav file location>, I'm not getting any transcription.
Here is the output I'm getting:

I1031 08:47:32.209681   109 riva_streaming_asr_client.cc:154] Using Insecure Server Credentials
Loading eval dataset...
filename: <wav file location>
Done loading 1 files
File: <wav file location>

Final transcripts:

Audio processed: 7.34695e-40 sec.

Not printing latency statistics because the client is run without the --simulate_realtime option and/or the number of requests sent is not equal to number of requests received. To get latency statistics, run with --simulate_realtime and set the --chunk_duration_ms to be the same as the server chunk duration
Run time: 4.68129 sec.
Total audio processed: 286.728 sec.
Throughput: 61.2499 RTFX

So, as you can see Final transcripts response is empty.

From docker log, I'm getting this:

  > Triton server is ready...
I1031 04:19:51.898931   423 riva_server.cc:120] Using Insecure Server Credentials
I1031 04:19:51.944115   423 model_registry.cc:110] Successfully registered: citrinet-1024-en-US-asr-streaming for ASR
W1031 04:19:51.961644   423 grpc_riva_asr.cc:157] citrinet-1024-en-US-asr-streaming has no configured wfst normalizer model 
I1031 04:19:51.980005   423 riva_server.cc:160] Riva Conversational AI Server listening on 0.0.0.0:50051
W1031 04:19:51.980062   423 stats_reporter.cc:41] No API key provided. Stats reporting disabled.
I1031 08:47:00.767860   428 grpc_riva_asr.cc:892] ASRService.StreamingRecognize called.
I1031 08:47:00.768599   428 grpc_riva_asr.cc:919] ASRService.StreamingRecognize performing streaming recognition with sequence id: 1779700260
I1031 08:47:00.800891   428 grpc_riva_asr.cc:976] Using model citrinet-1024-en-US-asr-streaming for inference
I1031 08:47:00.801008   428 grpc_riva_asr.cc:992] Model sample rate= 16000 for inference
I1031 08:47:00.848378   428 riva_asr_stream.cc:214] Detected format: encoding = 1 numchannels = 1 samplerate = 16000 bitspersample = 16
I1031 08:47:03.263229   428 grpc_riva_asr.cc:1093] ASRService.StreamingRecognize returning OK
I1031 08:47:32.256738   428 grpc_riva_asr.cc:892] ASRService.StreamingRecognize called.
I1031 08:47:32.257011   428 grpc_riva_asr.cc:919] ASRService.StreamingRecognize performing streaming recognition with sequence id: 2124845530
I1031 08:47:32.257077   428 grpc_riva_asr.cc:976] Using model citrinet-1024-en-US-asr-streaming for inference
I1031 08:47:32.257154   428 grpc_riva_asr.cc:992] Model sample rate= 16000 for inference
I1031 08:47:32.257484   428 riva_asr_stream.cc:214] Detected format: encoding = 1 numchannels = 1 samplerate = 16000 bitspersample = 16
I1031 08:47:36.936904   428 grpc_riva_asr.cc:1093] ASRService.StreamingRecognize returning OK

Can somebody direct me on how to comprehend what went wrong?

ValueError: cfg must have tokenizer config to create a tokenizer !

Training or Fine-Tuning an Acoustic Model：
Model fine-tuning is a set of techniques that makes fine adjustments to a pre-existing model using new data, so as to make it adapt to new situations while also retaining its original capabilities.
so I download this model:speechtotext_zh_cn_conformer.tlt and modidy the evaluate.yaml on test_ds.labels to the Mandarin then use this api:
!tao speech_to_text_citrinet evaluate
-e $SPECS_DIR/speech_to_text_citrinet/evaluate.yaml
-g 1
-k $KEY
-m $RESULTS_DIR/speechtotext_zh_cn_conformer.tlt
-r $RESULTS_DIR/citrinet/evaluate
test_ds.manifest_filepath=$DATA_DIR/train.json
but i got this error
raise ValueError("cfg must have tokenizer config to create a tokenizer !")
ValueError: cfg must have tokenizer config to create a tokenizer !

Unavailable model requested. Lang: en-US

riva_target_gpu_family="tegra"

Legacy arm64 platform to be enabled. Supported legacy platforms: xavier

riva_arm64_legacy_platform="xavier"

Enable or Disable Riva Services

service_enabled_asr=true
service_enabled_nlp=false
service_enabled_tts=false

just started the riva server via:
bash riva_init.sh
bash riva_start.sh

then tried the command:
riva_streaming_asr_client --audio_file=/opt/riva/wav/en-US_sample.wav
{
"name": "_InactiveRpcError",
"message": "<InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.INVALID_ARGUMENT\n\tdetails = "Error: Unavailable model requested. Lang: en-US, Type: offline"\n\tdebug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:50051 {grpc_message:"Error: Unavailable model requested. Lang: en-US, Type: offline", grpc_status:3, created_time:"2022-10-12T13:47:30.950278184+08:00"}"\n>",
"stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31m_InactiveRpcError\u001b[0m Traceback (most recent call last)\nCell \u001b[0;32mIn [5], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[43mriva_asr\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moffline_recognize\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcontent\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 2\u001b[0m asr_best_transcript \u001b[38;5;241m=\u001b[39m response\u001b[38;5;241m.\u001b[39mresults[\u001b[38;5;241m0\u001b[39m]\u001b[38;5;241m.\u001b[39malternatives[\u001b[38;5;241m0\u001b[39m]\u001b[38;5;241m.\u001b[39mtranscript\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124m"\u001b[39m\u001b[38;5;124mASR Transcript:\u001b[39m\u001b[38;5;124m"\u001b[39m, asr_best_transcript)\n\nFile \u001b[0;32m~/study/python/nemo/.nemo/lib/python3.8/site-packages/riva/client/asr.py:352\u001b[0m, in \u001b[0;36mASRService.offline_recognize\u001b[0;34m(self, audio_bytes, config, future)\u001b[0m\n\u001b[1;32m 350\u001b[0m request \u001b[39m=\u001b[39m rasr\u001b[39m.\u001b[39mRecognizeRequest(config\u001b[39m=\u001b[39mconfig, audio\u001b[39m=\u001b[39maudio_bytes)\n\u001b[1;32m 351\u001b[0m func \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mstub\u001b[39m.\u001b[39mRecognize\u001b[39m.\u001b[39mfuture \u001b[39mif\u001b[39;00m future \u001b[39melse\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mstub\u001b[39m.\u001b[39mRecognize\n\u001b[0;32m--> 352\u001b[0m \u001b[39mreturn\u001b[39;00m func(request, metadata\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mauth\u001b[39m.\u001b[39;49mget_auth_metadata())\n\nFile \u001b[0;32m~/study/python/nemo/.nemo/lib/python3.8/site-packages/grpc/channel.py:946\u001b[0m, in \u001b[0;36m_UnaryUnaryMultiCallable.call\u001b[0;34m(self, request, timeout, metadata, credentials, wait_for_ready, compression)\u001b[0m\n\u001b[1;32m 937\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__call\u001b[39m(\u001b[39mself\u001b[39m,\n\u001b[1;32m 938\u001b[0m request,\n\u001b[1;32m 939\u001b[0m timeout\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 942\u001b[0m wait_for_ready\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m,\n\u001b[1;32m 943\u001b[0m compression\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m):\n\u001b[1;32m 944\u001b[0m state, call, \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_blocking(request, timeout, metadata, credentials,\n\u001b[1;32m 945\u001b[0m wait_for_ready, compression)\n\u001b[0;32m--> 946\u001b[0m \u001b[39mreturn\u001b[39;00m _end_unary_response_blocking(state, call, \u001b[39mFalse\u001b[39;49;00m, \u001b[39mNone\u001b[39;49;00m)\n\nFile \u001b[0;32m~/study/python/nemo/.nemo/lib/python3.8/site-packages/grpc/_channel.py:849\u001b[0m, in \u001b[0;36m_end_unary_response_blocking\u001b[0;34m(state, call, with_call, deadline)\u001b[0m\n\u001b[1;32m 847\u001b[0m \u001b[39mreturn\u001b[39;00m state\u001b[39m.\u001b[39mresponse\n\u001b[1;32m 848\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 849\u001b[0m \u001b[39mraise\u001b[39;00m _InactiveRpcError(state)\n\n\u001b[0;31m_InactiveRpcError\u001b[0m: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.INVALID_ARGUMENT\n\tdetails = "Error: Unavailable model requested. Lang: en-US, Type: offline"\n\tdebug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:50051 {grpc_message:"Error: Unavailable model requested. Lang: en-US, Type: offline", grpc_status:3, created_time:"2022-10-12T13:47:30.950278184+08:00"}"\n>"
}

ASR model export unable to find nemo2riva version that meets the dependency with nvidia-eff

ASR model export
!pip install nemo2riva is not being resolved

cell:
print(riva_version)
!pip install nvidia-pyindex
!ngc registry resource download-version "nvidia/riva/riva_quickstart:"$riva_version
!pip install nemo2riva
!pip install protobuf==3.20.0

============
2.10.0
++++++++++
getting following resolution error
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://pypi.ngc.nvidia.com/
Collecting nemo2riva
Using cached nemo2riva-2.11.0-py3-none-any.whl (33 kB)
Collecting pyarmor<8 (from nemo2riva)
Using cached pyarmor-7.7.4-py2.py3-none-any.whl (2.3 MB)
Requirement already satisfied: nemo-toolkit>=1.13 in /usr/local/lib/python3.10/dist-packages (from nemo2riva) (1.19.0rc0)
INFO: pip is looking at multiple versions of nemo2riva to determine which version is compatible with other requirements. This could take a while.
Collecting nemo2riva
Using cached nemo2riva-2.10.0-py3-none-any.whl (33 kB)
Using cached nemo2riva-2.9.0-py3-none-any.whl (32 kB)
ERROR: Cannot install nemo2riva==2.10.0, nemo2riva==2.11.0 and nemo2riva==2.9.0 because these package versions have conflicting dependencies.

The conflict is caused by:
nemo2riva 2.11.0 depends on nvidia-eff<=0.6.2 and >=0.5.3
nemo2riva 2.10.0 depends on nvidia-eff<=0.6.2 and >=0.5.3
nemo2riva 2.9.0 depends on nvidia-eff<=0.6.2 and >=0.5.3

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Running asr-python-basics.ipynb code crashed Nvidia Riva container

I deployed Nvidia Riva on the remote machine using instructions from this quick start guide using the version nvidia/riva/riva_quickstart:2.6.0 of the quickstart.

I was trying to run the asr-python-basics.ipynb notebook, the prediction worked, but the container crashed.
Code for reproduction:

import io
import IPython.display as ipd
import grpc
import riva.client

auth = riva.client.Auth(uri='localhost:50051')
riva_asr = riva.client.ASRService(auth)

path = "./audio_samples/en-US_sample.wav"
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)

config = riva.client.RecognitionConfig()
config.language_code = "en-US"                    
config.max_alternatives = 1                       
config.enable_automatic_punctuation = True        
config.audio_channel_count = 1                    

response = riva_asr.offline_recognize(content, config)
asr_best_transcript = response.results[0].alternatives[0].transcript
print("ASR Transcript:", asr_best_transcript)
print("\n\nFull Response Message:")
print(response)

Error from the Riva container logs:

I1024 14:07:25.004743 91 grpc_server.cc:4544] Started GRPCInferenceService at 0.0.0.0:8001
I1024 14:07:25.004979 91 http_server.cc:3242] Started HTTPService at 0.0.0.0:8000
I1024 14:07:25.045824 91 http_server.cc:180] Started Metrics Service at 0.0.0.0:8002
  > Triton server is ready...
I1024 14:07:25.194975   403 riva_server.cc:120] Using Insecure Server Credentials
I1024 14:07:25.198563   403 model_registry.cc:110] Successfully registered: citrinet-1024-en-US-asr-offline for ASR
I1024 14:07:25.202152   403 model_registry.cc:110] Successfully registered: citrinet-1024-en-US-asr-streaming for ASR
I1024 14:07:25.205432   403 model_registry.cc:110] Successfully registered: conformer-en-US-asr-offline for ASR
I1024 14:07:25.208616   403 model_registry.cc:110] Successfully registered: conformer-en-US-asr-streaming for ASR
I1024 14:07:25.272111   403 model_registry.cc:110] Successfully registered: riva-punctuation-en-US for NLP
I1024 14:07:25.277859   403 model_registry.cc:110] Successfully registered: riva_intent_weather for NLP
I1024 14:07:25.278462   403 model_registry.cc:110] Successfully registered: riva_ner for NLP
I1024 14:07:25.279049   403 model_registry.cc:110] Successfully registered: riva_qa for NLP
I1024 14:07:25.279526   403 model_registry.cc:110] Successfully registered: riva_text_classification_domain for NLP
I1024 14:07:25.603746   403 model_registry.cc:110] Successfully registered: riva-punctuation-en-US for NLP
I1024 14:07:25.609462   403 model_registry.cc:110] Successfully registered: riva_intent_weather for NLP
I1024 14:07:25.610060   403 model_registry.cc:110] Successfully registered: riva_ner for NLP
I1024 14:07:25.610651   403 model_registry.cc:110] Successfully registered: riva_qa for NLP
I1024 14:07:25.611116   403 model_registry.cc:110] Successfully registered: riva_text_classification_domain for NLP
I1024 14:07:25.628804   403 model_registry.cc:110] Successfully registered: fastpitch_hifigan_ensemble-English-US for TTS
I1024 14:07:25.644143   403 riva_server.cc:160] Riva Conversational AI Server listening on 0.0.0.0:50051
W1024 14:07:25.644161   403 stats_reporter.cc:41] No API key provided. Stats reporting disabled.
W1024 14:07:26.005081 91 metrics.cc:426] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W1024 14:07:26.005117 91 metrics.cc:444] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W1024 14:07:26.005121 91 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W1024 14:07:27.005261 91 metrics.cc:426] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W1024 14:07:27.005292 91 metrics.cc:444] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W1024 14:07:27.005296 91 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
W1024 14:07:28.006394 91 metrics.cc:426] Unable to get power limit for GPU 0. Status:Success, value:0.000000
W1024 14:07:28.006411 91 metrics.cc:444] Unable to get power usage for GPU 0. Status:Success, value:0.000000
W1024 14:07:28.006415 91 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
I1024 14:19:04.273377   410 grpc_riva_asr.cc:484] ASRService.Recognize called.
I1024 14:19:04.273463   410 riva_asr_stream.cc:214] Detected format: encoding = 1 numchannels = 1 samplerate = 16000 bitspersample = 16
I1024 14:19:04.273468   410 grpc_riva_asr.cc:550] ASRService.Recognize performing streaming recognition with sequence id: 1093626779
I1024 14:19:04.273550   410 grpc_riva_asr.cc:580] Using model citrinet-1024-en-US-asr-offline for inference
I1024 14:19:04.273597   410 grpc_riva_asr.cc:595] Model sample rate= 16000 for inference
terminate called after throwing an instance of 'std::runtime_error'
  what():  punct_logits: failed to perform CUDA copy: invalid argument
Signal (6) received.
 0# 0x000056392435A7E9 in tritonserver
 1# 0x00007F011980A0C0 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# gsignal in /usr/lib/x86_64-linux-gnu/libc.so.6
 3# abort in /usr/lib/x86_64-linux-gnu/libc.so.6
 4# 0x00007F0119BC3911 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F0119BCF38C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007F0119BCF3F7 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007F0119BCF37F in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 8# 0x00007F0085B76B1E in /opt/tritonserver/backends/riva_nlp_pipeline/libtriton_riva_nlp_pipeline.so
 9# 0x00007F0085B88A1C in /opt/tritonserver/backends/riva_nlp_pipeline/libtriton_riva_nlp_pipeline.so
10# 0x00007F0085B38CC2 in /opt/tritonserver/backends/riva_nlp_pipeline/libtriton_riva_nlp_pipeline.so
11# 0x00007F0085B38B34 in /opt/tritonserver/backends/riva_nlp_pipeline/libtriton_riva_nlp_pipeline.so
12# 0x00007F0085C34B1F in /opt/tritonserver/backends/riva_nlp_pipeline/libtriton_riva_nlp_pipeline.so

E1024 14:19:10.933988  1386 client_object.cc:116] error: failed to do inference: Socket closed
I1024 14:19:10.934067  1386 grpc_riva_asr.cc:243] Could not get punctuated transcript from punctuator model for transcript "what is natural language processing", adding basic punctuation
I1024 14:19:10.935387   410 grpc_riva_asr.cc:664] ASRService.Recognize returning OK
/opt/riva/bin/start-riva: line 55:    91 Aborted                 (core dumped) ${CUSTOM_TRITON_ENV} tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000
One of the processes has exited unexpectedly. Stopping container.
W1024 14:19:16.092974   403 riva_server.cc:184] Signal: 15

K8s 1.21 is no longer supported by eksctl

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tutorials/deploy-eks.html

Repro (bash):

eksctl create cluster -f eks_launch_conf.yaml

Inference failed when using greedy decoder

Hi, I'm trying the deploy a NeMo conformer CTC model using Riva. It works well when evaluating with NeMo, but Riva is failing to infer when using this command during riva-build -

riva-build speech_recognition -f /servicemaker-dev/ASR-Model-Language-bn-val-wer-0.132.rmir /servicemaker-dev/ASR-Model-Language-bn-val-wer-0.132.riva \
--name=conformer-ctc-med-voicebook-it1-run3 \
--featurizer.use_utterance_norm_params=False \
--featurizer.precalc_norm_time_steps=0 \
--featurizer.precalc_norm_params=False \
--ms_per_timestep=40 \
--nn.fp16_needs_obey_precision_pass \
--chunk_size=0.16 \
--left_padding_size=1.92 \
--right_padding_size=1.92 \
--decoder_type=greedy \
--language_code=bn-BD --force

Using this command to transcribe an audio file -
/opt/riva/clients/riva_streaming_asr_client --audio_file /audios/a1.wav --language-code bn-BD --riva_uri localhost:50051
And getting this error in the riva speech container -

0105 06:11:48.388618   195 grpc_riva_asr.cc:1464] ASRService.StreamingRecognize called.
I0105 06:11:48.388924   195 grpc_riva_asr.cc:1491] ASRService.StreamingRecognize performing streaming recognition with sequence id: 761739119
I0105 06:11:48.389112   195 grpc_riva_asr.cc:1556] Using model conformer-ctc-med-voicebook-it1-run3 for inference
I0105 06:11:48.389196   195 grpc_riva_asr.cc:1573] Model sample rate= 16000 for inference
I0105 06:11:48.389746   195 riva_asr_stream.cc:214] Detected format: encoding = 1 numchannels = 1 samplerate = 8000 bitspersample = 16
I0105 06:11:48.390592   263 grpc_riva_asr.cc:1227] Creating resampler, audio file sample rate=8000 model sample_rate=16000
E0105 06:11:48.681532 92 ctc-decoder.cc:328] Inference failed in ASR decoder: basic_string::_M_construct null not valid
E0105 06:11:48.681618 92 backend_triton_api.cc:111] Model 'conformer-ctc-med-voicebook-it1-run3-ctc-decoder-cpu-streaming', instance: 'conformer-ctc-med-voicebook-it1-run3-ctc-decoder-cpu-streaming_0': failed executing 1 request(s) as one batch on device 0
W0105 06:13:28.664069   263 grpc_riva_asr.cc:1332] Response timeout. requests sent: 814 received: 52
E0105 06:13:28.664247   195 grpc_riva_asr.cc:1677] ASRService.StreamingRecognize returning failure

How to use a model that I've downloaded?

I've downloaded the models from https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechsynthesis_en_us_fastpitch_ipa and https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechsynthesis_en_us_hifigan_ipa. But I've no idea to use it. The tutorials just give how to generate speech with Riva TTS APIs. Can you give me a tutorial on how to generate speech using the downloaded models?

Error received trying RIVA basic inference

Tried ASR basic example using notebook hosted by nvidea in Getting started with RIVA course got this error when using audio files in .wav format other than samole audio.
<_InactiveRpcError of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "Error: config format doesn't match with header format" debug_error_string = "{"created":"@1682956061.861311444","description":"Error received from peer ipv4:172.18.0.4:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Error: config format doesn't match with header format","grpc_status":3}".

Attempting to end instances of riva_server

So I noticed once a riva_server is started it doesn't appear to ever get terminated. I am using recognizestreaming and I end() it in nodejs, but it fails to kill the process.

nvidia-riva / tutorials Goto Github PK

tutorials's People

Contributors

Stargazers

Watchers

Forkers

tutorials's Issues

Legacy arm64 platform to be enabled. Supported legacy platforms: xavier

Enable or Disable Riva Services

Recommend Projects

Recommend Topics

Recommend Org