dusty-nv / jetson-voice Goto Github PK

View Code? Open in Web Editor NEW

169.0 9.0 46.0 1.43 MB

ASR/NLP/TTS deep learning inference library for NVIDIA Jetson using PyTorch and TensorRT

Shell 1.57% Python 98.21% CMake 0.22%

speech-recognition nlp text-to-speech jetson jetson-nano deep-learning pytorch tensorrt

jetson-voice's Issues

Support for another language for TTS

Is there a way you can add or give instructions on how to adapt another language for the TTS.

jetson-voice on L4T 35.4.1

Is it possible to run this container on 35.4.1.
If possible what changes do I need to make ?

Thank you
Sandeep

Use anaconda run

how to use conda env to run all of python files~~?

Hi Dusty I was interested to see how your model compared to the official intent and slots example as it was not classifying the slots in the examples after 100 epochs (have started training with 250 epochs now) and performed poorly on additional inputs.

I experienced the following issue on Jetson Nano 4gb, Jetpack 4.5.1-b17

[TensorRT] VERBOSE: After vertical fusions: 250 layers
[TensorRT] VERBOSE: After final dead-layer removal: 250 layers
[TensorRT] VERBOSE: After tensor merging: 250 layers
[TensorRT] VERBOSE: After concat removal: 250 layers
[TensorRT] VERBOSE: Graph construction and optimization completed in 0.934568 seconds.
[TensorRT] VERBOSE: Constructing optimization profile number 0 [1/1].
[TensorRT] VERBOSE: *************** Autotuning format combination:  -> Float(1,768) ***************
[TensorRT] VERBOSE: *************** Autotuning format combination:  -> Half(1,768) ***************
Killed

Support for other languages

Could any one please suggest a way to replace the US English to another language.

Bad asr prediction on audio with a bit of noise

Hi,
first of all, thank you for providing this repo! I was able to set up speech recognition on my Jetson Nano 2GB relatively easily with it.
However, the quality of the prediction with the microphone I'm using is quite poor:

First I checked the provided dusty.wav file with the asr.py example. The predicted full sentences are, just as in the readme, pretty good:

hi hi this is dusty check on two two three.
what's the weather going to be tomorrow in pittsburg.
today is wednesday tomorrow is thursday.
i would like to order a large pepperoni pizza.

Then I tried to play this audio on a speaker and record it with the microphone that I intend to use for detection. It produced this audio file. If you play it, you can hear some noise, but you can still hear the voice very clearly (apart from the first 5 seconds). Still, the prediction on it is pretty bad:

they're going to be.
dawned.
thursday.
larger.
i going tomorrow.
this.
chat.
so.
three.
what weather.
tomorrow pittsburgh.
today is wednesday.
rotary.
ron.
is going tomorrow.
this is dusty.
ca no.
the.
what the weather tomorrow in pittsburgh.
today is wednesday tomorrow's thursday.

When I talk myself, the prediction is similarily bad.

Do you have an idea what might be the cause of it? Maybe there is a relatively simple fix to the preprocessing pipeline or some configuration that I can try?
I noticed that my recording has a very tiny echo. Maybe it's worth a shot to augment the training data in a similar way and retrain it? If you think that might help, can you outline how I would be able to do that?
Or is there maybe a better version of the quarznet model out there? You mentioned RIVA in another issue. Sadly I cannot use that because I need to make it work on the Jetson Nano 2GB. And quarznet already uses 95% of the memory I have. So it would be nice to make it work.

Using .nemo models

How to use .nemo models (for example https://ngc.nvidia.com/catalog/models/nvidia:nemo:stt_ru_quartznet15x5 )?

examples/asr.py --mic 11 --model asr/quartznet-15x5_ru/ru.nemo says "ValueError: resource 'asr/quartznet-15x5_ru/ru.nemo' has invalid extension '.nemo'"

I believe .nemo converted to bin with json somehow.

Actually I looking for pre-trained russian models, but there is no russian in --list-models list.

Models not working

I can run tests and they pass but if I attempt to run anything else they fail.

`
@Jetson:/jetson-voice/examples# ./asr.py --wav data/audio/dusty.wav
Namespace(debug=False, default_backend='tensorrt', global_config=None, list_devices=False, list_models=False, log_level='info', mic=None, model='quartznet', model_dir='data/networks', model_manifest='data/networks/manifest.json', profile=False, verbose=False, wav='data/audio/dusty.wav')
Traceback (most recent call last):
File "./asr.py", line 25, in
asr = ASR(args.model)
File "/jetson-voice/jetson_voice/asr.py", line 18, in ASR
return load_resource(resource, factory_map, *args, **kwargs)
File "/jetson-voice/jetson_voice/utils/resource.py", line 57, in load_resource
manifest = download_model(resource)
File "/jetson-voice/jetson_voice/utils/resource.py", line 166, in download_model
manifest = find_model_manifest(name)
File "/jetson-voice/jetson_voice/utils/resource.py", line 143, in find_model_manifest
manifest = load_models_manifest()
File "/jetson-voice/jetson_voice/utils/resource.py", line 128, in load_models_manifest
with open(path) as file:
FileNotFoundError: [Errno 2] No such file or directory: 'data/networks/manifest.json'

`PASSED TEST test_tts.py (fastpitch_hifigan) - return code 0

TEST SUMMARY

test_asr.py (quartznet) PASSED
test_asr.py (quartznet_greedy) PASSED
test_asr.py (matchboxnet) PASSED
test_asr.py (vad_marblenet) PASSED
test_nlp.py (distilbert_qa_128) PASSED
test_nlp.py (distilbert_qa_384) PASSED
test_nlp.py (distilbert_intent) PASSED
test_nlp.py (distilbert_sentiment) PASSED
test_nlp.py (distilbert_ner) PASSED
test_tts.py (fastpitch_hifigan) PASSED

passed 10 of 10 tests`

Support for other languages

Is there a way you can add or give instructions on how to adapt another language for the asr for instance spanish.

How to play this API without internet connection (without docker) ?

Hello!
Could you please include the CMakeLists files to build for a local install?

Running container Error

Nvidia Jetson Xavier NX | Jetpack 4.5 {L4T 32.5.0]

Initially did run the container without any issue. Tested and everything was working fine. However, next day can't run it. Following error:
xtend_m2@m1b2-ai:~/jetson-voice$ docker/run.sh
ARCH: aarch64
reading L4T version from /etc/nv_tegra_release
L4T BSP Version: L4T R32.5.0
[sudo] password for xtend_m2:
CONTAINER: dustynv/jetson-voice:r32.5.0
DEV_VOLUME:
DATA_VOLUME: --volume /home/xtend_m2/jetson-voice/data:/jetson-voice/data
USER_VOLUME:
USER_COMMAND:
Unable to find image 'dustynv/jetson-voice:r32.5.0' locally
docker: Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 127.0.0.53:53: read udp 127.0.0.1:37823->127.0.0.53:53: i/o timeout.
See 'docker run --help'.

Also, tried on another Xavier NX with clean installation. Same error.

I'd really appreciate an assistance

problems

I am using a Jetson Xavier NX with Jetpack 4.51. I have docker installed and my $USER is part of the docker group

I successfully pulled your repo
executed from ~/jetson-voice : docker/run.sh
here is the result:
ARCH: aarch64
reading L4T version from /etc/nv_tegra_release
L4T BSP Version: L4T R32.5.1
CONTAINER: dustynv/jetson-voice:r32.5.1
DEV_VOLUME:
DATA_VOLUME: --volume /home/rick/jetson-voice/data:/jetson-voice/data
USER_VOLUME:
USER_COMMAND:
Unable to find image 'dustynv/jetson-voice:r32.5.1' locally
docker: Error response from daemon: manifest for dustynv/jetson-voice:r32.5.1 not found: manifest unknown: manifest unknown.
so i tried to pull your image first: docker pull dustynv/jetson-voice
docker image ls shows the image is there
i try to:
docker/run.sh and get the same results
i try to: docker run dustynv/jetsonvoice:r32.5.0 ($USER is part of the docker group)
result:
Unable to find image 'dustynv/jetsonvoice:r32.5.0' locally
docker: Error response from daemon: pull access denied for dustynv/jetsonvoice, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

Please advise.
Thanks

Regarding Joint intent/slot classification - Wrong intents

I am trying to make a model for HVAC and infotainment based system by using Jetson Nano . But most of the intents generated was false . There were no much adequate labels in the dataset related to my project. Could anyone please suggest any solution for this issue or recommend a new pre-trained model suitable for this project.

These were the results of the queries :

Failed testing Result for matchboxnet and vad_marblenet

Hi Dusty
Thanks for the great repo and tutorial of introducing the ASR and NLP on Jetson device. I followed the instructions and got 2 error for 2 model s out of 10., here is my testing result.

TEST SUMMARY

test_asr.py (quartznet) PASSED
test_asr.py (quartznet_greedy) PASSED
test_asr.py (matchboxnet) FAILED
test_asr.py (vad_marblenet) FAILED
test_nlp.py (distilbert_qa_128) PASSED
test_nlp.py (distilbert_qa_384) PASSED
test_nlp.py (distilbert_intent) PASSED
test_nlp.py (distilbert_sentiment) PASSED
test_nlp.py (distilbert_ner) PASSED
test_tts.py (fastpitch_hifigan) PASSED

Matchboxnet testing log

RUNNING TEST (ASR)

model: matchboxnet
config: data/tests/asr_keyword.json

binding 0 - 'audio_signal'
input: True
shape: (1, 64, -1)
dtype: DataType.FLOAT
size: -256
dynamic: True
profiles: [{'min': (1, 64, 10), 'opt': (1, 64, 150), 'max': (1, 64, 300)}]

binding 1 - 'logits'
input: False
shape: (1, 12)
dtype: DataType.FLOAT
size: 48
dynamic: False
profiles: []

Vad_marblenet testing log

RUNNING TEST (ASR)

model: vad_marblenet
config: data/tests/asr_vad.json

binding 0 - 'audio_signal'
input: True
shape: (1, 64, -1)
dtype: DataType.FLOAT
size: -256
dynamic: True
profiles: [{'min': (1, 64, 10), 'opt': (1, 64, 150), 'max': (1, 64, 300)}]

binding 1 - 'logits'
input: False
shape: (1, 2)
dtype: DataType.FLOAT
size: 8
dynamic: False
profiles: []

When running command "examples/asr.py --model matchboxnet --wav data/audio/commands.wav", I got an error as follows:
RuntimeError: shape '[1, 154, 2]' is invalid for input of size 79156

When running command "examples/asr.py --model vad_marblenet --wav data/audio/commands.wav", I got a similar error like this:
RuntimeError: shape '[1, 34, 2]' is invalid for input of size 17476

Have you ever encountered this issue before?

Trying to get tts to load text from a file and size limitations

i'm interested in creating a wav like you did from the input but it seems to be quite limited on the amount of text it can load. Plus I'm looking for file loading. I tried to do a larger text but got:

[TensorRT] ERROR: 3: [executionContext.cpp::setBindingDimensions::969] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::969, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [1,80,2378] for bindings[0] exceed min ~ max range at index 2, maximum dimension in profile is 1024, minimum dimension in profile is 1, but supplied dimension is 2378.
)
Traceback (most recent call last):
File "tts2.py", line 90, in
audio = tts(args.text)
File "/jetson-voice/jetson_voice/models/tts/tts_engine.py", line 81, in call
audio = self.vocoder.execute(mels)
File "/jetson-voice/jetson_voice/backends/tensorrt/trt_model.py", line 114, in execute
setup_binding(self.bindings[idx], input)
File "/jetson-voice/jetson_voice/backends/tensorrt/trt_model.py", line 109, in setup_binding
binding.set_shape(input.shape)
File "/jetson-voice/jetson_voice/backends/tensorrt/trt_binding.py", line 80, in set_shape
raise ValueError(f"failed to set binding '{self.name}' with shape {shape}")
ValueError: failed to set binding 'mels' with shape (1, 80, 2378)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2308, GPU 3704 (MiB)

tts2.py is just copying the tts.py and adding a longer string and naming it something else.

How does one increase the size so it's not that limited?

Thanks for the help in advance.

How to run run and install the container on r32.7.2?

Hi,

I tried to install Nemo toolkit by following these commands on my Jetson nano L4T r32.7.2 but it's throwing an error. Can you please help me with installing Nemo in my Jetson nano?

$ git clone --branch dev https://github.com/dusty-nv/jetson-voice
$ cd jetson-voice
$ docker/run.sh

TTS model doesn't fit into Jetson Nano 2GB

I noticed that the provided fastpitch_hifigan model doesn't work with 2GB of RAM. Is anyone aware of a smaller model in NEMO that I can try to convert?
I also tried to run the model with TensorRT instead of the default onnxruntime, but some bugs in TensorRT prevent this.

Updates for JP 5.0 ?

Any updated containers for JP 5.0.1?

Thanks

Running out of disk space

How much disk space is required for the full container? I keep running out of disk space on my 32gb sd card which had about 6GB free before running the docker/run.sh

rtsp sound classification?

Hi,

I am working on a voice classification. Just laugh and cry.
May I ask if there is a custom training sound classification? (Just like your object detection training)
If not, is there any python audio tutorial?
Also, I need to get the sound from IP cam RTSP, I assume I shall use pyaudio? Not really sure about the details....
Thx

Btw, I have asked similar question in the forum.
https://forums.developer.nvidia.com/t/is-there-any-step-by-step-python-example-of-audio-classification-using-nano/213881/6

Extra configuration files from Nemo model

I'm doing some ASR tests and I want to use a different model than the ones offered here. I used the nemo_export_onnx script which produces an onnx and a single json file. Any of the other jsons and binaries from the models you offer are not generated. Is this the expected behavior?

How can I check that the transformation is done correctly?

dusty-nv / jetson-voice Goto Github PK

jetson-voice's Issues

TEST SUMMARY

TEST SUMMARY

Matchboxnet testing log

RUNNING TEST (ASR)

Vad_marblenet testing log

RUNNING TEST (ASR)

Recommend Projects

Recommend Topics

Recommend Org