Giter VIP home page Giter VIP logo

jetson-voice's Issues

jetson-voice on L4T 35.4.1

Is it possible to run this container on 35.4.1.
If possible what changes do I need to make ?

Thank you
Sandeep

Killed

Hi Dusty I was interested to see how your model compared to the official intent and slots example as it was not classifying the slots in the examples after 100 epochs (have started training with 250 epochs now) and performed poorly on additional inputs.

I experienced the following issue on Jetson Nano 4gb, Jetpack 4.5.1-b17

[TensorRT] VERBOSE: After vertical fusions: 250 layers
[TensorRT] VERBOSE: After final dead-layer removal: 250 layers
[TensorRT] VERBOSE: After tensor merging: 250 layers
[TensorRT] VERBOSE: After concat removal: 250 layers
[TensorRT] VERBOSE: Graph construction and optimization completed in 0.934568 seconds.
[TensorRT] VERBOSE: Constructing optimization profile number 0 [1/1].
[TensorRT] VERBOSE: *************** Autotuning format combination:  -> Float(1,768) ***************
[TensorRT] VERBOSE: *************** Autotuning format combination:  -> Half(1,768) ***************
Killed

Bad asr prediction on audio with a bit of noise

Hi,
first of all, thank you for providing this repo! I was able to set up speech recognition on my Jetson Nano 2GB relatively easily with it.
However, the quality of the prediction with the microphone I'm using is quite poor:

First I checked the provided dusty.wav file with the asr.py example. The predicted full sentences are, just as in the readme, pretty good:

hi hi this is dusty check on two two three.
what's the weather going to be tomorrow in pittsburg.
today is wednesday tomorrow is thursday.
i would like to order a large pepperoni pizza.

Then I tried to play this audio on a speaker and record it with the microphone that I intend to use for detection. It produced this audio file. If you play it, you can hear some noise, but you can still hear the voice very clearly (apart from the first 5 seconds). Still, the prediction on it is pretty bad:

they're going to be.
dawned.
thursday.
larger.
i going tomorrow.
this.
chat.
so.
three.
what weather.
tomorrow pittsburgh.
today is wednesday.
rotary.
ron.
is going tomorrow.
this is dusty.
ca no.
the.
what the weather tomorrow in pittsburgh.
today is wednesday tomorrow's thursday.

When I talk myself, the prediction is similarily bad.

Do you have an idea what might be the cause of it? Maybe there is a relatively simple fix to the preprocessing pipeline or some configuration that I can try?
I noticed that my recording has a very tiny echo. Maybe it's worth a shot to augment the training data in a similar way and retrain it? If you think that might help, can you outline how I would be able to do that?
Or is there maybe a better version of the quarznet model out there? You mentioned RIVA in another issue. Sadly I cannot use that because I need to make it work on the Jetson Nano 2GB. And quarznet already uses 95% of the memory I have. So it would be nice to make it work.

Models not working

I can run tests and they pass but if I attempt to run anything else they fail.

`
@Jetson:/jetson-voice/examples# ./asr.py --wav data/audio/dusty.wav
Namespace(debug=False, default_backend='tensorrt', global_config=None, list_devices=False, list_models=False, log_level='info', mic=None, model='quartznet', model_dir='data/networks', model_manifest='data/networks/manifest.json', profile=False, verbose=False, wav='data/audio/dusty.wav')
Traceback (most recent call last):
File "./asr.py", line 25, in
asr = ASR(args.model)
File "/jetson-voice/jetson_voice/asr.py", line 18, in ASR
return load_resource(resource, factory_map, *args, **kwargs)
File "/jetson-voice/jetson_voice/utils/resource.py", line 57, in load_resource
manifest = download_model(resource)
File "/jetson-voice/jetson_voice/utils/resource.py", line 166, in download_model
manifest = find_model_manifest(name)
File "/jetson-voice/jetson_voice/utils/resource.py", line 143, in find_model_manifest
manifest = load_models_manifest()
File "/jetson-voice/jetson_voice/utils/resource.py", line 128, in load_models_manifest
with open(path) as file:
FileNotFoundError: [Errno 2] No such file or directory: 'data/networks/manifest.json'

`

`PASSED TEST test_tts.py (fastpitch_hifigan) - return code 0


TEST SUMMARY

test_asr.py (quartznet) PASSED
test_asr.py (quartznet_greedy) PASSED
test_asr.py (matchboxnet) PASSED
test_asr.py (vad_marblenet) PASSED
test_nlp.py (distilbert_qa_128) PASSED
test_nlp.py (distilbert_qa_384) PASSED
test_nlp.py (distilbert_intent) PASSED
test_nlp.py (distilbert_sentiment) PASSED
test_nlp.py (distilbert_ner) PASSED
test_tts.py (fastpitch_hifigan) PASSED

passed 10 of 10 tests`

Support for other languages

Is there a way you can add or give instructions on how to adapt another language for the asr for instance spanish.

Running container Error

Nvidia Jetson Xavier NX | Jetpack 4.5 {L4T 32.5.0]

Initially did run the container without any issue. Tested and everything was working fine. However, next day can't run it. Following error:
xtend_m2@m1b2-ai:~/jetson-voice$ docker/run.sh
ARCH: aarch64
reading L4T version from /etc/nv_tegra_release
L4T BSP Version: L4T R32.5.0
[sudo] password for xtend_m2:
CONTAINER: dustynv/jetson-voice:r32.5.0
DEV_VOLUME:
DATA_VOLUME: --volume /home/xtend_m2/jetson-voice/data:/jetson-voice/data
USER_VOLUME:
USER_COMMAND:
Unable to find image 'dustynv/jetson-voice:r32.5.0' locally
docker: Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 127.0.0.53:53: read udp 127.0.0.1:37823->127.0.0.53:53: i/o timeout.
See 'docker run --help'.

Also, tried on another Xavier NX with clean installation. Same error.

I'd really appreciate an assistance

problems

I am using a Jetson Xavier NX with Jetpack 4.51. I have docker installed and my $USER is part of the docker group

  1. I successfully pulled your repo
  2. executed from ~/jetson-voice : docker/run.sh
  3. here is the result:
    ARCH: aarch64
    reading L4T version from /etc/nv_tegra_release
    L4T BSP Version: L4T R32.5.1
    CONTAINER: dustynv/jetson-voice:r32.5.1
    DEV_VOLUME:
    DATA_VOLUME: --volume /home/rick/jetson-voice/data:/jetson-voice/data
    USER_VOLUME:
    USER_COMMAND:
    Unable to find image 'dustynv/jetson-voice:r32.5.1' locally
    docker: Error response from daemon: manifest for dustynv/jetson-voice:r32.5.1 not found: manifest unknown: manifest unknown.
  4. so i tried to pull your image first: docker pull dustynv/jetson-voice
  5. docker image ls shows the image is there
  6. i try to:
    docker/run.sh and get the same results
  7. i try to: docker run dustynv/jetsonvoice:r32.5.0 ($USER is part of the docker group)
    result:
    Unable to find image 'dustynv/jetsonvoice:r32.5.0' locally
    docker: Error response from daemon: pull access denied for dustynv/jetsonvoice, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

Please advise.
Thanks

Regarding Joint intent/slot classification - Wrong intents

I am trying to make a model for HVAC and infotainment based system by using Jetson Nano . But most of the intents generated was false . There were no much adequate labels in the dataset related to my project. Could anyone please suggest any solution for this issue or recommend a new pre-trained model suitable for this project.

These were the results of the queries :

Stop music | audio_volume_mute
Play music | play_music
Play next track | play_music
Play previous track | music_query
Volume up | audio_volume_up
Volume down | audio_volume_up
mute | audio_volume_mute
Unmute | audio_volume_mute
AC temp increase | weather_query
AC temp decrease | weather_query
Fan on | social post
turn on fan | Play radio
on fan | social query
Fan off | audio volume mute
Fan speed increase | audio volume up
Fan speed decrease | audio volume up

Failed testing Result for matchboxnet and vad_marblenet

Hi Dusty
Thanks for the great repo and tutorial of introducing the ASR and NLP on Jetson device. I followed the instructions and got 2 error for 2 model s out of 10., here is my testing result.


TEST SUMMARY

test_asr.py (quartznet) PASSED
test_asr.py (quartznet_greedy) PASSED
test_asr.py (matchboxnet) FAILED
test_asr.py (vad_marblenet) FAILED
test_nlp.py (distilbert_qa_128) PASSED
test_nlp.py (distilbert_qa_384) PASSED
test_nlp.py (distilbert_intent) PASSED
test_nlp.py (distilbert_sentiment) PASSED
test_nlp.py (distilbert_ner) PASSED
test_tts.py (fastpitch_hifigan) PASSED

Matchboxnet testing log

RUNNING TEST (ASR)

model: matchboxnet
config: data/tests/asr_keyword.json

binding 0 - 'audio_signal'
input: True
shape: (1, 64, -1)
dtype: DataType.FLOAT
size: -256
dynamic: True
profiles: [{'min': (1, 64, 10), 'opt': (1, 64, 150), 'max': (1, 64, 300)}]

binding 1 - 'logits'
input: False
shape: (1, 12)
dtype: DataType.FLOAT
size: 48
dynamic: False
profiles: []

Vad_marblenet testing log

RUNNING TEST (ASR)

model: vad_marblenet
config: data/tests/asr_vad.json

binding 0 - 'audio_signal'
input: True
shape: (1, 64, -1)
dtype: DataType.FLOAT
size: -256
dynamic: True
profiles: [{'min': (1, 64, 10), 'opt': (1, 64, 150), 'max': (1, 64, 300)}]

binding 1 - 'logits'
input: False
shape: (1, 2)
dtype: DataType.FLOAT
size: 8
dynamic: False
profiles: []

When running command "examples/asr.py --model matchboxnet --wav data/audio/commands.wav", I got an error as follows:
RuntimeError: shape '[1, 154, 2]' is invalid for input of size 79156

When running command "examples/asr.py --model vad_marblenet --wav data/audio/commands.wav", I got a similar error like this:
RuntimeError: shape '[1, 34, 2]' is invalid for input of size 17476

Have you ever encountered this issue before?

Trying to get tts to load text from a file and size limitations

i'm interested in creating a wav like you did from the input but it seems to be quite limited on the amount of text it can load. Plus I'm looking for file loading. I tried to do a larger text but got:

[TensorRT] ERROR: 3: [executionContext.cpp::setBindingDimensions::969] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setBindingDimensions::969, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [1,80,2378] for bindings[0] exceed min ~ max range at index 2, maximum dimension in profile is 1024, minimum dimension in profile is 1, but supplied dimension is 2378.
)
Traceback (most recent call last):
File "tts2.py", line 90, in
audio = tts(args.text)
File "/jetson-voice/jetson_voice/models/tts/tts_engine.py", line 81, in call
audio = self.vocoder.execute(mels)
File "/jetson-voice/jetson_voice/backends/tensorrt/trt_model.py", line 114, in execute
setup_binding(self.bindings[idx], input)
File "/jetson-voice/jetson_voice/backends/tensorrt/trt_model.py", line 109, in setup_binding
binding.set_shape(input.shape)
File "/jetson-voice/jetson_voice/backends/tensorrt/trt_binding.py", line 80, in set_shape
raise ValueError(f"failed to set binding '{self.name}' with shape {shape}")
ValueError: failed to set binding 'mels' with shape (1, 80, 2378)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2308, GPU 3704 (MiB)

tts2.py is just copying the tts.py and adding a longer string and naming it something else.

How does one increase the size so it's not that limited?

Thanks for the help in advance.

TTS model doesn't fit into Jetson Nano 2GB

I noticed that the provided fastpitch_hifigan model doesn't work with 2GB of RAM. Is anyone aware of a smaller model in NEMO that I can try to convert?
I also tried to run the model with TensorRT instead of the default onnxruntime, but some bugs in TensorRT prevent this.

Running out of disk space

How much disk space is required for the full container? I keep running out of disk space on my 32gb sd card which had about 6GB free before running the docker/run.sh
image
image

rtsp sound classification?

Hi,

I am working on a voice classification. Just laugh and cry.
May I ask if there is a custom training sound classification? (Just like your object detection training)
If not, is there any python audio tutorial?
Also, I need to get the sound from IP cam RTSP, I assume I shall use pyaudio? Not really sure about the details....
Thx

Btw, I have asked similar question in the forum.
https://forums.developer.nvidia.com/t/is-there-any-step-by-step-python-example-of-audio-classification-using-nano/213881/6

Extra configuration files from Nemo model

I'm doing some ASR tests and I want to use a different model than the ones offered here. I used the nemo_export_onnx script which produces an onnx and a single json file. Any of the other jsons and binaries from the models you offer are not generated. Is this the expected behavior?

How can I check that the transformation is done correctly?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.