Hi, thanks for your tutorial :) Im trying to run sentence.py with my dataset b

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

shape[0] changes in function of the length of the test file (<code class="notranslate"

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray). about speechrecognition HOT 10 OPEN

semirke commented on August 23, 2024

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

from speechrecognition.

Comments (10)

semirke commented on August 23, 2024

OK, I got it, I had to create fixed size "parts" in audioToTensor.

from speechrecognition.

aruno14 commented on August 23, 2024

@semirke Hello,

Thank you to read my tutorial :)
I commited a fix, maybe it is similar to your fix.
Tell me if it works now.

from speechrecognition.

semirke commented on August 23, 2024

Hi !
Oh, thanks for the prompt response.
May I have a question? Does every record in your dataset have the same length?
It's strange for me that you use a random wav file to define the model input tensor size.

from speechrecognition.

aruno14 commented on August 23, 2024

@semirke

May I have a question? Does every record in your dataset have the same length?

Ask anything you want :)
No, all records do not have the same length, but I normalize all the records to voice_max_length (10s) in the voice_max_length() function.
Below code:

if len(audio_clean)<audio_length*voice_max_length:
    audio = tf.concat([np.zeros([audio_length*voice_max_length-len(audio_clean)]), audio_clean], 0)
else:
    audio = audio_clean[-(audio_length*voice_max_length):]

from speechrecognition.

semirke commented on August 23, 2024

Hmm, I might have some old missunderstanding...
I had troubles when tried to load a model and use it on different size of tensors.
Is that alright to use differnet dimension for the same model?

from speechrecognition.

aruno14 commented on August 23, 2024

It depends on the model, for example in the sentence.py model input is: (testParts.shape[0], None, None, 1)
First dimension size is fixed, second and third one are free and last one is fixed. Two dimensions are free because on next layer they are fixed using Resizing(6, 129).

from speechrecognition.

semirke commented on August 23, 2024

Yes, but it IS shape[0] that changes according to the voice length, isnt it?

from speechrecognition.

aruno14 commented on August 23, 2024

shape[0] changes in function of the length of the test file (testParts = audioToTensor(...)).
All the files including test file length are the same and fixed using voice_max_length.
After the model is created, input size is fixed and can not be changed.

from speechrecognition.

semirke commented on August 23, 2024

All the files including test file length are the same and fixed using voice_max_length.

Ah, there is my answer :)
I see now. I think we are on the same page then.
Instead I used a fix shape size that is supposedly longer than any of my voice files.

Thanks!

from speechrecognition.

semirke commented on August 23, 2024

@semirke

May I have a question? Does every record in your dataset have the same length?

Ask anything you want :)
No, all records do not have the same length, but I normalize all the records to voice_max_length (10s) in the voice_max_length() function.
Below code:
if len(audio_clean)<audio_length*voice_max_length:
    audio = tf.concat([np.zeros([audio_length*voice_max_length-len(audio_clean)]), audio_clean], 0)
else:
    audio = audio_clean[-(audio_length*voice_max_length):]

Hm.
Actually the shape change happens with
partsCount = len(range(0, len(spectrogram)-part_length, int(part_length/2)))

This is what I have for me:

    partsCount = len(range(0, len(spectrogram)-part_length, int(part_length/2)))
    if(partsCount> 2000):
        print("MAX PCNT REACHED: " + str(partsCount))
    partsCount = 2000
    parts = np.zeros((partsCount, part_length, 513))
    for i, p in enumerate(range(0, len(spectrogram)-part_length, int(part_length/2))):
        if i >= partsCount:
            break
        part = np.array(spectrogram[p:p+part_length])
        parts[i] = part
    return parts

However, Im not sure if this'll work out as I expect.

from speechrecognition.

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray). about speechrecognition HOT 10 OPEN

Comments (10)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent