Giter VIP home page Giter VIP logo

Comments (10)

semirke avatar semirke commented on August 23, 2024

OK, I got it, I had to create fixed size "parts" in audioToTensor.

from speechrecognition.

aruno14 avatar aruno14 commented on August 23, 2024

@semirke Hello,

Thank you to read my tutorial :)
I commited a fix, maybe it is similar to your fix.
Tell me if it works now.

from speechrecognition.

semirke avatar semirke commented on August 23, 2024

Hi !
Oh, thanks for the prompt response.
May I have a question? Does every record in your dataset have the same length?
It's strange for me that you use a random wav file to define the model input tensor size.

from speechrecognition.

aruno14 avatar aruno14 commented on August 23, 2024

@semirke

May I have a question? Does every record in your dataset have the same length?

Ask anything you want :)
No, all records do not have the same length, but I normalize all the records to voice_max_length (10s) in the voice_max_length() function.
Below code:

if len(audio_clean)<audio_length*voice_max_length:
    audio = tf.concat([np.zeros([audio_length*voice_max_length-len(audio_clean)]), audio_clean], 0)
else:
    audio = audio_clean[-(audio_length*voice_max_length):]

from speechrecognition.

semirke avatar semirke commented on August 23, 2024

Hmm, I might have some old missunderstanding...
I had troubles when tried to load a model and use it on different size of tensors.
Is that alright to use differnet dimension for the same model?

from speechrecognition.

aruno14 avatar aruno14 commented on August 23, 2024

It depends on the model, for example in the sentence.py model input is: (testParts.shape[0], None, None, 1)
First dimension size is fixed, second and third one are free and last one is fixed. Two dimensions are free because on next layer they are fixed using Resizing(6, 129).

from speechrecognition.

semirke avatar semirke commented on August 23, 2024

Yes, but it IS shape[0] that changes according to the voice length, isnt it?

from speechrecognition.

aruno14 avatar aruno14 commented on August 23, 2024

shape[0] changes in function of the length of the test file (testParts = audioToTensor(...)).
All the files including test file length are the same and fixed using voice_max_length.
After the model is created, input size is fixed and can not be changed.

from speechrecognition.

semirke avatar semirke commented on August 23, 2024

All the files including test file length are the same and fixed using voice_max_length.

Ah, there is my answer :)
I see now. I think we are on the same page then.
Instead I used a fix shape size that is supposedly longer than any of my voice files.

Thanks!

from speechrecognition.

semirke avatar semirke commented on August 23, 2024

@semirke

May I have a question? Does every record in your dataset have the same length?

Ask anything you want :)
No, all records do not have the same length, but I normalize all the records to voice_max_length (10s) in the voice_max_length() function.
Below code:

if len(audio_clean)<audio_length*voice_max_length:
    audio = tf.concat([np.zeros([audio_length*voice_max_length-len(audio_clean)]), audio_clean], 0)
else:
    audio = audio_clean[-(audio_length*voice_max_length):]

Hm.
Actually the shape change happens with
partsCount = len(range(0, len(spectrogram)-part_length, int(part_length/2)))

This is what I have for me:

    partsCount = len(range(0, len(spectrogram)-part_length, int(part_length/2)))
    if(partsCount> 2000):
        print("MAX PCNT REACHED: " + str(partsCount))
    partsCount = 2000
    parts = np.zeros((partsCount, part_length, 513))
    for i, p in enumerate(range(0, len(spectrogram)-part_length, int(part_length/2))):
        if i >= partsCount:
            break
        part = np.array(spectrogram[p:p+part_length])
        parts[i] = part
    return parts

However, Im not sure if this'll work out as I expect.

from speechrecognition.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.