can i use this model for multiple word detection ? i want to train model for 5-6 t

<div class="highlight highlight-source-python notranslate position-relative overflow-auto" dir="auto

Comments (43)

dimanshu commented on June 9, 2024 1

okay im making data

from trigger-word-detection.

dimanshu commented on June 9, 2024 1

i have made data for
positive =300
negative =300
background =50 (Waterfall , road noises etc..)

from trigger-word-detection.

dimanshu commented on June 9, 2024 1

yes now it is changing
Epoch 1192/3000
26/26 [==============================] - 3s 104ms/step - loss: 0.8611 - accuracy: 0.8391
Epoch 1193/3000
26/26 [==============================] - 3s 103ms/step - loss: 0.8607 - accuracy: 0.8408

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024 1

i checked with training data it is not detecting the hot word.

The model requires more training. Although the accuracy is high, it is because threshold is 0.5, loss is still high somewhere around 0.6. So we may change the threshold if required or take loss as the reference.

keras.metrics.BinaryAccuracy
Stackoverflow binary_accuracy

Thanks for pointing out this bug. I will fix it.

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

Question: Can I use this model for multiple word detection?

Solution: Sure, you can. It will work for multiple trigger words, but it will require a huge dataset even if you want to train for one trigger word (large means around 4000 examples). If you have such a large dataset it will work for sure.

Clarification: In the positive folder the number of examples is less because it was just to make sure that the model is getting trained properly. A model which was trained on 4000 training examples for many hours on multiple GPUs is further trained with the help of those examples. But this code can be used to train a new model on a larger dataset from scratch.

Question: I want to train the model for 5-6 trigger words, so I have to add all the .wav files into the positive folder right?

Solution: Yes, that's right. But then you will have to do some minor changes in the main.py file, as in the current code model which has already been trained and stored, using the same code, is loaded to avoid training it every time we run the code.

Conclusion: It can be done using the same code, but require large dataset and some minor changes.

All the best! I am there to help you.

from trigger-word-detection.

dimanshu commented on June 9, 2024

what are the changes that i have to do in main.py to train 5-6 triggered words.

from trigger-word-detection.

dimanshu commented on June 9, 2024

i want to train for 5 hot words and then i have to check that any hot word out 5 is occurring in my file or not for this what are the changes i have to do?

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

"""All the files are integrated here."""

from settings import Settings 
from dataset import Dataset
from model import model, load_model, train_model


print('Everything is successfully imported')
settings = Settings()
print('Settings is done')

data = Dataset(settings)
# This command has to run only once because 
# then the dataset will be saved in X.npy and 
# Y.npy. Then this should be replaced with 
# data.load_dataset()
data.create_training_examples()
print('Data is done')

# This should also be ran only once
# then below lines should be replaced with
# model = load_model()
model = model()
train_model(model, data)
print('Model is ready')

model.summary()

Before running this delete the saved model and dataset (model_trained.h5, X.npy, and Y.npy). I don't think, previously saved files will interfere, but just remove them to avoid any bug.
Replace the code in the main.py file with the above code.

from trigger-word-detection.

dimanshu commented on June 9, 2024

i just have to put all positive data in positive folder right and train it . how much data is required and training time ?

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

There are approximately 0.527 million trainable parameters. After preparing the data, for multiple trigger words, we should have at least 5000 training examples.

from trigger-word-detection.

dimanshu commented on June 9, 2024

can you tell me all the steps for training ? i want to start training with one word . so i have to put 5000 audio files in positive folder ?how to make 5000 dataset for single word ?

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

No, you don't need to put 5000 positive files in positives. 5000 training examples are required.
300 positives, 300 negatives and 50 backgrounds will be sufficient to prepare 5000 training examples for a single word.

Steps training:

Change the architecture of the model if you want.
Create training examples (data.create_training_examples()).
Run the train_model() function which will train your model.

from trigger-word-detection.

dimanshu commented on June 9, 2024

and can i change sample rate also ? from 44100 to 16000hz ?

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

Yes, you can do that. It may also cause the shape of the output of the spectrogram to change, so take care of that.

from trigger-word-detection.

dimanshu commented on June 9, 2024

my wav file property is like this my concern is channels is 1 and bit_rate = 7705600. do i have to change this ? will this effect on my training ?

[STREAM]
index=0
codec_name=pcm_s16le
codec_long_name=PCM signed 16-bit little-endian
profile=unknown
codec_type=audio
codec_time_base=1/44100
codec_tag_string=[1][0][0][0]
codec_tag=0x0001
sample_fmt=s16
sample_rate=44100
channels=1
channel_layout=unknown
bits_per_sample=16
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/44100
start_pts=N/A
start_time=N/A
duration_ts=40703
duration=0.922971
bit_rate=705600
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]

but your wav file property is this :-

index=0
codec_name=pcm_s16le
codec_long_name=PCM signed 16-bit little-endian
profile=unknown
codec_type=audio
codec_time_base=1/44100
codec_tag_string=[1][0][0][0]
codec_tag=0x0001
sample_fmt=s16
sample_rate=44100
channels=2
channel_layout=unknown
bits_per_sample=16
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/44100
start_pts=N/A
start_time=N/A
duration_ts=441000
duration=10.000000
bit_rate=1411200
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]

from trigger-word-detection.

dimanshu commented on June 9, 2024

i converted this text to speech
"nice to meet you but i was reffering to the word is activate which is very demonstrate and kindly happy to see that which is there and no was but how can you do this to me but there is something in which you can do

but your model is not detecting the word activate

from trigger-word-detection.

dimanshu commented on June 9, 2024

getting this output
/home/dimanshu/venv/lib/python3.6/site-packages/matplotlib/axes/_axes.py:7581: RuntimeWarning: divide by zero encountered in log10
Z = 10. * np.log10(spec)

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

getting this output
/home/dimanshu/venv/lib/python3.6/site-packages/matplotlib/axes/_axes.py:7581: RuntimeWarning: divide by zero encountered in log10
Z = 10. * np.log10(spec)

It is due to zero values in our audio. You may close that warning or add a small numerical value to the audio for numerical stability.

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

i converted this text to speech
"nice to meet you but i was reffering to the word is activate which is very demonstrate and kindly happy to see that which is there and no was but how can you do this to me but there is something in which you can do

but your model is not detecting the word activate

Can you share the spectrogram of this audio?

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

my wav file property is like this my concern is channels is 1 and bit_rate = 7705600. do i have to change this ? will this effect on my training ?

[STREAM]
index=0
codec_name=pcm_s16le
codec_long_name=PCM signed 16-bit little-endian
profile=unknown
codec_type=audio
codec_time_base=1/44100
codec_tag_string=[1][0][0][0]
codec_tag=0x0001
sample_fmt=s16
sample_rate=44100
channels=1
channel_layout=unknown
bits_per_sample=16
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/44100
start_pts=N/A
start_time=N/A
duration_ts=40703
duration=0.922971
bit_rate=705600
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]

but your wav file property is this :-

index=0
codec_name=pcm_s16le
codec_long_name=PCM signed 16-bit little-endian
profile=unknown
codec_type=audio
codec_time_base=1/44100
codec_tag_string=[1][0][0][0]
codec_tag=0x0001
sample_fmt=s16
sample_rate=44100
channels=2
channel_layout=unknown
bits_per_sample=16
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/44100
start_pts=N/A
start_time=N/A
duration_ts=441000
duration=10.000000
bit_rate=1411200
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]

Understanding Audio Quality: Bit Rate, Sample Rate
Channel is not a problem, and bit rate is also not a problem it is half because your audio is single channel.

from trigger-word-detection.

dimanshu commented on June 9, 2024

actually i'm running this model on server so how can i generate spectrogram ? i converted tts and tested on your server it is not detecting can you please check.

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

actually i'm running this model on server so how can i generate spectrogram ? i converted tts and tested on your server it is not detecting can you please check.

Ok, I will check.

from trigger-word-detection.

dimanshu commented on June 9, 2024

and for that channel thing i converted channel 1 to 2 and now bit rate is same as yours but still it is not detecting the word

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

I recorded my voice and model predicted that correctly. Below is the spectrogram

This is a bit noisy, as you can see in the spectrogram, but the model worked.

I would like to share one more spectrogram but on this audio file model doesn't work
Since it records too much noise.

Both audio files are recorded in the same environment but from different microphones. So, please try to check your spectrogram, if possible, from which you may know if the microphone is picking too much noise.

from trigger-word-detection.

dimanshu commented on June 9, 2024

hey i trained it and im getting this :-

rder.
26/26 [==============================] - 15s 565ms/step - loss: 1.2999 - accuracy: 0.5010
Model is ready
Model: "model_1"

Layer (type) Output Shape Param #

input_1 (InputLayer) (None, 5511, 101) 0

conv1d_1 (Conv1D) (None, 1375, 196) 297136

batch_normalization_1 (Batch (None, 1375, 196) 784

activation_1 (Activation) (None, 1375, 196) 0

dropout_1 (Dropout) (None, 1375, 196) 0

gru_1 (GRU) (None, 1375, 128) 124800

dropout_2 (Dropout) (None, 1375, 128) 0

batch_normalization_2 (Batch (None, 1375, 128) 512

gru_2 (GRU) (None, 1375, 128) 98688

dropout_3 (Dropout) (None, 1375, 128) 0

batch_normalization_3 (Batch (None, 1375, 128) 512

dropout_4 (Dropout) (None, 1375, 128) 0

time_distributed_1 (TimeDist (None, 1375, 1) 129

Total params: 522,561
Trainable params: 521,657
Non-trainable params: 904

why there is only one epoch can i train for 100 epochs ?

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

Yes, you can. Just go the model.py file and change the number of epochs in the function train_model()

def train_model(model, data):
    """Fit the model on the given data.
    
    # Arguments
        model: An instance of class Model
        data: An instance of class Dataset
    """
    opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"])
    model.fit(data.X, data.Y, batch_size=5, epochs=1)
    model.save('./model/model_trained.h5')

from trigger-word-detection.

dimanshu commented on June 9, 2024

how many epochs will be perfect and hows my output ?

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

I recorded my voice and model predicted that correctly. Below is the spectrogram

This is a bit noisy, as you can see in the spectrogram, but the model worked.

I would like to share one more spectrogram but on this audio file model doesn't work
Since it records too much noise.

Both audio files are recorded in the same environment but from different microphones. So, please try to check your spectrogram, if possible, from which you may know if the microphone is picking too much noise.

Was this helpful?

from trigger-word-detection.

dimanshu commented on June 9, 2024

yes actually i converted text from google text to speech and then send it to this model but it is not detecting the text i send u

from trigger-word-detection.

dimanshu commented on June 9, 2024

how many epochs should i use for traning ?
and batch size i changed it to 64 and epoch 400 but accuracy is constant 0.5003

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

yes actually I converted text from google text to speech and then send it to this model but it is not detecting the text I send u

I hard to find out what's the problem, since it is working fine for me.

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

how many epochs should I use for training?

That depends on your dataset and how loss and accuracy are changing. It's difficult to tell without knowing that.

and batch size I changed it to 64 and epoch 400 but accuracy is constant 0.5003

Is it not changing even by a small value?
And how is the loss behaving?

from trigger-word-detection.

dimanshu commented on June 9, 2024

hey model is done with 3000 epochs and 0.85 accuracy but it is not detecting anything
http://34.83.214.234/test/
can you see this ?

from trigger-word-detection.

dimanshu commented on June 9, 2024

where im failing getting this output

from trigger-word-detection.

dimanshu commented on June 9, 2024

i deleted 2 old files from dev was that important i thought it was for your dataset?

FileNotFoundError: [Errno 2] No such file or directory: '/home/dimanshu/Trigger-Word-Detection/dataset/activate/dev/X_dev.npy'

from trigger-word-detection.

dimanshu commented on June 9, 2024

made positive data by google speech to text approx 100 and then changed its pitch to make it 300.
but now im recording 10 sec audio and passing to my model it is not detecting ,? 0.86 accuracy

from trigger-word-detection.

dimanshu commented on June 9, 2024

why im getting this graph

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

i deleted 2 old files from dev was that important i thought it was for your dataset?

FileNotFoundError: [Errno 2] No such file or directory: '/home/dimanshu/Trigger-Word-Detection/dataset/activate/dev/X_dev.npy'

No, it was just to test the accuracy on unseen data.

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

made positive data by google speech to text approx 100 and then changed its pitch to make it 300.
but now im recording 10 sec audio and passing to my model it is not detecting ,? 0.86 accuracy

It might be happening because the data on which you have trained is different from that on which you are testing. You can check this by doing the following:

Check the spectrogram of the training data and audio on which you are testing. I doubt that the testing audio's spectrogram is a lot noisy.
Create new examples in the same way you have created the training set. And if it's accuracy on this data is close to that of training accuracy then you are getting the wrong prediction because the distribution of the test data (previous one) is not the same as that of training data.

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

where im failing getting this output

Check how the model is predicting on the unseen data produced in the manner similar to that of the training data. If the model is performing well then you may have to train the model on the data similar to the one on which you want it to use.

from trigger-word-detection.

dimanshu commented on June 9, 2024

i checked with training data it is not detecting the hot word.

from trigger-word-detection.

dimanshu commented on June 9, 2024

http://34.83.214.234/test/ can you see my model please ?

from trigger-word-detection.

shivammalviya712 commented on June 9, 2024

import tensorflow as tf

model.compile(loss='binary_crossentropy', optimizer=opt, metrics=[tf.keras.metrics.BinaryAccuracy(threshold=0.1)])

This gives the accuracy of 0.1695, but still don't trust accuracy much because in the label most of the values will be zero and if the model predicts all the values as 0 then also it may get the accuracy of around 0.6.

All the best!!

from trigger-word-detection.

for multiple trigger word about trigger-word-detection HOT 43 CLOSED

Comments (43)

Layer (type) Output Shape Param #

time_distributed_1 (TimeDist (None, 1375, 1) 129

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent