linhdvu14 / vggvox-speaker-identification Goto Github PK

View Code? Open in Web Editor NEW

82.0 9.0 34.0 64.01 MB

Speaker identification with VGGVox network

Python 100.00%

speaker-recognition voxceleb vgg vggvox

vggvox-speaker-identification's Introduction

vggvox

Python adaptation of VGGVox speaker identification model, based on Nagrani et al 2017, "VoxCeleb: a large-scale speaker identification dataset"
Evaluation code only, based on the author's Matlab code and pretrained model.

Instructions

Install python3 and the required packages
Modify cfg/enroll_list.csv and cfg/test_list.csv to point to your local enroll/test wav files
To run evaluation: python3 scoring.py
Results will be stored in res/results.csv. Each line has format: [path to test wav], [correct speaker], [distance to enroll speaker 1],...[distance to enroll speaker N], [predicted speaker], [correct?]

vggvox-speaker-identification's People

Contributors

Stargazers

Watchers

vggvox-speaker-identification's Issues

Short segments

There is a bug when a wav file is shorted than frame_stepsample_ratelist(buckets.keys())[0]
since rsize is empty.
A possible workaround is to zero pad to min length

VGGVox 2

Hi all,

I'm following @linhdvu14's steps and triyng to export VGGVox 2 to Keras/TensorFlow, but apparently things get much more complicated.

I've tried some options like the one explained here (https://sefiks.com/2019/07/15/how-to-convert-matlab-models-to-keras/) but I've made no success. Apparently, the VGGVox2 model is implemented in Matlab using the MatConvNet toolbox and additionally a DAGNN wrapper, which makes the export task more complex.

Any suggesion about how to tackle this stuff?

Many thanks in advance!

about the conv_bn_dynamic_apool

I read your code and found that the 9*1 is a conv layer in conv_bn_dynamic_apool() function.
The paper says "replaced by two -layers-a fully connected layers of 9*1 and an average layer with 1/*8..."
I stuck on this for a long time. Maybe you are right, that is a conv layer, which make sense.

input:0 is both fed and fetched.

tensorflow.python.framework.errors_impl.InvalidArgumentError: input:0 is both fed and fetched.

I ran the code on Windows and it worked and then I downloaded it on Linux and it showed this error.
The versions of the packages are fine.

ValueError: Input 0 is incompatible with layer pad7: expected ndim=4, found ndim=2

code poorly classificating

Hello,
I am trying the model with some IDs retrieved from the voxCeleb1 database http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html and I am getting aroun 10% of correct classification.
Do you know why is it happening?
I have only changed the function "librosa.load" in signalprocess file for "sr,audio = wavfile.read(filename)" because I couldnt download avconv file needed for it

Thank you

true_fn and false_fn arguments to tf.cond must have same dimension

File "", line 1, in
runfile('/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py', wdir='/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification')

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py", line 79, in
test()

File "/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py", line 62, in test
model = vggvox_model()

File "/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py", line 44, in vggvox_model
pool='max',pool_size=(3,3),pool_strides=(2,2))

File "/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py", line 19, in conv_bn_pool
x = BatchNormalization(epsilon=1e-5,momentum=1,name='bn{}'.format(layer_idx))(x)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in call
outputs = call_fn(cast_inputs, *args, **kwargs)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/layers/normalization.py", line 741, in call
outputs = self._fused_batch_norm(inputs, training=training)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/layers/normalization.py", line 612, in _fused_batch_norm
lambda: 1.0)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/utils/tf_utils.py", line 65, in smart_cond
pred, true_fn=true_fn, false_fn=false_fn, name=name)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/smart_cond.py", line 59, in smart_cond
name=name)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1177, in cond
return cond_v2.cond_v2(pred, true_fn, false_fn, name)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/cond_v2.py", line 101, in cond_v2
name=scope)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/cond_v2.py", line 221, in _build_cond
_check_same_outputs(_COND, [true_graph, false_graph])

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/cond_v2.py", line 801, in _check_same_outputs
error(b, "%s and %s have different types" % (b0_out, bn_out))

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/cond_v2.py", line 779, in error
detail=error_detail))

TypeError: true_fn and false_fn arguments to tf.cond must have the same number, type, and overall structure of return values.

true_fn output: Tensor("Identity:0", shape=(), dtype=int32)
false_fn output: Tensor("Identity:0", shape=(), dtype=float32)

Error details:
Tensor("Identity:0", shape=(), dtype=int32) and Tensor("Identity:0", shape=(), dtype=float32) have different types

Here is the full error. Something is wrong with batchnorm but i could not understand why.The same operations done for both true_fn and false_fn.

MemoryError: Unable to allocate 13.9 MiB for an array with shape (1779, 512) and data type complex128

I am getting this error while running scoring.py file. I am using NVIDIA GPU on windows 10. I have also tried multiple solutions avaliable on stackoverflow

Processing enroll samples....
Traceback (most recent call last):

File "E:\task1\code\scoring.py", line 86, in
get_id_result()

File "E:\task1\code\scoring.py", line 60, in get_id_result
enroll_result = get_embeddings_from_list_file(model, c.ENROLL_LIST_FILE, c.MAX_SEC)

File "E:\task1\code\scoring.py", line 48, in get_embeddings_from_list_file
result['features'] = result['filename'].apply(lambda x: get_fft_spectrum(x, buckets))

File "C:\Users\Anaconda3\envs\vgg\lib\site-packages\pandas\core\series.py", line 3848, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)

File "pandas_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer

File "E:\task1\code\scoring.py", line 48, in
result['features'] = result['filename'].apply(lambda x: get_fft_spectrum(x, buckets))

File "E:\task1\code\wav_reader.py", line 43, in get_fft_spectrum
fft = abs(np.fft.fft(frames,n=c.NUM_FFT))

File "<array_function internals>", line 6, in fft

File "C:\Users\Anaconda3\envs\vgg\lib\site-packages\numpy\fft_pocketfft.py", line 188, in fft
output = _raw_fft(a, n, axis, False, True, inv_norm)

File "C:\Users\Anaconda3\envs\vgg\lib\site-packages\numpy\fft_pocketfft.py", line 77, in _raw_fft
r = pfi.execute(a, is_real, is_forward, fct)

MemoryError: Unable to allocate 13.9 MiB for an array with shape (1779, 512) and data type complex128

about MFCC

@linhdvu14 Hi, thanks for your code.
I know you are using the model with weight from VGGVOX, but where is the MFCC process?
Or you use different features?

the pretrained model file

Hi,
Thanks for sharing your code.
I noticed that the pretrained model file provided by original author is of .mat format, while yours is a .h5 file. Could you please share with me how you convert a matlab .mat file to the h5 filie? I need to convert another .mat file to h5, but I have encountered some difficulties.
Thanks.

CSV Example

Hello, I would like to see the csv output ("result.csv" in "res" folder) you have obtained because I don't see the last 2 fields of each audio file. I mean these ones: "[predicted speaker]" and "[correct?]".

what is the method "build_buckets" for ?

Hello, thanks for your code!

I'm in a voice recognition project. I've read your code and a little bit confused about the method "build_buckets" in scoring.py. What is the method for? What's the meaning of 's' in build_buckets?

Can you give a little bit explanation?

Training

Hello,
I am using this code to work with the database in voxCeleb1,as they do in "learnable PINs" paper. I am using only 9 IDs to check how the code works, but I only get 10% of correct voice recognition... could you please send me the code of training? or do you know why can it be?
I have tried using all the IDs but in that case the process is killed
Thank you

Model training

Hi,

I saw that you are using Keras to replicate the model of the VoxCeleb paper. Did you train the model that is part of this repository yourself?
If yes, could you replicate the results from the paper, i.e. identification accuracy of 80%. I'm currently having trouble replicating this, because of overfitting. Any help would be appreciated.

Thanks!

do you have model for vggox resnet50

dear linhdvu14

you vggox model is excellently ,do you have the model by vggvox on resnet50

thanks

where cfg/enroll_list.csv is ?

FileNotFoundError: File b'cfg/enroll_list.csv' does not exist

linhdvu14 / vggvox-speaker-identification Goto Github PK

vggvox-speaker-identification's Introduction

vggvox

Instructions

vggvox-speaker-identification's People

Contributors

Stargazers

Watchers

Forkers

vggvox-speaker-identification's Issues

Recommend Projects

Recommend Topics

Recommend Org