Giter VIP home page Giter VIP logo

vggvox-speaker-identification's Introduction

vggvox

Instructions

  • Install python3 and the required packages
  • Modify cfg/enroll_list.csv and cfg/test_list.csv to point to your local enroll/test wav files
  • To run evaluation: python3 scoring.py
  • Results will be stored in res/results.csv. Each line has format: [path to test wav], [correct speaker], [distance to enroll speaker 1],...[distance to enroll speaker N], [predicted speaker], [correct?]

vggvox-speaker-identification's People

Contributors

linhdvu14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vggvox-speaker-identification's Issues

Short segments

There is a bug when a wav file is shorted than frame_stepsample_ratelist(buckets.keys())[0]
since rsize is empty.
A possible workaround is to zero pad to min length

VGGVox 2

Hi all,

I'm following @linhdvu14's steps and triyng to export VGGVox 2 to Keras/TensorFlow, but apparently things get much more complicated.

I've tried some options like the one explained here (https://sefiks.com/2019/07/15/how-to-convert-matlab-models-to-keras/) but I've made no success. Apparently, the VGGVox2 model is implemented in Matlab using the MatConvNet toolbox and additionally a DAGNN wrapper, which makes the export task more complex.

Any suggesion about how to tackle this stuff?

Many thanks in advance!

about the conv_bn_dynamic_apool

I read your code and found that the 9*1 is a conv layer in conv_bn_dynamic_apool() function.
The paper says "replaced by two -layers-a fully connected layers of 9*1 and an average layer with 1/*8..."
I stuck on this for a long time. Maybe you are right, that is a conv layer, which make sense.

input:0 is both fed and fetched.

tensorflow.python.framework.errors_impl.InvalidArgumentError: input:0 is both fed and fetched.

I ran the code on Windows and it worked and then I downloaded it on Linux and it showed this error.
The versions of the packages are fine.

code poorly classificating

Hello,
I am trying the model with some IDs retrieved from the voxCeleb1 database http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html and I am getting aroun 10% of correct classification.
Do you know why is it happening?
I have only changed the function "librosa.load" in signalprocess file for "sr,audio = wavfile.read(filename)" because I couldnt download avconv file needed for it

Thank you

true_fn and false_fn arguments to tf.cond must have same dimension

File "", line 1, in
runfile('/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py', wdir='/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification')

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py", line 79, in
test()

File "/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py", line 62, in test
model = vggvox_model()

File "/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py", line 44, in vggvox_model
pool='max',pool_size=(3,3),pool_strides=(2,2))

File "/home/batuhan/Desktop/Python/Staj/SpeakerRecognition/vggvox-speaker-identification/model.py", line 19, in conv_bn_pool
x = BatchNormalization(epsilon=1e-5,momentum=1,name='bn{}'.format(layer_idx))(x)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in call
outputs = call_fn(cast_inputs, *args, **kwargs)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/layers/normalization.py", line 741, in call
outputs = self._fused_batch_norm(inputs, training=training)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/layers/normalization.py", line 612, in _fused_batch_norm
lambda: 1.0)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/utils/tf_utils.py", line 65, in smart_cond
pred, true_fn=true_fn, false_fn=false_fn, name=name)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/smart_cond.py", line 59, in smart_cond
name=name)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1177, in cond
return cond_v2.cond_v2(pred, true_fn, false_fn, name)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/cond_v2.py", line 101, in cond_v2
name=scope)

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/cond_v2.py", line 221, in _build_cond
_check_same_outputs(_COND, [true_graph, false_graph])

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/cond_v2.py", line 801, in _check_same_outputs
error(b, "%s and %s have different types" % (b0_out, bn_out))

File "/home/batuhan/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/cond_v2.py", line 779, in error
detail=error_detail))

TypeError: true_fn and false_fn arguments to tf.cond must have the same number, type, and overall structure of return values.

true_fn output: Tensor("Identity:0", shape=(), dtype=int32)
false_fn output: Tensor("Identity:0", shape=(), dtype=float32)

Error details:
Tensor("Identity:0", shape=(), dtype=int32) and Tensor("Identity:0", shape=(), dtype=float32) have different types

Here is the full error. Something is wrong with batchnorm but i could not understand why.The same operations done for both true_fn and false_fn.

MemoryError: Unable to allocate 13.9 MiB for an array with shape (1779, 512) and data type complex128

I am getting this error while running scoring.py file. I am using NVIDIA GPU on windows 10. I have also tried multiple solutions avaliable on stackoverflow


Processing enroll samples....
Traceback (most recent call last):

File "E:\task1\code\scoring.py", line 86, in
get_id_result()

File "E:\task1\code\scoring.py", line 60, in get_id_result
enroll_result = get_embeddings_from_list_file(model, c.ENROLL_LIST_FILE, c.MAX_SEC)

File "E:\task1\code\scoring.py", line 48, in get_embeddings_from_list_file
result['features'] = result['filename'].apply(lambda x: get_fft_spectrum(x, buckets))

File "C:\Users\Anaconda3\envs\vgg\lib\site-packages\pandas\core\series.py", line 3848, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)

File "pandas_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer

File "E:\task1\code\scoring.py", line 48, in
result['features'] = result['filename'].apply(lambda x: get_fft_spectrum(x, buckets))

File "E:\task1\code\wav_reader.py", line 43, in get_fft_spectrum
fft = abs(np.fft.fft(frames,n=c.NUM_FFT))

File "<array_function internals>", line 6, in fft

File "C:\Users\Anaconda3\envs\vgg\lib\site-packages\numpy\fft_pocketfft.py", line 188, in fft
output = _raw_fft(a, n, axis, False, True, inv_norm)

File "C:\Users\Anaconda3\envs\vgg\lib\site-packages\numpy\fft_pocketfft.py", line 77, in _raw_fft
r = pfi.execute(a, is_real, is_forward, fct)

MemoryError: Unable to allocate 13.9 MiB for an array with shape (1779, 512) and data type complex128

about MFCC

@linhdvu14 Hi, thanks for your code.
I know you are using the model with weight from VGGVOX, but where is the MFCC process?
Or you use different features?

the pretrained model file

Hi,
Thanks for sharing your code.
I noticed that the pretrained model file provided by original author is of .mat format, while yours is a .h5 file. Could you please share with me how you convert a matlab .mat file to the h5 filie? I need to convert another .mat file to h5, but I have encountered some difficulties.
Thanks.

CSV Example

Hello, I would like to see the csv output ("result.csv" in "res" folder) you have obtained because I don't see the last 2 fields of each audio file. I mean these ones: "[predicted speaker]" and "[correct?]".

what is the method "build_buckets" for ?

Hello, thanks for your code!

I'm in a voice recognition project. I've read your code and a little bit confused about the method "build_buckets" in scoring.py. What is the method for? What's the meaning of 's' in build_buckets?

Can you give a little bit explanation?

Training

Hello,
I am using this code to work with the database in voxCeleb1,as they do in "learnable PINs" paper. I am using only 9 IDs to check how the code works, but I only get 10% of correct voice recognition... could you please send me the code of training? or do you know why can it be?
I have tried using all the IDs but in that case the process is killed
Thank you

Model training

Hi,

I saw that you are using Keras to replicate the model of the VoxCeleb paper. Did you train the model that is part of this repository yourself?
If yes, could you replicate the results from the paper, i.e. identification accuracy of 80%. I'm currently having trouble replicating this, because of overfitting. Any help would be appreciated.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.