hpi-deeplearning / crnn-lid Goto Github PK

Code for the paper Language Identification Using Deep Convolutional Recurrent Neural Networks

License: GNU General Public License v3.0

Python 71.40% Shell 1.41% JavaScript 25.19% HTML 1.07% CSS 0.94%

language-identification deep-learning computer-vision cnns keras

crnn-lid's Introduction

Language Identification Using Deep Convolutional Recurrent Neural Networks

This repository contains the code for the paper "Language Identification Using Deep Convolutional Recurrent Neural Networks", which will be presented at the 24th International Conference on Neural Information Processing (ICONIP 2017).

Structure of the Repository

/data
- Scripts to download training data from Voxforge, European Parliament Speech Repository and YouTube. For usage details see the README in that folder.
/keras
- All the code for setting up and training various models with Keras/Tensorflow.
- Includes training and prediction script. See train.py and predict.py.
- Configure your learning parameters in config.yaml.
- More below
/tools
- Some handy scripts to clean filenames, normalize audio files and other stuff.
/web-server
- A demo project for language identification. A small web server with a REST interface for classification and a small web frontend to upload audio files. For more information see README in that folder.

Requirements

You can install all python requirements with pip install -r requirements.txt in the respective folders. You will additionally need to install the following software:

youtube_dl
sox

Models

The repository contains a model for 4 languages (English, German, French, Spanish) and a model for 6 languages (English, German, French, Spanish, Chinese, Russian). You can find these models in the folder web-server/model.

Training & Prediction

To start a training run, go into the keras directory, set all the desired properties and hyperparameters in the config.yaml file and train with Keras:

python train.py --config <config.yaml>

To predict a single audio file run:

python predict.py --model <path/to/model> --input <path/to/speech.mp3>

Audio files can be in any format understood by SoX. The pretrained model files need to be caomptible with Keras v1.

To evaluate a trained model you can run:

python evaluate.py --model <path/to/model> --config <config.yaml> --testset True

You can also create a visualisation of the clusters the model is able to produce by using our tsne.py script:

python tsne.py --model <path/to/model> --config <config.yaml>

In case you are interested in creating a visualization of what kind of patterns excite certain layers the most, you can create such a visualization with the following command:

python visualize_conv.py --model <path/to/model>

Labels

0 English,
1 German,
2 French,
3 Spanish,
4 Mandarin Chinese,
5 Russian

LICENSE

GPLv3 see LICENSE for more information.

Citation

If you find this code useful, please cite our paper:

@inproceedings{crnn-lid,
  title={Language Identification Using Deep Convolutional Recurrent Neural Networks},
  author={Bartz, Christian and Herold, Tom and Yang, Haojin and Meinel, Christoph},
  booktitle={International Conference on Neural Information Processing},
  pages={880--889},
  year={2017},
  organization={Springer}
}

crnn-lid's People

Contributors

Stargazers

Watchers

crnn-lid's Issues

Installing pysox on ubuntu via pip install, cannot resolve sox.h

If someone is having sox error while installing pysox. Do follow the second answer in this stakoverflow.
https://stackoverflow.com/questions/14756346/installing-pysox-on-ubuntu-via-pip-install-cannot-resolve-sox-h

Model Overfitting

I am try 4 different datasets. The biggest one contains 4 languages with 20600 pngs with 10 second spectrogramm for each language.

No luck. Train accuracy is 0.97 Validation and Test accuracy is 0.2 - 0.4. What dataset size I am must use?

P.S. I am use you default config. I am changed code (a little) to use Keras2 and Tensorflow 1.14.

voxforge download script is corrupted on OSX Catalina

in voxforge/download-data.sh in Line 19: curl $VOXFORGE_DATA_URL | grep -o '<a .*href=.*>' | sed -e 's/<a /\n<a /g' | sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d' | grep tgz$ > $ZIPS

on my System (OSX Catalina): the "\n" inserts an "n" in front of the ZIPS urls and stops further processing.
the error is detected by ls in the extract_tgz.sh script...

after removing it i still get the message "tar: failed to set default locale"... however the script unpacks everything and moves the wavs to the right folder...

Even data distribution

Hi,

Just a quick question - are the youtube audio samples evenly distributed between languages in some step? If they are, would you please tell me in which script is it happening? ~~If they aren't, could you explain why?~~ Sorry, I've just read in the paper that they are.

I'd also like to thank you for writing the paper in a really comprehensive way!

Early Stopping occurs during training

I am trying to train model. But While training, all on a sudden Early Stopping occurs. The model is supposed to be trained upto 50 epochs. But at 15-16 epochs it stops.

Can anyone tell why this early stopping occurs?

Cannot load model in "predict.py"

I was trying to load model
model = load_model("../web-server/model/2017-01-31-14-29-14.CRNN_EN_DE_FR_ES_CN_RU.model"), and I got this error, have you seen this before? I think it may be because of keras version. I think you are using Keras V1, but what specific version of keras are you using? Thanks!

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-89a8bb686762> in <module>()
     15 print type(ckpt)
     16 
---> 17 model = load_model("../web-server/model/2017-01-31-14-29-14.CRNN_EN_DE_FR_ES_CN_RU.model")
     18 print "get model"
     19 

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/keras/models.pyc in load_model(filepath, custom_objects)
    138         raise ValueError('No model found in config file.')
    139     model_config = json.loads(model_config.decode('utf-8'))
--> 140     model = model_from_config(model_config, custom_objects=custom_objects)
    141 
    142     # set weights

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/keras/models.pyc in model_from_config(config, custom_objects)
    187         raise Exception('`model_fom_config` expects a dictionary, not a list. '
    188                         'Maybe you meant to use `Sequential.from_config(config)`?')
--> 189     return layer_from_config(config, custom_objects=custom_objects)
    190 
    191 

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/keras/utils/layer_utils.pyc in layer_from_config(config, custom_objects)
     32         layer_class = get_from_module(class_name, globals(), 'layer',
     33                                       instantiate=False)
---> 34     return layer_class.from_config(config['config'])
     35 
     36 

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/keras/models.pyc in from_config(cls, config, layer_cache)
   1059             conf = normalize_legacy_config(conf)
   1060             layer = get_or_create_layer(conf)
-> 1061             model.add(layer)
   1062         return model

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/keras/models.pyc in add(self, layer)
    322                  output_shapes=[self.outputs[0]._keras_shape])
    323         else:
--> 324             output_tensor = layer(self.outputs[0])
    325             if type(output_tensor) is list:
    326                 raise Exception('All layers in a Sequential model '

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/keras/engine/topology.pyc in __call__(self, x, mask)
    489                                      '`layer.build(batch_input_shape)`')
    490             if len(input_shapes) == 1:
--> 491                 self.build(input_shapes[0])
    492             else:
    493                 self.build(input_shapes)

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/keras/layers/wrappers.pyc in build(self, input_shape)
    216 
    217     def build(self, input_shape):
--> 218         self.forward_layer.build(input_shape)
    219         self.backward_layer.build(input_shape)
    220 

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/keras/layers/recurrent.pyc in build(self, input_shape)
    731                                       self.W_o, self.U_o, self.b_o]
    732 
--> 733             self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
    734             self.U = K.concatenate([self.U_i, self.U_f, self.U_c, self.U_o])
    735             self.b = K.concatenate([self.b_i, self.b_f, self.b_c, self.b_o])

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.pyc in concatenate(tensors, axis)
    751         return tf.sparse_concat(axis, tensors)
    752     else:
--> 753         return tf.concat(axis, [to_dense(x) for x in tensors])
    754 
    755 

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.pyc in concat(values, axis, name)
   1108       ops.convert_to_tensor(
   1109           axis, name="concat_dim",
-> 1110           dtype=dtypes.int32).get_shape().assert_is_compatible_with(
   1111               tensor_shape.scalar())
   1112       return identity(values[0], name=scope)

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in convert_to_tensor(value, dtype, name, preferred_dtype)
   1009       name=name,
   1010       preferred_dtype=preferred_dtype,
-> 1011       as_ref=False)
   1012 
   1013 

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.pyc in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx)
   1105 
   1106     if ret is None:
-> 1107       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
   1108 
   1109     if ret is NotImplemented:

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.pyc in _constant_tensor_conversion_function(v, dtype, name, as_ref)
    215                                          as_ref=False):
    216   _ = as_ref
--> 217   return constant(v, dtype=dtype, name=name)
    218 
    219 

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.pyc in constant(value, dtype, shape, name, verify_shape)
    194   tensor_value.tensor.CopyFrom(
    195       tensor_util.make_tensor_proto(
--> 196           value, dtype=dtype, shape=shape, verify_shape=verify_shape))
    197   dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
    198   const_tensor = g.create_op(

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.pyc in make_tensor_proto(values, dtype, shape, verify_shape)
    434       nparray = np.empty(shape, dtype=np_dt)
    435     else:
--> 436       _AssertCompatible(values, dtype)
    437       nparray = np.array(values, dtype=np_dt)
    438       # check to them.

/f/gfs1/yiyangli/.lang_env/local/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.pyc in _AssertCompatible(values, dtype)
    345     else:
    346       raise TypeError("Expected %s, got %s of type '%s' instead." %
--> 347                       (dtype.name, repr(mismatch), type(mismatch).__name__))
    348 
    349 

TypeError: Expected int32, got <tf.Variable 'forward_forward_lstm_1_W_i_5:0' shape=(256, 512) dtype=float32_ref> of type 'Variable' instead.

Predict new audio - audio is not found

I am trying to predict.
I have specified the audio path correctly but still I am getting error:
ValueError: need at least one array to stack

full error:

('SpectrogramGenerator Exception: ', IOError(2, 'No such file or directory'), 'audios/speech.mp3')
Traceback (most recent call last):
  File "predict.py", line 42, in <module>
    predict(cli_args)
  File "predict.py", line 17, in predict
    data = np.stack(data)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/shape_base.py", line 335, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

Does input audio needs to be of exact 10 secconds?
another question:
python predict.py --model <path/to/model> --input <path/to/speech.mp3>

Here, what should be the path to model ?

Training comes to a stand-still

I am trying to train model on my dataset. However at epoch-3 training doesn't go further. It remains still.
Besides the code doesn't also utilize gpu.

Can anyone tell how to make it utilize GPU and accelerate training?
Thanks in advance!

some errors when run python train.py

Hi sir,
I am trying to using your crnn-lid. Firstly I have download the audio by using ./download-data.sh and convert audio files to spectrograms by running "python wav_to_spectrogram.py" .Then I set all the desired properties and hyperparameters in the config.yaml file. After that, I go into the keras directory and run "python train.py". Unfortunately, I got something wrong as follows:

Can you give me some help? Waiting for your reply. Thank you!

Question

Why do we need to do this in fact: "Use ffmpeg to convert and split WAV files into 10 second parts"?

After downloading we have big wav files. We can then directly convert them to spectogram image files.
This will slice anyway the image into 10 seconds spectograms.

Failing to import model package

Inside keras/models package there is an __init__.py file in which author has imported all the self created modules.
I am getting error: ModuleNotFoundError: No module named 'topcoder'. Self-created modules are failing to import.

Why am I getting this? What should I do?
Thanks in advance!

Fails to load weight file

Hello!
I am trying to run train.py module. Getting the following error:

IOError: Unable to open file (unable to open file: name = 'logs/2017-04-08-13-03-44/weights.08.model', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Error is coming form the topcoder_crnn_finetune.py module line: 38

model.load_weights("logs/2017-04-08-13-03-44/weights.08.model", by_name=True)

Why there is a static 2017-04-08-13-03-44 dir here? As long as I can see inside the logs folder directory gets generated with current date-time. And where can I get weights.08.model ?
Please anyone kindly tell me whats going wrong here!
Thanks in advance!

Training convergence

I'm trying to train the crnn network using latest keras on 2 languages (English and French) on a "youtube spoken" dataset. But it seems the validation accuracy (and not only) blocks at 0.5.

Could you give me some advices about that? I'd like to share in fact some trained models using the latest keras version for the different models you've implemented.

Thanks again!

`
bidirectional_1 (Bidirection (None, 512) 1574912

dense_1 (Dense) (None, 2) 1026

Total params: 8,444,418
Trainable params: 8,439,938
Non-trainable params: 4,480

None
WARNING:tensorflow:Variable *= will be deprecated. Use variable.assign_mul if you want assignment to the variable value or 'x = x * y' if you want a new python Tensor object.
Epoch 1/50
16384/16384 [==============================] - 6176s 377ms/step - loss: 0.1435 - acc: 0.9892 - recall: 0.9999 - precision: 0.5000 - val_loss: 1.6510 - val_acc: 0.6196 - val_recall: 1.0000 - val_precision: 0.5000

Epoch 00001: val_acc improved from -inf to 0.61963, saving model to logs/2018-10-12-02-34-27/weights.01.model
Epoch 2/50
16384/16384 [==============================] - 6156s 376ms/step - loss: 0.0580 - acc: 0.9955 - recall: 1.0000 - precision: 0.5000 - val_loss: 4.5258 - val_acc: 0.5111 - val_recall: 1.0000 - val_precision: 0.5000

Epoch 00002: val_acc did not improve from 0.61963
Epoch 3/50
16384/16384 [==============================] - 6152s 375ms/step - loss: 0.0390 - acc: 0.9964 - recall: 1.0000 - precision: 0.5000 - val_loss: 4.0108 - val_acc: 0.5033 - val_recall: 1.0000 - val_precision: 0.5000
`

ValueError: arrays must all be same length

While running the tsne.py code I'm getting the following error.

Traceback (most recent call last):
File "tsne.py", line 100, in
visualize_cluster(cli_args)
File "tsne.py", line 84, in visualize_cluster
plot_with_labels(lowD_weights, labels, config["label_names"], cli_args.plot_name)
File "tsne.py", line 17, in plot_with_labels
df = DataFrame({"x": lowD_Weights[:, 0], "y": lowD_Weights[:, 1], "label": labels})
File "/home/bini/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 275, in init
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/home/bini/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 411, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/home/bini/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 5496, in _arrays_to_mgr
index = extract_index(arrays)
File "/home/bini/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 5544, in extract_index
raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length

GPU-Training supported?

Dear Authors:

I want to know whether this code support GPU trainning? I tried install tensorflow-gpu with latest version and also 0.12.1 version. I will get errors tensor shapes doesn't match. I want to know whether you got the same error and how to fix it? Thanks

Size issue with convolutions

Hi,
I'm getting size issues wit convolutions for image size of 500x129. Max pooling is reducing too much the image after some convolutions

Train issue.

I am getting this error when trying to process train:

Logging to logs/2020-04-09-16-19-09
Traceback (most recent call last):
File "/home/varuzhan/Desktop/crnn-lid-master/keras/train.py", line 85, in
shutil.copytree("models", log_dir) # creates the log_dir
File "/usr/lib/python2.7/shutil.py", line 194, in copytree
names = os.listdir(src)
OSError: [Errno 2] No such file or directory: 'models'

downloading dataset error

hi i'm new to language identification and i would like to try your repo. but i encounter error when downloading the dataset. any tips how to solve it?

$ sh download-data.sh english
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1004k 0 1004k 0 0 280k 0 --:--:-- 0:00:03 --:--:-- 280k
download-data.sh: line 24: wget: command not found
tar: tmp/1028-20100710-hne.tgz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
ls: cannot access 'tmp/1028-20100710-hne/wav': No such file or directory
rm -f tmp/1028-20100710-hne.tgz
rm -rf tmp/1028-20100710-hne
download-data.sh: line 24: wget: command not found
tar: tmp/1337ad-20170321-ajg.tgz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
ls: cannot access 'tmp/1337ad-20170321-ajg/wav': No such file or directory

this is the error i get about can't find the tmp folder. i already tried creating folder "tmp" but the error still persist.

can you solve this

Training steps.

Can anyone explaint me training process step by step.I have tried every method and way but no results only errors:).

value of nb_val_samples is none: validation_data_generator.get_num_files() returns 0

I am trying to run the train.py script. I can see in the training section in line 59:
nb_val_samples=validation_data_generator.get_num_files()

validation_data_generator.get_num_files() is returning 0.
Thats why I am getting the exception:
Exception: When using a generator for validation data, you must specify a value for "nb_val_samples".

Kindly tell me whats going wrong!
Thanks in advance!

Performance on short speech

Hi there, first, thanks for the toolkit.

I am interested in applying this on short audios. I did a simple test by chopping the web-server/audio/samples audios into 10 seconds segments and ran predict.py separately on these segments with the existing model from web-server folder (assuming this model would be the best;)). When predicting them separately, the accuracy seemed quite low, about 60%. More similar tests with our own dataset received worse results... I understand short audio would be much tougher, but I still wonder if you'd have any insights if we can improve this. Thanks in advance.

Ben

Train error with finetune_crnn model.

I am getting this error during train(after downloading and spectograming downloads ).

Epoch 1/50
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/varuzhan/.local/lib/python2.7/site-packages/keras/engine/training.py", line 404, in data_generator_task
    generator_output = next(generator)
  File "/home/varuzhan/Desktop/crnn-lid-master/keras/data_loaders/csv_loader.py", line 36, in get_data
    label_batch[i, :] = to_categorical([label], nb_classes=self.config["num_classes"]) # one-hot encoding
  File "/home/varuzhan/.local/lib/python2.7/site-packages/keras/utils/np_utils.py", line 23, in to_categorical
    Y[i, y[i]] = 1.
IndexError: index 4 is out of bounds for axis 1 with size 4
Traceback (most recent call last):
  File "/home/varuzhan/Desktop/crnn-lid-master/keras/train.py", line 88, in <module>
    model_file_name = train(cli_args, log_dir)
  File "/home/varuzhan/Desktop/crnn-lid-master/keras/train.py", line 62, in train
    pickle_safe=True
  File "/home/varuzhan/.local/lib/python2.7/site-packages/keras/models.py", line 907, in fit_generator
    pickle_safe=pickle_safe)
  File "/home/varuzhan/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1425, in fit_generator
    'or (x, y). Found: ' + str(generator_output))
Exception: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None

train.py error

HI. In /keras/models/topcoder_crnn_finetune.py file there is a line:

model.load_weights("logs/2017-04-08-13-03-44/weights.08.model", by_name=True)

But in project there is no direction or file like this.What file is this?How can I load it?

"SpectrogramGenerator Exception: [Errno 2] No such file or directory: 'tmp_images/tmp_91484.png' audio_segment/malayalam"

Im new to AI and when i have try to run python wav_to_spectrogram.py --source audio_segment --target target_spectrogram it showing up an error "SpectrogramGenerator Exception: [Errno 2] No such file or directory: 'tmp_images/tmp_91484.png' audio_segment/malayalam" i have expecting your reply, Thanks in advance

Live testing not good

Dear Authors:

We trained your model actually (topcoderCRNN) which is good at training and even regression test(which means unseen data from training). But the live test is really bad. Live means we use microphone recorded data to test. Could you let us know what's the reason causing this? Is it the feature issue or the micro recorded data are much different from training data? Thanks.

SpectrogramGenerator Exception: [Errno 2] No such file or directory:

does anyone know where is wrong with this? i already tried setting the file_name to <os.path.abspath> but still the problem persists.

.``

Cannot predict some files

while predicting some audio file I am getting the following error

Using TensorFlow backend.

Traceback (most recent call last):

  File "predict.py", line 41, in <module>

    predict(cli_args)

  File "predict.py", line 16, in predict

    data = np.stack(data)

  File "/home/gamut/anaconda2/envs/xyz/li/lib/python2.7/site-packages/numpy/core/shape_base.py", line 335, in stack

    raise ValueError('need at least one array to stack')

ValueError: need at least one array to stack

Unable to open model

I have trained model on my custom dataset that only contains English and Japanese audios.
The model training stops early. And but no h5f file gets generated. I only get weight files like: weights.12.model.
That's why receiving error:
IOError: Unable to open file (Unable to open file: name = 'models/logs/2020-07-29-05-05-31/weights.12.model', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0)

full error:

Traceback (most recent call last):
  File "predict.py", line 45, in <module>
    predict(cli_args)
  File "predict.py", line 23, in predict
    model = load_model(cli_args.model_dir)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 133, in load_model
    f = h5py.File(filepath, mode='r')
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 271, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 101, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-nCYoKW-build/h5py/_objects.c:2840)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-nCYoKW-build/h5py/_objects.c:2798)
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open (/tmp/pip-nCYoKW-build/h5py/h5f.c:2117)
IOError: Unable to open file (Unable to open file: name = 'models/logs/2020-07-29-05-05-31/weights.12.model', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0)

Delete this issue

File name issue

Hi. I am getting this error when I am trying to run prediction for a single mp3 file:
Using TensorFlow backend.
('SpectrogramGenerator Exception: ', IOError(2, 'No such file or directory'), 'aaaa.mp3')

File is located in crnn-lid-master folder. Terminal command is:
python keras/predict.py --model home/varuzhan/Desktop/crnn-lid-master/web-server/model/2017-01-31-14-29-14.CRNN_EN_DE_FR_ES_CN_RU.model --input aaaa.mp3

Thanks.

on the usage of an additional labels

Dear authors,

have you ever considered to include a label for an "unknown" class filled with data from languages that we don't want to identify?
Furthermore, what about a label that represents silence?
I'll let you know when I completed my experiments :)

Training data segementation

Dear authors:
I just want to know why you choose 10 seconds segmentation as your training and prediction. Choosing smaller one will bother the performance or not (i.e.500 ms) for the sake of latency? Thanks.

Models origin

Hello. I'm seeing some models who are described as "TOPCODER etc". Ex: topcoder_5s_finetune, Topcoder_CRNN, etc.

Are these models from the SpokenLanguage topcoder contest? If this is the case, can you share the weights?

Thanks in advance!

Your project is SUPERB by the way!

numpy.AxisError: axis 1 is out of bounds for array of dimension 1

While running the evaluate.py code I'm getting the following error. I'm using ubuntu (18.04) terminal

Traceback (most recent call last):
File "evaluate.py", line 66, in
evaluate(cli_args)
File "evaluate.py", line 53, in evaluate
y_pred = np.argmax(probabilities, axis=1)
File "<array_function internals>", line 6, in argmax
File "/home/bini/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1153, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out)
File "/home/bini/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return _wrapit(obj, method, *args, **kwds)
File "/home/bini/.local/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 47, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
numpy.AxisError: axis 1 is out of bounds for array of dimension 1

wav_to_spectogram.py stops converting before it should

Hi,

I'm working with four languages and for each I have downloaded only one video so that I can check that the scripts work as they should before running them on my VM on the cloud.

The issue I have is that the script wav_to_spectogram.py acts weird with one language.
The languages and the number of segmented .wav file for each are:

Croatian - 64
English - 42
French - 39
Spanish - 45

So, the expected result after running the script is that there will be 38 or 39 .png spectograms for each language since the language with the least number of .wav files is French. It does execute as it should when I run it for all the languages except English:

But running the script with English manages to count only 13 files in English, even though there are 42:

I still haven't come up with an explanation to why it's happening, so any clue would be of a great help!

Here's the sources.yml that I used to download the videos if someone prefers to check it himself.

croatian:
  users:
    -
  playlists:
    - https://www.youtube.com/playlist?list=PLv3j2_RROTdEh39boAuP-JPeDR7dy6wih

english:
  users:
    - 
  playlists:
    - https://www.youtube.com/playlist?list=PLv3j2_RROTdHSp1oIY4L_t5xX0dFV3GMH

french:
  users:
    - 
  playlists:
    - https://www.youtube.com/playlist?list=PLv3j2_RROTdEgT-oLhk11Xjbev7Q02F3-
spanish:
  users:
    - 
  playlists:
    - https://www.youtube.com/playlist?list=PLv3j2_RROTdHpKmps4DaomrqXd8VmZV1g

I'd also note that I'm working with 3-seconds segments, so if someone will be recreating what I am doing, it is important to change the number of seconds by which the files will be splitted. It is on the line 66 in download_youtube.py from:

command = ["ffmpeg", "-y", "-i", f, "-map", "0", "-ac", "1", "-ar", "16000", "-f", "segment", "-segment_time", "10", output_filename]

to:

command = ["ffmpeg", "-y", "-i", f, "-map", "0", "-ac", "1", "-ar", "16000", "-f", "segment", "-segment_time", "3", output_filename]

For the same reason, it is necessary to change the size of the output spectogram on line 70 in wav_to_spectogram.py from:

parser.add_argument('--shape', dest='shape', default=[129, 500, 1], type=int, nargs=3)

to:

parser.add_argument('--shape', dest='shape', default=[129, 150, 1], type=int, nargs=3)

Thank you!

Changing to shorter segments

What do I need to change to train on shorter segments? I used data with minimum length of 3 seconds, but the wav_to_spec module still processes speech with only 10 seconds or more.

Predict single audio(mp3 file).

command:
python3 predict.py --model 2017-01-31-14-29-14.CRNN_EN_DE_FR_ES_CN_RU.model --input aaa.wav

error:
Traceback (most recent call last):
File "predict.py", line 7, in
from data_loaders.SpectrogramGenerator import SpectrogramGenerator
File "/home/varuzhan/Desktop/crnn-lid-master/keras/data_loaders/init.py", line 2, in
from .image_loader import ImageLoader
File "/home/varuzhan/Desktop/crnn-lid-master/keras/data_loaders/image_loader.py", line 2, in
from scipy.misc import imread
ImportError: cannot import name 'imread'