deepspectrum / deepspectrumlite Goto Github PK

Light-weight transfer learning framework for on-device speech and audio recognition using pre-trained image convolutional neural networks.

License: GNU General Public License v3.0

Python 50.97% Jupyter Notebook 49.03%

deepspectrumlite's Introduction

DeepSpectrum is a Python toolkit for feature extraction from audio data with pre-trained Image Convolutional Neural Networks (CNNs). It features an extraction pipeline which first creates visual representations for audio data - plots of spectrograms or chromagrams - and then feeds them to a pre-trained Image CNN. Activations of a specific layer then form the final feature vectors.

Please direct any questions or requests to Shahin Amiriparian (shahin.amiriparian at tum.de) or Maurice Gercuk (maurice.gerczuk at informatik.uni-augsburg.de).

Citing

If you use DeepSpectrum or any code from DeepSpectrum in your research work, you are kindly asked to acknowledge the use of DeepSpectrum in your publications.

S. Amiriparian, M. Gerczuk, S. Ottl, N. Cummins, M. Fre-itag, S. Pugachevskiy, and B. Schuller, “Snore Sound Classification Using Image-based Deep Spectrum Fea-tures,” in Proceedings INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 3512–3516, ISCA, August 2017.

@inproceedings{amiriparian2017Snore,
  title = {Snore Sound Classification Using Image-Based Deep Spectrum Features},
  booktitle = {Interspeech 2017},
  author = {Amiriparian, Shahin and Gerczuk, Maurice and Ottl, Sandra and Cummins, Nicholas and Freitag, Michael and Pugachevskiy, Sergey and Baird, Alice and Schuller, Bj{\"o}rn},
  year = {2017},
  month = aug,
  pages = {3512--3516},
  publisher = {{ISCA}},
  language = {en}
}

Installation

The easiest way to install DeepSpectrum is through the official pypi package which is built for every release tag on the master branch. For installing different branches or a more manual approach, you can also use the setup.py script with pip (only for Linux) and also an environment.yml for installing through conda (recommended on Windows and OSX).

Dependencies (only for installation with pip)

Python 3.7
ffmpeg

Installing the python package

We recommend that you first setup and activate a virtual python 3.7 environment. Then you can install the toolkit via pip:

pip install deepspectrum

Installation is now complete - you can skip to configuration or usage.

Manual Conda installation

You can use the included environment.yml file to create a new virtual python environment with DeepSpectrum by running:

conda env create -f environment.yml

Then activate the environmnet with:

conda activate DeepSpectrum

Installation is now completed - you can skip to configuration or usage.

Installation through pip (for Linux)

We recommend that you install the DeepSpectrum tool into a virtual environment. To do so first create a new virtualenvironment:

virtualenv -p python3 ds_virtualenv

This creates a minimal python installation in the folder "ds_virtualenv". You can choose a different name instead of "ds_virtualenv" if you like, but the guide assumes this name. You can then activate the virtualenv (Linux):

source ds_virtualenv/bin/activate

Once the virtualenv is activated, the tool can be installed from the source directory (containing setup.py) with this command:

pip install .

Installation is now completed - you can skip to configuration or usage.

GPU support

DeepSpectrum uses Tensorflow 1.15.2. GPU support should be automatically available, as long as you have CUDA version 10.0. If you cannot install cuda 10.0 globally, you can use Anaconda to install it in a virtual environment along DeepSpectrum.

Configuration

If you just want to start working with ImageNet pretrained keras-application models, skip to usage. Otherwise, you can adjust your configuration file to use other weights for the supported models. The default file can be found in deep-spectrum/src/cli/deep.conf:

[main]
size = 227
backend = keras

[keras-nets]
vgg16 = imagenet
vgg19 = imagenet
resnet50 = imagenet
inception_resnet_v2 = imagenet
xception = imagenet
densenet121 = imagenet
densenet169 = imagenet
densenet201 = imagenet
mobilenet= imagenet
mobilenet_v2 = imagenet
nasnet_large = imagenet
nasnet_mobile = imagenet

[pytorch-nets]
alexnet=
squeezenet=
googlenet=

Under keras-nets you can define network weights for the supported models. Setting the weights for a model to imagenet is the default and uses ImageNet pretrained models from keras-aplications. Three additional networks are also supported through pytorch: alexnet, squeezenet and googlenet. For these, no definition of the used weights is needed (or possible, for the time being). The downloaded keras-nets will be stored in $HOME/.keras.

Using the tool

You can access the scripts provided by the tool from the virtualenvironment by calling deepspectrum. The feature extraction component is provided by the subcommand features.

Features for AVEC2018 CES

The command below extracts features from overlapping 1 second windows spaced with a hop size of 0.1 seconds (-t 1 0.1) of the the file Train_DE_01.wav. It plots mel spectrograms (-m mel) and feeds them to a pre-trained VGG16 model (-en vgg16). The activations on the fc2 layer (-fl fc2) are finally written to Train_DE_01.arff as feature vectors in arff format. -nl suppresses writing any labels to the output file. The first argument after deepspectrum features must be the path to the audiofile(s).

deepspectrum features Train_DE_01.wav -t 1 0.1 -nl -en vgg16 -fl fc2 -m mel -o Train_DE_01.arff

Commandline Options

All options can also be displayed using deepspectrum features --help.

Required options

Option	Description	Default
-o, --output	The location of the output feature file. Supported output formats are: Comma separated value files and arff files. If the specified output file's extension is .arff, arff is chosen as format, otherwise the output will be in comma separated value format.	None

Extracting features from audio chunks

Option	Description	Default
-t, --window-size-and-hop	Define window and hopsize for feature extraction. E.g `-t 1 0.5` extracts features from 1 second chunks every 0.5 seconds.	Extract from the whole audio file.
-s, --start	Set a start time (in seconds) from which features should be extracted from the audio files.	0
-e, --end	Set an end time until which features should be extracted from the audio files.	None

Setting parameters for the audio plots

Option	Description	Default
-m, --mode	Type of plot to use in the system (Choose from: 'spectrogram', 'mel', 'chroma').	spectrogram
-fs, --frequency-scale	Scale for the y-axis of the plots used by the system (Choose from: 'linear', 'log' and 'mel'). This is ignored if mode=chroma or mode=mel. (default: linear)
-fql, --frequency-limit	Specify a limit for the y-axis in the spectrogram plot in frequency.	None
-d, --delta	If specified, derivatives of the given order of the selected features are displayed in the plots used by the system.	None
-nm, --number-of-melbands	Number of melbands used for computing the melspectrogram. Only takes effect with mode=mel.	128
-nfft	The length of the FFT window used for creating the spectrograms in number of samples. Consider choosing smaller values when extracting from small segments.	The next power of two from 0.025 x sampling_rate_of_wav
-cm, --colour-map	Choose a matplotlib colourmap for creating the spectrogram plots.	viridis

Parameters for the feature extractor CNN

Option	Description	Default
-en, --extraction-network	Choose the net for feature extraction as specified in the config file	vgg16
-fl, --feature-layer	Name of the layer from which features should be extracted.	fc2

Defining label information

You can use csv files for label information or explicitly set a fixed label for all input files. If you use csv files, numerical features are supported (e.g. for regression). If you do neither of those, each file is assigned the name of its parent directory as label. This can be useful if your folder structure already represents the class labels, e.g.

data                          Base Directory of your data
  ├─── class0                 Directory containing members of 'class0'
  |    └─── instance0.wav     Directory containing members of 'class1'
  ├─── class1                      
  |    └─── instance4.wav     
  |    └─── ...
  └─── class2.py              Directory containing members of 'class2'
       └─── instance20.wav

Option	Description	Default
-lf, --label-file	Specify a comma separated values file containing labels for each .wav file. It has to include a header and the first column must specify the name of the audio file (with extension!)	None
-tc, --time-continuous	Set labeling of features to time continuous mode. Only works in conjunction with -t and the specified label file has to provide labels for the specified hops in its second column.	False
-el, --explicit-label	Specify a single label that will be used for every input file explicitly.	None
-nts, --no-timestamps	Remove timestamps from the output.	Write timestamps in feature file.
-nl, --no-labels	Remove labels from the output.	Write labels in feature file.

Additional output

Option	Description	Default
-so, --spectrogram-out	Specify a folder to save the plots used during extraction as .pngs	None
-wo, --wav-out	Convenience function to write the chunks of audio data used in the extraction to the specified folder.	None

Configuration and Help

Option	Description	Default
-np, --number-of-processes	Specify the number of processes used for the extraction. Defaults to the number of available CPU cores	None
-c, --config	The path to the configuration file used by the program can be given here. If the file does not exist yet, it is created and filled with standard settings.	deep.conf
--help	Show help.	None

Extracting CNN-Descriptors from images

The tool also provides a commandline utility for extracting CNN descriptors from image data. It can be accessed through deepspectrum image-features with a reduced set of options. As with deepspectrum features, the first argument should be a folder containing the input image files (.png or .jpg). The available options are: -o, -c, -np, -en, -fl, -bs, -lf, -el,-nl and --help. These function the same as described above for deepspectrum features.

deepspectrumlite's People

Contributors

Stargazers

Watchers

Forkers

amirip huebner-t yasinkaya1 vincentkaras

deepspectrumlite's Issues

Missing output during training on Win 10

When training on Win 10, DeepSpectrum Lite exits without error message after start of training process. No training appears to be taking place, and the part of the program which saves and evaluates the model is not reached. Tensorboard logs only Hyperparameters and Graph but no metrics.

System Info:
OS: Windows 10
CPU: AMD Ryzen 7 5800X
RAM: 32GB
GPU: Nvidia RTX 2070S

Console Output:

INFO:deepspectrumlite.cli.train:Physical devices: 2021-08-23 13:50:32.175008: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-08-23 13:50:32.177427: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll 2021-08-23 13:50:32.216973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:2d:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5 coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s 2021-08-23 13:50:32.217068: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll 2021-08-23 13:50:32.243762: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll 2021-08-23 13:50:32.243829: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll 2021-08-23 13:50:32.257932: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll 2021-08-23 13:50:32.261163: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll 2021-08-23 13:50:32.299935: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll 2021-08-23 13:50:32.311553: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll 2021-08-23 13:50:32.312073: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll 2021-08-23 13:50:32.312164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 INFO:deepspectrumlite.cli.train:[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] INFO:deepspectrumlite.cli.train:Loaded hyperparameter configuration. INFO:deepspectrumlite.cli.train:Recognised combinations of settings: 1 2021-08-23 13:50:32.315669: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-23 13:50:32.316491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:2d:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5 coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s 2021-08-23 13:50:32.316560: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll 2021-08-23 13:50:32.316967: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll 2021-08-23 13:50:32.320816: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll 2021-08-23 13:50:32.321255: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll 2021-08-23 13:50:32.321794: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll 2021-08-23 13:50:32.322147: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll 2021-08-23 13:50:32.322682: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll 2021-08-23 13:50:32.328163: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll 2021-08-23 13:50:32.328682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2021-08-23 13:50:32.718698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-23 13:50:32.718799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2021-08-23 13:50:32.720523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2021-08-23 13:50:32.721068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6611 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:2d:00.0, compute capability: 7.5) 2021-08-23 13:50:32.721856: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
INFO:deepspectrumlite.cli.train:Using custom external parser: C:\Users\Vincent\DeepLearning\Projects\Jupyter_Projects\SpeechEmotion\src\label_parser.py:SEWAParser INFO:deepspectrumlite.cli.train:--- Starting trial: test_sewa_resnet50_run_0_config_0 INFO:deepspectrumlite.cli.train:{'label_parser': 'C:\Users\Vincent\DeepLearning\Projects\Jupyter_Projects\SpeechEmotion\src\label_parser.py:SEWAParser', 'prediction_type': 'regression', 'model_name': 'TransferBaseModel', 'basemodel_name': 'resnet50', 'weights': 'imagenet', 'tb_experiment': 'test_sewa_resnet50', 'tb_run_id': 'test_sewa_resnet50_run_0', 'num_units': 512, 'dropout': 0.2, 'optimizer': 'adam', 'learning_rate': 0.001, 'fine_learning_rate': 0.0001, 'loss': 'mse', 'activation': 'gelu', 'pre_epochs': 20, 'epochs': 100, 'finetune_layer': 0.4, 'batch_size': 16, 'sample_rate': 16000, 'normalize_audio': True, 'chunk_size': 3.0, 'chunk_hop_size': 1.0, 'stft_window_size': 0.128, 'stft_hop_size': 0.064, 'stft_fft_length': 0.128, 'mel_scale': True, 'lower_edge_hertz': 0.0, 'upper_edge_hertz': 8000.0, 'num_mel_bins': 128, 'num_mfccs': 0, 'cep_lifter': 0, 'db_scale': True, 'use_plot_images': True, 'color_map': 'viridis', 'image_width': 224, 'image_height': 224, 'resize_method': 'nearest', 'anti_alias': False, 'sap_aug_a': 0.5, 'sap_aug_s': 10, 'augment_cutmix': True, 'augment_specaug': True, 'da_prob_min': 0.1, 'da_prob_max': 0.5, 'cutmix_min': 0.075, 'cutmix_max': 0.25, 'specaug_freq_min': 0.1, 'specaug_freq_max': 0.3, 'specaug_time_min': 0.1, 'specaug_time_max': 0.3, 'specaug_freq_mask_num': 4, 'specaug_time_mask_num': 4} INFO:deepspectrumlite.cli.train:Load data pipeline ...
INFO:deepspectrumlite.cli.train:All data pipelines have been successfully loaded. INFO:deepspectrumlite.cli.train:Caching in memory is: True INFO:deepspectrumlite.cli.train:Instantiating model ... INFO:deepspectrumlite.cli.train:Running model ... INFO:deepspectrumlite.lib.model.ai_model:Create Keras model INFO:deepspectrumlite.lib.model.ai_model:Model: "resnet50" INFO:deepspectrumlite.lib.model.ai_model:_________________________________________________________________ INFO:deepspectrumlite.lib.model.ai_model:Layer (type) Output Shape Param # INFO:deepspectrumlite.lib.model.ai_model:================================================================= INFO:deepspectrumlite.lib.model.ai_model:input_1 (InputLayer) [(None, 224, 224, 3)] 0 INFO:deepspectrumlite.lib.model.ai_model:_________________________________________________________________ INFO:deepspectrumlite.lib.model.ai_model:resnet50 (Functional) (None, None, None, 2048) 23587712 INFO:deepspectrumlite.lib.model.ai_model:_________________________________________________________________ INFO:deepspectrumlite.lib.model.ai_model:global_average_pooling2d (Gl (None, 2048) 0 INFO:deepspectrumlite.lib.model.ai_model:_________________________________________________________________ INFO:deepspectrumlite.lib.model.ai_model:flatten (Flatten) (None, 2048) 0 INFO:deepspectrumlite.lib.model.ai_model:_________________________________________________________________ INFO:deepspectrumlite.lib.model.ai_model:dense (Dense) (None, 512) 1049088 INFO:deepspectrumlite.lib.model.ai_model:_________________________________________________________________ INFO:deepspectrumlite.lib.model.ai_model:dropout (Dropout) (None, 512) 0 INFO:deepspectrumlite.lib.model.ai_model:_________________________________________________________________ INFO:deepspectrumlite.lib.model.ai_model:dense_1 (Dense) (None, 1) 513 INFO:deepspectrumlite.lib.model.ai_model:================================================================= INFO:deepspectrumlite.lib.model.ai_model:Total params: 24,637,313 INFO:deepspectrumlite.lib.model.ai_model:Trainable params: 1,049,601 INFO:deepspectrumlite.lib.model.ai_model:Non-trainable params: 23,587,712 INFO:deepspectrumlite.lib.model.ai_model:_________________________________________________________________ INFO:deepspectrumlite.lib.model.ai_model:Starting model training ... 2021-08-23 13:50:49.997666: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing. 2021-08-23 13:50:49.997832: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started. 2021-08-23 13:50:49.998198: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs 2021-08-23 13:50:49.999942: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cupti64_110.dll 2021-08-23 13:50:50.099069: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down. 2021-08-23 13:50:50.099203: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed 2021-08-23 13:50:59.536468: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll 2021-08-23 13:51:00.252830: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0 2021-08-23 13:51:00.288410: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0 2021-08-23 13:51:01.318948: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing. 2021-08-23 13:51:01.319037: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started. 2021-08-23 13:51:01.676758: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data. 2021-08-23 13:51:01.677669: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed