felixbur / nkululeko Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 4.0 19.42 MB

Machine learning speaker characteristics

License: MIT License

Python 85.62% Shell 1.47% Jupyter Notebook 12.63% Makefile 0.27%

machine-learning pytorch speech

nkululeko's People

Contributors

Stargazers

Watchers

Forkers

bagustris eunjung31 aistairc

nkululeko's Issues

Complete example

Hey,

Could you provide a complete example, e.g., using emodb data, so the user only needs to adapt the nkululeko/src and database`s root and then type the following to run:

$ python3 my_experiment.py exp_A.ini

Currently, only a guide to making .ini file is available. It will be perfect to provide an .ini example with public dataset than can be run with minimum changes.

For your reference, I use the following .ini file but got the error below.
ini file:

[EXP]
root = /home/bagus/audb/emodb/
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = /home/bagus/audb/emodb/
emodb.split_strategy = speaker_split
emodb.testsplit = 40
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']
[FEATS]
type = os
[MODEL]
type = svm

Error:

bagus@m049:nkululeko$ python3.7 my_experiment.py emodb.ini 
Traceback (most recent call last):
  File "my_experiment.py", line 41, in <module>
    main('./maschinelle_sprachverarbeitung/experiment_1/exp_A.ini')
  File "my_experiment.py", line 17, in main
    expr = exp.Experiment(config)
  File "/data/github/nkululeko/src/experiment.py", line 29, in __init__
    self.name = glob_conf.config['EXP']['name']
  File "/usr/lib/python3.7/configparser.py", line 958, in __getitem__
    raise KeyError(key)
KeyError: 'EXP'

add OpenXBOW as features

write an interface to https://github.com/openXBOW/openXBOW
bag-of-audio-words from opensmile features extractor

feature importance for egemaps

not working in version 0.44.0

Add function to override default configuration

The example for INI file currently contain user-specific path, e.g.,

nkululeko/tests/exp_emodb_os_mlp.ini

Line 9 in 8581a0e

emodb = /home/audeering.local/fburkhardt/audb/emodb/1.3.0/fe182b91/

I want to request that the user can override the config without editing the config. The suggested CLI argument is -o or --override . So the command will be,

 python -m nkululeko -i tests/exp_emodb_os_mlp.ini -o "exp_emodb_os_mlp.data.emodb=/home/bagus/emodb"

Benefits: No need to modify INI file for testing/trying Nkululeko.

Also, should the path variable be consistent across datasets? In that emodb the dataset path is named "emodb" but in other datasets (Android) is named "data".

adding testing framework

Up to now, no real test framework has been implemented.

as a minimum, i use the test_runs.sh script to perform some experiments based on emodb

add mos / snr prediction model

to be used for data quality and bias checking

Nkululeko is being written without a real plan but simply adding features under high time pressure when needed for some projects.
This means that most of the code is spaghetti , best thing would be a complete re-write based on a stable architecture, second best would be to go along the existing architecture and enhance central classes like experiment, runmanager, dataset, etc.

interface wav2vec

wav2vec2 has been published by facebook
Should be interfaced as features,
in a later step with finetuning on the local training

Make feature selection specific to feature sets

With nkululeko, you can

combine feature sets
select specific features by name

Now what's missing is a way to say:

i want all features from set a (e.g. embeddings)
and these two special features from set b (e.g. Praat)

Add resampling functionality

Most models require 16 khz sampling rate, but data might be in other rates, so it'd be nice to automatically resample data

Supports common datasets

Supports common datasets and new datasets at a minimum: how to convert user's dataset or common speech datasets into CSV or audformat that matches nkululeko (currently only emodb works out the box). Suggested (I have seen some in the example): ravdess, IEMOCAP, msp-improv, msp-podcast, aesdd, CaFe, Crema-D, emovo, savee, emofilm, etc. This can be achieved by providing preprocessing.py for each dataset.

GUI

Make a nkululeko GUI
novel users struggle with command line, they are used to point and click.
Simply make a GUI that lets users select an ini file, perhaps edit it and show them the result of a nkululeko call.

automatically detect labels and bins

I guess it would be better if the labels would not need to be given explicitly but read from the datafile automatically.
I meant that the labels [anger, disgust, happy...] are already in the data. Currently you have to tell nkululeko which labels to use, but if you want all, that shouldn't be necessary

For regression I would define default binning
e.g. automatically assign three bins: (low, medium, high), and use the borders so they are equally distributed

Integrate a data checker/gatekeeper

I keep having problems with datasets that contain wav files that are

not 16khz mono wav
zero length
no speech contained
too short

would be nice to have a flag that checks the data before processing (i.e. train and devel) and removes faulty ones

Mismatch between confmatrix dim and labels length

This issue is related to #61.

If the number of given labels is more than the number of actual numbers, the confusion matrix will not be printed and the error reporter is generated.

ERROR reporter: mismatch between confmatrix dim (11) and labels length (12: ['boredom', 'neutral', 'happy', 'sad', 'angry', 'fear', 'disgust', 'surprise', 'excited', 'pleasure', 'pain', 'disapointed'])

So, automatically detecting labels is needed to avoid this error. The labels are still can be used if the user wants to evaluate specific labels instead of all labels (must be less than number of actual labels).

Possible workaround:

build automatic detect labels as raised in #61
check if the number of given labels more than actual number --> give warning but proceed with actual (automatic) labels
still print/plot confusing matrix --> but give warning that actual (automatic) labels were used instead of given labels

An example is using ASVP-ESD dataset (exp.ini) and TESS.

Google colab for tutorial

Provide tutorial on EmoDB, ravdess, and Crema-D using Google Colab and/or Kaggle.
Put under the docs.

More flexible data visualization

currently only target/gender distribution plots are available..
but there might be age, gender, emotion, duration, snr etc. information available

change value_counts in [EXPL]
so that

[DATA]
target = emotion
[EXPL]
value_counts = [['gender'], ['age', 'duration']]

would result in two plots: one showing the distribution of gender per emotion, and the other the scatterplot of age and duration colored by emotion.

3-dimensional feature distribution plots

currently distribution plots like t-sne or pca are only 2-dimensional, but as we can plot three dimensions, we should do so

Hubert embeddings

would be nice to add them for comparison to wav2vec 2

Make experiments persistent

Make experiments persistent/re-loadable to continue or demo over them

Feature request: adding version (`nkululeko.version`)

There is no version information in the current release; future releases should include this information (a standard feature in science packages like numpy, scipy, pandas, etc.)

>>> import nkululeko
>>> nkululeko.__version__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'nkululeko' has no attribute '__version__'
>>> nkululeko.__file__
'/home/bagus/github/nkululeko/nkululeko/__init__.py'

Error on plotting gender models

When I run the following commands,

python -m nkululeko.explore --config tests/exp_emodb_explore_data.ini

I got the following error:

ERROR plots: plot value counts: target gender has more than 2 values

I think the error is caused by the fact that Audeering models' resulted in three gender values instead of two, maybe.

Add a demo modus

Add live demo modus for best models

Code documentation

The code is almost entierly undocumented.
https://realpython.com/documenting-python-code/

add age/gender model / embeddings

there's a new public model from audeering finetuned on age and gender.
Could be interfaced just like the emotion model.
and actually being used to soft label data that is not age/gender labeled.

add the audeering model as features

Add the finetuned on emotional dimensions wav2vec2 odel from audeering
https://github.com/audeering/w2v2-how-to
as feature embeddings

Add possiblilty to combine features

E.g. embeddings and expert features simply as concatenations

Development status

Currently nkululeko still has alpha, but i wonder if it shouldn't be stable?

according to this website
these are the criteria:

4 - Beta ¶

Required properties: Software is feature complete
Typical steps: External testing, fixing bugs and performance problems. Usability testing.
Semantic version: 0.X.X
[PyPI](https://pypi.org/search/?q=&o=&c=Development+Status+%3A%3A+4+-+Beta): 10,000+ projects, including [pint](https://pypi.org/project/Pint/)

5 - Production/Stable ¶

Required properties: No major bugs known, tests cover the most important cases.
Typical steps: Fixing bugs, adding updates and new features.
Semantic version:

renaming "test" split to "dev"

Nkululeko only knows two splits: train and test.
but it would be more correct to name the "test" split "dev" (short for development), as we kind of always use it to optimize a model.
Any thoughts?

Mel spec features

Add an example for melspec feature extraction

Error on Combine Per Speaker

Hi @felixbur,

Thanks for the new video tutorial, l follow it. Using eperiment.ini provided in the Usage,
I got an error when trying to plot the confusion matrix for combined per speaker (I still got results).

(nkululeko) bagus@m049:nkulu_work$ python -m nkululeko.nkululeko --config exp_emodb.ini
DEBUG: running exp_emodb, nkululeko version 0.44.0
DEBUG: emodb: loading ...
DEBUG: emodb: loading from ./emodb/
DEBUG: emodb: loading tables: []
DEBUG: Loaded database emodb with 535 samples: got targets: True, got speakers: True, got sexes: True, got age: True
DEBUG: emodb: loaded data with 535 samples: got targets: True, got speakers: True, got sexes: True
DEBUG: splitting database emodb with strategy speaker_split
DEBUG: emodb: [273/262] samples in train/test
DEBUG: emodb: 262 samples in test and 273 samples in train
DEBUG: Categories test: ['anger', 'fear', 'boredom', 'disgust']
DEBUG: Categories train: ['disgust', 'anger', 'boredom', 'fear']
DEBUG: 5 speakers in test and 5 speakers in train
DEBUG: train shape : (172, 5), test shape:(151, 5)
DEBUG: extracting Praat features, this might take a while...
praat: extracting file 0 of 172
praat: extracting file 10 of 172
praat: extracting file 20 of 172
praat: extracting file 30 of 172
praat: extracting file 40 of 172
praat: extracting file 50 of 172
praat: extracting file 60 of 172
praat: extracting file 70 of 172
praat: extracting file 80 of 172
praat: extracting file 90 of 172
praat: extracting file 100 of 172
praat: extracting file 110 of 172
praat: extracting file 120 of 172
praat: extracting file 130 of 172
praat: extracting file 140 of 172
praat: extracting file 150 of 172
praat: extracting file 160 of 172
praat: extracting file 170 of 172
Warning: 1892 infinite in x
<class 'numpy.ndarray'>
DEBUG: praat feature names: Index(['duration', 'meanF0Hz', 'stdevF0Hz', 'HNR', 'localJitter',
       'localabsoluteJitter', 'rapJitter', 'ppq5Jitter', 'ddpJitter',
       'localShimmer', 'localdbShimmer', 'apq3Shimmer', 'apq5Shimmer',
       'apq11Shimmer', 'ddaShimmer', 'f1_mean', 'f2_mean', 'f3_mean',
       'f4_mean', 'f1_median', 'f2_median', 'f3_median', 'f4_median',
       'JitterPCA', 'ShimmerPCA', 'pF', 'fdisp', 'avgFormant', 'mff',
       'fitch_vtl', 'delta_f', 'vtl_delta_f'],
      dtype='object')
DEBUG: praat: shape : (172, 32)
DEBUG: extracting Praat features, this might take a while...
praat: extracting file 0 of 151
praat: extracting file 10 of 151
praat: extracting file 20 of 151
praat: extracting file 30 of 151
praat: extracting file 40 of 151
praat: extracting file 50 of 151
praat: extracting file 60 of 151
praat: extracting file 70 of 151
praat: extracting file 80 of 151
praat: extracting file 90 of 151
praat: extracting file 100 of 151
praat: extracting file 110 of 151
praat: extracting file 120 of 151
praat: extracting file 130 of 151
praat: extracting file 140 of 151
praat: extracting file 150 of 151
Warning: 1661 infinite in x
<class 'numpy.ndarray'>
DEBUG: praat feature names: Index(['duration', 'meanF0Hz', 'stdevF0Hz', 'HNR', 'localJitter',
       'localabsoluteJitter', 'rapJitter', 'ppq5Jitter', 'ddpJitter',
       'localShimmer', 'localdbShimmer', 'apq3Shimmer', 'apq5Shimmer',
       'apq11Shimmer', 'ddaShimmer', 'f1_mean', 'f2_mean', 'f3_mean',
       'f4_mean', 'f1_median', 'f2_median', 'f3_median', 'f4_median',
       'JitterPCA', 'ShimmerPCA', 'pF', 'fdisp', 'avgFormant', 'mff',
       'fitch_vtl', 'delta_f', 'vtl_delta_f'],
      dtype='object')
DEBUG: praat: shape : (151, 32)
DEBUG: All features: train shape : (172, 32), test shape:(151, 32)
DEBUG: run 0
DEBUG: run: 0 epoch: 0: result: test: 0.498 UAR
DEBUG: plotting confusion matrix to emodb_svm_praat__0_000_cnf
DEBUG: labels: ['anger' 'boredom' 'disgust' 'fear']
DEBUG: result per class (F1 score): [0.745, 0.846, 0.102, 0.4]
151 151 151
DEBUG: plotting speaker combination (True) confusion matrix to result_combined_per_speaker
ERROR: unkown function True

It seems that argument True for combined_per_speaker in INI file is parsed as function.

subplots for runs

currently all runs overwrite each other, perhaps add subfolders per run?

Reduce the dependencies of nkululeko

Currently, a nkululeko environment is quite large and needs numerous packages to be installed. Would be great to check for each of them whether they could be replaced or somehow at least reduced in size.

Wav2vec2.0 embeddings

make the layer from which the wav2vec embeddings are taken, configurable.
Currently it's always the last hidden one

using ravdess dataset as csv

not working in version 0.44.0

Adding audio_path to DATA section

Currently, the filename in the database (CSV) must contain a full path instead of a basename only. In most cases, the provider of the dataset only provides a file with a list of the basenames (for platform independence). So, I would like to request adding audio_path to the DATA section in the INI file.

This can be optional, meaning, that if this option is not given, Nkululeko will search file path in the given CSV file (current behavior).

Example usage (see train.audio_path and dev.audio_path)

[DATA]
databases = ['train', 'test', 'dev']
train = ./data/ravdess/ravdess_train.csv
train.type = csv
train.absolute_path = False
train.split_strategy = train
train.audio_path = ./data/ravdess/ravdess_speech
dev = ./data/ravdess/ravdess_dev.csv
dev.type = csv
dev.absolute_path = False
dev.split_strategy = train
dev.audio_path = ./data/ravdess/ravdess_speech

One important note is that Nkululeko should be able to find audio files inside subdirectories of given audio_path since database creator sometimes also split their audio files into subdirectories instead of in a single directory.

Actually, I want to evaluate my experiment here without much effort with Nkululeko :)

Sample selection to plot feature distributions is confusing

why is

[EXPL]
sample_selection=all

needed?

should have at least a default

add statistic plots on datasets

add plots describing the training and evaluation datasets

export databases from testing

To do experiments on label propagation: add the ability to export the results of a test process as an own data table that can then be added to the training.

add correlation metrics to explore module

Somehow inform the user on correlation metrics between target and other factors like gender, mos or age, in the explore module

add latex database report

would be straightforward to collect all the info from the explore module and generate a pdf report from latex sources based on text-templates and the database statistics and distribution plots.

Integrate articulatory features

find a framework to estimate articulatory from acoustic features and integrate

augmentation

add some augmentation modules, e.g. adding noise or bandpass filters

Add a data filter for column values

Add a filter that only uses data points where a specific column has a specific value.
This is a generalization of the sex filter.
e.g.
[DATA]
filter = [['sex', 'female'], ['style', 'reading']]

would use only the data where sex is female and style is reading

Enable cross validation/ loso for mlp

These days LOSO / x cross validation is only possible with sklearn classifiers/regressors because the sklearn functions are used.
handle it with nkululeko somehow

Add automatic segmentation

based on VAD
probably simplest to have an own module

add speaker id emeddings

e.g. https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb

Add more bias checking

From existing models, predict audio properties, e.g.

SNR
biological sex
age
emotion

and then display data distribution of target task per property, to detect bias in the data

Error on `praat` features with file checker

Even though already using check_size for file checker, I still got the following error on using praat features. The error did not show when I changed the feature to os.

DEBUG nkululeko: running results/exp_asvp_praat from config data/asvp-esd/exp.ini, nkululeko version 0.64.0
DEBUG dataset: loading train
DEBUG dataset: value for audio_path not found, using default: 
DEBUG dataset: Loaded database train with 10100 samples: got targets: True, got speakers: False, got sexes: False
DEBUG dataset: train: loaded data with 10100 samples: got targets: True, got speakers: False, got sexes: False
DEBUG dataset: loading test
DEBUG dataset: value for audio_path not found, using default: 
DEBUG dataset: Loaded database test with 2525 samples: got targets: True, got speakers: False, got sexes: False
DEBUG dataset: test: loaded data with 2525 samples: got targets: True, got speakers: False, got sexes: False
DEBUG experiment: loaded databases train,test
DEBUG dataset: splitting database train with strategy train
DEBUG dataset: train: 0 samples in test and 10100 samples in train
DEBUG dataset: value for strategy not found, using default: train_test
DEBUG experiment: warn: train test empty
DEBUG dataset: splitting database test with strategy test
DEBUG dataset: test: 2525 samples in test and 0 samples in train
DEBUG dataset: value for strategy not found, using default: train_test
DEBUG experiment: warn: test train empty
DEBUG filechecker: : checked for samples less than 1000 bytes, reduced samples from 8786 to 8786
DEBUG filechecker: : checked for samples less than 1000 bytes, reduced samples from 2214 to 2214
DEBUG experiment: value for filter.sample_selection not found, using default: all
DEBUG experiment: value for type not found, using default: dummy
DEBUG experiment: Categories test: ['pain' 'happy' 'fear' 'excited' 'neutral' 'surprise' 'sad' 'disgust'
 'pleasure' 'disapointed' 'boredom']
DEBUG experiment: Categories train: ['happy' 'boredom' 'neutral' 'sad' 'surprise' 'disgust' 'pain' 'excited'
 'fear' 'disapointed' 'pleasure']
DEBUG nkululeko: train shape : (8786, 4), test shape:(2214, 4)
DEBUG featureset: value for store_format not found, using default: pkl
DEBUG featureset: extracting Praat features, this might take a while...
praat: extracting file 0 of 8786
praat: extracting file 10 of 8786
praat: extracting file 20 of 8786
praat: extracting file 30 of 8786
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/bagus/github/nkululeko/nkululeko/nkululeko.py", line 58, in <module>
    main(cwd) # use this if you want to state the config file path on command line
  File "/home/bagus/github/nkululeko/nkululeko/nkululeko.py", line 46, in main
    expr.extract_feats()
  File "/home/bagus/github/nkululeko/nkululeko/experiment.py", line 334, in extract_feats
    self.feats_train =self.feature_extractor.extract()
  File "/home/bagus/github/nkululeko/nkululeko/feature_extractor.py", line 135, in extract
    self.featExtractor.extract()
  File "/home/bagus/github/nkululeko/nkululeko/feat_extract/feats_praat.py", line 28, in extract
    self.df = feinberg_praat.compute_features(self.data_df.index)
  File "/home/bagus/github/nkululeko/nkululeko/feat_extract/feinberg_praat.py", line 172, in compute_features
    (f1_mean, f2_mean, f3_mean, f4_mean, f1_median, f2_median, f3_median, f4_median) = measureFormants(
  File "/home/bagus/github/nkululeko/nkululeko/feat_extract/feinberg_praat.py", line 86, in measureFormants
    f1_mean = statistics.mean(f1_list)
  File "/usr/lib/python3.8/statistics.py", line 315, in mean
    raise StatisticsError('mean requires at least one data point')
statistics.StatisticsError: mean requires at least one data point

This could be related to how the features inside Praat are calculated.
An example is with ASVP-ESD dataset with the following INI file.

[EXP]
root = ./
name = results/exp_asvp_os
save = True
[DATA]
databases = ['train', 'test']
train = ./data/asvp-esd/asvp_train.csv
train.type = csv
train.absolute_path = False
train.split_strategy = train
test = ./data/asvp-esd/asvp_test.csv
test.type = csv
test.absolute_path = False
test.split_strategy = test
target = emotion
; no_reuse = True
labels =["boredom","neutral","happy", "sad","angry", "fear", "disgust", "surprise", "excited","pleasure","pain","disapointed"]
check_size = 1000
; min_duration_of_samples = 1
[FEATS]
type = ['praat']
; scale = standard
[MODEL]
type = svm