felixbur / nkululeko Goto Github PK
View Code? Open in Web Editor NEWMachine learning speaker characteristics
License: MIT License
Machine learning speaker characteristics
License: MIT License
Hey,
Could you provide a complete example, e.g., using emodb data, so the user only needs to adapt the nkululeko/src
and database`s root and then type the following to run:
$ python3 my_experiment.py exp_A.ini
Currently, only a guide to making .ini file is available. It will be perfect to provide an .ini example with public dataset than can be run with minimum changes.
For your reference, I use the following .ini file but got the error below.
ini file:
[EXP]
root = /home/bagus/audb/emodb/
name = exp_emodb
[DATA]
databases = ['emodb']
emodb = /home/bagus/audb/emodb/
emodb.split_strategy = speaker_split
emodb.testsplit = 40
target = emotion
labels = ['anger', 'boredom', 'disgust', 'fear', 'happiness', 'neutral', 'sadness']
[FEATS]
type = os
[MODEL]
type = svm
Error:
bagus@m049:nkululeko$ python3.7 my_experiment.py emodb.ini
Traceback (most recent call last):
File "my_experiment.py", line 41, in <module>
main('./maschinelle_sprachverarbeitung/experiment_1/exp_A.ini')
File "my_experiment.py", line 17, in main
expr = exp.Experiment(config)
File "/data/github/nkululeko/src/experiment.py", line 29, in __init__
self.name = glob_conf.config['EXP']['name']
File "/usr/lib/python3.7/configparser.py", line 958, in __getitem__
raise KeyError(key)
KeyError: 'EXP'
write an interface to https://github.com/openXBOW/openXBOW
bag-of-audio-words from opensmile features extractor
not working in version 0.44.0
Add an example for a convolutional net
The example for INI file currently contain user-specific path, e.g.,
nkululeko/tests/exp_emodb_os_mlp.ini
Line 9 in 8581a0e
I want to request that the user can override the config without editing the config. The suggested CLI argument is -o
or --override
. So the command will be,
python -m nkululeko -i tests/exp_emodb_os_mlp.ini -o "exp_emodb_os_mlp.data.emodb=/home/bagus/emodb"
Benefits: No need to modify INI file for testing/trying Nkululeko.
Also, should the path variable be consistent across datasets? In that emodb the dataset path is named "emodb" but in other datasets (Android) is named "data".
Up to now, no real test framework has been implemented.
as a minimum, i use the test_runs.sh script to perform some experiments based on emodb
to be used for data quality and bias checking
Nkululeko is being written without a real plan but simply adding features under high time pressure when needed for some projects.
This means that most of the code is spaghetti , best thing would be a complete re-write based on a stable architecture, second best would be to go along the existing architecture and enhance central classes like experiment, runmanager, dataset, etc.
wav2vec2 has been published by facebook
Should be interfaced as features,
in a later step with finetuning on the local training
With nkululeko, you can
Now what's missing is a way to say:
Most models require 16 khz sampling rate, but data might be in other rates, so it'd be nice to automatically resample data
Supports common datasets and new datasets at a minimum: how to convert user's dataset or common speech datasets into CSV or audformat that matches nkululeko (currently only emodb works out the box). Suggested (I have seen some in the example): ravdess, IEMOCAP, msp-improv, msp-podcast, aesdd, CaFe, Crema-D, emovo, savee, emofilm, etc. This can be achieved by providing preprocessing.py for each dataset.
Make a nkululeko GUI
novel users struggle with command line, they are used to point and click.
Simply make a GUI that lets users select an ini file, perhaps edit it and show them the result of a nkululeko call.
I guess it would be better if the labels would not need to be given explicitly but read from the datafile automatically.
I meant that the labels [anger, disgust, happy...] are already in the data. Currently you have to tell nkululeko which labels to use, but if you want all, that shouldn't be necessary
For regression I would define default binning
e.g. automatically assign three bins: (low, medium, high), and use the borders so they are equally distributed
I keep having problems with datasets that contain wav files that are
would be nice to have a flag that checks the data before processing (i.e. train and devel) and removes faulty ones
This issue is related to #61.
If the number of given labels is more than the number of actual numbers, the confusion matrix will not be printed and the error reporter is generated.
ERROR reporter: mismatch between confmatrix dim (11) and labels length (12: ['boredom', 'neutral', 'happy', 'sad', 'angry', 'fear', 'disgust', 'surprise', 'excited', 'pleasure', 'pain', 'disapointed'])
So, automatically detecting labels is needed to avoid this error. The labels are still can be used if the user wants to evaluate specific labels instead of all labels (must be less than number of actual labels).
Possible workaround:
An example is using ASVP-ESD dataset (exp.ini) and TESS.
Provide tutorial on EmoDB, ravdess, and Crema-D using Google Colab and/or Kaggle.
Put under the docs.
currently only target/gender distribution plots are available..
but there might be age, gender, emotion, duration, snr etc. information available
change value_counts in [EXPL]
so that
[DATA]
target = emotion
[EXPL]
value_counts = [['gender'], ['age', 'duration']]
would result in two plots: one showing the distribution of gender per emotion, and the other the scatterplot of age and duration colored by emotion.
currently distribution plots like t-sne or pca are only 2-dimensional, but as we can plot three dimensions, we should do so
would be nice to add them for comparison to wav2vec 2
Make experiments persistent/re-loadable to continue or demo over them
There is no version information in the current release; future releases should include this information (a standard feature in science packages like numpy, scipy, pandas, etc.)
>>> import nkululeko
>>> nkululeko.__version__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'nkululeko' has no attribute '__version__'
>>> nkululeko.__file__
'/home/bagus/github/nkululeko/nkululeko/__init__.py'
When I run the following commands,
python -m nkululeko.explore --config tests/exp_emodb_explore_data.ini
I got the following error:
ERROR plots: plot value counts: target gender has more than 2 values
I think the error is caused by the fact that Audeering models' resulted in three gender values instead of two, maybe.
Add live demo modus for best models
The code is almost entierly undocumented.
https://realpython.com/documenting-python-code/
there's a new public model from audeering finetuned on age and gender.
Could be interfaced just like the emotion model.
and actually being used to soft label data that is not age/gender labeled.
Add the finetuned on emotional dimensions wav2vec2 odel from audeering
https://github.com/audeering/w2v2-how-to
as feature embeddings
E.g. embeddings and expert features simply as concatenations
Currently nkululeko still has alpha, but i wonder if it shouldn't be stable?
according to this website
these are the criteria:
4 - Beta ¶
Required properties: Software is feature complete
Typical steps: External testing, fixing bugs and performance problems. Usability testing.
Semantic version: 0.X.X
[PyPI](https://pypi.org/search/?q=&o=&c=Development+Status+%3A%3A+4+-+Beta): 10,000+ projects, including [pint](https://pypi.org/project/Pint/)
5 - Production/Stable ¶
Required properties: No major bugs known, tests cover the most important cases.
Typical steps: Fixing bugs, adding updates and new features.
Semantic version:
Nkululeko only knows two splits: train and test.
but it would be more correct to name the "test" split "dev" (short for development), as we kind of always use it to optimize a model.
Any thoughts?
Add an example for melspec feature extraction
Hi @felixbur,
Thanks for the new video tutorial, l follow it. Using eperiment.ini
provided in the Usage
,
I got an error when trying to plot the confusion matrix for combined per speaker (I still got results).
(nkululeko) bagus@m049:nkulu_work$ python -m nkululeko.nkululeko --config exp_emodb.ini
DEBUG: running exp_emodb, nkululeko version 0.44.0
DEBUG: emodb: loading ...
DEBUG: emodb: loading from ./emodb/
DEBUG: emodb: loading tables: []
DEBUG: Loaded database emodb with 535 samples: got targets: True, got speakers: True, got sexes: True, got age: True
DEBUG: emodb: loaded data with 535 samples: got targets: True, got speakers: True, got sexes: True
DEBUG: splitting database emodb with strategy speaker_split
DEBUG: emodb: [273/262] samples in train/test
DEBUG: emodb: 262 samples in test and 273 samples in train
DEBUG: Categories test: ['anger', 'fear', 'boredom', 'disgust']
DEBUG: Categories train: ['disgust', 'anger', 'boredom', 'fear']
DEBUG: 5 speakers in test and 5 speakers in train
DEBUG: train shape : (172, 5), test shape:(151, 5)
DEBUG: extracting Praat features, this might take a while...
praat: extracting file 0 of 172
praat: extracting file 10 of 172
praat: extracting file 20 of 172
praat: extracting file 30 of 172
praat: extracting file 40 of 172
praat: extracting file 50 of 172
praat: extracting file 60 of 172
praat: extracting file 70 of 172
praat: extracting file 80 of 172
praat: extracting file 90 of 172
praat: extracting file 100 of 172
praat: extracting file 110 of 172
praat: extracting file 120 of 172
praat: extracting file 130 of 172
praat: extracting file 140 of 172
praat: extracting file 150 of 172
praat: extracting file 160 of 172
praat: extracting file 170 of 172
Warning: 1892 infinite in x
<class 'numpy.ndarray'>
DEBUG: praat feature names: Index(['duration', 'meanF0Hz', 'stdevF0Hz', 'HNR', 'localJitter',
'localabsoluteJitter', 'rapJitter', 'ppq5Jitter', 'ddpJitter',
'localShimmer', 'localdbShimmer', 'apq3Shimmer', 'apq5Shimmer',
'apq11Shimmer', 'ddaShimmer', 'f1_mean', 'f2_mean', 'f3_mean',
'f4_mean', 'f1_median', 'f2_median', 'f3_median', 'f4_median',
'JitterPCA', 'ShimmerPCA', 'pF', 'fdisp', 'avgFormant', 'mff',
'fitch_vtl', 'delta_f', 'vtl_delta_f'],
dtype='object')
DEBUG: praat: shape : (172, 32)
DEBUG: extracting Praat features, this might take a while...
praat: extracting file 0 of 151
praat: extracting file 10 of 151
praat: extracting file 20 of 151
praat: extracting file 30 of 151
praat: extracting file 40 of 151
praat: extracting file 50 of 151
praat: extracting file 60 of 151
praat: extracting file 70 of 151
praat: extracting file 80 of 151
praat: extracting file 90 of 151
praat: extracting file 100 of 151
praat: extracting file 110 of 151
praat: extracting file 120 of 151
praat: extracting file 130 of 151
praat: extracting file 140 of 151
praat: extracting file 150 of 151
Warning: 1661 infinite in x
<class 'numpy.ndarray'>
DEBUG: praat feature names: Index(['duration', 'meanF0Hz', 'stdevF0Hz', 'HNR', 'localJitter',
'localabsoluteJitter', 'rapJitter', 'ppq5Jitter', 'ddpJitter',
'localShimmer', 'localdbShimmer', 'apq3Shimmer', 'apq5Shimmer',
'apq11Shimmer', 'ddaShimmer', 'f1_mean', 'f2_mean', 'f3_mean',
'f4_mean', 'f1_median', 'f2_median', 'f3_median', 'f4_median',
'JitterPCA', 'ShimmerPCA', 'pF', 'fdisp', 'avgFormant', 'mff',
'fitch_vtl', 'delta_f', 'vtl_delta_f'],
dtype='object')
DEBUG: praat: shape : (151, 32)
DEBUG: All features: train shape : (172, 32), test shape:(151, 32)
DEBUG: run 0
DEBUG: run: 0 epoch: 0: result: test: 0.498 UAR
DEBUG: plotting confusion matrix to emodb_svm_praat__0_000_cnf
DEBUG: labels: ['anger' 'boredom' 'disgust' 'fear']
DEBUG: result per class (F1 score): [0.745, 0.846, 0.102, 0.4]
151 151 151
DEBUG: plotting speaker combination (True) confusion matrix to result_combined_per_speaker
ERROR: unkown function True
It seems that argument True
for combined_per_speaker
in INI file is parsed as function.
currently all runs overwrite each other, perhaps add subfolders per run?
Currently, a nkululeko environment is quite large and needs numerous packages to be installed. Would be great to check for each of them whether they could be replaced or somehow at least reduced in size.
make the layer from which the wav2vec embeddings are taken, configurable.
Currently it's always the last hidden one
not working in version 0.44.0
Currently, the filename in the database (CSV) must contain a full path instead of a basename only. In most cases, the provider of the dataset only provides a file with a list of the basenames (for platform independence). So, I would like to request adding audio_path
to the DATA section in the INI file.
This can be optional, meaning, that if this option is not given, Nkululeko will search file path in the given CSV file (current behavior).
Example usage (see train.audio_path
and dev.audio_path
)
[DATA]
databases = ['train', 'test', 'dev']
train = ./data/ravdess/ravdess_train.csv
train.type = csv
train.absolute_path = False
train.split_strategy = train
train.audio_path = ./data/ravdess/ravdess_speech
dev = ./data/ravdess/ravdess_dev.csv
dev.type = csv
dev.absolute_path = False
dev.split_strategy = train
dev.audio_path = ./data/ravdess/ravdess_speech
One important note is that Nkululeko should be able to find audio files inside subdirectories of given audio_path
since database creator sometimes also split their audio files into subdirectories instead of in a single directory.
Actually, I want to evaluate my experiment here without much effort with Nkululeko :)
why is
[EXPL]
sample_selection=all
needed?
should have at least a default
add plots describing the training and evaluation datasets
To do experiments on label propagation: add the ability to export the results of a test process as an own data table that can then be added to the training.
Somehow inform the user on correlation metrics between target and other factors like gender, mos or age, in the explore module
would be straightforward to collect all the info from the explore module and generate a pdf report from latex sources based on text-templates and the database statistics and distribution plots.
find a framework to estimate articulatory from acoustic features and integrate
add some augmentation modules, e.g. adding noise or bandpass filters
Add a filter that only uses data points where a specific column has a specific value.
This is a generalization of the sex filter.
e.g.
[DATA]
filter = [['sex', 'female'], ['style', 'reading']]
would use only the data where sex is female and style is reading
These days LOSO / x cross validation is only possible with sklearn classifiers/regressors because the sklearn functions are used.
handle it with nkululeko somehow
based on VAD
probably simplest to have an own module
From existing models, predict audio properties, e.g.
and then display data distribution of target task per property, to detect bias in the data
Even though already using check_size
for file checker, I still got the following error on using praat
features. The error did not show when I changed the feature to os
.
DEBUG nkululeko: running results/exp_asvp_praat from config data/asvp-esd/exp.ini, nkululeko version 0.64.0
DEBUG dataset: loading train
DEBUG dataset: value for audio_path not found, using default:
DEBUG dataset: Loaded database train with 10100 samples: got targets: True, got speakers: False, got sexes: False
DEBUG dataset: train: loaded data with 10100 samples: got targets: True, got speakers: False, got sexes: False
DEBUG dataset: loading test
DEBUG dataset: value for audio_path not found, using default:
DEBUG dataset: Loaded database test with 2525 samples: got targets: True, got speakers: False, got sexes: False
DEBUG dataset: test: loaded data with 2525 samples: got targets: True, got speakers: False, got sexes: False
DEBUG experiment: loaded databases train,test
DEBUG dataset: splitting database train with strategy train
DEBUG dataset: train: 0 samples in test and 10100 samples in train
DEBUG dataset: value for strategy not found, using default: train_test
DEBUG experiment: warn: train test empty
DEBUG dataset: splitting database test with strategy test
DEBUG dataset: test: 2525 samples in test and 0 samples in train
DEBUG dataset: value for strategy not found, using default: train_test
DEBUG experiment: warn: test train empty
DEBUG filechecker: : checked for samples less than 1000 bytes, reduced samples from 8786 to 8786
DEBUG filechecker: : checked for samples less than 1000 bytes, reduced samples from 2214 to 2214
DEBUG experiment: value for filter.sample_selection not found, using default: all
DEBUG experiment: value for type not found, using default: dummy
DEBUG experiment: Categories test: ['pain' 'happy' 'fear' 'excited' 'neutral' 'surprise' 'sad' 'disgust'
'pleasure' 'disapointed' 'boredom']
DEBUG experiment: Categories train: ['happy' 'boredom' 'neutral' 'sad' 'surprise' 'disgust' 'pain' 'excited'
'fear' 'disapointed' 'pleasure']
DEBUG nkululeko: train shape : (8786, 4), test shape:(2214, 4)
DEBUG featureset: value for store_format not found, using default: pkl
DEBUG featureset: extracting Praat features, this might take a while...
praat: extracting file 0 of 8786
praat: extracting file 10 of 8786
praat: extracting file 20 of 8786
praat: extracting file 30 of 8786
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/bagus/github/nkululeko/nkululeko/nkululeko.py", line 58, in <module>
main(cwd) # use this if you want to state the config file path on command line
File "/home/bagus/github/nkululeko/nkululeko/nkululeko.py", line 46, in main
expr.extract_feats()
File "/home/bagus/github/nkululeko/nkululeko/experiment.py", line 334, in extract_feats
self.feats_train =self.feature_extractor.extract()
File "/home/bagus/github/nkululeko/nkululeko/feature_extractor.py", line 135, in extract
self.featExtractor.extract()
File "/home/bagus/github/nkululeko/nkululeko/feat_extract/feats_praat.py", line 28, in extract
self.df = feinberg_praat.compute_features(self.data_df.index)
File "/home/bagus/github/nkululeko/nkululeko/feat_extract/feinberg_praat.py", line 172, in compute_features
(f1_mean, f2_mean, f3_mean, f4_mean, f1_median, f2_median, f3_median, f4_median) = measureFormants(
File "/home/bagus/github/nkululeko/nkululeko/feat_extract/feinberg_praat.py", line 86, in measureFormants
f1_mean = statistics.mean(f1_list)
File "/usr/lib/python3.8/statistics.py", line 315, in mean
raise StatisticsError('mean requires at least one data point')
statistics.StatisticsError: mean requires at least one data point
This could be related to how the features inside Praat are calculated.
An example is with ASVP-ESD dataset with the following INI file.
[EXP]
root = ./
name = results/exp_asvp_os
save = True
[DATA]
databases = ['train', 'test']
train = ./data/asvp-esd/asvp_train.csv
train.type = csv
train.absolute_path = False
train.split_strategy = train
test = ./data/asvp-esd/asvp_test.csv
test.type = csv
test.absolute_path = False
test.split_strategy = test
target = emotion
; no_reuse = True
labels =["boredom","neutral","happy", "sad","angry", "fear", "disgust", "surprise", "excited","pleasure","pain","disapointed"]
check_size = 1000
; min_duration_of_samples = 1
[FEATS]
type = ['praat']
; scale = standard
[MODEL]
type = svm
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.