Comments (7)
The problem with librosa
is that it automatically converts the sampling rate when you don't specify it during loading, e.g. when I load the 16.000 Hz test file I generated above I get:
>>> import librosa
>>> signal, sampling_rate = librosa.load('test.wav')
>>> sampling_rate
22050
If I then execute opensmile, I get a different result:
>>> f3 = smile.process_signal(signal, sampling_rate)
>>> f3['audspec_lengthL1norm_sma_maxPos']
start end
0 days 0 days 00:00:01 0.434783
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32
To avoid this you have to tell librosa
the desired sampling rate during loading or use None
to get the sampling rate from the file:
>>> signal, sampling_rate = librosa.load('test.wav', sr=None)
>>> sampling_rate
16000
If you then use opensmile, you get the desired result:
>>> f4 = smile.process_signal(signal, sampling_rate)
>>> f4['audspec_lengthL1norm_sma_maxPos']
start end
0 days 0 days 00:00:01 0.430108
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32
from opensmile.
Yes, you are right.
Here a minimal example how to reproduce (even without librosa
):
import audiofile
import numpy as np
import opensmile
np.random.seed(0)
smile = opensmile.Smile(
feature_set=opensmile.FeatureSet.ComParE_2016 ,
feature_level=opensmile.FeatureLevel.Functionals,
verbose=True
)
sampling_rate = 16000
signal = np.random.normal(size=(1, sampling_rate))
audiofile.write('test.wav', signal, sampling_rate)
f1 = smile.process_file('test.wav')
f2 = smile.process_signal(signal, sampling_rate)
and then
>>> f1['audspec_lengthL1norm_sma_maxPos']
file start end
test.wav 0 days 0 days 00:00:01 0.430108
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32
>>> f2['audspec_lengthL1norm_sma_maxPos']
start end
0 days 0 days 00:00:01 0.0
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32
from opensmile.
Hi, does someone have an answer for that? I would like to use from_signal as it is faster than from_file
from opensmile.
Hi,
I also encountered the above issue!
My workaround is:
- assuming that opensmile it's wav parsing is correct.
- When I want to pass an (wav) array to opensmiles
process_signal
, I use torchaudio it's load function instead of librosa. torchaudio its function gives exactly the same smile results as the wav.
# load the wav data and convert to 32b float
arr, fs = torchaudio.load(WAV_PATH, normalize=true)
arr = arr.numpy().ravel()
I also remarked significant differences in feature values when resampling the signal!
e.g., In the visualization below I used the raw 44.1kHz
and a 16kHz
sinc-resampled variant from the signal to extract GeMAPSv01b
LLD's. 📷 ⬇️
legend:
Smile-orig-n
: using 44.1kHz dataSmile-16kHz-n
: u sing 16kHz data
it seems that the GeMAPs F0semitone is more robustly extracted in the 16KhZ variant? (less 60 peaks)
Is this behavior normal?
from opensmile.
assuming that opensmile it's wav parsing is correct.
If you are using the Python version, then the WAV parsing of opensmile is not used as the file is read with audiofile
first and then internally processed with https://github.com/audeering/opensmile-python/blob/c64837d6fdfa62f1810ba00ed0f44d2c2bd7ddd1/opensmile/core/smile.py#L263-L326
The code that reproduces the error here at #46 (comment) returns different results as I did not normalize the magnitude of the audio.
When I repeat with ensuring the amplitude is in the range -1..1 I get:
import audiofile
import numpy as np
import opensmile
np.random.seed(0)
smile = opensmile.Smile(
feature_set=opensmile.FeatureSet.ComParE_2016 ,
feature_level=opensmile.FeatureLevel.Functionals,
verbose=True
)
sampling_rate = 16000
signal = np.random.normal(size=(1, sampling_rate))
signal = signal / (np.max(np.max(np.abs(signal))) + 10 ** -9)
audiofile.write('test.wav', signal, sampling_rate)
f1 = smile.process_file('test.wav')
f2 = smile.process_signal(signal, sampling_rate)
and then
>>> f1['audspec_lengthL1norm_sma_maxPos']
file start end
test.wav 0 days 0 days 00:00:01 0.430108
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32
>>> f2['audspec_lengthL1norm_sma_maxPos']
start end
0 days 0 days 00:00:01 0.430108
Name: audspec_lengthL1norm_sma_maxPos, dtype: float32
from opensmile.
Hi, indeed, when you use None
with the librosa, you get the same results as just parsing the .wav
file, thanks for helping with that ;).
But my second question was more tailored towards the (rather large) differences in OpenSMILE LLD values when using resampling?
If you click on the 📷 🔝 which I sent in my previous image; you can see
- rather significant changes in the
jitter
andshimmer
- There is a common trend that jitter values for the 44.1kHz data are really high when someone begins/ends a voiced segment (see red VAD-line of upper subplot as reference for voiced regions)
- some differences in the
F0-semitone
- The 44.1kHz data has some peaks to 60 (which would imply a peak F0 of 880Hz, which is not feasible, see 👨🏼💻 ⬇️ )
What are the possible explanations for these differences, and which sampling-rate is recommended to work with when using OpneSMILE? (In the majority of research papers I find resampling to 16kHz as a preprocessing step, but I would presume that, for features such as jitter and shimmer, a higher (thus more temporal accurate) rate should result in better results?
Looking forward to your response and kind regards,
Jonas
from opensmile.
For reference, I'm using the GeMAPSv01b
LLD config
from opensmile.
Related Issues (20)
- libm.so.6: version 'GLIBC_2.27' not found HOT 7
- (MSG): No filename given HOT 3
- zsh: command not found: SMILExtract HOT 4
- OpenSmile output export HOT 2
- Ndst in ./src/lldcore/intensity.cpp HOT 2
- Why the output line in csv file always have some badlines? HOT 1
- eGeMAPS Implementation HOT 1
- build error no such file src/include/io/rosSink.hpp:55:10: fatal error: ros/ros.h: HOT 2
- AttributeError: module 'opensmile' has no attribute 'Smile' HOT 1
- version `GLIBC_***` is too low to run. HOT 1
- Reading features from output file HOT 4
- Custom `win_len` and `hop_dur` in openSmile python
- Compiling from Source m1 Mac; finite
- Issue with smileMath_csplint_init when migrating from version 2.0 to 3.*
- "SMILE3.0.1 error mentioning no smile-conf file"
- Spectral Centroid for white noise has large offset HOT 1
- Error about “Extracting features with OpenCV” HOT 1
- error in prosodyShs.conf
- The audio name in the output of SMILExtract is noname HOT 5
- (ERR) [1] configManager: cFileConfigReader::openInput : cannot find input file 'smile.conf'! HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opensmile.