Comments (12)
I think I manage to understand your issue. Please pass in the extractor the window_spliter
parameter as True
.
X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True)
Please let me know if it solves your issue. We have been improving the procedures to handle the input data and in the upcoming release (expected to be during next week), there will be no necessity in declare explicitly window_spliter=True
. For now I do believe it will solve your issue.
from tsfel.
I appreciate the reported error on LPCC. We will look onto it. Despite the current LPCC implementation does not require a sampling frequency, please note that some of the spectral features require that you pass a sampling frequency. Since you are dealing with a daily sampling rate I recommend passing fs
:
X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True, fs=(1/(3600*24)))
To achieve a feature extraction at a daily level rolling over the time series I assume that you intend to have some overlap between windows. The overlap can be defined using the overlap
variable:
X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True, fs=(1/(3600*24)), overlap=[Your intended overlap percentage])
from tsfel.
Hi,
I appreciate your feedback on TSFEL! I am not sure if I understand your input data structure. Since you mentioned that you have a univariate time series I would assume that the dimension will be m dates and one single feature.
Nevertheless, more generally, if you have as input a time series of length N it is expected that you receive an output vector of size N divided by the window size (N / win_size).
Can you share more insights with a concrete example of the shape of your data?
from tsfel.
Hi @dmfolgado, thanks for replying.
For example, take the following time series represented in a dataframe. It has a length of N = 50 and tsfel window_size parameter of 5. Hence, I should expect ten rows as output, but I'm getting a single row with 159 derived variables from the univariate time series.
My question is, how could I extract the feature, so I get an output of length 50? Meaning it's a rolling (and overlap) calculation.
Thanks!
from tsfel import get_features_by_domain, time_series_features_extractor
import pandas as pd
df = pd.DataFrame(
{'date': ['2018-05-01','2018-05-02','2018-05-03','2018-05-04','2018-05-05','2018-05-06','2018-05-07','2018-05-08','2018-05-09','2018-05-10','2018-05-11','2018-05-12','2018-05-13','2018-05-14','2018-05-15','2018-05-16','2018-05-17','2018-05-18','2018-05-19','2018-05-20','2018-05-21','2018-05-22','2018-05-23','2018-05-24','2018-05-25','2018-05-26','2018-05-27','2018-05-28','2018-05-29','2018-05-30','2018-05-31','2018-06-01','2018-06-02','2018-06-03','2018-06-04','2018-06-05','2018-06-06','2018-06-07','2018-06-08','2018-06-09','2018-06-10','2018-06-11','2018-06-12','2018-06-13','2018-06-14','2018-06-15','2018-06-16','2018-06-17','2018-06-18','2018-06-19'],
'feat_1': [906.82,923.64,975.9,970.3,986.12,965.87,937,918.47,931.82,901.8,841.24,847.07,868.36,867.07,846.75,833.61,805.43,823.89,823.1,852.37,839.96,797.78,749.48,757.67,745.67,732.77,733.97,709.79,746.52,737.51,748.58,752.1,763.81,771.8,748.83,762.18,765.4,768.89,761.51,749.85,675.8,687.33,654.39,629.51,663.37,638.5,648.58,643.83,670.92,673.55]
})
df.shape
# (50, 2)
cfg = get_features_by_domain()
X = time_series_features_extractor(cfg, df['feat_1'], window_size=5)
# *** Feature extraction started ***
# *** Feature extraction finished ***
X.shape
# (1, 159)
from tsfel.
Tried the following command and got an error message when estimating LPCC features (potential 🐞?).
Command
X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True)
Error message
Click to expand!
```python --------------------------------------------------------------------------- IndexError Traceback (most recent call last) in --> 2 X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True)~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/calc_features.py in time_series_features_extractor(dict_features, signal_windows, fs, window_spliter, verbose, **kwargs)
174 break
175 else:
--> 176 features = calc_window_features(dict_features, wind_sig, fs, features_path=features_path)
177 feat_val = feat_val.append(features)
178
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/calc_features.py in calc_window_features(dict_features, signal_window, fs, **kwargs)
287
288 execf += ')'
--> 289 eval_result = eval(execf, locals())
290
291 # Function returns more than one element
in
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/features.py in lpcc(signal, n_coeff)
1420
1421 # 12-20 cepstral coefficients are sufficient for speech recognition
-> 1422 lpc_coeffs = lpc(signal, n_coeff)
1423
1424 if np.sum(lpc_coeffs) == 0:
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/features_utils.py in lpc(signal, n_coeff)
211 acf = autocorr_norm(signal)
212 r = -acf[1:n_coeff + 1].T
--> 213 smatrix = create_symmetric_matrix(acf, n_coeff)
214 if np.sum(smatrix) == 0:
215 return tuple(np.zeros(n_coeff))
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/features_utils.py in create_symmetric_matrix(acf, n_coeff)
184 for i in range(n_coeff):
185 for j in range(n_coeff):
--> 186 smatrix[i, j] = acf[np.abs(i - j)]
187 return smatrix
188
IndexError: index 5 is out of bounds for axis 0 with size 5
Next, I executed the following statement that generated (N / window_size) -1
del cfg['spectral']['LPCC']
X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True)
# *** Feature extraction started ***
# *** Feature extraction finished ***
X.shape
# (9, 119)
How could I achieve a feature extraction at a daily level given a window size? Considering the univariate time series provided and the shape of X (i.e. extracted 119 features), the resulting dataframe should have a shape of (50, 119). Of course, dates that don't complete the window_size
will have partial information, so it's probable that those will have NaN
or metrics estimated with fewer data points.
from tsfel.
Installed tsfel 0.1.3
(commit b44f339) to test whether I could extract overlapping features for daily samples (i.e. one observation a day). There might be an issue with window_size
parameter; values other than None
through an exception.
from tsfel import get_features_by_domain, time_series_features_extractor
import pandas as pd
df = pd.DataFrame(
{'date': ['2018-05-01','2018-05-02','2018-05-03','2018-05-04','2018-05-05','2018-05-06','2018-05-07','2018-05-08','2018-05-09','2018-05-10','2018-05-11','2018-05-12','2018-05-13','2018-05-14','2018-05-15','2018-05-16','2018-05-17','2018-05-18','2018-05-19','2018-05-20','2018-05-21','2018-05-22','2018-05-23','2018-05-24','2018-05-25','2018-05-26','2018-05-27','2018-05-28','2018-05-29','2018-05-30','2018-05-31','2018-06-01','2018-06-02','2018-06-03','2018-06-04','2018-06-05','2018-06-06','2018-06-07','2018-06-08','2018-06-09','2018-06-10','2018-06-11','2018-06-12','2018-06-13','2018-06-14','2018-06-15','2018-06-16','2018-06-17','2018-06-18','2018-06-19'],
'feat_1': [906.82,923.64,975.9,970.3,986.12,965.87,937,918.47,931.82,901.8,841.24,847.07,868.36,867.07,846.75,833.61,805.43,823.89,823.1,852.37,839.96,797.78,749.48,757.67,745.67,732.77,733.97,709.79,746.52,737.51,748.58,752.1,763.81,771.8,748.83,762.18,765.4,768.89,761.51,749.85,675.8,687.33,654.39,629.51,663.37,638.5,648.58,643.83,670.92,673.55]
})
X = time_series_features_extractor(cfg, df['feat_1'], window_size=10, fs=1, overlap=1)
# *** Feature extraction started ***
Got the following error message
Click to expand!
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 X = time_series_features_extractor(cfg, df['close'], window_size=10, fs=1, overlap=1)
~/.pyenv/versions/3.7.4/envs/python-3.7.4/src/tsfel/tsfel/feature_extraction/calc_features.py in time_series_features_extractor(dict_features, signal_windows, fs, verbose, **kwargs)
245
246 if window_size is not None:
--> 247 signal_windows = signal_window_splitter(signal_windows, window_size, overlap)
248
249 if len(signal_windows) == 0:
~/.pyenv/versions/3.7.4/envs/python-3.7.4/src/tsfel/tsfel/utils/signal_processing.py in signal_window_splitter(signal, window_size, overlap)
27 return [signal[i:i + window_size] for i in range(0, len(signal), step)]
28 else:
---> 29 return [signal[i:i + window_size] for i in range(0, len(signal) - window_size, step)]
30
31
ValueError: range() arg 3 must not be zero
As of now, I'm solving my use case by using an auxiliary function to roll over the Series/DataFrame. Note the number of rows of df and out are the same. I'm not clear I could solve this use case with the standard tsfel API. Comments?
def rolling_pipe(dataframe, window, fctn):
return pd.Series(
[
dataframe.iloc[i - window : i].pipe(fctn) if i >= window else None
for i in range(1, len(dataframe) + 1)
],
index=dataframe.index,
)
out = df['feat_1'].pipe(rolling_pipe, 10, lambda f: time_series_features_extractor(cfg, f))
Thanks!
from tsfel.
Hi @ciberger
The overlap parameter defines the overlap as a percentage of the window_size. So it has to be smaller than 1!
If you chose for example overlap=0.5 your example will work.
from tsfel.
@tecamenz thanks for your feedback. You're indeed correct in describing how to use the overlap
parameter. With this discussion I realized that the documentation is not correct as it suggests the possibility of overlap=1
, which should not exist.
@ciberger let us know if this solved the issue or you need additional help. I'll plan to fix the documentation.
from tsfel.
As you correctly mentioned, overlap
as to be strictly smaller than 1 to work. Tested two window_size
of 10 and 30 to check the outcome shape. Using the largest possible input value for overlap
, I was able to capture a dataframe with the number of rows equals to the length of the input dataframe minus the window_size
. overlap=0.5
resulted in the 218 observations.
Any comments are welcome.
from tsfel import get_features_by_domain, time_series_features_extractor
cfg = get_features_by_domain()
del cfg['spectral']['FFT mean coefficient']
df.shape
# (1099,)
X = time_series_features_extractor(cfg, df, window_size=10, overlap=0.5)
X.shape
# (218, 131)
X = time_series_features_extractor(cfg, df, window_size=10, overlap=0.95)
X.shape
# (1089, 131)
# Result of 1099 - 10 (window_size) = 1089
X = time_series_features_extractor(cfg, df, window_size=30, overlap=0.97)
X.shape
# (1069, 134)
# Result of 1099 - 30 (window_size) = 1069
from tsfel.
Hi there!
I believe I can add a few comments on this:
The time_series_features_extractor will return a DataFrame in which the rows represent each signal window and the columns the extracted features for the given window.
On this matter, you can a priori guess how many features TSFEL will return (independent of input size) by calling tsfel.get_number_features(cfg)
# we are expecting 134 features per window
In the first situation, X = time_series_features_extractor(cfg, df, window_size=10, overlap=0.5
), you get 131 features. This is due to the window_size not having enough length for calculating some coefficients of the LPCC feature. If you increase the window_size as you did in the 3rd case, X = time_series_features_extractor(cfg, df, window_size=30, overlap=0.97)
, you will get the 134 extracted features.
Regarding the number of rows, if you intend a high amount of overlap, e.g. 0.97, you will perform the window splitter function on your signal with a step=1. Thus, the number of rows will then be df.shape – window_size +1
The signal windows that you will obtain for this case can be represented as: [df[i: i + window_size] for i in range(len(df) - window_size + 1)]
Note that some corrections were performed in the signal_window_splitter.
I hope this brings more insights into the output shapes!
from tsfel.
HOLA estoy trabajando con esta librería pero no quiero utilizar todas las características , se puede hacer algo para mostrar solo las características que deseo?
from tsfel.
HOLA estoy trabajando con esta librería pero no quiero utilizar todas las características , se puede hacer algo para mostrar solo las características que deseo?
Debes armar un json con las caracteristicas que desees y usar la funcion load_json al momento de configurar las features. función: https://tsfel.readthedocs.io/en/latest/descriptions/modules/tsfel.feature_extraction.html?highlight=get_features#tsfel.feature_extraction.features_settings.load_json
ejemplo del json: https://github.com/fraunhoferportugal/tsfel/blob/development/tsfel/feature_extraction/features.json/
from tsfel.
Related Issues (20)
- Issue in having features for each observation HOT 1
- Last incomplete window HOT 1
- Request for Optional Batch ID Grouping Support HOT 4
- How to determine if the extracted features are correct HOT 1
- AttributeError: module 'scipy.stats' has no attribute 'median_absolute_deviation' HOT 5
- What is the difference between fsfel and tsfresh? HOT 4
- Some questions for the module HOT 1
- Scipy renamed some functions HOT 2
- Getting error "Input signal must have a length >= n_coeff" HOT 1
- new numpy version uses astype(bool) instead of astype(np.bool) HOT 1
- index 0 is out of bounds for axis 0 with size 0 HOT 4
- Error in TSFEL_HAR_Example.ipynb HOT 1
- Absolute Energy and Autocorrelation with equal value HOT 3
- Option to realign resulting dataframe's index to input HOT 2
- Add `TSFEL` to Conda Forge HOT 1
- `fs` and `window_size` HOT 4
- [DOC] Add data input types and formats to the documentation
- [DOC] Update the block comments in the documentation with the new Dataset module.
- [DOC] Update the main example Notebook
- module 'tsfel' has no attribute 'get_features_by_domain' HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tsfel.