Giter VIP home page Giter VIP logo

Comments (12)

dmfolgado avatar dmfolgado commented on September 15, 2024 2

I think I manage to understand your issue. Please pass in the extractor the window_spliter parameter as True.

X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True)

Please let me know if it solves your issue. We have been improving the procedures to handle the input data and in the upcoming release (expected to be during next week), there will be no necessity in declare explicitly window_spliter=True. For now I do believe it will solve your issue.

from tsfel.

dmfolgado avatar dmfolgado commented on September 15, 2024 1

I appreciate the reported error on LPCC. We will look onto it. Despite the current LPCC implementation does not require a sampling frequency, please note that some of the spectral features require that you pass a sampling frequency. Since you are dealing with a daily sampling rate I recommend passing fs:

X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True, fs=(1/(3600*24)))

To achieve a feature extraction at a daily level rolling over the time series I assume that you intend to have some overlap between windows. The overlap can be defined using the overlap variable:

X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True, fs=(1/(3600*24)), overlap=[Your intended overlap percentage])

from tsfel.

dmfolgado avatar dmfolgado commented on September 15, 2024

Hi,

I appreciate your feedback on TSFEL! I am not sure if I understand your input data structure. Since you mentioned that you have a univariate time series I would assume that the dimension will be m dates and one single feature.
Nevertheless, more generally, if you have as input a time series of length N it is expected that you receive an output vector of size N divided by the window size (N / win_size).

Can you share more insights with a concrete example of the shape of your data?

from tsfel.

ciberger avatar ciberger commented on September 15, 2024

Hi @dmfolgado, thanks for replying.

For example, take the following time series represented in a dataframe. It has a length of N = 50 and tsfel window_size parameter of 5. Hence, I should expect ten rows as output, but I'm getting a single row with 159 derived variables from the univariate time series.

My question is, how could I extract the feature, so I get an output of length 50? Meaning it's a rolling (and overlap) calculation.

Thanks!

from tsfel import get_features_by_domain, time_series_features_extractor
import pandas as pd

df = pd.DataFrame(
{'date': ['2018-05-01','2018-05-02','2018-05-03','2018-05-04','2018-05-05','2018-05-06','2018-05-07','2018-05-08','2018-05-09','2018-05-10','2018-05-11','2018-05-12','2018-05-13','2018-05-14','2018-05-15','2018-05-16','2018-05-17','2018-05-18','2018-05-19','2018-05-20','2018-05-21','2018-05-22','2018-05-23','2018-05-24','2018-05-25','2018-05-26','2018-05-27','2018-05-28','2018-05-29','2018-05-30','2018-05-31','2018-06-01','2018-06-02','2018-06-03','2018-06-04','2018-06-05','2018-06-06','2018-06-07','2018-06-08','2018-06-09','2018-06-10','2018-06-11','2018-06-12','2018-06-13','2018-06-14','2018-06-15','2018-06-16','2018-06-17','2018-06-18','2018-06-19'],
'feat_1': [906.82,923.64,975.9,970.3,986.12,965.87,937,918.47,931.82,901.8,841.24,847.07,868.36,867.07,846.75,833.61,805.43,823.89,823.1,852.37,839.96,797.78,749.48,757.67,745.67,732.77,733.97,709.79,746.52,737.51,748.58,752.1,763.81,771.8,748.83,762.18,765.4,768.89,761.51,749.85,675.8,687.33,654.39,629.51,663.37,638.5,648.58,643.83,670.92,673.55]
})

df.shape
# (50, 2)

cfg = get_features_by_domain()                                                                    
X = time_series_features_extractor(cfg, df['feat_1'], window_size=5)

# *** Feature extraction started ***
# *** Feature extraction finished ***

X.shape
# (1, 159)

from tsfel.

ciberger avatar ciberger commented on September 15, 2024

Tried the following command and got an error message when estimating LPCC features (potential 🐞?).

Command

X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True)

Error message

Click to expand! ```python --------------------------------------------------------------------------- IndexError Traceback (most recent call last) in --> 2 X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True)

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/calc_features.py in time_series_features_extractor(dict_features, signal_windows, fs, window_spliter, verbose, **kwargs)
174 break
175 else:
--> 176 features = calc_window_features(dict_features, wind_sig, fs, features_path=features_path)
177 feat_val = feat_val.append(features)
178

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/calc_features.py in calc_window_features(dict_features, signal_window, fs, **kwargs)
287
288 execf += ')'
--> 289 eval_result = eval(execf, locals())
290
291 # Function returns more than one element

in

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/features.py in lpcc(signal, n_coeff)
1420
1421 # 12-20 cepstral coefficients are sufficient for speech recognition
-> 1422 lpc_coeffs = lpc(signal, n_coeff)
1423
1424 if np.sum(lpc_coeffs) == 0:

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/features_utils.py in lpc(signal, n_coeff)
211 acf = autocorr_norm(signal)
212 r = -acf[1:n_coeff + 1].T
--> 213 smatrix = create_symmetric_matrix(acf, n_coeff)
214 if np.sum(smatrix) == 0:
215 return tuple(np.zeros(n_coeff))

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tsfel/feature_extraction/features_utils.py in create_symmetric_matrix(acf, n_coeff)
184 for i in range(n_coeff):
185 for j in range(n_coeff):
--> 186 smatrix[i, j] = acf[np.abs(i - j)]
187 return smatrix
188

IndexError: index 5 is out of bounds for axis 0 with size 5

Next, I executed the following statement that generated (N / window_size) -1

del cfg['spectral']['LPCC']                                                  
X = time_series_features_extractor(cfg, df['feat_1'], window_size=5, window_spliter=True)

# *** Feature extraction started ***
# *** Feature extraction finished ***

X.shape
# (9, 119)

How could I achieve a feature extraction at a daily level given a window size? Considering the univariate time series provided and the shape of X (i.e. extracted 119 features), the resulting dataframe should have a shape of (50, 119). Of course, dates that don't complete the window_size will have partial information, so it's probable that those will have NaN or metrics estimated with fewer data points.

from tsfel.

ciberger avatar ciberger commented on September 15, 2024

Installed tsfel 0.1.3 (commit b44f339) to test whether I could extract overlapping features for daily samples (i.e. one observation a day). There might be an issue with window_size parameter; values other than None through an exception.

from tsfel import get_features_by_domain, time_series_features_extractor
import pandas as pd

df = pd.DataFrame(
{'date': ['2018-05-01','2018-05-02','2018-05-03','2018-05-04','2018-05-05','2018-05-06','2018-05-07','2018-05-08','2018-05-09','2018-05-10','2018-05-11','2018-05-12','2018-05-13','2018-05-14','2018-05-15','2018-05-16','2018-05-17','2018-05-18','2018-05-19','2018-05-20','2018-05-21','2018-05-22','2018-05-23','2018-05-24','2018-05-25','2018-05-26','2018-05-27','2018-05-28','2018-05-29','2018-05-30','2018-05-31','2018-06-01','2018-06-02','2018-06-03','2018-06-04','2018-06-05','2018-06-06','2018-06-07','2018-06-08','2018-06-09','2018-06-10','2018-06-11','2018-06-12','2018-06-13','2018-06-14','2018-06-15','2018-06-16','2018-06-17','2018-06-18','2018-06-19'],
'feat_1': [906.82,923.64,975.9,970.3,986.12,965.87,937,918.47,931.82,901.8,841.24,847.07,868.36,867.07,846.75,833.61,805.43,823.89,823.1,852.37,839.96,797.78,749.48,757.67,745.67,732.77,733.97,709.79,746.52,737.51,748.58,752.1,763.81,771.8,748.83,762.18,765.4,768.89,761.51,749.85,675.8,687.33,654.39,629.51,663.37,638.5,648.58,643.83,670.92,673.55]
})

X = time_series_features_extractor(cfg, df['feat_1'], window_size=10, fs=1, overlap=1)
# *** Feature extraction started ***

Got the following error message

Click to expand!
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
----> 1 X = time_series_features_extractor(cfg, df['close'], window_size=10, fs=1, overlap=1)

~/.pyenv/versions/3.7.4/envs/python-3.7.4/src/tsfel/tsfel/feature_extraction/calc_features.py in time_series_features_extractor(dict_features, signal_windows, fs, verbose, **kwargs)
    245 
    246     if window_size is not None:
--> 247         signal_windows = signal_window_splitter(signal_windows, window_size, overlap)
    248 
    249     if len(signal_windows) == 0:

~/.pyenv/versions/3.7.4/envs/python-3.7.4/src/tsfel/tsfel/utils/signal_processing.py in signal_window_splitter(signal, window_size, overlap)
     27         return [signal[i:i + window_size] for i in range(0, len(signal), step)]
     28     else:
---> 29         return [signal[i:i + window_size] for i in range(0, len(signal) - window_size, step)]
     30 
     31 

ValueError: range() arg 3 must not be zero

As of now, I'm solving my use case by using an auxiliary function to roll over the Series/DataFrame. Note the number of rows of df and out are the same. I'm not clear I could solve this use case with the standard tsfel API. Comments?

def rolling_pipe(dataframe, window, fctn):
    return pd.Series(
        [
            dataframe.iloc[i - window : i].pipe(fctn) if i >= window else None
            for i in range(1, len(dataframe) + 1)
        ],
        index=dataframe.index,
    )

out = df['feat_1'].pipe(rolling_pipe, 10, lambda f: time_series_features_extractor(cfg, f))

Thanks!

from tsfel.

tecamenz avatar tecamenz commented on September 15, 2024

Hi @ciberger
The overlap parameter defines the overlap as a percentage of the window_size. So it has to be smaller than 1!
If you chose for example overlap=0.5 your example will work.

from tsfel.

dmfolgado avatar dmfolgado commented on September 15, 2024

@tecamenz thanks for your feedback. You're indeed correct in describing how to use the overlap parameter. With this discussion I realized that the documentation is not correct as it suggests the possibility of overlap=1, which should not exist.
@ciberger let us know if this solved the issue or you need additional help. I'll plan to fix the documentation.

from tsfel.

ciberger avatar ciberger commented on September 15, 2024

As you correctly mentioned, overlap as to be strictly smaller than 1 to work. Tested two window_size of 10 and 30 to check the outcome shape. Using the largest possible input value for overlap, I was able to capture a dataframe with the number of rows equals to the length of the input dataframe minus the window_size. overlap=0.5 resulted in the 218 observations.

Any comments are welcome.

from tsfel import get_features_by_domain, time_series_features_extractor

cfg = get_features_by_domain()
del cfg['spectral']['FFT mean coefficient']

df.shape
# (1099,)

X = time_series_features_extractor(cfg, df, window_size=10, overlap=0.5)
X.shape
# (218, 131)

X = time_series_features_extractor(cfg, df, window_size=10, overlap=0.95)
X.shape
# (1089, 131)
# Result of 1099 - 10 (window_size) = 1089

X = time_series_features_extractor(cfg, df, window_size=30, overlap=0.97)
X.shape
# (1069, 134)
# Result of 1099 - 30 (window_size) = 1069

from tsfel.

smlsantos avatar smlsantos commented on September 15, 2024

Hi there!
I believe I can add a few comments on this:

The time_series_features_extractor will return a DataFrame in which the rows represent each signal window and the columns the extracted features for the given window.

On this matter, you can a priori guess how many features TSFEL will return (independent of input size) by calling tsfel.get_number_features(cfg)
# we are expecting 134 features per window

In the first situation, X = time_series_features_extractor(cfg, df, window_size=10, overlap=0.5), you get 131 features. This is due to the window_size not having enough length for calculating some coefficients of the LPCC feature. If you increase the window_size as you did in the 3rd case, X = time_series_features_extractor(cfg, df, window_size=30, overlap=0.97), you will get the 134 extracted features.

Regarding the number of rows, if you intend a high amount of overlap, e.g. 0.97, you will perform the window splitter function on your signal with a step=1. Thus, the number of rows will then be df.shape – window_size +1

The signal windows that you will obtain for this case can be represented as: [df[i: i + window_size] for i in range(len(df) - window_size + 1)]

Note that some corrections were performed in the signal_window_splitter.

I hope this brings more insights into the output shapes!

from tsfel.

espjose avatar espjose commented on September 15, 2024

HOLA estoy trabajando con esta librería pero no quiero utilizar todas las características , se puede hacer algo para mostrar solo las características que deseo?

from tsfel.

cb3ndev avatar cb3ndev commented on September 15, 2024

HOLA estoy trabajando con esta librería pero no quiero utilizar todas las características , se puede hacer algo para mostrar solo las características que deseo?

Debes armar un json con las caracteristicas que desees y usar la funcion load_json al momento de configurar las features. función: https://tsfel.readthedocs.io/en/latest/descriptions/modules/tsfel.feature_extraction.html?highlight=get_features#tsfel.feature_extraction.features_settings.load_json
ejemplo del json: https://github.com/fraunhoferportugal/tsfel/blob/development/tsfel/feature_extraction/features.json/

from tsfel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.