Giter VIP home page Giter VIP logo

gutentag's Introduction

TimeEval logo

TimeEval

Evaluation Tool for Anomaly Detection Algorithms on Time Series.

CI Documentation Status codecov PyPI version License: MIT python version 3.7|3.8|3.9|3.10|3.11 Downloads

See TimeEval Algorithms for algorithms that are compatible to this tool. The algorithms in that repository are containerized and can be executed using the DockerAdapter of TimeEval.

If you use TimeEval, please consider citing our paper.

πŸ“– TimeEval's documentation is hosted at https://timeeval.readthedocs.io.

Features

  • Large integrated benchmark dataset collection with more than 700 datasets
  • Benchmark dataset interface to select datasets easily
  • Adapter architecture for algorithm integration
    • DockerAdapter
    • JarAdapter
    • DistributedAdapter
    • MultivarAdapter
    • ... (add your own adapter)
  • Large collection of existing algorithm implementations (in TimeEval Algorithms repository)
  • Automatic algorithm detection quality scoring using AUC (Area under the ROC curve, also c-statistic) or range-based metrics
  • Automatic timing of the algorithm execution (differentiates pre-, main-, and post-processing)
  • Distributed experiment execution
  • Output and logfile tracking for subsequent inspection

Installation

TimeEval can be installed as a package or from source.

⚠️ Attention!

Currently, TimeEval is tested only on Linux and macOS and relies on unixoid capabilities. On Windows, you can use TimeEval within WSL. If you want to use the provided detection algorithms, Docker is required.

Installation using pip

Builds of TimeEval are published to PyPI:

Prerequisites

  • python >= 3.7, <= 3.11
  • pip >= 20
  • Docker (for the anomaly detection algorithms)
  • (optional) rsync for distributed TimeEval

Steps

You can use pip to install TimeEval from PyPI:

pip install TimeEval

Installation from source

tl;dr

git clone [email protected]:TimeEval/TimeEval.git
cd timeeval/
conda create -n timeeval python=3.7
conda activate timeeval
pip install -r requirements.txt
python setup.py bdist_wheel
pip install dist/TimeEval-*-py3-none-any.whl

Prerequisites

The following tools are required to install TimeEval from source:

  • git
  • Python > 3.7 and Pip (anaconda or miniconda is preferred)

Steps

  1. Clone this repository using git and change into its root directory.
  2. Create a conda-environment and install all required dependencies.
    conda create -n timeeval python=3.7
    conda activate timeeval
    pip install -r requirements.txt
  3. Build TimeEval: python setup.py bdist_wheel. This should create a Python wheel in the dist/-folder.
  4. Install TimeEval and all of its dependencies: pip install dist/TimeEval-*-py3-none-any.whl.
  5. If you want to make changes to TimeEval or run the tests, you need to install the development dependencies from requirements.dev: pip install -r requirements.dev.

Usage

tl;dr

from pathlib import Path
from typing import Dict, Any

import numpy as np

from timeeval import TimeEval, DatasetManager, Algorithm, TrainingType, InputDimensionality
from timeeval.adapters import FunctionAdapter
from timeeval.algorithms import subsequence_if
from timeeval.params import FixedParameters

# Load dataset metadata
dm = DatasetManager(Path("tests/example_data"), create_if_missing=False)


# Define algorithm
def my_algorithm(data: np.ndarray, args: Dict[str, Any]) -> np.ndarray:
    score_value = args.get("score_value", 0)
    return np.full_like(data, fill_value=score_value)


# Select datasets and algorithms
datasets = dm.select()
datasets = datasets[-1:]
# Add algorithms to evaluate...
algorithms = [
    Algorithm(
        name="MyAlgorithm",
        main=FunctionAdapter(my_algorithm),
        data_as_file=False,
        training_type=TrainingType.UNSUPERVISED,
        input_dimensionality=InputDimensionality.UNIVARIATE,
        param_config=FixedParameters({"score_value": 1.})
    ),
    subsequence_if(params=FixedParameters({"n_trees": 50}))
]
timeeval = TimeEval(dm, datasets, algorithms)

# execute evaluation
timeeval.run()
# retrieve results
print(timeeval.get_results())

Citation

If you use TimeEval in your project or research, please cite our demonstration paper:

Phillip Wenig, Sebastian Schmidl, and Thorsten Papenbrock. TimeEval: A Benchmarking Toolkit for Time Series Anomaly Detection Algorithms. PVLDB, 15(12): 3678 - 3681, 2022. doi:10.14778/3554821.3554873

@article{WenigEtAl2022TimeEval,
  title = {TimeEval: {{A}} Benchmarking Toolkit for Time Series Anomaly Detection Algorithms},
  author = {Wenig, Phillip and Schmidl, Sebastian and Papenbrock, Thorsten},
  date = {2022},
  journaltitle = {Proceedings of the {{VLDB Endowment}} ({{PVLDB}})},
  volume = {15},
  number = {12},
  pages = {3678--3681},
  doi = {10.14778/3554821.3554873}
}

gutentag's People

Contributors

arrrrrmin avatar b-deforce avatar codelionx avatar dependabot[bot] avatar gezelligheid avatar wenig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gutentag's Issues

Type Mismatch Error When Using Integer Data with Custom Input

Hello,

I encountered an issue when using the custom_input feature with my own dataset, which contains integer values. When I tried to inject an anomaly into my data, I received a numpy.core._exceptions._UFuncOutputCastingError. This error occurred because the apply_variations function in the Consolidator class tried to add float64 values (from bo.noise, bo.trend_series, and bo.offset) to my int64 time series data, resulting in a type mismatch.

Here is the error message:

numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

This error occurred at the following line in the consolidator.py file:

self.timeseries[:, c] += bo.noise + bo.trend_series + bo.offset

The documentation for the CustomInput class does not explicitly state that the input data must be of type float, but I guess in general gutenTAG is designed to work with floating-point time series data.

Suggested Fix:

To prevent this error, I suggest adding a simple check in the CustomInput class to ensure that the input data is of type float. If the data is of type integer, we can automatically convert it to float (maybe with a warning to inform the user).

if df.dtypes[0] == 'int64':
    df = df.astype(float)

Add post-processing options

Add configuration options to post-process (each individual channel) of a time series.

Post-processing options include:

  • smoothing using convolution along t
  • segment smoothing (smooth different parts of the time series differently)
  • scaling and standardization (MinMax, z-score, ...)

Example configuration:

timeseries:
  - name: "mytest"
    length: 1000
    base-oscillations:
      - kind: "sine"
        frequency: 1
      [...]
    anomalies:
      - position: "beginning"
        [...]
    post-processing:
      - channel: 0
        kind: "smoothing"
        factor: 2.0

Add a smooth transition to the anomaly API

Some existing anomaly kinds use smooth transitions. We should allow all (except extremum) anomalies to do this, and therefore move it to the anomaly itself instead of the anomaly kind.

Example configuration:

timeseries:
  - name: "test"
    base-oscillations:
      - kind: "sine"
    anomalies:
      - position: beginning
        length: 100
        transition-window: 10  # <--
        kinds:
          - kind: "amplitude"
            amplitude_factor: 2

Create new mode - ts_augmentation

We are user of this repo to create time series. We like to introduce new mode on a top of supervised and semi-supervised, call "ts-augmentation' where we produce

  • time series (original)
  • same time series with anomaly (augmented time series)
  • label

We can provide a small code.

New anomaly: pattern-flip

Take a subsequence and flip it vertically:

image

This anomaly can be used for periodic datasets only.
Add an option for a smooth transition if the start and end does not fit to the time series.

Use existing TS file as base oscillation

Use an existing time series as base oscillation for a new dataset. This allows injecting our anomalies into existing time series. The file must be formatted in our canonical file format.

GutenTAG tries to parse the ground truth (label) information to prevent anomaly overlaps.

Example configuration:

timeseries:
  - name: "test"
    base-oscillations:
      - kind: file
        path: path/to/file.csv
        channel: 0

New anomaly: Normalize

Normalization anomalies can be constructed by combining mean and amplitude anomaly kinds, however, this is difficult to achieve. Add a new anomaly kind that allows to specify the way a subsequence should be normalized. Normalization options:

  • normalize-z
  • normalize-minmax
  • normalize-median
  • normalize-mean
  • normalize-logistic
  • normalize-tanh

Example configuration:

timeseries:
  - name: "test"
    base-oscillations:
      - kind: "sine"
    anomalies:
      - position: beginning
        length: 100
        kinds:
          - kind: "normalize-minmax"
            min: 0
            max: 1

Mapping of anomaly-types on time domain

Is there a clean way on a generated time series (I am working with the Benchmark GutenTag dataset) to label the individual time steps with the related anomaly types?

So far I have worked with a run length encoding on is_anomaly to be able to map the respective positions and anomaly types from the yaml.

Unfortunately, a correct labeling cannot be achieved if, for example, there are two different anomaly types in a time series where both positions are, for example, 'middle'.

Further exemplary sample:

image

Incompatible types: pattern -> cylinder_bell_funnel

Hi, it seems that the pattern anomaly is not compatible with CBF base oscillations.

Here is my config file

timeseries:
- name: demo
  length: 1000
  semi-supervised: true
  supervised: false
  channels: 1
  base-oscillations:
  - kind: cylinder_bell_funnel
    frequency: 2.0
    amplitude: 1.0
    variance: 0.05
  anomalies:
  - position: beginning
    length: 100
    kinds:
    - kind: pattern
      cbf_pattern_factor: 2.0

Amplitude anomaly fails with `amplitude_bell` size offset

Hi there πŸ‘‹

first of all, thanks for the great repository.
I came across this example which seems to fail:

timeseries:
  - name: test-amp
    length: 1000
    base-oscillations:
      - kind: sine
    anomalies:
      - position: end
        length: 47
        channel: 0
        kinds:
          - kind: amplitude
            amplitude_factor: 2.0

Running with: python -m gutenTAG --config-yaml tests/configs/config-amp.yaml --seed 42 --no-save --plot
this fails to create the anomaly

Generating datasets:   0%|                                                                                      | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/gutenTAG.py", line 132, in generate
    results: List[Tuple[Dict, Dict[str, Any], Optional[List[ExtTimeSeries]]]] = Parallel(n_jobs=n_jobs)(
                                                                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/joblib/parallel.py", line 1863, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/gutenTAG.py", line 157, in internal_generate
    ts.generate(ctx.seed)
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/generator/timeseries.py", line 37, in generate
    self.timeseries, self.labels = consolidator.generate(GenerationContext(seed=self._create_new_seed(random_seed)))
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/consolidator.py", line 43, in generate
    self.generate_anomalies(ctx)
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/consolidator.py", line 67, in generate_anomalies
    anomaly_protocol = anomaly.generate(ctx.to_anomaly(current_base_oscillation, positions))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/anomalies/__init__.py", line 65, in generate
    protocol = anomaly.generate(protocol)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../.venv/lib/python3.11/site-packages/gutenTAG/anomalies/types/amplitude.py", line 55, in generate
    subsequence = anomaly_protocol.base_oscillation.timeseries[anomaly_protocol.start:anomaly_protocol.end] * amplitude_bell
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
ValueError: operands could not be broadcast together with shapes (47,) (46,) 

Maybe I'm wrong but as it looks the error corresponds to
As far as I can see the error originates in:

transition_length = int(length * 0.2)
and with creeping-length passed in
anomaly_protocol, custom_anomaly_length=int(anomaly_length * 0.8)

I'll open a PR soon, if this is fine for you.
Thanks again for the repo πŸ‘

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.