seldonio / alibi-detect Goto Github PK

Algorithms for outlier, adversarial and drift detection

Home Page: https://docs.seldon.io/projects/alibi-detect/en/stable/

License: Other

Python 99.88% Makefile 0.12%

anomaly outlier concept-drift detection unsupervised-learning semi-supervised-learning adversarial data-drift drift-detection time-series

alibi-detect's People

Contributors

Stargazers

Watchers

Forkers

arnaudvl jklaise alexcoca gipster nunofernandes-plight james-fu mishinma rpplayground jingmouren qing0991 kiminh floriandejax snorkeldepth daehwanahn ryandawsonuk djofouc dhkdn9192 cl12102783 aalokshanbhag andreimano tonybaby16 sanyam07 keshava yt114 jayshah96 bnojavan pb055 stjordanis dragon-dane sangha05 e7dal ddr-capital xxyypp icxa wazzed chomolungma kvhooreb micseb adbmd titaofdata maybeee18 valeman mtoslalibu tubbz-alt edwardnguyen1705 mbrner djkimgogo opentechfn clusks zaouk zhuocai giasti80 0xthreebody marvel-works fjsj gianpd ryanpmccaffrey danyalkhaliq ojcobb dhaval-dholakia amanjha8100 passion4energy john-marinton adeeps1 kevinkwshin zhutony tim5go mk-michal iceflameworm hugoporta majolo littlehappi afiqmuzaffar gavinljj vishalbelsare ascillitoe diogofm reza1615 sachinvarghese rodski chunmk dreamplayer-zhang tyronexie abdoujaouhar msoftware akshaya1105 ari-vedant-jain amirgholipour chaitalibodke lenamax2355 laopeng2021 maxpark robertsamoilescu whiteeyehansel n1m6 tmisirpash kuromt felixarcanum akravcuk piropiro124

alibi-detect's Issues

add saved models for examples in bucket and update examples

Integrate categorical distance and convergence methods from alibi.

Question: Detecting Concept Drift in a Text Datasets

I was hoping to use Alibi-detect in order to detect concept drift in a text datastream. However, based on the documentation, it doesn't appear that this is possible. If possible, can you suggest any other project(s) that would allow for this functionality or would your library work if I were to tokenize my data first?

tfserving .pb version of cifar10 resnet32 model

We've got a resnet32 version of the cifar10 classifier in h5 format. Would be great to get in tfserving format like other models we have so we can run as a SeldonDeployment.

Add VAE OD tabular example

Extend plotting fn to work on lists as well as arrays.

Allow user to pass timestamps for SR outlier detector

Create a Concept Drift Integraton for KFServing

The example can follow the existing ones for outlier detection in the repo.

Rewrite save/load fn with singledispatch

Not able to import OutlierProphet

I tried to pip alibi-detect , results in successful install. But i am not able to import outlierProphet files.
Second issue not being able to import KSD drift although all requirements are satisfied.
I am getting following exception:

    from .creme_to_sklearn import convert_creme_to_sklearn

  File "t/lib/python3.5/site-packages/creme/compat/creme_to_sklearn.py", line 221
    raise ValueError(f'n_classes is more than 2 but {self.creme_estimator} is a ' +
                                                                                ^
SyntaxError: invalid syntax



ImportError                               Traceback (most recent call last)
<ipython-input-38-254c7647900a> in <module>
----> 1 from alibi_detect.cd import KSDrift

ImportError: cannot import name 'KSDrift'

Thanks

Access scores as attribute preds.data.instance_score

It would be nice to be able to access prediction scores as preds.data.instance_score instead of preds["data"]["instance_score"]. Could be accomplished with addict https://github.com/mewwts/addict (also Bunch could be replaced with addict if it doesn't serve other purpose). This is just a personal opinion though, feel free to ignore. :)

Add VAE outlier detector example with categorical data

Visualizations: improve confusion matrix examples and add outlier flags to instance level visualizations.

Proba threshold for VAE outlier detector

odcd server and kfserving changes

add test metrics

Add drift integration example

Fix test spectral residual

Vanilla AE outlier detector

Hi,

Is there a reason why there is no vanilla AE outlier detector in od module? This would be nice to have as a baseline.

Issue with Adversarial VAE detection tutorial

Hello everyone.

I have been trying to run the tutorial https://docs.seldon.io/projects/alibi-detect/en/stable/examples/ad_advvae_mnist.html#

However, I have been running into some issues. I will list them below:

If I try to download the model from the google cloud library, the load_detector(filepath_ad) does not work, even though the same procedure using load_tf_model worked for loading the MNIST model. It seems that the problem lies with namings, because I tried to trace it back in the source code and the naming goes from AdversarialVAE to AdversarialAE, so I think there is a problem in the pickle files maybe?

I then tried to train by myself the adversarial detector, following the code in the link. However, I start getting again some issues. When calling the init method for AdversarialAE (not AdversarialVAE like written in tutorial), I write:

ad = AdversarialVAE(threshold=.5,  # threshold for adversarial score
                        model=model,
                        encoder_net=encoder_net,  # can also pass VAE model instead
                        decoder_net=decoder_net,  # of separate encoder and decoder
                        latent_dim=latent_dim,
                        samples=2,  # nb of samples drawn by VAE
                        beta=0.  # weight on KL-divergence loss term of latent space
                       )

However I get an error saying that there is no latent_dim keyword, and if I remove that line I get an error saying that there is no samples keyword method. Looking at the source code, it seems true:

def __init__(self,
                 threshold: float = None,
                 ae: tf.keras.Model = None,
                 model: tf.keras.Model = None,
                 encoder_net: tf.keras.Sequential = None,
                 decoder_net: tf.keras.Sequential = None,
                 model_hl: List[tf.keras.Model] = None,
                 hidden_layer_kld: dict = None,
                 w_model_hl: list = None,
                 temperature: float = 1.,
                 data_type: str = None
                 ) -> None:

If I remove all the missing keywords, calling the fit method results in dimension error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [2048,512], In[1]: [50,1568] [Op:MatMul] name: ae/decoder/sequential_1/dense/Tensordot/MatMul/

Could someone look into this?

Define batch size for each detector

Infer threshold method based on FPR when outlier labels are available.

Consolidate plotting fn by inferring data type from metadata

Elbo loss; reduce_sum

Hi,

Why is it reduce_sum and not reduce_mean in the elbo loss? Doesn't it overbalance the KL divergence loss term?

alibi-detect/alibi_detect/models/losses.py

Line 44 in 14ed962

loss = -tf.reduce_sum(y_mn.log_prob(Flatten()(y_true)))

Infer threshold level for outlier detectors based on e.g. percentile of training/validation data.

Add tests

od
utils
models

Add ECG dataset and create seq2seq example

Add above threshold flag or colour code in plot_feature_outlier_image

Add image example AEGMM outlier detector

Fix save/load fn Prophet detector Python 3.6

Can put a temporary fix in place by wiping the stan backend logger as an attribute: facebook/prophet#1361

AEGMM Cholesky decomposition error

Hi,

When training AEGMM on the KDDCUP99 data I often get:
InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky] (raised when computing gmm_params in the loss function).

Do you know what might cause this? From my experiments seems to be related to the size of latent_dim.

Cheers :)

Benchmark different detection algorithms

Benchmark outlier and adversarial detection algorithms across datasets covering at least tabular (numerical/categorical features and low/high dimensionality), image and time series datasets.

Update docs with utils

Add separate section to docs with utility functions which could be useful as standalone functions (e.g. fetching the datasets, transforming categorical to numerical values, data perturbation functions etc).

make fbprophet an optional dependency

Installing fbprophet seems always taking a long time and it seems to be only used by one detector.

I suggest making it optional dependency, defining in extras_require and adding simple check that would disable that one detector if fbprophet is not installed.

Improve isolation forest with categorical mapping

Investigate whether Isolation Forests (and more generally naive decision tree models like in scikit-learn) can be improved by first mapping the categorical features to numerical space using ABDM and multidim scaling, similar to the example in #83 or the Mahalanobis outlier detector. The reason is that the splitting of ordinally encoded categorical variables is more natural since the preprocessing technique infers an order between the categories.

Add tensorboard support for custom trainer.

add logo

Add datatype metadata to load fn outlier detectors

Generic save and load methods for outlier and concept drift detectors

Error when installing holidays

The holiday package (dependence of fbprophe) returns an error (https://github.com/dr-prodigy/python-holidays/issues/277). I fixed the issue by pinning the holiday package to version 0.9.11 (pip install --upgrade holidays==0.9.11)

Standardize to use tensorflow

We should use tensorflow.keras exclusively from now on. Mixingkeras and tf.keras operations can break stuff:

https://github.com/SeldonIO/odcd/blob/2323831721607441f1d241b7dd5ac71e0c88a158/odcd/cd/model_symmetries.py#L2

Show impact of std_clip for Mahalanobis outlier detector

The std_clip parameter makes the mean and covariance updates more stable. This can be important when the outliers arrive in batches instead of e.g. uniform. The effect is already visualized in https://github.com/SeldonIO/seldon-core/blob/master/components/outlier-detection/mahalanobis/doc.ipynb.

Extend google bucket fetch fn to outlier detectors

pip install alibi-detect issue

when I install alibi-detect，it turns out：

ERROR: Could not find a version that satisfies the requirement tensorboard<2.2.0,>=2.1.0 (from tensorflow>=2->alibi-detect) (from versions: 1.6.0rc0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.13.1, 1.14.0, 1.15.0, 2.0.0, 2.0.1, 2.0.2)
ERROR: No matching distribution found for tensorboard<2.2.0,>=2.1.0 (from tensorflow>=2->alibi-detect)

my environment is
Python 3.6.2 |Anaconda custom (64-bit)| (default, Jul 20 2017, 12:30:02) [MSC v.1900 64 bit (AMD64)] on win32

Cannot load AEGMM model from the example

I have a problem running the "AEGMM and VAEGMM outlier detection on KDD Cup '99 dataset" notebook. For either model, after downloading its parameters from the cloud, loading fails with this error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-9-7cf85ac01c21> in <module>
      1 filepath = './od_aegmm_kddcup/'  # change to directory where model is downloaded
      2 if load_outlier_detector:  # load pretrained outlier detector
----> 3     od = load_detector(filepath)
      4 else:  # define model, initialize, train and save outlier detector
      5     # the model defined here is similar to the one defined in the original paper

~/dev/alibi-detect/alibi_detect/utils/saving.py in load_detector(filepath)
    414 
    415     # load outlier detector specific parameters
--> 416     state_dict = pickle.load(open(os.path.join(filepath, detector_name + '.pickle'), 'rb'))
    417 
    418     # initialize outlier detector

ModuleNotFoundError: No module named 'odcd'

Callbacks in generic tensorflow trainer

Basic functionality to cover:

Early stopping
Learning rate scheduling