Giter VIP home page Giter VIP logo

rca's Introduction

RCA: A Deep Collaborative Autoencoder Approach for Anomaly Detection

This is the official implementation of Robust Collaborative Autoencoders(RCA).

Paper Abstract

Unsupervised anomaly detection plays a crucial role in many critical applications. Driven by the success of deep learning, recent years have witnessed a growing interest in applying deep neural networks (DNNs) to anomaly detection problems. A common approach is using autoencoders to learn a feature representation for the normal observations in the data. The reconstruction error of the autoencoder is then used as outlier score to detect the anomalies. However, due to the high complexity brought upon by the over-parameterization of DNNs, the reconstruction error of the anomalies could also be small, which hampers the effectiveness of these methods. To alleviate this problem, we propose a robust framework using collaborative autoencoders to jointly identify normal observations from the data while learning its feature representation. We investigate the theoretical properties of the framework and empirically show its outstanding performance as compared to other DNN-based methods. Our experimental results also show the resiliency of the framework to missing values compared to other baseline methods.

RCA vs Autoencoder

  1. RCA uses multiple autoencoders (we found two autoencoders are usually enough).
  2. For each minibatch, RCA only uses samples with small reconstruction loss to update while AE uses all samples in minibatch to update the model.
  3. Each autoencoder of RCA will exchange the selected data to other autoencoder.
  4. RCA still use dropout during evaluation to get multiple anomaly scores while autoencoder only uses dropout in training.

Conda Environment

We provide the conda virtual environment in environment.yml.

Data

We use the ODDs dataset. The preprocessed data is in the data folder and you need first unzip the data.rar file. More details can be found in Official Page of ODDs Dataset

Example:

run RCA on vowels:

python3 trainRCA.py --data vowels --missing_ratio 0

run RCA on pima:

python3 trainRCA.py --data pima --missing_ratio 0

run RCA on vowels with 10% missing value and mean imputation:

python3 trainRCA.py --data vowels --missing_ratio 0.1

run RCA by using k autoencoders

python3 trainRCAMulti.py --data vowels --missing_ratio 0.0 --n_member k

Hyperparameters

Since in unsupervised anomaly detection, there is no clean validation data available to tune the hyperparameter. Thus, we use the same hyperparameter across all different datasets to show that our method does not heavily depend on hyperparameter tuning.

batchsize=128

learningrate=3e-4 with Adam Optimizer

hidden dimension=256

bottleneck dimension=10

The network structure is in the models/RCA.py. Currently, we use a 6-layer autoencoder.

Baselines

We implement several baselines. Our implementations for one class SVM, SO-GAAL, isolation forest are based on the pyod implementation. They also provide the official benchmark on ODDs dataset, which can be found in here.

We implement the DAGMM and Deep one class SVM by ourselves. Our DAGMM implementation heavily depends on this third-party implementation, and we found the DAGMM is highly numerical unstable in the ODDs dataset. For the DeepSVDD, we train the autoencoder for 50 epochs as the initialization.

Acknowledgements

This research is funded by NSF-IIS 2006633, EF1638679, NSF-IIS-1749940, Office of Naval Research N00014-20-1-2382, National Institue on Aging RF1AG072449.

rca's People

Contributors

liuboyang93 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.