Giter VIP home page Giter VIP logo

dagmm's People

Contributors

tnakae avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dagmm's Issues

Tuning Parameters of DAGMM

I have an anomaly dataset split into a trainset and testset including 122 features as follows.

print(X_train2.shape)
print(y_train2.shape)
print(X_test2.shape)
print(y_test2.shape)
(38484, 122)
(38484,)
(38956, 122)
(38956,)

I applied basic Autoencoders to detect the anomalies through an unsupervised approach. I got a 75% precision-recall score, which is not bad. Likewise, I also applied DAGMM but I could get only 71% score as the best score so far. I expect a better score of more than 75% score with DAGMM but I assume that I could not find optimal parameters although I tried many different configurations through the parameters of DAGMM. I also tried the suggestion for the parameters in the paper(DAGMM paper you focused on)The best configuration I found for this benchmark dataset is below:

modelx6 = DAGMM( 
    comp_hiddens = [100, 50, 20, 10], comp_activation = tf.nn.tanh,
    est_hiddens = [5, 10, 2], est_activation = tf.nn.tanh, est_dropout_ratio = 0.5, 
    minibatch_size = 1024, epoch_size = 9000, learning_rate = 0.0001, random_seed = 123, lambda1=0.01, lambda2=0.00001
             )

The scores under the optimal thresholder for the energy are as follows:

Precision = 0.712
 Recall    = 0.712
 F1-Score  = 0.712

Can you please suggest to me how to improve this score of more than 75% that I got from Autoencoder?

Cholesky decomposition problem

When replacing KDDCUP 10 PERCENT dataset with other datasets, always report the error message:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid.
How to fix it?

The model tends to cluster data into two categories.

Thank you for your DAGMM Tensorflow implementation!
I want to cluster with DAGMM, and I found that the model tends to cluster data into two categories when I run the example uses random samples of mixture of gaussian.
I tried adjusting different parameters (e.g., # of layers, lambda1, lambda2, dropout_ratio), but it didn't seem to work.
Have you ever encountered such a situation? If so, how did you solve this problem?
Thank you again!

'Tensorflow' has no attribute 'contrib'

DAGMM couldn't work on my system owing to the error below.

<ipython-input-7-0e3dc57cf735> in <module>
----> 1 model.fit(X_train1)

~/DAGMM/dagmm/dagmm.py in fit(self, x)
    120             # Build graph
    121             z, x_dash  = self.comp_net.inference(input)
--> 122             gamma = self.est_net.inference(z, drop)
    123             self.gmm.fit(z, gamma)
    124             energy = self.gmm.energy(z)

~/DAGMM/dagmm/estimation_net.py in inference(self, z, dropout_ratio)
     59 
     60             # Softmax output
---> 61             output = tf.contrib.layers.softmax(logits)
     62 
     63         return output

AttributeError: module 'tensorflow.compat.v1' has no attribute 'contrib'

Although I forced TF2 to compatible with v1 as follows, it did not work.

# -*- coding: utf-8 -*-
#import tensorflow as tf
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior() 

How can I solve this issue?

at least python3.6 is required

Greetings. Thank you for making this software available. The github page states that python 3 is a pre-requisite, but it turns out DAGMM uses python f-strings, which first appeared in python 3.6. They don't work in python 3.5, which is the default in environments such as ubuntu 16.04.5 on AWS nVidia GPU nodes. If you can update the documentation, that will help.

Set random seed

  • Set random seed at the beginning of fit().
  • Add an option the user can change random seed.

remove data with normal label in training set

I noticed that you removed all normal labeled instances from training set( In your kddcup99 notebook input [5] X_train, y_train = X_train[y_train ==0], y_train[y_train==0] ). However, in my opinion, for an unsupervised learning method, you ought not to use known labels before testing, they are just for testing your model performance. But when I tried to remove this code, I met this problem after running 6 epochs :

InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid.
	 [[Node: GMM/Cholesky = Cholesky[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GMM/add)]]

I wonder the reason of this circumstance. Could you give me some ideas? thanks a lot!

Normalization of Input Data

Because input data must be normalized before trained by DAGMM,
need a normalization option when initialization
(use StandardScaler of sklearn for example)

Requirements

Thank you for making this library available. I followed the requirements below on my virtual environment:

python (3.5-3.6)
Tensorflow <= 1.15
Numpy
sklearn

I use python 3.5.6, Tensorflow 1.14.0, NumPy 1.16.4, scikit-learn 0.20.1 but I am still having requirement issue below on that virtual environment:

ERROR: Could not find a version that satisfies the requirement DAGMM
ERROR: No matching distribution found for DAGMM

Can you please clarify the requirements? That would be great.

Negative Energy

Hi, I am trying to use the model to do another binary classification task and the energy turned out to be negative. I'm new to GMM, is this possible? Thanks.

Testdata Evaluation Function

Implement functions to import data, train and predict for test data used in the original papers.

  • KDDCup Data
  • Thyroid Data
  • Arrhythmia Data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.