tnakae / dagmm Goto Github PK

View Code? Open in Web Editor NEW

165.0 165.0 54.0 427 KB

DAGMM Tensorflow implementation

License: MIT License

Jupyter Notebook 94.73% Python 5.27%

dagmm's People

Contributors

Stargazers

Watchers

dagmm's Issues

Tuning Parameters of DAGMM

I have an anomaly dataset split into a trainset and testset including 122 features as follows.

print(X_train2.shape)
print(y_train2.shape)
print(X_test2.shape)
print(y_test2.shape)
(38484, 122)
(38484,)
(38956, 122)
(38956,)

I applied basic Autoencoders to detect the anomalies through an unsupervised approach. I got a 75% precision-recall score, which is not bad. Likewise, I also applied DAGMM but I could get only 71% score as the best score so far. I expect a better score of more than 75% score with DAGMM but I assume that I could not find optimal parameters although I tried many different configurations through the parameters of DAGMM. I also tried the suggestion for the parameters in the paper(DAGMM paper you focused on)The best configuration I found for this benchmark dataset is below:

modelx6 = DAGMM( 
    comp_hiddens = [100, 50, 20, 10], comp_activation = tf.nn.tanh,
    est_hiddens = [5, 10, 2], est_activation = tf.nn.tanh, est_dropout_ratio = 0.5, 
    minibatch_size = 1024, epoch_size = 9000, learning_rate = 0.0001, random_seed = 123, lambda1=0.01, lambda2=0.00001
             )

The scores under the optimal thresholder for the energy are as follows:

Precision = 0.712
 Recall    = 0.712
 F1-Score  = 0.712

Can you please suggest to me how to improve this score of more than 75% that I got from Autoencoder?

Cholesky decomposition problem

When replacing KDDCUP 10 PERCENT dataset with other datasets, always report the error message:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid.
How to fix it?

The model tends to cluster data into two categories.

Thank you for your DAGMM Tensorflow implementation!
I want to cluster with DAGMM, and I found that the model tends to cluster data into two categories when I run the example uses random samples of mixture of gaussian.
I tried adjusting different parameters (e.g., # of layers, lambda1, lambda2, dropout_ratio), but it didn't seem to work.
Have you ever encountered such a situation? If so, how did you solve this problem?
Thank you again!

'Tensorflow' has no attribute 'contrib'

DAGMM couldn't work on my system owing to the error below.

<ipython-input-7-0e3dc57cf735> in <module>
----> 1 model.fit(X_train1)

~/DAGMM/dagmm/dagmm.py in fit(self, x)
    120             # Build graph
    121             z, x_dash  = self.comp_net.inference(input)
--> 122             gamma = self.est_net.inference(z, drop)
    123             self.gmm.fit(z, gamma)
    124             energy = self.gmm.energy(z)

~/DAGMM/dagmm/estimation_net.py in inference(self, z, dropout_ratio)
     59 
     60             # Softmax output
---> 61             output = tf.contrib.layers.softmax(logits)
     62 
     63         return output

AttributeError: module 'tensorflow.compat.v1' has no attribute 'contrib'

Although I forced TF2 to compatible with v1 as follows, it did not work.

# -*- coding: utf-8 -*-
#import tensorflow as tf
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

How can I solve this issue?

Implement Test Codes

Test codes are missing in this repository now.

at least python3.6 is required

Greetings. Thank you for making this software available. The github page states that python 3 is a pre-requisite, but it turns out DAGMM uses python f-strings, which first appeared in python 3.6. They don't work in python 3.5, which is the default in environments such as ubuntu 16.04.5 on AWS nVidia GPU nodes. If you can update the documentation, that will help.

Set random seed

Set random seed at the beginning of fit().
Add an option the user can change random seed.

remove data with normal label in training set

I noticed that you removed all normal labeled instances from training set( In your kddcup99 notebook input [5] X_train, y_train = X_train[y_train ==0], y_train[y_train==0] ). However, in my opinion, for an unsupervised learning method, you ought not to use known labels before testing, they are just for testing your model performance. But when I tried to remove this code, I met this problem after running 6 epochs :

InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid.
	 [[Node: GMM/Cholesky = Cholesky[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GMM/add)]]

I wonder the reason of this circumstance. Could you give me some ideas? thanks a lot!

Normalization of Input Data

Because input data must be normalized before trained by DAGMM,
need a normalization option when initialization
(use StandardScaler of sklearn for example)

CLI Interface

Implement CLI Interface

Requirements

Thank you for making this library available. I followed the requirements below on my virtual environment:

python (3.5-3.6)
Tensorflow <= 1.15
Numpy
sklearn

I use python 3.5.6, Tensorflow 1.14.0, NumPy 1.16.4, scikit-learn 0.20.1 but I am still having requirement issue below on that virtual environment:

ERROR: Could not find a version that satisfies the requirement DAGMM
ERROR: No matching distribution found for DAGMM

Can you please clarify the requirements? That would be great.

KDDCup Data
Thyroid Data
Arrhythmia Data

tnakae / dagmm Goto Github PK

dagmm's People

Contributors

Stargazers

Watchers

Forkers

dagmm's Issues

Recommend Projects

Recommend Topics

Recommend Org