tnakae / dagmm Goto Github PK
View Code? Open in Web Editor NEWDAGMM Tensorflow implementation
License: MIT License
DAGMM Tensorflow implementation
License: MIT License
I have an anomaly dataset split into a trainset and testset including 122 features as follows.
print(X_train2.shape)
print(y_train2.shape)
print(X_test2.shape)
print(y_test2.shape)
(38484, 122)
(38484,)
(38956, 122)
(38956,)
I applied basic Autoencoders to detect the anomalies through an unsupervised approach. I got a 75% precision-recall score, which is not bad. Likewise, I also applied DAGMM but I could get only 71% score as the best score so far. I expect a better score of more than 75% score with DAGMM but I assume that I could not find optimal parameters although I tried many different configurations through the parameters of DAGMM. I also tried the suggestion for the parameters in the paper(DAGMM paper you focused on)The best configuration I found for this benchmark dataset is below:
modelx6 = DAGMM(
comp_hiddens = [100, 50, 20, 10], comp_activation = tf.nn.tanh,
est_hiddens = [5, 10, 2], est_activation = tf.nn.tanh, est_dropout_ratio = 0.5,
minibatch_size = 1024, epoch_size = 9000, learning_rate = 0.0001, random_seed = 123, lambda1=0.01, lambda2=0.00001
)
The scores under the optimal thresholder for the energy are as follows:
Precision = 0.712
Recall = 0.712
F1-Score = 0.712
Can you please suggest to me how to improve this score of more than 75% that I got from Autoencoder?
When replacing KDDCUP 10 PERCENT dataset with other datasets, always report the error message:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid.
How to fix it?
Thank you for your DAGMM Tensorflow implementation!
I want to cluster with DAGMM, and I found that the model tends to cluster data into two categories when I run the example uses random samples of mixture of gaussian.
I tried adjusting different parameters (e.g., # of layers, lambda1, lambda2, dropout_ratio), but it didn't seem to work.
Have you ever encountered such a situation? If so, how did you solve this problem?
Thank you again!
DAGMM couldn't work on my system owing to the error below.
<ipython-input-7-0e3dc57cf735> in <module>
----> 1 model.fit(X_train1)
~/DAGMM/dagmm/dagmm.py in fit(self, x)
120 # Build graph
121 z, x_dash = self.comp_net.inference(input)
--> 122 gamma = self.est_net.inference(z, drop)
123 self.gmm.fit(z, gamma)
124 energy = self.gmm.energy(z)
~/DAGMM/dagmm/estimation_net.py in inference(self, z, dropout_ratio)
59
60 # Softmax output
---> 61 output = tf.contrib.layers.softmax(logits)
62
63 return output
AttributeError: module 'tensorflow.compat.v1' has no attribute 'contrib'
Although I forced TF2 to compatible with v1 as follows, it did not work.
# -*- coding: utf-8 -*-
#import tensorflow as tf
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
How can I solve this issue?
Test codes are missing in this repository now.
Greetings. Thank you for making this software available. The github page states that python 3 is a pre-requisite, but it turns out DAGMM uses python f-strings, which first appeared in python 3.6. They don't work in python 3.5, which is the default in environments such as ubuntu 16.04.5 on AWS nVidia GPU nodes. If you can update the documentation, that will help.
I noticed that you removed all normal labeled instances from training set( In your kddcup99 notebook input [5] X_train, y_train = X_train[y_train ==0], y_train[y_train==0]
). However, in my opinion, for an unsupervised learning method, you ought not to use known labels before testing, they are just for testing your model performance. But when I tried to remove this code, I met this problem after running 6 epochs :
InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid.
[[Node: GMM/Cholesky = Cholesky[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GMM/add)]]
I wonder the reason of this circumstance. Could you give me some ideas? thanks a lot!
Because input data must be normalized before trained by DAGMM,
need a normalization option when initialization
(use StandardScaler of sklearn for example)
Implement CLI Interface
Thank you for making this library available. I followed the requirements below on my virtual environment:
python (3.5-3.6)
Tensorflow <= 1.15
Numpy
sklearn
I use python 3.5.6, Tensorflow 1.14.0, NumPy 1.16.4, scikit-learn 0.20.1 but I am still having requirement issue below on that virtual environment:
ERROR: Could not find a version that satisfies the requirement DAGMM
ERROR: No matching distribution found for DAGMM
Can you please clarify the requirements? That would be great.
Training data have to be shuffled before training.
it is not working for x_train.shape = (n,6) here n = number of samples
Hi, I am trying to use the model to do another binary classification task and the energy turned out to be negative. I'm new to GMM, is this possible? Thanks.
Implement functions to import data, train and predict for test data used in the original papers.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.