nickwallen / botnet-dga-classifier Goto Github PK
View Code? Open in Web Editor NEWIdentifies domains created by Domain Generating Algorithms commonly used by Botnets.
Identifies domains created by Domain Generating Algorithms commonly used by Botnets.
The required data files are currently downloaded into the R session's temp directory. This will change for each R session. This is not typically an issue except when generating R markdowns. RMD's are built in a new session each time, which causes the data files to be downloaded each and every time.
The initial results give roughly twice as much importance to length versus dictionary words. This may simply be because the magnitude of length (min = 1, mean = 10.98, max = 63) is greater than the magnitude of dictionary words (min = 0, mean = 0.13, max = 1.0).
Simply centering and scaling these features should resolve the issue.
By default, a cutoff probability of 50% is used to classify in most models. Take a look at the Receiver Operating Characteristics to determine if a different cutoff, can improve overall sensitivity.
If the minimum length of an acceptable dictionary word is 2, we can see anomalies like the following.
legit: facebook.com has 6 dictionary words
malicious: trkirbiintwifllswdawckvinon.ru has 17 dictionary words
As currently implemented, the feature does not seem to distinguish legit from malicious domains as good as it could. We also know that malicious domains tend to have more characters generally.
The model is only able to correctly classify 9.2% of malicious domains (sensitivity). How can this be visualized?
Create a plot of only those that are malicious. Then do a scatter plot of length versus dictionary and color by whether they were correctly classified or not.
The length of the domain name itself is likely to be an important differentiator between malicious and benign domain names.
The number of valid dictionary words embedded as substrings within the domain is likely to be a good differentiator between malicious (DGA-generated) and benign domain names.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.