Giter VIP home page Giter VIP logo

mutantxs's Introduction

MutantX-S

MutantX-S is a static malware classification system. It employs techniques such as feature hashing and prototype-based clustering to conserve computational costs, giving the system the ability to scale up to large datasets. This work is our rendition of MutantX-S in Python, the system is created using the logic and algorithms outlined in the following paper:

  • X. Hu, S. Bhatkar, K. Griffin, and K. G. Shin. 2013. MutantX-S: Scalable Malware Clustering Based on Static Features. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (San Jose, CA) (USENIX ATC’13). USENIX Association, USA, 187–198.

The goal of this work is to allow for benchmarking between malware classifiers. The original work was done using extracted opcodes as features, this data was taken from a private dataset. For our rendition, the open source EMBER dataset was used, with function imports being used as features, using an open-source dataset ensures that any experiments done on the system are reproducible.

We used this work as a benchmark for our malware clustering system COUGAR. Results from that comparison can be found in the following paper:

  • N. MacAskill, Z. Wilkins and N. Zincir-Heywood, "Scaling Multi-Objective Optimization for Clustering Malware," 2021 IEEE Symposium Series on Computational Intelligence (SSCI), Orlando, FL, USA, 2021, pp. 1-8, doi: 10.1109/SSCI50451.2021.9659925.

Usage

To get the system up and running, JSON files of malware data must first be downloaded from EMBER. Each sample from these files will contain a corresponding MD5. The user must supply the system with a file listing the MD5s representing which malware samples they wish to cluster (with one on each line). These files, along with the chosen parameters, can be passed to MutantX-S as command line arguments in the following order:

n_gram_size p_max d_min md5_file json_files

with the JSON files each being separated by a space. For more on n_gram_size, p_max and d_min, see the Hu et Al. paper cited above. If nothing is passed via the command line, the user will be prompted to enter each individually.

mutantxs's People

Contributors

noahmacaskill avatar znwilkins avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.