Giter VIP home page Giter VIP logo

malware-detection-of-pe-files's Introduction

Malware-detection-of-PE-files



💻Working Demo of the FastAPI using it's swagger UI.

This project is basically a Malware detection system using Machine Learning and CNN. We also deploy models using fastAPI. So the main steps taken to get the reults are:

  1. Dataset collection
  2. Feature selection
  3. Data preprocessing
  4. Model building
  5. Deploy to fastAPI

🧠 In this we use two different models,
1. RandomForestClassifier : first model is trained on the portable executable files' different sections characteristic which allows us to classify whether a given input file is malicious file or not.
2. CNN model : This model is trained on 9639 malware images of 25 different malware families and using this model we try to classify the detected malware from the first model into 25 different malware families.
So starting with the first model, since we are working with the portable executable files we need to understand the structure of the PE files and which characteristics matter the most. Here is the link to understand the structure, if you want you can check it out [here](https://tech-zealots.com/malware-analysis/pe-portable-executable-structure-malware-analysis-part-2/)

📚 The datasets used:

  1. The dataset used to train the first model is available here https://www.kaggle.com/amauricio/pe-files-malwares
  2. The second dataset i.e Malware image dataset is the already generated dataset from microsoft's 2015 kaggle competition dataset, you can download it from here: https://www.dropbox.com/s/ep8qjakfwh1rzk4/malimg_dataset.zip?dl=0

    So these are the datasets used for building two different models but in the end working as a pipeline. Now these datasets are somewhat old as they were published in 2017 and at the time of making this project they were relevant so you might want to change the data source or you can build a dataset yourself using some of the utility functions from the scripts of this project. Since there wasn't enough time to test this entire project on 1000s of PE files we didnt add the retraining of the data part but you can find that code in the scripts part and execute it when there will be enough new data available.

⚙️ Requirements

First you need to have python 3.6+ to install all the dependencies. Now let's see the requiurements and dependencies you need to install inorder to run this on your end:

  1. We have used python for everything so basic requirement is to have python installed or you can use colab notebooks.
  2. You need to install the pefile module using this
    pip install pefile.
    Now what this pefile module does is that it takes a Portable Executable file as an input and gives an output of the dump which has almost every metadata of the portable executable file.
    To look up more on pefile module and the examples of usage here's a link of the original repository:https://github.com/erocarrera/pefile
  3. For CNN model building you need tensorflow as backend and keras wrapper class, for this we have used the colabcode since it has these libraries pre-installed we just have to import them.
  4. For deploying to fastapi we need fastapi library, so to do that use
    pip install fastapi
  5. You will also need an ASGI server, for production such as Uvicorn or Hypercorn.
    pip install uvicorn.

For more info on fastapi you can see this:https://fastapi.tiangolo.com/

For running this code on colab notebooks you can run the FastAPI for server notebook in the notebooks folder. It has every modules to be installed and imported.

malware-detection-of-pe-files's People

Contributors

avdhoot0303 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

malware-detection-of-pe-files's Issues

jinja2template

hi @avdhoot0303 , in the fast api server model.ipynb file, there is a code 'templates = Jinja2Templates(directory="/content/drive/MyDrive/Final year/Arsha")' can I see the contents of the file?

Internal server Error

hi @avdhoot0303
I'm doing research for my final project in college, I've tried the program you created, but when I run it only produces an internal server error, how do I fix this problem?
I hope you reply to this message, thank you very much

Quetion

hi @avdhoot0303 I have a few questions about the code you made, can you explain it to me?

  1. Why are pred and result given 0?
  2. binary code why [0][1]*100 ? where do the values 0 and 1 come from? and why multiplied by 100 ?

sorry if I have a lot of questions because I don't understand the code, I hope you can explain it, thank you

binary

resultcode

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.