malware-detection-of-pefiles

Type of project: We are planning to pursue this as a Research cum development project and want to explore this field as much as possible during the course of the project. We want to gain a better understanding in this domain so that we can apply the same in real time problems of everyday life. Critical Analysis of Research Papers: Some of the research papers are:

● https://arxiv.org/pdf/1804.04637.pdf Publishers: Hyrum S. Anderson, Phil Roth This paper describes EMBER: a labeled benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files. The dataset includes features extracted from 1.1M binary files: 900K training samples (300K malicious, 300K benign, 300K unlabeled) and 200K test samples (100K malicious, 100K benign). Additionally, it demonstrates one use case wherein it compares a baseline gradient boosted decision tree model trained using LightGBM with default settings to MalConv, a recently published end-to-end (featureless) deep learning model for malware detection.

● https://arxiv.org/ftp/arxiv/papers/1709/1709.01471.pdf Publishers: Edward Raff,Jared Sylvester,Charles Nicholas In this paper they have shown the potential for neural networks to learn from raw byte values, demonstrated by classifying executables as benign or malicious without any feature engineering or processing. Restricted to a manageable range of bytes from the PE Header, the networks are able to match and even surpass the performance of a domain knowledge approach that has equivalent information available. They have also shown the model’s underlying representation is robust enough for re-calibrating models to new domains. This may be significant to larger corporations and organizations that may receive targeted or otherwise unique intrusion attempts.

● http://www.covert.io/research-papers/security/Polonium%20-%20Tera-Scale%20Gr aph%20Mining%20for%20Malware%20Detection.pdf Publishers: Duen Horng Chau, Carey Nachenberg, Jeffrey Wilhelm , Adam Wright, Christos Faloutsos It presented Polonium, a scalable and elective technology for detecting malware. They evaluated it with the largest anonymized file submissions dataset ever published, which spans over 60 terabytes of disk space. The paper formulated the problem of detecting malware as a large-scale graph mining and inference task, for which we construct a huge bipartite graph of almost 1 billion nodes from our data, 48 million of which are users, and 903 million are files. Edges, each denoting a file appearing on a machine, exceeds 37 billion. The method for identifying malware was to locate files with low reputation.

● https://www.researchgate.net/publication/342640653_PE_File-Based_Malware_Dete ction_Using_Machine_Learning Publishers: Namita and Prachi January 2021 | In book: Proceedings of International Conference on Artificial Intelligence and Applications The aim of this paper is to discuss and review the malware analysis of PE files. PE files were chosen in this paper because they work on the Windows operating systems and to date Windows is the most commonly used OS (77.93%) by the users all across the world. PE is a 32/64 bit file format for Windows OS executables, object codes, DLLs and others. Malware analysis of PE files can be done with a variety of features as byte sequences, strings, information flow tracking, opcodes, control flow graphs and API calls and so on. Data Set Source: We are taking our data set from the following sources: https://archive.ics.uci.edu/ml/datasets/Detect+Malacious+Executable(AntiVirus) https://marcoramilli.blogspot.com/2016/12/malware-training-sets-machine-learning.html https://github.com/jivoi/awesome-ml-for-cybersecurity#-datasets https://zeltser.com/malware-sample-sources/ Overall workflow and flowchart of project:

Action Plan and Proposed Models: The PE files were those files generally run on the Windows platform with extension of .exe or .dll. PE files are divided into the PE file header, section table and sections. Data Set Preprocessing : The dataset is preprocessed using noise filters,after preprocessing, the data is splitted as fixed window data. The window data is splitted into 70 % training set and 30% testing set. Feature Extraction: In this, we extract features from PE header and sections. Feature Engineering: After feature extraction, each file will be represented in proposed vector formats, and grouped the extracted features into four categories - file metadata, file packing, imported dlls, and imported functions. Each entry of the vector presented the corresponding feature. The task of malware detection classified files into two classes: malware or benign. Hence, in this project, we will deploy four supervised learning techniques (classification models) for the task of malware detection. We may need to test several algorithms with the dataset which we have generated. For our initial testing algorithm we will use the Gaussian Naive Bayes model which is based on likelihood and probability. We have chosen this algorithm because it is fast, simple, and stable. Then, we are planning to use Logistic regression, Linear discriminant Analysis, K-Nearest Neighbors, Classification and regression Trees, Support vector Machine, Random Forest Decision trees as models for the same purpose. Also we reset the random number seed before each run to ensure that the evaluation of each algorithm is performed using exactly the same data splits in all algorithms. It ensures the results are directly comparable. Then we compare all the models based on performance metrics such as confusion matrix and choose the model with the best accuracy. Usage of Project - Applications in Daily Use: Malware detection in portable executable files has got many applications in our daily lives in various domains. These are as follows:

● For most organizations and businesses, the weakest area in the information security system is endpoint / Endpoint devices. These are potential security locations and are easily exploited through careless staff in the organization. When we download certain files on our system, malicious .exe files also get downloaded without our knowledge and can cause harm to our systems. So here malware detection in portable executable files comes handy and is of much importance.

● Nowadays, Android applications declare as many permissions as possible to provide more function for the users, which also poses severe security threat to them. Although many Android malware detection methods based on permissions have been developed, they are ineffective when the dangerous permissions declared by malicious applications are similar to those declared by benign applications. Also while downloading apps certain portable executable files get downloaded with them having malware in them. So this project can be used to detect them easily to prevent any harm to the users. Due to the increase in the popularity of Android devices, malware developers develop malware on a daily basis to threaten the system integrity and user’s privacy. The proposed framework detects malware from Android apps by performing its dynamic analysis.

● Techniques for malware detection are widely deployed in commercial anti-virus products installed on end-user clients. Anti-virus software typically includes a background service that scans files on access, and an on-demand scanner, which is invoked in regular intervals.

● Sandboxes detect malware by testing potentially malicious code in an isolated virtual environment.This allows researchers to observe the code's real behavior in a safe environment where it cannot spread or do any harm to the system and network it's running on. Sandboxing is a useful malware detection technique and sour project can prove to be useful for the same.

● Cynet 360 delivers full protection against the ever-changing malware landscape by continuously monitoring file execution and process behavior.Cynet’s malware protection comprises multiple complementing layers:signature, Ml based static analysis, sandbox and monitoring of critical memory location Cynet integrate the best NGAV and EDR capabilities to disarm malware before it can fully execute and cause harm.

vernase / malware-detection-of-pefiles Goto Github PK

malware-detection-of-pefiles's Introduction

malware-detection-of-pefiles

malware-detection-of-pefiles's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent