Giter VIP home page Giter VIP logo

b2sfinder's Introduction

B2SFinder


B2SFinder is a binary-to-source matching tool for OSS reuse detection on COTS software. This project contains the core code of B2SFinder without implementation about database and pipeline.


Requirements

A Windows server with python 2.7 (64-bit) and IDA 7.0.

A Linux server with python.


Quickstart

Step 0: Download the source code packages of candidate OSS projects.

Step 1: Extract code features of OSS projects.

$ cd SourceFeatureExtract
$ python extract_source_feature.py -pj_root <OSS_project_root>

Step2: Compare targeted COTS software with all candidate OSS projects.

$ cd FeatureMatch
$ python feature_match.py -local_match <bin_path/bin_dir>

Code Structure

dir file function
- COTS_list.txt MD5 list of installers of COTS software products.
BinaryFeatureExtract local_binary_feature.py Extracting code features of a binary file.
FeatureMatch feature_inverted_and_trie.py Building and searching in inverted index and Trie without implementation of database.
feature_match.py Matching code feature instances between binary code and source code.
feature_preprocessor.py Preprocessing feature instances to unify their represantations.
SourceFeatureExtract extract_source_feature.py Extracting code features of an OSS project.
get_file_dependency.py Building Compilation Dependency Layered Graph.
src_proj_preprocessor.py Parsing compilation arguments.
if-else-extractor An llvm-based tool to extract if/else features.
switch-case-extractor An llvm-based tool to extract switch/case features.
tools ida_autoanalysis_70.py An IDAPython script for extracting binary features.

Setup

A Windows server with python 2.7 (64-bit) and IDA 7.0, and a Linux server with python are required.


For Windows Server

  1. install python 2.7 64-bit and add it to the PATH (IDA 7.0 need 64-bit python)

  2. Install IDA 7.0 after pre-installed the VS2015 runtime library (need Win7 SP1+ or Win10)

  3. Install the dependencies in python

    pefile, shutil, re
    

For Linux Server

  1. install python 2.7 and add it to the PATH

  2. install clang-3.7

  3. install the dependencies by apt-get

    python-pip build-essential python-dev liblzma-dev libev4 libev-dev dos2unix cmake
    
  4. Install the dependencies in python

    pip install backports.lzma clang==3.7
    

b2sfinder's People

Contributors

1dayto0day avatar floraatom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

b2sfinder's Issues

Released Dataset

Hi, it is an interesting and meaningful work! However, I face some difficulties when constructing the dataset. Can you release a dataset containing both source projects and binaries?

Cannot run `feature_match.py`

As shown in here, script feature_match.py needs to import a CommonManager. However, there is no package named CommonManager. Could you provide some more information about it?

Besides, feature_match.py should compare the features of the input binary and the features extracted from the source code database. However, the input of feature_match.py is merely the input binary. Could you talk more about how to use it?

Thanks in advance.

BinaryFeatureExtract seems to lacks code to extract some features

File ida_autoanalysis_70.py seems to extract feature in [string, switch, nested_if], and local_binary_feature.py extract export info. But as listed in local_binary_feature.py, there seems to have 7 features needed to be extracted.

feature_types = ["export", "string", "switch_case", "nested_if", "const_enum_array", "const_num_array", "string_array"]

Is the code to extract other features in other place or the author will add it in the future?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.