Giter VIP home page Giter VIP logo

x86-sok's Introduction

x86-sok

Overview of the source code:

.
|-- ccr			# source code of ccr/randomizer
|-- compare		# scripts that compare the result between gt and disassembler
|-- extract_gt		# scripts that extract ground truth from binary
|-- disassemblers       # scripts that we use to extract disassemblers' result
|-- gt			# modified gcc/clang toolchain
|-- protobuf_def	# protobuf definitions that defines disassembly information and x-ref information
|-- README.md
`-- testsuite		# coreutils and findutils that compiled by gcc/clang toolchain

The Framework of compilers

We modified compilers(gcc/clang) to collect the information of basic blocks and reconstruct the information of binary code. And we want to extract executable file's basic block, jump table, fixup/reference and function information in compilation tool and can help to evaluate disassemblers.

For clang, we re-use the implemenation of CCR.

                                                              |
 ===============       ===============       ===============  |     =============
||             ||     ||             ||     ||             || |    ||           ||
|| preprocess  ||  +  ||   compile   ||  +  ||  assemble   || | +  ||   link    ||  => executable
||             ||     ||             ||     ||             || |    ||           ||
 ===============       ===============       ===============  |     =============
               llvm/clang              MC Componment          |      linker(gold)

For gcc, we modifed gcc and gas(GNU assembler). Modifications can be found at gcc modification and gas modification.

                                        |                      |
 ===============       ===============  |     ===============  |     =============
||             ||     ||             || |    ||             || |    ||           ||
|| preprocess  ||  +  ||   compile   || | +  ||  assemble   || | +  ||   link    ||  => executable
||             ||     ||             || |    ||             || |    ||           ||
 ===============       ===============  |     ===============  |     =============
                  gcc                   |      assembler(gas)  |      linker(gold)

There exist some differences between llvm toolchains(based on ccr) and gcc toolchains. In llvm, it has MC componment that combines compilation and assembling together internally. While gcc outputs the .s file after compilation, and invoke assembler(gas) to assemble the .s file into object. So it is easier to extract the basic block and jump table information in compilation and store this information in object file after assembling in llvm toolchains when comparing to gcc toolchains.

As basic block, function, and jump table information can only be collected in compilation stage, so firstly, we output the related information into .s file, and then we reconstruct these information in assembler(gas).

Build the compilers

We provide two ways to build the toolchains of compilers: ubuntu18.04 and docker.

Ubuntu 18.04

If you are using Ubuntu 18.04, we recommend you to build the toolchain in your computer:

$ git clone [email protected]:junxzm1990/x86-sok.git
$ cd x86-sok/gt
$ bash build.sh

The gcc/g++ are installed in gt/build/executable_gcc/bin, clang/clang++ are installed in gt/build/build_clang/bin. We also build glibc by using our toolchain so that the compiled glibc contains the information emitted by compiler. Glibc is installed in gt/build/glibc_build_32 or gt/build/glibc_build_64.

For convenience, we provide config scripts to set CC, CXX, CFLAGS, CXXFLAGS, LDFLAGS. These configs are gt/gcc64.rc, gt/gcc32.rc, gt/clang64.rc and gt/clang32.rc.

Before compiling, we can set proper configures by:

# for example, we want to compile the source code by gcc
$ source gcc64.rc

# set the proper optimization level
$ export CFLAGS="-O2 $CFLAGS" && export CXXFLAGS="-O2 $CXXFLAGS"

Docker

If you prefer to use Docker, we also provides script to build docker image.

# install docker firstly
$ curl -fsSL https://get.docker.com/ | sudo sh
$ sudo usermod -aG docker [user_id]

# build our toolchain
$ git clone [email protected]:junxzm1990/x86-sok.git
$ cd x86-sok/gt
$ docker build -t x86_gt ./

# check the image
$ docker image ls
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
x86_gt              latest              85f6fb2d4257        2 minutes ago       20.5GB

# launch the image
$ docker run --rm -it x86_gt:latest /bin/bash

# configure CC, CXX, CFLAGS, CXXFLAGS, LDFLAGS inside container
root@fc44258775ac:/gt_x86# source ./gcc64.rc
root@fc44258775ac:/gt_x86# export CFLAGS="-O2 $CFLAGS" && export CXXFLAGS="-O2 $CXXFLAGS"

Or pull the image from Docker Hub:

$ docker pull bin2415/x86_gt:0.1

# check the image
$ docker image ls
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
bin2415/x86_gt      0.1                 85f6fb2d4257        3 minutes ago       20.5GB

# launch the image
$ docker run --rm -it bin2415/x86_gt:0.1 /bin/bash

# configure CC, CXX, CFLAGS, CXXFLAGS, LDFLAGS inside container
root@fc44258775ac:/gt_x86# source ./gcc64.rc
root@fc44258775ac:/gt_x86# export CFLAGS="-O2 $CFLAGS" && export CXXFLAGS="-O2 $CXXFLAGS"

Compile binary with our toolchain

Here is an example that explains how to use our toolchain to compile binary:

root@5e8606df7f20:/gt_x86# cd test

root@5e8606df7f20:/gt_x86/test# source ../gcc64.rc

root@5e8606df7f20:/gt_x86/test# export CFLAGS="-O0 $CFLAGS"

root@5e8606df7f20:/gt_x86/test# $CC $CFLAGS -o test_switch test_switch.c
[bbinfo]: DEBUG, the target binary format is: size 64, is big endian 0
Update shuffleInfo Done!
Successfully wrote the ShuffleInfo to the .rand section!

# check the .rand section in the executable
root@5e8606df7f20:/gt_x86/test# readelf -S test_switch | grep -A1 rand
 [35] .rand             PROGBITS         0000000000000000  000034ec
       000000000000025b  0000000000000000           0     0     1

Testsuite

We built the testsuite by using our toolchain. It has more than 4000 binaries, including Linux/Windows 32/64 bits. The subset of testsuite is in testsuite folder. The whole testsuite is in link of google drive.

Exatract Ground truth from binary

Linux

We use the example of test_switch to show how to extract ground truth.

# copy the gt info from binary
root@5e8606df7f20:/gt_x86/test#  objcopy --dump-section .rand=test_switch.gt.gz test_switch && gzip -d test_switch.gt.gz

# there has test_switch.gt in current directory
root@5e8606df7f20:/gt_x86/test# ls
test_switch  test_switch.c  test_switch.gt

# extract disassembly result, and the result is saved in /tmp/gtBlock_test_switch.pb
root@5e8606df7f20:/gt_x86/test# python3 ../../extract_gt/extractBB.py -b test_switch -m test_switch.gt -o /tmp/gtBlock_test_switch.pb
...
...
INFO:=======================================================
INFO:[Summary]: padding cnt is 9
INFO:[Summary]: handcoded bytes is 0
INFO:[Summary]: handcoded number is 0
INFO:[Summary]: Jump tables is 1
INFO:[Summary]: Tail indirect call is 2
INFO:[Summary]: overlapping instructions is 0
INFO:[Summary]: Non-returning function is 2
INFO:[Summary]: Multi-entry function is 0
INFO:[Summary]: overlapping functions is 0
INFO:[Summary]: tail call count is is 1

# extract x-ref result, the result is saved in /tmp/gtRef_test_switch.pb
root@5e8606df7f20:/gt_x86/test# python3 ../../extract_gt/extractXref.py -b test_switch -m test_switch.gt -o /tmp/gtRef_test_switch.pb

Note that the definition of disassembly and x-ref result is in protobuf_def/blocks.proto and protobuf_def/refInf.proto.

We provide script to extract ground truth in batch. It searchs all the binaries in a directory.

ubuntu@ubuntu:/x86-sok/extract_gt: bash run_extract_linux.sh -d <directory> -s ./extractBB.py -p gtBlock
ubuntu@ubuntu:/x86-sok/extract_gt: bash run_extract_linux.sh -d <directory> -s ./extractXref.py -p gtRef

Windows

We prepare an example in extract_gt/pemap/testto explain how to extract ground truth.

# extract fixup info, this step must be completed in windows, as we need to use dumpbin tool
windows@windows:/extract_gt/pemap# python3 dumpfixup.py -p ./test/7zDec.pdb -b ./test/7zDec.exe -o ./test/gtRef_7zDec.pb

# extract disassembly info. Note that we need the fixup info
windows@windows:/extract_gt/pemap# make
windows@windows:/extract_gt/pemap# ./PEMap -iwRFE -P ./test/7zDec.pdb -r ./test/gtRef_7zDec.pb -e ./test/7zDec.exe -o ./test/gtBlock_7zDec.pb

Compare the result

We can use scripts in compare folder to compare results between ground truth and comapred tools.

For example, if we want to compare instructions, we can use compareInsts.py:

ubuntu@ubuntu:/x86_sok/compare# python3 compareInsts.py -b <binary path> -g <ground truth> -c <compared>

Note that before comparring the non-return, we need to extend the non-return lists based on ground truth:

# extend the non-rets
ubuntu@ubuntu:/x86_sok/compare# python3 findNonRets.py -b <binary path> -g <ground truth> -o <ground truth with extended non-rets>
# compare
ubuntu@ubuntu:/x86_sok/compare# python3 compareNonRet.py -b <binary path> -g <ground turht with extended non-rets> -c <compared>

Citation

If your research find one or several components of this work useful, please cite the following paper:

@INPROCEEDINGS {sok-x86,
      author = {Chengbin Pang and Ruotong Yu and Yaohui Chen and Eric Koskinen and Georgios Portokalidis and Bing Mao and Jun Xu},
      booktitle = {42nd IEEE Symposium on Security and Privacy (SP)},
      title = {SoK: All You Ever Wanted to Know About x86/x64 Binary Disassembly But Were Afraid to Ask},
      year = {2021},
}

x86-sok's People

Contributors

bin2415 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.