Giter VIP home page Giter VIP logo

reassessor's Introduction

Reassessor

Reassessor is an automated tool to search symbolization errors from reassembler-generated assembly files. At a high level, Reassessor searches errors by diffing the compiler generated-assembly file and reassembly file. The details of the algorithm in our paper "Reassembly is Hard: A Reflection on Challenges and Strategies" will appear in USENIX Security 2023.

Install

Reassessor currently works on only Linux machine and we tested on Ubuntu 18.04 and Ubuntu 20.04.

1. Clone Reassessor

$ git clone https://github.com/SoftSec-KAIST/Reassessor
$ cd Reassessor

2. Install Dependencies

Reassessor is written in python 3 (3.6), and it depends on pyelftools (>= 0.29) and captone (>=4.0.2).

To install the dependencies, please run:

$ pip3 install -r requirements.txt

3. Install Reassessor

$ python3 setup.py install --user

Usage

Perform a Preprocessing Step

There is a preprocessing step that needs to be performed before operating Reassessor to produce a compiler-generated assembly file, a non-stripped binary file, and a reassembler-generated assembly file.

You can download our benchmark binary files and compiler-generated assembly files at DOI.

Note If you want to make your own binary set, you should build binaries with --save-temps=obj option to force the compilers to preserve all the intermediate files including assembly files generated during a compilation process. Also, you should enable the -g option to produce binaries with debugging information. Lastly, -Wl,--emit-relocs linker option is required especially when you build non-PIE (Position-dependent Executable) binaries. The linker option preserves relocation information.

Next, you can get reassembler-generated assembly files by running preprocessing module.

Note Docker needs to be installed on the same machine to run reassemblers within a Docker container. Our scripts assume that you can run Docker commands as a regular (unprivileged) user; thus, no need to run them with sudo.

$ python3 -m reassessor.preprocessing <binary_path> <output_dir>

During the preprocessing step, STRIP module strips off debug symbols from the binary to get a stripped binary. Ddisasm and Ramblr take the stripped binary as an input binary. However, the stripping process is omitted for RetroWrite since it requires debugging information to reassemble binaries.

The module produces the reassembly files under the <output_dir>/reassem.

$ ls <output_dir>/reassem
ddisasm.s  retrowrite.s

Note that each reassembly tool supports different sets of binaries: Ramblr only works with non-PIE binaries and RetroWrite only works with x86-64 PIE binaries. Thus, preprocessing module will generate a different set of reassembly files depending on binary files.

Note The preprocessing module runs the-state-of-art reassemblers, Ramblr (commit 64d1049, Apr. 2022), RetroWrite (commit 613562, Apr. 2022), and Ddisasm v1.5.3 (docker image digests: a803c9, Apr. 2022), in a dockerized environment, to produce reassembly files. If you want to run Reassessor with a new reassembler, you should update the execution commands in reassemble() method in preprocessing.py file

Run Reassessor

Reassessor takes in a compiler-generated assembly file and a reassembler-generated assembly file, and transforms assembly expressions into a canonical form to ease the comparison. Then, Reassessor searches errors by comparing the normalized assembly code.

To search reassembly errors, you should run reassessor module as follows:

$ python3 -m reassessor.reassessor <binary_path> <assembly_directory> <output_directory> \
  [--ramblr RAMBLR] [--retrowrite RETROWRITE] [--ddisasm DDISASM]

The reassessor module requires <binary_path> and <assembly_directory> to normalize compiler-generated assembly files. Also, it requires reassembly file to normalize the target reassembly file; you can specify the location of reassembly file by using --ramblr, --retrowrite, and --ddisasm options. Then, reassessor module compares the normalized code and produces report files on <output_directory>.

$ python3 -m reassessor.reassessor <binary_path> <assembly_directory> <output_directory> \
  --ddisasm <reassembly_file_path>
$ ls <output_directory>/norm_db
gt.db  ddisasm.db
$ ls <output_directory>/errors/ddisasm
disasm_diff.txt  sym_diff.txt  sym_errors.dat  sym_errors.json

The reassessor module generates normalized assembly files under <output_directory>/norm_db folder, and then it takes the two normalized files to find the differences between them. Consequently, the reassessor module produces the following files as output: ddisasm_diff.txt, sym_errors.dat, sym_diff.txt, sym_errors.json. Firstly, disasm_diff.txt contains a list of disassembly errors (one per line); each line contains the relevant address, reassembler-generated assembly line, and compiler-generated assembly line. sym_errors.dat is a raw output file containing a list of symbolization errors. This file is used to generate other two files: sym_errors.json and sym_diff.txt. sym_diff.txt is a human-readable representation of sym_errors.dat. Each line of the file contains address, error type, reassembler-generated assembly code, and compiler-generated code, for each error found. Finally, sym_errors.json contains detailed information about each symbolization error found, including the relevant assembly file, line number, relocatable expression type, normalized code, repairability, and so on. The file is written in JSON format.

Docker

You can use a Docker image to try out Reassessor quickly.

The following command will build the docker image name Reassessor using our Dockerfile.

$ docker build --tag reassessor .

Now, you can run Reassessor within a Docker container.

$ docker run --rm reassessor sh -c "/Reassessor/reassessor.py <binary_path> <assembly_directory> \
  <output_directory> [--ramblr RAMBLR] [--retrowrite RETROWRITE] [--ddisasm DDISASM]

Example

You can test Reassessor with our sample program.

1. Build a source code

$ cd examples
$ make
$ cd ..

2. Reassemble the example program

$ mkdir output
$ python3 -m reassessor.preprocessing ./example/bin/hello ./output
$ ls ./output/reassem
ddisasm.s  retrowrite.s

3. Run Reassessor

$ python3 -m reassessor.reassessor ./example/bin/hello ./example/asm ./output  \
  --retrowrite ./output/reassem/retrowrite.s
$ ls ./output/norm_db
gt.db  retrowrite.db
$ ls ./output/errors/retrowrite
disasm_diff.txt  sym_diff.txt  sym_errors.dat  sym_errors.json

Also, you can run Reassessor within a Docker container.

$ docker run --rm -v $(pwd):/input reassessor sh -c "python3 -m reassessor.reassessor \
  /input/example/bin/hello /input/example/asm/ /input/output \
  --retrowrite /input/output/reassem/retrowrite.s"

4. Check Error Report

$ ls ./output/errors/retrowrite/
disasm_diff.txt  sym_diff.txt  sym_errors.dat  sym_errors.json
$ cat ./output/errors/retrowrite/sym_diff.txt
# Instrs to check: 48
# Data to check: 14
Relocatable Expression Type 4 [FP: 3(0) / FN: 0]
E4FP [0] (Disp:3:0) 0x1196  : movl .LC2024(%rip), %eax                  | movl bar+4(%rip), %eax
E4FP [0] (Disp:3:0) 0x11a7  : movl .LC2028(%rip), %eax                  | movl bar+8(%rip), %eax
E4FP [0] (Disp:3:0) 0x11b8  : movl .LC202c(%rip), %eax                  | movl bar+12(%rip), %eax

Dataset

We publicize our benchmark at DOI. (The dataset does not contain SPEC CPU 2006 binaries because of a licensing issue.)

Artifacts

We also provide the artifact to reproduce the experiments in our paper. Please check Reassessor/artifacts/ folder.

Contributions of our works

Reassessor found plentiful symbolization errors from stat-of-art reassemblers. Also, we discovered unseen reassembly errors. We made PR and issues to resolve the errors.

Authors

This research project has been conducted by SoftSec Lab at KAIST and UT Dallas.

Citation

(TBD)

License

See the LICENSE file for license rights and limitations (MIT).

reassessor's People

Contributors

witbring avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

yiruma96

reassessor's Issues

Errors when run examples

Hi, I've followed the instructions in the readme, but error occurs:

(reasm) [itemqq@itemqq-arch Reassessor]$ python --version
Python 3.8.16
(reasm) [itemqq@itemqq-arch Reassessor]$ python -m reassessor.preprocessing ./example/bin/hello ./output
[+] docker run --rm -v /home/itemqq/fault/Reassessor/output/bin:/input -v /home/itemqq/fault/Reassessor/output/reassem:/output reassessor/retrowrite:613562 sh -c "python3 -m retrowrite.librw_x64.rw /input/hello /output/retrowrite.s"
Unable to find image 'reassessor/retrowrite:613562' locally
613562: Pulling from reassessor/retrowrite
e706e0a9f423: Pull complete 
67067e559e33: Pull complete 
Digest: sha256:f5679d9e2bdf521b561698a843dae655cda88aca6f5ce943deec585335c0054d
Status: Downloaded newer image for reassessor/retrowrite:613562
I NEED SOMEBODY HELP
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/retrowrite/librw_x64/rw.py", line 1045, in <module>
    loader.load_functions(flist)
  File "/retrowrite/librw_x64/loader.py", line 68, in load_functions
    print(func.__dict__)
AttributeError: 'NoneType' object has no attribute '__dict__'

Any suggestions on this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.