Giter VIP home page Giter VIP logo

retrowrite-tutorial's Introduction

RetroWrite: fuzzing native binaries at lightning speed

Modern grey-box fuzzers are the most effective way of finding bugs in complex code bases, and instrumentation is fundamental to their effectiveness. Existing instrumentation techniques either require source code (e.g., afl-gcc, ASan) or have a high runtime performance cost (roughly 10x slowdown for e.g., afl-qemu).

We introduce Retrowrite, a binary rewriting framework that enables direct static instrumentation for both user-mode binaries and Linux kernel modules. Unlike dynamic translation and trampolining, rewriting code with Retrowrite does not introduce a performance penalty. We show the effectiveness of Retrowrite for fuzzing by implementing binary-only coverage tracking and ASan instrumentation passes. Our binary instrumentation achieves performance similar to compiler-based instrumentation.

This tutorial consists of three practical phases:

  • Fuzzing native binaries without instrumentation
  • Instrumenting native binaries with coverage tracking
  • Instrumenting native binaries with coverage tracking and address sanitization

Research background

Fuzzing is the method of choice for finding security vulnerabilities in software due to its simplicity and scalability, but it struggles to find deep paths in complex programs, and only detects bugs when the target crashes. Instrumentation greatly helps with both issues by (i) collecting coverage feedback, which drives fuzzing deeper into the target, and (ii) crashing the target immediately when bugs are detected, which lets the fuzzer detect more bugs and produce more precise reports. One of the main difficulties of fuzzing closed-source software is that instrumenting compiled binaries comes at a huge performance cost. For example, simple coverage instrumentation through dynamic binary translation already incurs between 10x and 100x slowdown, which prevents the fuzzer from finding interesting inputs and bugs.

We show how we used static binary rewriting for instrumentation: our approach has low overhead (comparable to compile-time instrumentation) but works on binaries. There are three main techniques to rewrite binaries: recompilation, trampoline insertion and reassembleable assembly. Recompilation is the most powerful but it requires expensive analysis and type recovery, which is an open/unsolved problem. Trampolines add a level of indirection and increase the size of the code, both of which have a negative impact on performance. Reassembleable assembly, the technique that we use, suffers from neither problem. In order to produce reassembleable assembly, we first disassemble the binary and then symbolize all code and data references (replacing offsets and references with references to unique labels). The output can then be fed to a standard assembler to produce a binary. Because symbolization replaces all references with labels, we can now insert instrumentation in the code and the assembler will fix the references for us when reassembling the binary. Symbolization is possible because references to code and global data always use RIP-relative addressing in the class of binaries that we support (position-independent x86_64 binaries). This makes it easy to distinguish between references and integer constants in the disassembly.

See the following resources for more details:

Tutorial outline

We present Retrowrite, a framework for static binary rewriting that can add efficient instrumentation to compiled binaries and scales to real-world code. Retrowrite symbolizes x86_64 position-independent Linux binaries and emits reassembleable assembly, which can be fed to instrumentation passes. We implement a binary version of Address Sanitizer (ASan) that integrates with the source-based version. Retrowrite’s output can also be fed directly to AFL’s afl-gcc to produce a binary with coverage-tracking instrumentation. RetroWrite is openly available for everyone to use and we will demo it during the presentation.

Based on the open-source release of RetroWrite, we will rewrite several binaries (starting with small examples to larger real programs and libraries) and prepare them for fuzzing and sanitization. Using the demo programs the attendees will discover bugs in these binaries and learn how to patch them alongside.

The tutorial consists of three phases:

The playground directory contains a set of buggy binaries to play with along with their source code. The tutorial expects that retrowrite is installed and compiled in the retrowrite directory and the fuzzer in the fuzzer directory. You can set up the necessary tools by executing ./setup.sh.

Contact

This tutorial is created by the HexHive group at EPFL and is first presented at SSSS'20. In case of questions, please contact Mathias Payer, Jean-Michel Crepel, or Antony Vennard.

retrowrite-tutorial's People

Contributors

diagprov avatar gannimo avatar jeanmi151 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.