Giter VIP home page Giter VIP logo

retrowrite's Introduction

Retrowrite

Retrowrite is a static binary rewriter for x64 and aarch64. It works without heuristics, does not introduce overhead and uses the symbolization technique (also known as reassemblable assembly) to insert instrumentation to binaries without the need for source code.

Please note that the x64 version and the arm64 version use different rewriting algorithms and support a different set of features.

For technical details, you can read the paper (in IEEE S&P'20) for the x64 version and this thesis for the arm64 version.

KRetrowrite is a variant of the x64 version that supports the rewriting of Linux kernel modules.

General setup

Retrowrite is implemented in python3 (3.6). It depends on pyelftools and capstone. To install the dependencies, please run:

pip install -r requirements.txt

It is not recommended to install the dependencies from your distro's package managers, as they might be outdated.

Features

retrowrite-x64 retrowrite-aarch64
stripped binaries ❌ (WIP)
Non-PIE binaries
Non-standard compilers
Zero overhead
Kernel modules support
AFL-coverage instrumentation
ASan instrumentation
C++ support ❌ (WIP) ❌ (WIP)

Command line options

(retro) $ retrowrite --help
usage: retrowrite [-h] [-a] [-A] [-m MODULE] [-k] [--kcov] [-c] [--ignore-no-pie] [--ignore-stripped] [-v] bin outfile

positional arguments:
  bin                   Input binary to load
  outfile               Symbolized ASM output

optional arguments:
  -h, --help            show this help message and exit
  -a, --assemble        Assemble instrumented assembly file into instrumented binary
  -A, --asan            Add binary address sanitizer instrumentation
  -m MODULE, --module MODULE
                        Use specified instrumentation pass/module in rwtools directory
  -k, --kernel          Instrument a kernel module
  --kcov                Instrument the kernel module with kcov
  -c, --cache           Save/load register analysis cache (only used with --asan)
  --ignore-no-pie       Ignore position-independent-executable check (use with caution)
  --ignore-stripped     Ignore stripped executable check (use with caution)
  -v, --verbose         Verbose output

Instrumentation passes

Select the instrumentation pass you would like to apply with retrowrite -m <pass> You can find the available instrumentation passes in folders rwtools_x64 and rwtools_arm64.

Available instrumentation passes for x64: - AddressSanitizer - AFL-coverage information

Available instrumentation passes for aarch64: - AddressSanitizer - AFL-coverage information + forkserver - Coarse grained control flow integrity on function entries

Example usage

a. Instrument Binary with Binary-Address Sanitizer (BASan)

retrowrite --asan </path/to/binary/> </path/to/output/binary>

Note: If on x64, make sure that the binary is position-independent and is not stripped. This can be checked using file command (the output should say ELF shared object).

Example, create an instrumented version of /bin/ls:

retrowrite --asan /bin/ls ls-basan-instrumented.s

This will generate an assembly (.s) file. To recompile the assembly back into a binary, it depends on the architecture:

x64

The generated assembly can be assembled and linked using any compiler, like:

gcc ls-basan-instrumented.s -lasan -o ls-basan-instrumented

debug in case you get the error undefined reference to `__asan_init_v4' , replace "asan_init_v4" by "asan_init" in the assembly file, the following command can help you do that: sed -i 's/asan_init_v4/asan_init/g' ls-basan-instrumented.s

aarch64

On aarch64, we also rely on standard compilers to assemble and link but the collection of compiler flags is slightly more involved and so we provide the -a switch on the main retrowrite executable to do that for you:

retrowrite -a ls-basan-instrumented.s -lasan -o ls-basan-instrumented

b. Instrument a binary with coverage information and fuzz with AFL

x64

To generate an AFL-instrumented binary, first generate the symbolized assembly as described above. Then, recompile the symbolized assembly with afl-gcc from afl++ like this:

$ AFL_AS_FORCE_INSTRUMENT=1 afl-gcc foo.s -o foo

or afl-clang.

aarch64

To instrument a binary with coverage information, use the coverage instrumentation pass with retrowrite -m coverage <input file> <output asm>. Re-assemble the binary with retrowrite -a <output asm> <new binary>.

The binary can now be fuzzed with:

afl-fuzz -i <seed folder> -o <out folder> <new binary>

Retrowrite also tries to add instrumentation to act as a forkserver for AFL; in case this causes problems, you can disable this behaviour by using export AFL_NO_FORKSERVER=1

c. Generate Symbolized Assembly

To generate symbolized assembly that may be modified by hand or post-processed by existing tools, just do not specify any instrumentation pass:

retrowrite </path/to/binary> <path/to/output/asm/files>

The output asm files can be freely edited by hand or by other tools. Post-modification, the asm files may be assembled to working binaries as described above.

While retrowrite is interoperable with other tools, we strongly encourage researchers to use the retrowrite API for their binary instrumentation / modification needs! This saves the additional effort of having to load and parse binaries or assembly files.

KRetrowrite

Quick Usage Guide

Setup

Run setup.sh:

  • ./setup.sh kernel

Activate the virtualenv (from root of the repository):

  • source retro/bin/activate

(Bonus) To exit virtualenv when you're done with retrowrite:

  • deactivate

Usage

Commands

Classic instrumentation
  • Instrument Binary with Binary-Address Sanitizer (BASan) :retrowrite --asan --kernel </path/to/module.ko> </path/to/output/module_asan.ko>
  • Generate Symbolized Assembly that may be modified by hand or post-processed by existing tools: retrowrite </path/to/module.ko> <path/to/output/asm/files>
Fuzzing

For fuzzing campaign please see fuzzing/ folder.

Developer Guide

In general, librw/ contains the code for loading, disassembly, and symbolization of binaries and forms the core of all transformations. Individual transformation passes that build on top this rewriting framework, such as our binary-only Address Sanitizer (BASan) is contained as individual tools in rwtools/.

The files and folder starting with k are linked with the kernel retrowrite version.

Demos

In the demos/ folder, you will find examples for userspace and kernel retrowrite (demos/user_demo and demos/kernel_demo respectively).

Cite

The following publications cover different parts of the RetroWrite project:

  • RetroWrite: Statically Instrumenting COTS Binaries for Fuzzing and Sanitization Sushant Dinesh, Nathan Burow, Dongyan Xu, and Mathias Payer. In Oakland'20: IEEE International Symposium on Security and Privacy, 2020

  • No source, no problem! High speed binary fuzzing Matteo Rizzo, and Mathias Payer. In 36c3'19: Chaos Communication Congress, 2019

License -- MIT

The MIT License

Copyright (c) 2019 HexHive Group, Sushant Dinesh [email protected], Luca Di Bartolomeo [email protected], Antony Vennard [email protected], Matteo Rizzo [email protected], Mathias Payer [email protected]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

retrowrite's People

Contributors

cyanpencil avatar dependabot[bot] avatar diagprov avatar gannimo avatar jeanmi151 avatar matrizzo avatar sushant94 avatar vanhauser-thc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

retrowrite's Issues

LLVM IR Corpus leads to cases where functions are not disassembled.

Hi,

As you may know, I developed a tool that creates an LLVM IR corpus for my master thesis @ HexHive :)

Some generated snippets resulted in an assertion error, I didn't have time to analyse the issue yet, so it might not be in scope of retrowrite.

Files to reproduce:
reproducible.tar.gz

$ clang -O2 -fPIE -fPIC -pie 1198a663f254851c2086795b4c8c54b50e067d7c_SCCP3633_2.c 1198a663f254851c2086795b4c8c54b50e067d7c.ll -o out
$ retrowrite out out.s
[*] Relocations for a section that's not loaded: .rela.dyn
[*] Relocations for a section that's not loaded: .rela.plt
[x] Could not replace value in .init_array
[x] Couldn't find valid section 3de8
[x] Couldn't find valid section 3fd8
[x] Couldn't find valid section 3fe0
[x] Couldn't find valid section 3fe8
[x] Couldn't find valid section 3ff0
[x] Couldn't find valid section 3ff8
Traceback (most recent call last):
  File "path/to/retrowrite/retro/bin/retrowrite", line 176, in <module>
    rw.dump()
  File "path/to/retrowrite/librw/rw.py", line 73, in dump
    results.append("\t.text\n%s" % (function))
  File "path/to/retrowrite/librw/container.py", line 172, in __str__
    assert self.cache, "Function not disassembled!"
AssertionError: Function not disassembled!

Exception: 'r_addend' key missing in relocation

First of all, thanks for publishing the research and source for this tool.

When running python3 -m rwtools.asan.asantool serverbrowser.so serverbrower_instr I get the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/konrad/dev/retrowrite/rwtools/asan/asantool.py", line 83, in <module>
    rewriter = do_symbolization(args.binary, args.outfile)
  File "/home/konrad/dev/retrowrite/rwtools/asan/asantool.py", line 21, in do_symbolization
    reloc_list = loader.reloc_list_from_symtab()
  File "/home/konrad/dev/retrowrite/librw/loader.py", line 113, in reloc_list_from_symtab
    'addend': rel['r_addend'],
  File "/home/konrad/dev/retrowrite/retro/lib/python3.8/site-packages/elftools/elf/relocation.py", line 36, in __getitem__
    return self.entry[name]
  File "/home/konrad/dev/retrowrite/retro/lib/python3.8/site-packages/elftools/construct/lib/container.py", line 35, in __getitem__
    return self.__dict__[name]
KeyError: 'r_addend'

If I print the rel object, there is no r_addend key in the entry in my case:

<Relocation (REL): Container({'r_offset': 3346624, 'r_info': 8, 'r_info_sym': 0, 'r_info_type': 8})>

The binary/library in question is not stripped and has tons of symbols and sections. It is part of the Steam client. I guess I shouldn't share the binary here for copyright reasons.

I haven't debugged this in depth yet, but I guess this issue will show up for others sooner or later.

[Enhancement] can retrowrite support arm32 architecture?

Platform details
Please detail the following:

  • Architecture: arm32
  • Kernel or userspace: STM32 embeded decives
  • Compiler: arm-none-eabi-gcc
  • Language (if not obvious from compiler): c
  • OS: firmware

I tried to port the retrowrite to the arm32 architecture,but it seems be a lot of questions. the firmware usually has no PIE, is it possible to achieve the arm32 retrowrite?

Add support for binaries with MiniDebugInfo/.gnu_debugdata section

Initial reference: https://sourceware.org/gdb/onlinedocs/gdb/MiniDebugInfo.html

LZMA compressed debug symbols can often be found in binaries (particularly CentOS, to my experience), in the .gnu_debugdata segment. The ability to support these would expand the capabilities of retrowrite greatly, though it doesn't seem that I don't believe elftools has support for the .gnu_debugdata segment. Anyhow, as seen in the screenshot attached, this provides almost full debug info for a binary once parsed -- the issue is, of course, de-compressing the symbols and applying them. I'm not the greatest Python developer (I've started thinking more in C than in Python), so I leave this as a suggestion/feature request.

An example from the readelf tool.

[user@CentOS-F5 bin]$ readelf -e telnet
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x4b19
  Start of program headers:          64 (bytes into file)
  Start of section headers:          99912 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         29
  Section header string table index: 28

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000000238  00000238
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000000254  00000254
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.bu[...] NOTE             0000000000000274  00000274
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .gnu.hash         GNU_HASH         0000000000000298  00000298
       0000000000000038  0000000000000000   A       5     0     8
  [ 5] .dynsym           DYNSYM           00000000000002d0  000002d0
       00000000000009f0  0000000000000018   A       6     2     8
  [ 6] .dynstr           STRTAB           0000000000000cc0  00000cc0
       0000000000000421  0000000000000000   A       0     0     1
  [ 7] .gnu.version      VERSYM           00000000000010e2  000010e2
       00000000000000d4  0000000000000002   A       5     0     2
  [ 8] .gnu.version_r    VERNEED          00000000000011b8  000011b8
       0000000000000080  0000000000000000   A       6     1     8
  [ 9] .rela.dyn         RELA             0000000000001238  00001238
       0000000000002748  0000000000000018   A       5     0     8
  [10] .rela.plt         RELA             0000000000003980  00003980
       0000000000000828  0000000000000018  AI       5    12     8
  [11] .init             PROGBITS         00000000000041a8  000041a8
       000000000000001a  0000000000000000  AX       0     0     4
  [12] .plt              PROGBITS         00000000000041d0  000041d0
       0000000000000580  0000000000000010  AX       0     0     16
  [13] .text             PROGBITS         0000000000004750  00004750
       000000000000bad2  0000000000000000  AX       0     0     16
  [14] .fini             PROGBITS         0000000000010224  00010224
       0000000000000009  0000000000000000  AX       0     0     4
  [15] .rodata           PROGBITS         0000000000010230  00010230
       0000000000002cb0  0000000000000000   A       0     0     8
  [16] .eh_frame_hdr     PROGBITS         0000000000012ee0  00012ee0
       00000000000005a4  0000000000000000   A       0     0     4
  [17] .eh_frame         PROGBITS         0000000000013488  00013488
       0000000000001ef4  0000000000000000   A       0     0     8
  [18] .init_array       INIT_ARRAY       0000000000215a90  00015a90
       0000000000000008  0000000000000000  WA       0     0     8
  [19] .fini_array       FINI_ARRAY       0000000000215a98  00015a98
       0000000000000008  0000000000000000  WA       0     0     8
  [20] .jcr              PROGBITS         0000000000215aa0  00015aa0
       0000000000000008  0000000000000000  WA       0     0     8
  [21] .data.rel.ro      PROGBITS         0000000000215aa8  00015aa8
       0000000000000008  0000000000000000  WA       0     0     8
  [22] .dynamic          DYNAMIC          0000000000215ab0  00015ab0
       0000000000000220  0000000000000010  WA       6     0     8
  [23] .got              PROGBITS         0000000000215cd0  00015cd0
       0000000000000330  0000000000000008  WA       0     0     8
  [24] .data             PROGBITS         0000000000216000  00016000
       0000000000001ac8  0000000000000000  WA       0     0     32
  [25] .bss              NOBITS           0000000000217ae0  00017ac8
       000000000000cf48  0000000000000000  WA       0     0     32
  [26] .gnu_debuglink    PROGBITS         0000000000000000  00017ac8
       0000000000000014  0000000000000000           0     0     4
  [27] .gnu_debugdata    PROGBITS         0000000000000000  00017adc
       0000000000000a58  0000000000000000           0     0     1
  [28] .shstrtab         STRTAB           0000000000000000  00018534
       0000000000000111  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000001f8 0x00000000000001f8  R E    0x8
  INTERP         0x0000000000000238 0x0000000000000238 0x0000000000000238
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x000000000001537c 0x000000000001537c  R E    0x200000
  LOAD           0x0000000000015a90 0x0000000000215a90 0x0000000000215a90
                 0x0000000000002038 0x000000000000ef98  RW     0x200000
  DYNAMIC        0x0000000000015ab0 0x0000000000215ab0 0x0000000000215ab0
                 0x0000000000000220 0x0000000000000220  RW     0x8
  NOTE           0x0000000000000254 0x0000000000000254 0x0000000000000254
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x0000000000012ee0 0x0000000000012ee0 0x0000000000012ee0
                 0x00000000000005a4 0x00000000000005a4  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000015a90 0x0000000000215a90 0x0000000000215a90
                 0x0000000000000570 0x0000000000000570  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 
   03     .init_array .fini_array .jcr .data.rel.ro .dynamic .got .data .bss 
   04     .dynamic 
   05     .note.ABI-tag .note.gnu.build-id 
   06     .eh_frame_hdr 
   07     
   08     .init_array .fini_array .jcr .data.rel.ro .dynamic .got 

As you'll see from the readelf output, there also exists the issue of lacking a proper section-to-segment mapping for .gnu_debugdata. I believe the issue may be able to be solved with a fork of PyELFTools, though that's up to you all as it would complicate installation.

Screen Shot 2021-05-05 at 5 05 21 PM

Regards,

impost0r

Load widening

Hi,
this is not an issue but a question.
How do you handle binaries that perform load widening?
ASAN fix this problem simply partially disabling this optimization,
I'm curious about how retrowrite solved this issue at binary level.

Thank you :)

AssertionError in rw.py

Hi,
I'm trying to instrument objdump (2.32.51 from git) with basan.
The binary if PIE not stripped:

objdump: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 2.6.32, BuildID[sha1]=71d653b38898745c40f2b7e2346b978da2421e41, with debug_info, not stripped

Running asantool I get the following error:

$ python -m rwtools.asan.asantool objdump objdump_basan
[*] Relocations for a section that's not loaded: .rela.dyn
[*] Relocations for a section that's not loaded: .rela.plt
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/andrea/Documents/retrowrite/rwtools/asan/asantool.py", line 83, in <module>
    rewriter = do_symbolization(args.binary, args.outfile)
  File "/home/andrea/Documents/retrowrite/rwtools/asan/asantool.py", line 30, in do_symbolization
    rw.symbolize()
  File "/home/andrea/Documents/retrowrite/librw/rw.py", line 57, in symbolize
    symb.symbolize_text_section(self.container, None)
  File "/home/andrea/Documents/retrowrite/librw/rw.py", line 144, in symbolize_text_section
    self.symbolize_mem_accesses(container, context)
  File "/home/andrea/Documents/retrowrite/librw/rw.py", line 331, in symbolize_mem_accesses
    container, target)
  File "/home/andrea/Documents/retrowrite/librw/rw.py", line 265, in _adjust_target
    assert sec is not None
AssertionError

I'm on Ubuntu 18.04.2 , I've just run setup.sh to prepare the retrowrite venv.
My python version is:

Python 3.6.8 (default, Jan 14 2019, 11:02:34) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux

I attach the objdump binary.
objdump.zip

[BUG] Unable to get helloworld.s from retrowrite

Dear developers,

Could someone tell me if I misuse the retro-write to cause this issue or any other problem? Thank you for your help!

Platform: Ubuntu 20.04, x86_64.
gcc version: 9.4.0

source code:
#include <stdio.h>
void main(){
printf("hello\n");
}

I write them into case.c, and compile with cammand:
gcc case.c -g -pie -o case

Here is the execute file info:
:~/retrowrite$ file case
case: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=b912de4ace5d75a39fec09e0d6a7c9e917510d0f, for GNU/Linux 3.2.0, with debug_info, not stripped

Output from retrowrite:
~/retrowrite$ ./retrowrite --asan case case.s
Rewriting case into case.s
IDENTIFIED IMPORTS
{'name': 'frame_dummy', 'cache': [], 'start': 4416, 'sz': 0, 'bytes': b'', 'bbstarts': {4416}, 'bind': 'STB_LOCAL', 'except_table': None, 'cfi_map': None, 'nexts': defaultdict(<class 'list'>, {}), 'analysis': defaultdict(<function Function.init.. at 0x7ffff5ec3c10>, {}), 'instrumented': False, 'is_mangled': False, '_true_name': None}
.init_array frame_dummy pointer removed.
Traceback (most recent call last):
File "./retrowrite", line 307, in
loader.load_relocations(reloc_list)
File "/home/wen/retrowrite/librw_x64/loader.py", line 147, in load_relocations
debug("[*] Relocations for a section that's not loaded:" + str(reloc_section))
NameError: name 'debug' is not defined

Improve X64 jump table handling

Hi, thanks for your contribution and hard work! Retrowrite is amazing.

Actually, I find a small unsoundness issue in reassembly and want to set up some discussion here. It would be very appreciated if anyone can comment on this.

In short, my key insight is that: although we do not need to distinguish numerical numbers and references/labels in PIE binaries, we still need to distinguish numerical numbers and the label offsets.

I use the latest commit 9e2e633e9ab165681733f3255e648a62b22e6368 for reference.

Case 1

The story begins when I got a program which behaves differently after reassembly-and-recompilation (the attached code is reduced for easy demonstration).

#include <stdio.h>

static const int t[2] = {-187, -184};

int main() {
    int x;
    scanf("%d", &x);
    printf("%d\n", t[x]);
}

My basic setup is:

$ gcc --version 
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc poc.c

$ file a.out
a.out: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=f1c5b51f8f0547a7d9e06fdc52bef7d450d34022, not stripped

$ retrowrite -s a.out a.s

$ gcc -no-pie a.s -o b.out
[*] Relocations for a section that's not loaded: .rela.dyn
[*] Relocations for a section that's not loaded: .rela.plt
[x] Could not replace value in .init_array
[x] Couldn't find valid section 200db0
[x] Couldn't find valid section 200fd8
[x] Couldn't find valid section 200fe0
[x] Couldn't find valid section 200fe8
[x] Couldn't find valid section 200ff0
[x] Couldn't find valid section 200ff8

After that, we can execute a.out and b.out to get the execution results.

$ ./a.out
1
-184

$ ./b.out
1
-202

We can see, with the same input 1, a.out prints -184 but b.out prints -202.

Originally, I think the fault is caused by the plt warnings. But after some exploration, I found it is an unsoundness issue in theory. Specifically, let's check the reassembly file a.s

...

.globl t_818
t_818: # 818 -- 820                    # static const int t[2] locates here
.LC818:
        .long .LC75d-.LC818
.LC81c:
        .long .LC760-.LC818

...

.LC756:
        leaq .LC818(%rip), %rax
.LC75d:
        movl (%rdx, %rax), %eax
.LC760:
        movl %eax, %esi
.LC762:
        leaq .LC823(%rip), %rdi
.LC769:
        movl $0, %eax
.LC76e:
        callq printf@PLT

...

We can see, retrowrite misclassified the numerical elements in const int t[2] as the label offsets (e.g., .LC75d-.LC818). After compilation, these values are changed for sure.

Then I go check the code.

def symbolize_switch_tables(self, container, context):

It seems that retrowrite uses heuristics to symbolize jump tables (which contains many label offsets). And in this case, the global const int t[2] satisfies the heuristics by chance, which confuses retrowrite. Unsoundness is still here and there, somehow.

With more study, I think the problem can be summarized as:

Although we do not need to distinguish numerical numbers and references in PIE binaries, we still need to distinguish numerical numbers and the offsets between labels. I feel these label offsets are not in the symbol/relocation table.

The aforementioned case provides an example, where the element in jump table is the offset between labels (the target and jump base). That is why we got confused here.

Case 2

To further study the root cause, I hand-crafted a program with inline-assembly. Hope it can help.

test rdi, rdi;
je A;

mov rbx, B-A;
push rbx;
lea r8, [rip + A];

A:
pop r9;
add r8, r9;
jmp r8;

B:
mov rax, 60;     # SYS_EXIT
mov rdi, 0;
syscall;
ret;

Let's go through the code.

test rdi, rdi;
je A;

The first test-je pattern is to let retrowrite know there is a basic block starting at A.

mov rbx, B-A;
push rbx;
lea r8, [rip + A];

The mov instruction is the key, which loads the offset between labels B and A into rbx.
The following push instruction stores the offset into memory, and the lea instruction loads the address of label A into r8.

A:
pop r9;
add r8, r9;
jmp r8;

Later, we pop the offset into r9. Note that, here we use a simple push-pop pattern to simulate the complex behaviors (including the aliasing problems) in real-world binary.
The add instruction adds r8 and r9, which denotes A + B - A = B.
Then, an indirect jump leads the control flow to B.

B:
mov rax, 60;     # SYS_EXIT
mov rdi, 0;
syscall;
ret;

B is a simple exit(0).

My basic setup is:

$ cat poc.c
int main(int argc, char **argv) {
    asm volatile(
        ".intel_syntax noprefix\n"

        "\ttest rdi, rdi;\n"
        "\tje A;\n"

        "\tmov rbx, B-A;\n"
        "\tpush rbx;\n"
        "\tlea r8, [rip + A];\n"

        ".global A\n"
        "A:\n"
        "\tpop r9;\n"
        "\tadd r8, r9;\n"
        "\tjmp r8;\n"

        ".global B\n"
        "B:\n"
        "\tmov rax, 60;\n"
        "\tmov rdi, 0;\n"
        "\tsyscall;\n"

        "\tret;\n"

        ".att_syntax;\n"
        );
}

$ gcc poc.c

$ retrowrite -s a.out a.s
[*] Relocations for a section that's not loaded: .rela.dyn
[x] Could not replace value in .init_array
[x] Couldn't find valid section 200df8
[x] Couldn't find valid section 200fd8
[x] Couldn't find valid section 200fe0
[x] Couldn't find valid section 200fe8
[x] Couldn't find valid section 200ff0
[x] Couldn't find valid section 200ff8

$ AFL_AS_FORCE_INSTRUMENT=1 ~/AFLplusplus/afl-clang -no-pie a.s -o b.out

Let's first check the reassembly file

...

.LC605:
        testq %rdi, %rdi
.LC608:
        je .L619
.LC60a:
        movq $8, %rbx                   # Originally, it is the offset between labels B and A, misclassified as a numerical number 8
.LC611:
        pushq %rbx
.LC612:
        leaq (%rip), %r8
.L619:
.LC619:

...

We can see retrowrite left the label offset (B-A) as a constant number 8.

After instrumentation, the offset has changed, but the constant is still left here, which breaks the recompiled binary (the indirect jump target becomes invalid).

$ ./a.out

$ ./b.out
[1]    4754 segmentation fault (core dumped)  ./b.out

The solution is to infer the label offset (B - A). However, the traditional challenge, which is caused by sophisticated memory behaviors, is still there.

More

I have attached the above files. The directory structure is:

- case1    # the printf case
    - poc.c
    - a.out
    - a.s
    - b.out
- case2    # the inline-asm case
    - poc.c
    - a.out
    - a.s
    - b.out

It seems that an Usenix paper also mentions the small unsoundness issue in Section 7.1. It would be very appreciated if anyone can share some thinkings here. And also, I hope our discussion can help the development of retrowrite, which is, again, a great work for us to follow.

Thanks!

[BUG] RetroWrite does not disassemble a set of functions

RetroWrite does not disassemble certain functions
especially when their symbol visibilities are 'STV_HIDDEN.'
Moreover, the omission causes recompilation errors.

I examined source code, and found the following code that filters out hidden functions.

for symbol in section.iter_symbols():
if symbol['st_other']['visibility'] == "STV_HIDDEN":
continue

I think the above code should be removed to fix the bug.
Thank you.

Test Environment.

  1. Platform: Ubuntu 18.04, x86-64
  2. Compiler: gcc-7
  3. Target binary: binutils-2.31.1/objcopy

issues with handling aliased symbols

Hi,

The current version of retrowrite may not properly handle aliased symbols.

An object, such as a function, can have multiple aliased symbols. However, the loading process at https://github.com/HexHive/retrowrite/blob/master/librw/loader.py#L124 just picks the last one. This can cause problems because other references to the same object use the symbol linked to the relocation (e.g., https://github.com/HexHive/retrowrite/blob/master/librw/rw.py#L185) --- the symbol picked by the loading process and the symbol linked to the relocation can be different, though they are aliases.

A safer strategy is to keep all aliased symbols in the assembly file by using the ".set" primitive.

I have committed a tentative "patch" to the repo I forked locally: https://github.com/junxzm1990/retrowrite.

FYI, a tentative "patch" to the naming issues of global symbols I mentioned at #15 is also committed to my local repo.

Segmentation fault in reassembled binary

Another possible bug.

Issue:

$ ./08e6e84cab2284e35e4808f1891290b0519f1e3f_GlobalDCE1061_12_bin_ref
$ echo $?
0

$ ./08e6e84cab2284e35e4808f1891290b0519f1e3f_GlobalDCE1061_12_bin_mod
Segmentation fault (core dumped)

Build:

$ clang -O2 -fPIC -fPIE -pie 08e6e84cab2284e35e4808f1891290b0519f1e3f_GlobalDCE1061_12.c 08e6e84cab2284e35e4808f1891290b0519f1e3f.ll -o 08e6e84cab2284e35e4808f1891290b0519f1e3f_GlobalDCE1061_12_bin_ref
$ retrowrite 08e6e84cab2284e35e4808f1891290b0519f1e3f_GlobalDCE1061_12_bin_ref 08e6e84cab2284e35e4808f1891290b0519f1e3f_GlobalDCE1061_12_bin_ref.s
$ clang 08e6e84cab2284e35e4808f1891290b0519f1e3f_GlobalDCE1061_12_bin_ref.s -o 08e6e84cab2284e35e4808f1891290b0519f1e3f_GlobalDCE1061_12_bin_mod

reproducible.tar.gz

Cannot recompile assembled code[BUG]

When I try to compile the assembled code I get this error

(retro) bash-5.1$ file binary/quich
binary/quich: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=f78f9f350f2b849566bbc2062e349729a5493eba, for GNU/Linux 4.4.0,
with debug_info, not stripped

(retro) bash-5.1$ retrowrite binary/quich asm/quich_reassembled.s
.init_array frame_dummy pointer removed.
[*] Relocations for a section that's not loaded: .rela.dyn
[*] Relocations for a section that's not loaded: .rela.plt
[x] Couldn't find valid section ade0
[x] Couldn't find valid section afd8
[x] Couldn't find valid section afe0
[x] Couldn't find valid section afe8
[x] Couldn't find valid section aff0
[x] Couldn't find valid section aff8

(retro) bash-5.1$ gcc asm/quich_reassembled.s -lm -o binary/quich_reassembled
asm/quich_reassembled.s: Assembler messages:
asm/quich_reassembled.s:4898: Error: unrecognized symbol type "GLIBC_2.2.5_b220"
asm/quich_reassembled.s:4898: Error: junk at end of line, first unrecognized character is `,'
asm/quich_reassembled.s:4899: Error: junk at end of line, first unrecognized character is `@'
asm/quich_reassembled.s:4900: Error: invalid character '@' in mnemonic
asm/quich_reassembled.s:4936: Error: unrecognized symbol type "GLIBC_2.2.5_b230"
asm/quich_reassembled.s:4936: Error: junk at end of line, first unrecognized character is `,'
asm/quich_reassembled.s:4937: Error: junk at end of line, first unrecognized character is `@'
asm/quich_reassembled.s:4938: Error: invalid character '@' in mnemonic
asm/quich_reassembled.s:4974: Error: unrecognized symbol type "GLIBC_2.2.5_b240"
asm/quich_reassembled.s:4974: Error: junk at end of line, first unrecognized character is `,'
asm/quich_reassembled.s:4975: Error: junk at end of line, first unrecognized character is `@'
asm/quich_reassembled.s:4976: Error: invalid character '@' in mnemonic

Steps To Reproduce:

Source code can be found at https://github.com/Usbac/quich, I added -pie to the
CFLAGS to make sure that the binary is compiled as position independent code
(PIE). Adding or removing the -g flags does not change anything.

Environment:

OS: 5.14.21-2-MANJARO x86_64 GNU/Linux
GCC: gcc (GCC) 11.2.0
retrowrite at commit: 117dad5

capstone.CsError: Invalid option (CS_ERR_OPTION)

i can't run it .

`( retro) root:#cd retrowrite/demos/user_demo

( retro) root:retrowrite/demos/user_demo# make

( retro) root:retrowrite/demos/user_demo# retrowrite --asan heap heap_retwo

Traceback (most recent call last):
File "/home/zhang/retrowrite/retro/bin/retrowrite", line 137, in
loader.load_data_sections(slist, lambda x: x in Rewriter.DATASECTIONS)
File "/home/zhang/retrowrite/librw/loader.py", line 69, in load_data_sections
disasm_bytes(section.data(), seclist[sec]['base']))
File "/home/zhang/retrowrite/librw/disasm.py", line 6, in disasm_bytes
md.syntax = CS_OPT_SYNTAX_ATT
File "/home/zhang/retrowrite/retro/lib/python3.6/site-packages/capstone/init.py", line 1012, in syntax
raise CsError(status)
capstone.CsError: Invalid option (CS_ERR_OPTION)`

Could not find valid section ...

I run it as in the example:

(retro) # python3 -m librw.rw /bin/ls ./ls.s 
[*] Relocations for a section that's not loaded: .rela.dyn
[*] Relocations for a section that's not loaded: .rela.plt
[x] Could not replace value in .init_array
[x] Couldn't find valid section 21398
[x] Couldn't find valid section 21fc8
[x] Couldn't find valid section 21fd0
[x] Couldn't find valid section 21fd8
[x] Couldn't find valid section 21fe0
[x] Couldn't find valid section 21fe8
[x] Couldn't find valid section 21ff0
[x] Couldn't find valid section 21ff8

an ls.s file is created but it fails to compile:

# gcc ls.s -o ls.bin
/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x20): undefined reference to `main'
/bin/ld: /tmp/cc071IcM.o: in function `_obstack_newchunk':
(.text+0x7f): undefined reference to `.L160e0'
/bin/ld: (.text+0x111): undefined reference to `.L16100'
/bin/ld: /tmp/cc071IcM.o: in function `_obstack_free':
(.text+0x178): undefined reference to `.L16100'
/bin/ld: /tmp/cc071IcM.o: in function `_obstack_begin':
(.text+0xd): undefined reference to `.L16120'
/bin/ld: /tmp/cc071IcM.o: in function `_obstack_begin_1':
(.text+0x22): undefined reference to `.L16120'
/bin/ld: /tmp/cc071IcM.o: in function `_obstack_free':
(.text+0x18d): undefined reference to `.L4722'
/bin/ld: /tmp/cc071IcM.o:(.data+0x1f0): undefined reference to `.LCcc60'
/bin/ld: /tmp/cc071IcM.o:(.data+0x260): undefined reference to `.LC160a0'
/bin/ld: /tmp/cc071IcM.o:(.data.rel.ro+0x0): undefined reference to `.LC6ce0'
/bin/ld: /tmp/cc071IcM.o:(.data.rel.ro+0x8): undefined reference to `.LC7200'
[...]

and this happens with every binary I try, static compiled or dynamic. all 64 bit though

[BUG] cannot use retrowrite with binary compiled with gcc or g++

Description
retrowrite print no issue when runned on binary compiled with gcc but when I try to assemble the generated assembly with gcc I get this error:

/usr/bin/ld:hello.asm: file format not recognized; treating as linker script
/usr/bin/ld:hello.asm:1: syntax error
collect2: error: ld returned 1 exit status

Environment:

OS: 5.14.21-2-MANJARO x86_64 GNU/Linux
GCC: gcc (GCC) 11.2.0
clang:13.0.1
retrowrite at commit: 7c230bc

I tried with a simple hello world program:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    printf("Hello, World!\n");

    return 0;
}

And compile it with gcc: gcc -O0 -ggdb -Wall -Wpedantic -Wextra -fPIC -fPIE -pie hello.c -o hello
then ./retrowrite hello hello.asm
gcc hello.asm -o hello_instrumented => error
note that clang hello.asm -o hello_instrumented seems to work on some files

[Enhancement] Your description

Platform details
Please detail the following:

  • Architecture: x86-64
  • Kernel or userspace:
  • Compiler:
  • Language (if not obvious from compiler):
  • OS: Windows

Is there any plan to extend the static rewriting capability to Windows PE binaries?

undefined reference to `__asan_init_v4' encounters, how to fix it ?

Hello ! I entered the 'demo' subdirectory and 'make heap.asan'. However, I encountered the following message:

[] Relocations for a section that's not loaded: .rela.dyn
[
] Relocations for a section that's not loaded: .rela.plt
[x] Couldn't find valid section 403ff0
[x] Couldn't find valid section 403ff8
[] Loading analysis cache
[
] Instrumented: 25 locations
Stats: [ 4 14 7]
{"rflags": 21, "rdi": 15, "rax": 10, "rbp": 1, "rsi": 2}
rflags live: 4, rflags + 0 regs: 2, rflags + rax: 0, rflags + >= 1 reg: 2
clang heap.asan.s -lasan -o heap.asan
/usr/bin/ld: /tmp/heap-1df0be.o: in function asan.module_ctor': (.text+0x542): undefined reference to __asan_init_v4'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

I tried the following solutions: 1) change the compiler from clang-9 to gcc-4.8, 2) change the '-lasan'
linkage option to '-static-libasan' as some googled tips said, but all fail to the same error message.
So how to fix it ?
Looking forward to your suggestion ~

Compilation fail on reassembled code

Hi, I have a binary failed to be reassembled. It is bsdtar in libarchive.

The git version of retrowrite is b842aca0d1ff3ad10b4df71c5f4a2944bae18580

The binary information is:

$ file bsdtar
bsdtar: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c1d62fbb71cec37b6ce7089b476513fb3bc4146e, not stripped

$ checksec bsdtar
CANARY    : ENABLED
FORTIFY   : disabled
NX        : ENABLED
PIE       : ENABLED
RELRO     : Partial

I tried following commands to get the reassembled code:

$ retrowrite bsdtar a.s
[*] Relocations for a section that's not loaded: .rela.plt
[*] Relocations for a section that's not loaded: .rela.dyn
[x] Could not replace value in .init_array
[x] Couldn't find valid section 308230
[x] Couldn't find valid section 30afd0
[x] Couldn't find valid section 30afd8
[x] Couldn't find valid section 30afe0
[x] Couldn't find valid section 30afe8
[x] Couldn't find valid section 30aff0

$ gcc a.s -llzma -lcrypto -lz -lxml2 -lbz2 -lacl -llz4 -o a.out
/tmp/ccZthWQk.o:(.data+0x30): undefined reference to `.LC0'
/tmp/ccZthWQk.o:(.data+0x38): undefined reference to `.LC0'
collect2: error: ld returned 1 exit status

Then I checked the bug. In a.s, the error happens here:

.type   memset_v.3282_30b8d8,@object
.globl memset_v.3282_30b8d8
memset_v.3282_30b8d8: # 30b8d8 -- 30b8e0
.LC30b8d8:
        .quad .LC0
.type   memset_v.2768_30b8e0,@object
.globl memset_v.2768_30b8e0
memset_v.2768_30b8e0: # 30b8e0 -- 30b8e8
.LC30b8e0:
        .quad .LC0
.section .bss
.align 32
.type   stdout_30b900,@object
.globl stdout_30b900
stdout_30b900: # 30b900 -- 30b908

The label LC0 is an invalid label. The r2 output of these code is:

            ;-- memset_v.3282:
            ; DATA XREF from sym.secure_zero_memory (0xb4e91)
            0x0030b8d8      .qword 0x0000000000000000                  ; RELOC 64 memset
            ;-- reloc.memset:
            ;-- memset_v.2768:
            ; DATA XREF from sym.secure_zero_memory_1 (0xd3afd)
            0x0030b8e0      .qword 0x0000000000000000                  ; RELOC 64 memset
            ;-- _edata:
            ;-- __bss_start:

When I manual replace the lable LC0 as a numerical value 0, It passes compilation.

It looks like retrowrite fails to symbolize the numerical value 0, but I guess it may be some implementation bugs.

All the files are attached here.

Symbolizing memory access fails to identify symbol

As part of an ongoing evaluation of Retrowrite by a third party, we identified a case that fails to symbolize correctly. The following steps reproduce it:

wget https://www.busybox.net/downloads/busybox-1.35.0.tar.bz2
tar xf busybox-1.35.0.tar.bz2
cd busybox-1.35.0
make defconfig
make menuconfig # in here, change to a PIE binary
make

This results in the following exception:

Traceback (most recent call last):
  File "/retrowrite/retro/bin/retrowrite_x64", line 168, in <module>
    rw.symbolize()
  File "/retrowrite/librw_x64/rw.py", line 76, in symbolize
    symb.symbolize_text_section(self.container, None)
  File "/retrowrite/librw_x64/rw.py", line 523, in symbolize_text_section
    self.symbolize_mem_accesses(container, context)
  File "/hexhive/retrowrite/librw_x64/rw.py", line 730, in symbolize_mem_accesses
    target, adjust = self._adjust_target(
  File "/hexhive/retrowrite/librw_x64/rw.py", line 645, in _adjust_target
    assert sec is not None
AssertionError

Adding the following diagnostic code:

diff --git a/librw_x64/rw.py b/librw_x64/rw.py
index 7c36b2f..9e3b9a1 100644
--- a/librw_x64/rw.py
+++ b/librw_x64/rw.py
@@ -680,6 +680,8 @@ class Symbolizer():
                     ripbase = inst.address + inst.sz
                     target = ripbase + value
 
+                    print("RIP REL Information Value=0x%x,RIPBASE=0x%x,TARGET=0x%x" % (value, ripbase, target))
+
                     is_an_import = False
 
                     for relocation in container.relocations[".dyn"]:
@@ -715,10 +717,16 @@ class Symbolizer():
                         # Check if target is contained within a known region
                         in_region = self._is_target_in_region(
                             container, target)
+
                         if in_region:
                             inst.op_str = inst.op_str.replace(
                                 hex(value), ".LC%x" % (target))
                         else:
+                            for sec, sval in container.sections.items():
+                                print("%s 0x%x - 0x%x" % (sec, sval.base, sval.sz))
+                            for fn, fval in container.functions.items():
+                                print("%s 0x%x - 0x%x" % (fval.name, fval.start, fval.sz))
+                            print("[*] Adjusting memory access, context: %s %s 0x%x" % (inst, context, target))
                             target, adjust = self._adjust_target(
                                 container, target)
                             inst.op_str = inst.op_str.replace(

To code being refactored in a separate repo suggests that we are unable to correctly identify a rip-relative lea to a text section function. According to the diagnostics, neither the text section nor the function itself are correctly identified.

The root cause of this bug needs to be tracked down and fixed, but is unrelated to previous init_array issues.

The following issues are likely related: #29, #3.

Does the retrowrite can run in ARM now? [BUG]

test errors

Test in ARM arch,use commend "retrowrite stack stack.asan.s",get the following error:

Traceback (most recent call last):
File "/root/retrowrite/retro/bin/retrowrite", line 173, in
from librw_arm64.analysis.register import RegisterAnalysis
File "/root/retrowrite/librw_arm64/analysis/register.py", line 9, in
from archinfo import ArchAArch64, Register
File "/root/retrowrite/retro/lib/python3.6/site-packages/archinfo/init.py", line 26, in
from .arch_amd64 import ArchAMD64
File "/root/retrowrite/retro/lib/python3.6/site-packages/archinfo/arch_amd64.py", line 378, in
register_arch([r'.*amd64|.*x64|.*x86_64|.*metapc'], 64, Endness.LE, ArchAMD64)
File "/root/retrowrite/retro/lib/python3.6/site-packages/archinfo/arch.py", line 800, in register_arch
all_arches.append(my_arch(endness))
File "/root/retrowrite/retro/lib/python3.6/site-packages/archinfo/arch_amd64.py", line 68, in init
self.reg_blacklist.append(register.name)
AttributeError: 'NoneType' object has no attribute 'append'

Test demo in directory of mytest, get the same error, and the readme.md file seems not updata? Does this version can run in the arm arch now? the current test on arm does not seem to have been successful.

[BUG] RetroWrite does not symbolize RIP-relative addressing and omits the definition of labels

Describe the bug

  1. RetroWrite fails on symbolizing RIP-relative addressing.
    I observed that RetroWrite fails on recovering RIP-relative addressing. As an example, given instruction  ‘leaq fix_syms(%rip), %rsi’ found in addr2line of binutils, RetroWrite reassembled the instruction as ‘leaq 5(%rip), %rsi’.
  • Compiler-generated assembly
_bfd_fix_excluded_sec_syms:             
    .cfi_startproc
    movq    %rdi, %rdx
    movq    40(%rsi), %rdi
    leaq    fix_syms(%rip), %rsi
    jmp bfd_link_hash_traverse          # TAILCALL
    .cfi_endproc

fix_syms:                               
    .cfi_startproc
    pushq   %r14
  • Binary
00000000000a663f <_bfd_fix_excluded_sec_syms>:
   a663f:    mov    %rdi,%rdx
   a6642:    mov    0x28(%rsi),%rdi
   a6646:    lea    0x5(%rip),%rsi        # a6652 <fix_syms>
   a664d:    jmpq   a3ec0 <bfd_link_hash_traverse>

00000000000a6652 <fix_syms>:
   a6652:    push   %r14
  • Reassembler-generated assembly 
.LCa6646:
    leaq 5(%rip), %rsi
  1. RetroWrite omits the definition of some labels.
    Also, I found that RetroWrite sometimes omits some definitions of labels. For example, given the data pointer 0x170c80, RetroWrite symbolized the pointer as '.LC170c80', but RetroWrite misses the definition of the label '.LC170c80'. As a result, it causes a compilation error. 

Describe how to reproduce the bug

  1. Platform: x86-64.
  2. Compiler: Clang v12.0 and GCC v7.5.0
  3. Binary: addr2line in binutils-2.31.1

AssertionError: Can't find displacement in lexp

Tried to run kernel fuzzing campaign using kretrowrite, but I cannot proceed with the following error.

(retro) ➜  retrowrite git:(master) ✗ ./fuzzing/kernel/fuzz-module.sh ext4 
scripts/kconfig/conf  --syncconfig Kconfig

...

Added function num_clusters_in_group

...

[*] ext4_destroy_inline_data_nolock needs redzone stack
[*] trace_event_raw_event_ext4_discard_preallocations needs redzone stack
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/user/retrowrite/rwtools/kasan/asantool.py", line 136, in <module>
    instrumenter.do_instrument()
  File "/home/user/retrowrite/rwtools/kasan/instrument.py", line 685, in do_instrument
    self.instrument_mem_accesses()
  File "/home/user/retrowrite/rwtools/kasan/instrument.py", line 382, in instrument_mem_accesses
    acsz, instruction, midx, free_registers, is_leaf)
  File "/home/user/retrowrite/rwtools/kasan/instrument.py", line 263, in get_mem_instrumentation
    assert False, 'Can\'t find displacement in lexp'
AssertionError: Can't find displacement in lexp

I tried to with not only ext4 but also several default kernel modules, but I cannot proceed with the same error.
I would appreciate it if you could tell me how to fix it.

Extensive test harness

Task Overview

In order to give confidence in Retrowrite as a product, we should develop an extensive testing suite using real-world binary programs, as well as test cases for all optimizations we apply and languages we support. Ideally, we would move to a continuous integration setup, but the first step is to create the test cases to evaluate all architectures supported by Retrowrite.

Share code between architectures as much as possible

Description of the problem

Currently, retrowrite duplicates code between its x64 and arm64 implementations.

Proposed solution

The proposal in this is to share as much as possible code between the two implementations. Specifically, handling of ELF files and DWARF structures should be relatively portable between implementations. Only the specific retrowriting techniques should be architecture-dependent.

[BUG] -- asan triggers traceback of capstone in ubuntu-arm64

Describe the bug
When I tested asan with "user_space" in the demos directory on an AArch64 ubuntu system, I encountered the following traceback:

$ gcc -O0 -ggdb -Wall -Wpedantic -Wextra -fPIC -fPIE -pie ./stack.c -o stack
...
$ ./retrowrite --asan ./stack ./stack.asan.s
[INFO] Found dependency libc.so.6
[INFO] Found dependency ld-linux-aarch64.so.1
[*] Relocations for a section that's not loaded: .rela.dyn
[*] Relocations for a section that's not loaded: .rela.plt
0x730 _init
0x730 0x744
0x800 _start
0x800 0x838
0x850 deregister_tm_clones
0x850 0x880
0x880 register_tm_clones
0x880 0x8c0
0x8c0 __do_global_dtors_aux
0x8c0 0x908
0x908 frame_dummy
0x908 0x90c
0xbbc _fini
0xbbc 0xbcc
[INFO] Disassembling...
[INFO] Symbolizing...
[INFO] Recovering .eh_frame information
{'name': 'exit', 'st_value': 0, 'offset': 73584, 'addend': 0, 'type': 1026}
[*] Unhandled relocation R_AARCH64_JUMP_SLOT
{'name': '__cxa_finalize', 'st_value': 0, 'offset': 73592, 'addend': 0, 'type': 1026}
[*] Unhandled relocation R_AARCH64_JUMP_SLOT
{'name': 'atoi', 'st_value': 0, 'offset': 73600, 'addend': 0, 'type': 1026}
[*] Unhandled relocation R_AARCH64_JUMP_SLOT
{'name': '__libc_start_main', 'st_value': 0, 'offset': 73608, 'addend': 0, 'type': 1026}
[*] Unhandled relocation R_AARCH64_JUMP_SLOT
{'name': '__stack_chk_fail', 'st_value': 0, 'offset': 73616, 'addend': 0, 'type': 1026}
[*] Unhandled relocation R_AARCH64_JUMP_SLOT
{'name': '__gmon_start__', 'st_value': 0, 'offset': 73624, 'addend': 0, 'type': 1026}
[*] Unhandled relocation R_AARCH64_JUMP_SLOT
{'name': 'abort', 'st_value': 0, 'offset': 73632, 'addend': 0, 'type': 1026}
[*] Unhandled relocation R_AARCH64_JUMP_SLOT
{'name': 'puts', 'st_value': 0, 'offset': 73640, 'addend': 0, 'type': 1026}
[*] Unhandled relocation R_AARCH64_JUMP_SLOT
{'name': 'printf', 'st_value': 0, 'offset': 73648, 'addend': 0, 'type': 1026}
[*] Unhandled relocation R_AARCH64_JUMP_SLOT
Traceback (most recent call last):
  File "./retrowrite", line 293, in <module>
    asan(rw, loader, args)
  File "./retrowrite", line 52, in asan
    analyze_registers(loader, args)
  File "./retrowrite", line 41, in analyze_registers
    StackFrameAnalysis.analyze(loader.container)
  File "/root/retrowrite/librw_arm64/analysis/stackframe.py", line 20, in analyze
    analyzer.analyze_container(container)
  File "/root/retrowrite/librw_arm64/analysis/stackframe.py", line 25, in analyze_container
    self.analyze_function(fn, container)
  File "/root/retrowrite/librw_arm64/analysis/stackframe.py", line 28, in analyze_function
    self.analyze_is_fn_leaf(function, container)
  File "/root/retrowrite/librw_arm64/analysis/stackframe.py", line 34, in analyze_is_fn_leaf
    target = instruction.cs.operands[-1].imm
  File "/usr/local/lib/python3.8/dist-packages/capstone/__init__.py", line 667, in __getattr__
    raise CsError(CS_ERR_DETAIL)
capstone.CsError: Details are unavailable (CS_ERR_DETAIL)

My environment

$ uname -m
aarch64
$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.5 LTS
Release:	20.04
Codename:	focal
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ 

C++ Support: Can't rebuild output of retrowrite on g++ Hello World

Background

  • x86_64
  • Linux 5.11.11
  • Arch Linux
  • GCC 10.2.0
  • Retrowrite at commit 9e2e633

Input file

hello.cpp

#include <iostream>

int main() {
    std::cout << "Hello, world!" << std::endl;
}

Repro

  1. g++ -pie -o hello hello.cpp

  2. ./retro/bin/retrowrite hello hello.s

    noting output:

    [*] Relocations for a section that's not loaded: .rela.dyn
    [*] Relocations for a section that's not loaded: .rela.plt
    [x] Could not replace value in .init_array
    [x] Couldn't find valid section 3db0
    [x] Couldn't find valid section 3fc8
    [x] Couldn't find valid section 3fd0
    [x] Couldn't find valid section 3fd8
    [x] Couldn't find valid section 3fe0
    [x] Couldn't find valid section 3fe8
    [x] Couldn't find valid section 3ff0
    [x] Couldn't find valid section 3ff8
    
  3. g++ hello.s

    fails with output:

    hello.s: Assembler messages:
    hello.s:76: Error: unrecognized symbol type "GLIBCXX_3.4_4080"
    hello.s:76: Error: junk at end of line, first unrecognized character is `,'
    hello.s:77: Error: junk at end of line, first unrecognized character is `@'
    hello.s:78: Error: invalid character '@' in mnemonic
    

It looks like it has generated some syntax that as doesn't like. Cleaning that up (not sure of the consequences but I just removed it), it then fails on an undefined label:

$ g++ hello.s
/usr/bin/ld: /tmp/ccaM02YR.o:(.init_array+0x0): undefined reference to `.LC1160'
collect2: error: ld returned 1 exit status

.LC1160 seems to be necessary for something (otherwise the program segfaults at startup), but I can't tell what it needs to point to.

naming issue of global symbols

Please consider changing the code at https://github.com/HexHive/retrowrite/blob/master/librw/loader.py#L196

The statement here will change the name of global symbols to "original_name_address". For instance, this will change "symbol1" (whose address is 0x90) to "symbol1_90" and keep the new name all the way to the assembly file. However, the other references to the symbol in the code or data will still use the original name. Re-assembling will complain "undefined symbol1_90".

Kernel module rewriting

I'm watching the #36c3 talk and it mentions that the kernel rewriting tool would be released in the beginning of 2020

[BUG] Cannot disassemble hello world aarch64 binary

Example program:

#include <stdio.h>
int main() {
    printf("hello world\n");
    return 0;
}

Compiled with the following command (GCC 11.3.0)

$ aarch64-linux-gnu-gcc test.c -o a.out -static

Running retrowrite on the binary to disassemble produces an error:

$ retrowrite a.out out.s
Rewriting a.out into out.s
[*] Relocations for a section that's not loaded: .rela.plt
[INFO] Disassembling...
[INFO] Symbolizing...
[INFO] Recovering .eh_frame information
[CRITICAL] [x] Unhandled DWARF instruction: 7
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 8
[CRITICAL] [x] Unhandled DWARF instruction: 11
[CRITICAL] [x] Unhandled DWARF instruction: 13
[CRITICAL] {'name': None, 'st_value': None, 'offset': 4788224, 'addend': 4284560, 'type': 1032}
[CRITICAL] [*] Unhandled relocation R_AARCH64_TLS_DTPMOD32
[CRITICAL] {'name': None, 'st_value': None, 'offset': 4788232, 'addend': 4285600, 'type': 1032}
[CRITICAL] [*] Unhandled relocation R_AARCH64_TLS_DTPMOD32
[CRITICAL] {'name': None, 'st_value': None, 'offset': 4788240, 'addend': 4427520, 'type': 1032}
[CRITICAL] [*] Unhandled relocation R_AARCH64_TLS_DTPMOD32
[CRITICAL] {'name': None, 'st_value': None, 'offset': 4788248, 'addend': 4284880, 'type': 1032}
[CRITICAL] [*] Unhandled relocation R_AARCH64_TLS_DTPMOD32
[CRITICAL] {'name': None, 'st_value': None, 'offset': 4788256, 'addend': 4281616, 'type': 1032}
[CRITICAL] [*] Unhandled relocation R_AARCH64_TLS_DTPMOD32
[CRITICAL] {'name': None, 'st_value': None, 'offset': 4788264, 'addend': 4427520, 'type': 1032}
[CRITICAL] [*] Unhandled relocation R_AARCH64_TLS_DTPMOD32
[CRITICAL] {'name': None, 'st_value': None, 'offset': 4788272, 'addend': 4281616, 'type': 1032}
[CRITICAL] [*] Unhandled relocation R_AARCH64_TLS_DTPMOD32
[CRITICAL] 40393c: bl #0x4002b0 - target outside code section!

Please let me know if I am doing something incorrect. Thanks!

[BUG] How to setup at arm64 machine?

I don't know how to setup retrowrite at arm64 machine, there have only command pip install -r requirements.txt at General setup in README.md file, it have no further suggest.
I try to run python setup.py, get the following hint, what should i do next.

usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

I hope someone can give me answer, thank you very much.

It seems missing a comma at https://github.com/HexHive/retrowrite/blob/master/setup.py#L8.

Illegal Instruction when running rewritten, re-compiled ELF

Hello, thanks for retrowrite - I enjoyed your 36c3 presentation.

I installed retrowrite in a fresh Ubuntu docker container:

root@aba6d7a9e538:/# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:        20.04
Codename:       focal
root@aba6d7a9e538:/# python3 --version
Python 3.8.5

... and followed the indicated setup steps by running the startup script setup.sh and created a virtual environment. Presumably this installed the correct version of capstone, and there doesn't appear to be a different capstone version installed through APT.

I have an x86_64 ELF that was a CTF challenge ( https://github.com/CUCyber/cuctf-2020-challenges/blob/main/reverse-engineering/virtual/virtual.c ) that I'm attempting to rewrite, but when I execute the re-written and re-compiled binary, I get an Illegal Instruction signal.

I compiled virtual.c with
gcc virtual.c -O0 -ggdb -Wall -Wpedantic -Wextra -fPIC -fPIE -pie -o virtual_pie
(ignore the makefile that's in the linked github repo), confirmed that this executable runs without error, and re-wrote with:
/retrowrite/retrowrite virtual_pie virtual_pie.s

Next, I re-compiled with:
gcc virtual_pie.s -o virtual_pie_retro

Now when I try to run the executable, I get:

(retro) root@aba6d7a9e538:/home# ./virtual_pie_retro
Illegal instruction

Any ideas? Thanks again

can't build my binary with gcc

Do you have any clue why is this happening?
...
[*] Instrumented: 3048 locations
gcc binary-basan.s -lasan -o binary-basan
/usr/bin/ld: /tmp/ccyZLb4w.o: in function asan.module_ctor': (.text+0x4b7f2): undefined reference to __asan_init_v4'
collect2: error: ld returned 1 exit status

AssertionError

I faced assertion error when reassemble binary as follows:

python3 -m retrowrite.librw.rw  addr2line addr2line.s
.init_array frame_dummy pointer removed.
[*] Relocations for a section that's not loaded: .rela.dyn
[*] Relocations for a section that's not loaded: .rela.plt
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "retrowrite/librw/rw.py", line 439, in <module>
    rw.symbolize()
  File "retrowrite/librw/rw.py", line 58, in symbolize
    symb.symbolize_text_section(self.container, None)
  File "retrowrite/librw/rw.py", line 145, in symbolize_text_section
    self.symbolize_mem_accesses(container, context)
  File "retrowrite/librw/rw.py", line 332, in symbolize_mem_accesses
    container, target)
  File "retrowrite/librw/rw.py", line 266, in _adjust_target
    assert sec is not None
AssertionError

I debugged retrowrite and found a strange reason.

I think RetroWrites handle following instruction as memory access operation.

  File "retrowrite/librw/rw.py", line 332, in symbolize_mem_accesses
    container, target)
(Pdb) hex(inst.address)
'0x36bb3'
objdump -M intel -d 36bb3
36bb3:	48 8d 35 c3 57 00 00 	lea    rsi,[rip+0x57c3]        # 3c37d <bfd_section_hash_newfunc>

[BUG] RetroWrite omits data sections

Description:
RetroWrite did not create data sections, like .data.rel.ro.local, .fini.array
As a result, RetroWrite not only recover certain relocation information,
but also emit incorrect assembly code.

My test program has relocation information in .data.rel.ro.local

$ readelf -r  hello  | grep .data.rel.ro.local -A 20
Relocation section '.rela.data.rel.ro.local' at offset 0x9a3d8 contains 165 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000020360  000100000001 R_X86_64_64       0000000000003c90 .text + 26a0
000000020368  000100000001 R_X86_64_64       0000000000003c90 .text + 2d60
000000020370  000100000001 R_X86_64_64       0000000000003c90 .text + 26b0
000000020378  000100000001 R_X86_64_64       0000000000003c90 .text + 2e20
...

Its binary code refers .data.rel.ro.local section.

  • Disassembly code
    61ec:	48 8d 05 6d a1 01 00 	lea    0x1a16d(%rip),%rax        # 20360 <sort_functions>

However, RetroWrite emited weird assembly code.

  • Reassembled code
.LC61ec:
	leaq 41984+.LC15f60(%rip), %rax

I examined the reassembly file and found that RetroWrite did not create .data.rel.ro.local section.

I hope to fix the errors.

Thanks.

[BUG] Segmentation fault in reassembled SPEC2006 binaries with asan tool

Describe the bug
When SPEC2006 benchmarks are rewritten using Retrowrite ASAN tool, and recompiled and run, I am getting "Segmentation Fault(core dumped)" .

Describe how to reproduce the bug
Step 1) for compiling PIE to be used as input to retrowrite ASAN tool
platform: x86-64
compiler: gcc 5.5
compiler flags: -O2 -PIE
benchmarks: SPEC2006 benchmarks: gcc, bzip2, ....

Step 2)
platform: x86-64

        python3 -m rwtools.asan.asantool ./bzip2_base.gcc ./bzip2_base.gcc-asan
        sed -i 's/asan_init_v4/asan_init/g' bzip2_base.gcc-asan.s

Step 3) For compiling -asan.s file generated by retrowrite
platform: x86-64
compiler: gcc 5.5
compiler flags: -O2
benchmarks: SPEC2006 benchmarks: gcc, bzip2, ....
Compile SPEC :
gcc -g -o bzip2_base.gcc-asan -O2 bzip2_base.gcc-asan.s -pie -lasan -lm

Missing some instrumentations when instrumenting binary with AFL

RetroWrite generates labels with the format as .L%x in

instruction.op_str = ".L%x" % (target)

and
results.append(".L%x:" % (instruction.address))

However, for the afl-gcc compilers of AFL-family fuzzers such as AFL++, they only instrument the label start with .L%d

https://github.com/AFLplusplus/AFLplusplus/blob/32a0d6ac31554a47dca591f8978982758fb87677/src/afl-as.c#L464-L466

        if ((isdigit(line[2]) ||
             (clang_mode && !strncmp(line + 1, "LBB", 3))) &&
            R(100) < (long)inst_ratio) {

Taking the nm in binutils as an example, the .L9ffea basic block is instrumented while .La0047 and .La0058 are not instrumented.
After the fix, the number of instrumentation increase from 39511 to 47795.

.L9ffea:
.LC9ffea:

/* --- AFL TRAMPOLINE (64-BIT) --- */

.align 4

leaq -(128+24)(%rsp), %rsp
movq %rdx,  0(%rsp)
movq %rcx,  8(%rsp)
movq %rax, 16(%rsp)
movq $0x00006e12, %rcx
call __afl_maybe_log
movq 16(%rsp), %rax
movq  8(%rsp), %rcx
movq  0(%rsp), %rdx
leaq (128+24)(%rsp), %rsp

/* --- END --- */

	movq -0x48(%rbp), %rax
.LC9ffee:
	movq -0x58(%rbp), %rcx
.LC9fff2:
	movq %rax, 8(%rcx)
.LC9fff6:
	movq -0x58(%rbp), %rax
.LC9fffa:
	movl $0xffffffff, 0x60(%rax)
.LCa0001:
	movq -0x58(%rbp), %rax
.LCa0005:
	movl $1, 0x64(%rax)
.LCa000c:
	movl $1, -0x64(%rbp)
.LCa0013:
	movq -0x20(%rbp), %rax
.LCa0017:
	movq (%rax), %rax
.LCa001a:
	cmpq $0, 0x100(%rax)
.LCa0022:
	je .La0047

/* --- AFL TRAMPOLINE (64-BIT) --- */

.align 4

leaq -(128+24)(%rsp), %rsp
movq %rdx,  0(%rsp)
movq %rcx,  8(%rsp)
movq %rax, 16(%rsp)
movq $0x0000740b, %rcx
call __afl_maybe_log
movq 16(%rsp), %rax
movq  8(%rsp), %rcx
movq  0(%rsp), %rdx
leaq (128+24)(%rsp), %rsp

/* --- END --- */

.LCa0028:
	movq -0x20(%rbp), %rax
.LCa002c:
	movq (%rax), %rax
.LCa002f:
	movq 0x100(%rax), %rax
.LCa0036:
	cmpl $0, 0x10(%rax)
.LCa003a:
	jne .La0047

/* --- AFL TRAMPOLINE (64-BIT) --- */

.align 4

leaq -(128+24)(%rsp), %rsp
movq %rdx,  0(%rsp)
movq %rcx,  8(%rsp)
movq %rax, 16(%rsp)
movq $0x000050f3, %rcx
call __afl_maybe_log
movq 16(%rsp), %rax
movq  8(%rsp), %rcx
movq  0(%rsp), %rdx
leaq (128+24)(%rsp), %rsp

/* --- END --- */

.LCa0040:
	movl $0, -0x64(%rbp)
.La0047:
.LCa0047:
	movq -0x20(%rbp), %rax
.LCa004b:
	movq (%rax), %rax
.LCa004e:
	addq $0x100, %rax
.LCa0054:
	movq %rax, -0x60(%rbp)
.La0058:
.LCa0058:
	movq -0x60(%rbp), %rax
.LCa005c:
	cmpq $0, (%rax)
.LCa0060:
	je .La007f

I think RetroWrite could output the label with format .L%d (see #27), or modify the code in afl-as.c to

        if (((isdigit(line[2]) || (line[2] >= 'a' && line[2] <= 'f')) ||
            (clang_mode && !strncmp(line + 1, "LBB", 3))) &&
            R(100) < (long)inst_ratio) {

The assembly code files are attached here.

Errors on LD

I'm trying this command

$ gcc ls-basan-instrumented.s -lasan -o ls-basan-instrumented

and I get these errors:

/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In function `_start':(.text+0x20): undefined reference to `main'
/tmp/ccEO4Rw4.o: In function `_obstack_newchunk':
(.text+0x276): undefined reference to `.L153e0'
(.text+0x4a3): undefined reference to `.L15400'
/tmp/ccEO4Rw4.o: In function `_obstack_free':
(.text+0x5fb): undefined reference to `.L15400'
/tmp/ccEO4Rw4.o: In function `_obstack_begin':
(.text+0x94): undefined reference to `.L15420'
/tmp/ccEO4Rw4.o: In function `_obstack_begin_1':
(.text+0x159): undefined reference to `.L15420'
/tmp/ccEO4Rw4.o:(.data+0x1f0): undefined reference to `.LCc2b0'
/tmp/ccEO4Rw4.o:(.data+0x260): undefined reference to `.LC153a0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x0): undefined reference to `.LC6530'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x8): undefined reference to `.LC6c70'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x10): undefined reference to `.LC6540'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x18): undefined reference to `.LC6d30'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x20): undefined reference to `.LC6070'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x28): undefined reference to `.LC6cd0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x30): undefined reference to `.LC6080'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x38): undefined reference to `.LC6d90'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x40): undefined reference to `.LCb5f0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x48): undefined reference to `.LCb9a0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x50): undefined reference to `.LCb580'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x58): undefined reference to `.LCb8f0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x60): undefined reference to `.LCb870'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x68): undefined reference to `.LCba50'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x70): undefined reference to `.LCb7f0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x78): undefined reference to `.LC7000'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x80): undefined reference to `.LC64c0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x88): undefined reference to `.LC6a90'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x90): undefined reference to `.LC64f0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x98): undefined reference to `.LC6b80'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0xa0): undefined reference to `.LC5ff0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0xa8): undefined reference to `.LC6b00'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0xb0): undefined reference to `.LC6030'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0xb8): undefined reference to `.LC6bf0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0xc0): undefined reference to `.LC6560'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0xc8): undefined reference to `.LC6fa0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0xd0): undefined reference to `.LC6550'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0xd8): undefined reference to `.LC6f40'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x100): undefined reference to `.LCb080'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x108): undefined reference to `.LCb370'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x110): undefined reference to `.LCafb0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x118): undefined reference to `.LCb480'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x120): undefined reference to `.LCb100'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x128): undefined reference to `.LC6a10'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x130): undefined reference to `.LCb180'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x138): undefined reference to `.LCb770'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x140): undefined reference to `.LCaf20'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x148): undefined reference to `.LCb3f0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x150): undefined reference to `.LCb030'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x158): undefined reference to `.LCb260'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x160): undefined reference to `.LCb210'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x168): undefined reference to `.LCb660'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x170): undefined reference to `.LCb1c0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x178): undefined reference to `.LC9880'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x180): undefined reference to `.LCaff0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x188): undefined reference to `.LCb500'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x190): undefined reference to `.LCaf70'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x198): undefined reference to `.LCb2f0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x1a0): undefined reference to `.LCb140'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x1a8): undefined reference to `.LCb6f0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x1b0): undefined reference to `.LCb0c0'
/tmp/ccEO4Rw4.o:(.data.rel.ro+0x1b8): undefined reference to `.LC9800'
collect2: error: ld returned 1 exit status

my GCC version is:

gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)

Exception: 'struct.error: unpack requires a buffer of 4 bytes' when using hardcoded strings shorter than 4 bytes in external function calls

test.c:

#include <stdlib.h>

int main(int argc, char *argv[])
{
        return system("ls");
}

Compile and run asantool on the binary:

$ gcc test.c -o test
$ python3 -m rwtools.asan.asantool test test_instr
[*] Relocations for a section that's not loaded: .rela.dyn
[*] Relocations for a section that's not loaded: .rela.plt
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/konrad/dev/retrowrite/rwtools/asan/asantool.py", line 83, in <module>
    rewriter = do_symbolization(args.binary, args.outfile)
  File "/home/konrad/dev/retrowrite/rwtools/asan/asantool.py", line 30, in do_symbolization
    rw.symbolize()
  File "/home/konrad/dev/retrowrite/librw/rw.py", line 57, in symbolize
    symb.symbolize_text_section(self.container, None)
  File "/home/konrad/dev/retrowrite/librw/rw.py", line 145, in symbolize_text_section
    self.symbolize_switch_tables(container, context)
  File "/home/konrad/dev/retrowrite/librw/rw.py", line 228, in symbolize_switch_tables
    value = rodata.read_at(swbase, 4)
  File "/home/konrad/dev/retrowrite/librw/container.py", line 316, in read_at
    value = struct.unpack(
struct.error: unpack requires a buffer of 4 bytes
316  ->	        value = struct.unpack(
317  	            "<I",
318  	            bytes([x.value for x in self.cache[cacheoff:cacheoff + sz]]))[0]

(Pdb) sz
4
(Pdb) len(self.cache[cacheoff:cacheoff + sz])
3

Edit: Just realized that I didn't compile the binary with the -fPIE flag. The same error still shows up if this flag is passed to gcc, or if it's built with -shared.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.