Giter VIP home page Giter VIP logo

macaw-loader's Introduction

This library provides a uniform interface to load a binary (e.g. in ELF format) and get macaw Memory and a list of entry points.

This also helps extracting auxiliary information (e.g. the PPC TOC/Table-Of-Contents). It tries to encapsulate a lot of the extra logic around entry point identification (like wanting to use the TOC on PowerPC).

The ‘binaryRepr’ used here is not just a width in order to allow future support of a Mach-O and/or PE repr as well as the ELF format.

Hierarchy

The modules in this repository should require macaw-base, and possibly semmc, but they should not require any higher-level macaw operations. These modules should be useable independently of or by the higher-level macaw modules.

macaw-loader's People

Contributors

bboston7 avatar kquick avatar ryanglscott avatar travitch avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

macaw-loader's Issues

Use HTTPS-based (not SSH-based) submodules

Currently, all of macaw-loader's submodules use git@ prefixes, which mean that SSH authentication is required to clone the submodules. There's no particularly good reason for doing this, as all of macaw-loader's submodules are public. We should use HTTPS-based authentication instead to avoid this problem.

`entryPoints` do not include dynamic symbols

In this program:

static int ONE = 1;

int getzero(void) {
  return 0;
}

int getone(void) {
  return ONE + getzero();
}

If you compile it to a shared library and strip it:

$ gcc -nostdlib -shared getone.c -o libgetone-stripped.so
$ strip libgetone-stripped.so

The assembly will look like this:

$ objdump -d libgetone-stripped.so 

libgetone-stripped.so:     file format elf64-x86-64


Disassembly of section .plt:

0000000000001000 <getzero@plt-0x10>:
    1000:       ff 35 02 30 00 00       pushq  0x3002(%rip)        # 4008 <getone+0x2fdd>
    1006:       ff 25 04 30 00 00       jmpq   *0x3004(%rip)        # 4010 <getone+0x2fe5>
    100c:       0f 1f 40 00             nopl   0x0(%rax)

0000000000001010 <getzero@plt>:
    1010:       ff 25 02 30 00 00       jmpq   *0x3002(%rip)        # 4018 <getzero+0x2ff8>
    1016:       68 00 00 00 00          pushq  $0x0
    101b:       e9 e0 ff ff ff          jmpq   1000 <getzero@plt-0x10>

Disassembly of section .text:

0000000000001020 <getzero>:
    1020:       55                      push   %rbp
    1021:       48 89 e5                mov    %rsp,%rbp
    1024:       b8 00 00 00 00          mov    $0x0,%eax
    1029:       5d                      pop    %rbp
    102a:       c3                      retq   

000000000000102b <getone>:
    102b:       55                      push   %rbp
    102c:       48 89 e5                mov    %rsp,%rbp
    102f:       e8 dc ff ff ff          callq  1010 <getzero@plt>
    1034:       8b 15 e6 2f 00 00       mov    0x2fe6(%rip),%edx        # 4020 <getone+0x2ff5>
    103a:       01 d0                   add    %edx,%eax
    103c:       5d                      pop    %rbp
    103d:       c3                      retq

Note that there are two function entry points here, one for getzero (at address 0x1020) and another for getone (at address 0x102b). macaw-loader, on the other hand, only discovers the entry point for getzero. This is due to a limitation in how entryPoints is defined:

x86EntryPoints :: (X.MonadThrow m)
=> BL.LoadedBinary MX.X86_64 (E.ElfHeaderInfo 64)
-> m (NEL.NonEmpty (MM.MemSegmentOff 64))
x86EntryPoints loadedBinary = do
case BLE.resolveAbsoluteAddress mem addrWord of
-- n.b. no guarantee of uniqueness, and in particular, entryPoint is probably in symbols somewhere
Just entryPoint -> return (entryPoint NEL.:| mapMaybe (BLE.resolveAbsoluteAddress mem) symbolWords)
Nothing -> X.throwM (InvalidEntryPoint addrWord)
where
offset = fromMaybe 0 (LC.loadOffset (BL.loadOptions loadedBinary))
mem = BL.memoryImage loadedBinary
addrWord = MM.memWord (offset + (fromIntegral (E.headerEntry (E.header (elf (BL.binaryFormatData loadedBinary))))))
elfData = elf (BL.binaryFormatData loadedBinary)
symbolWords = [ MM.memWord (fromIntegral (offset + (E.steValue entry)))
| Just (Right st) <- [E.decodeHeaderSymtab elfData]
, entry <- F.toList (E.symtabEntries st)
, E.steType entry == E.STT_FUNC
]

This implementation uses decodeHeaderSymtab, which only consults the static symbol table. This happens to contain the address for getzero because it is the main entry point address for the shared library:

$ readelf -h libgetone-stripped.so | grep "Entry point address:"
  Entry point address:               0x1020

However, libgetone-stripped.so also contains dynamic symbols:

$ readelf --dyn-syms libgetone-stripped.so 

Symbol table '.dynsym' contains 3 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000001020    11 FUNC    GLOBAL DEFAULT    7 getzero
     2: 000000000000102b    19 FUNC    GLOBAL DEFAULT    7 getone

If the entryPoints function consulted the dynamic symbols, similarly to how it is done in macaw, it would be able to find the address for getone.

This example uses x86, but it applies to AArch32 and PPC32 as well, which use identical implementations for entryPoints.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.