Light

galoisinc / macaw-loader Goto Github PK

Uniform interface to load a binary executable and get Macaw Memory and a list of entry points.

Haskell 100.00%

macaw-loader's Introduction

This library provides a uniform interface to load a binary (e.g. in ELF format) and get macaw Memory and a list of entry points.

This also helps extracting auxiliary information (e.g. the PPC TOC/Table-Of-Contents). It tries to encapsulate a lot of the extra logic around entry point identification (like wanting to use the TOC on PowerPC).

The ‘binaryRepr’ used here is not just a width in order to allow future support of a Mach-O and/or PE repr as well as the ELF format.

Hierarchy

The modules in this repository should require macaw-base, and possibly semmc, but they should not require any higher-level macaw operations. These modules should be useable independently of or by the higher-level macaw modules.

macaw-loader's People

Contributors

Stargazers

Watchers

Forkers

benjaminselfridge

macaw-loader's Issues

Move `resolveAbsoluteAddress` to a different module

The resolveAbsoluteAddress function currently lives in Data.Macaw.BinaryLoader.ELF, but despite its name, there is nothing ELF-specific about it whatsoever. In light of this, we should move resolveAbsoluteAddress to a less misleading location.

Implement `macaw-riscv-loader`

macaw now has a RISCV backend, but there is currently no macaw-loader support for it. We should do so.

Use HTTPS-based (not SSH-based) submodules

Currently, all of macaw-loader's submodules use git@ prefixes, which mean that SSH authentication is required to clone the submodules. There's no particularly good reason for doing this, as all of macaw-loader's submodules are public. We should use HTTPS-based authentication instead to avoid this problem.

`entryPoints` do not include dynamic symbols

In this program:

static int ONE = 1;

int getzero(void) {
  return 0;
}

int getone(void) {
  return ONE + getzero();
}

If you compile it to a shared library and strip it:

$ gcc -nostdlib -shared getone.c -o libgetone-stripped.so
$ strip libgetone-stripped.so

The assembly will look like this:

$ objdump -d libgetone-stripped.so 

libgetone-stripped.so:     file format elf64-x86-64


Disassembly of section .plt:

0000000000001000 <getzero@plt-0x10>:
    1000:       ff 35 02 30 00 00       pushq  0x3002(%rip)        # 4008 <getone+0x2fdd>
    1006:       ff 25 04 30 00 00       jmpq   *0x3004(%rip)        # 4010 <getone+0x2fe5>
    100c:       0f 1f 40 00             nopl   0x0(%rax)

0000000000001010 <getzero@plt>:
    1010:       ff 25 02 30 00 00       jmpq   *0x3002(%rip)        # 4018 <getzero+0x2ff8>
    1016:       68 00 00 00 00          pushq  $0x0
    101b:       e9 e0 ff ff ff          jmpq   1000 <getzero@plt-0x10>

Disassembly of section .text:

0000000000001020 <getzero>:
    1020:       55                      push   %rbp
    1021:       48 89 e5                mov    %rsp,%rbp
    1024:       b8 00 00 00 00          mov    $0x0,%eax
    1029:       5d                      pop    %rbp
    102a:       c3                      retq   

000000000000102b <getone>:
    102b:       55                      push   %rbp
    102c:       48 89 e5                mov    %rsp,%rbp
    102f:       e8 dc ff ff ff          callq  1010 <getzero@plt>
    1034:       8b 15 e6 2f 00 00       mov    0x2fe6(%rip),%edx        # 4020 <getone+0x2ff5>
    103a:       01 d0                   add    %edx,%eax
    103c:       5d                      pop    %rbp
    103d:       c3                      retq

Note that there are two function entry points here, one for getzero (at address 0x1020) and another for getone (at address 0x102b). macaw-loader, on the other hand, only discovers the entry point for getzero. This is due to a limitation in how entryPoints is defined:

macaw-loader/macaw-loader-x86/src/Data/Macaw/BinaryLoader/X86.hs

Lines 50 to 67 in 7e26fbe

 x86EntryPoints :: (X.MonadThrow m) 

 => BL.LoadedBinary MX.X86_64 (E.ElfHeaderInfo 64) 

 -> m (NEL.NonEmpty (MM.MemSegmentOff 64)) 

 x86EntryPoints loadedBinary = do 

 case BLE.resolveAbsoluteAddress mem addrWord of 

 -- n.b. no guarantee of uniqueness, and in particular, entryPoint is probably in symbols somewhere 

 Just entryPoint -> return (entryPoint NEL.:| mapMaybe (BLE.resolveAbsoluteAddress mem) symbolWords) 

 Nothing -> X.throwM (InvalidEntryPoint addrWord) 

 where 

 offset = fromMaybe 0 (LC.loadOffset (BL.loadOptions loadedBinary)) 

 mem = BL.memoryImage loadedBinary 

 addrWord = MM.memWord (offset + (fromIntegral (E.headerEntry (E.header (elf (BL.binaryFormatData loadedBinary)))))) 

 elfData = elf (BL.binaryFormatData loadedBinary) 

 symbolWords = [ MM.memWord (fromIntegral (offset + (E.steValue entry))) 

 | Just (Right st) <- [E.decodeHeaderSymtab elfData] 

 , entry <- F.toList (E.symtabEntries st) 

 , E.steType entry == E.STT_FUNC 

 ]

This implementation uses decodeHeaderSymtab, which only consults the static symbol table. This happens to contain the address for getzero because it is the main entry point address for the shared library:

$ readelf -h libgetone-stripped.so | grep "Entry point address:"
  Entry point address:               0x1020

However, libgetone-stripped.so also contains dynamic symbols:

$ readelf --dyn-syms libgetone-stripped.so 

Symbol table '.dynsym' contains 3 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000001020    11 FUNC    GLOBAL DEFAULT    7 getzero
     2: 000000000000102b    19 FUNC    GLOBAL DEFAULT    7 getone

If the entryPoints function consulted the dynamic symbols, similarly to how it is done in macaw, it would be able to find the address for getone.

This example uses x86, but it applies to AArch32 and PPC32 as well, which use identical implementations for entryPoints.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	x86EntryPoints :: (X.MonadThrow m)
	=> BL.LoadedBinary MX.X86_64 (E.ElfHeaderInfo 64)
	-> m (NEL.NonEmpty (MM.MemSegmentOff 64))
	x86EntryPoints loadedBinary = do
	case BLE.resolveAbsoluteAddress mem addrWord of
	-- n.b. no guarantee of uniqueness, and in particular, entryPoint is probably in symbols somewhere
	Just entryPoint -> return (entryPoint NEL.:\| mapMaybe (BLE.resolveAbsoluteAddress mem) symbolWords)
	Nothing -> X.throwM (InvalidEntryPoint addrWord)
	where
	offset = fromMaybe 0 (LC.loadOffset (BL.loadOptions loadedBinary))
	mem = BL.memoryImage loadedBinary
	addrWord = MM.memWord (offset + (fromIntegral (E.headerEntry (E.header (elf (BL.binaryFormatData loadedBinary))))))
	elfData = elf (BL.binaryFormatData loadedBinary)
	symbolWords = [ MM.memWord (fromIntegral (offset + (E.steValue entry)))
	\| Just (Right st) <- [E.decodeHeaderSymtab elfData]
	, entry <- F.toList (E.symtabEntries st)
	, E.steType entry == E.STT_FUNC
	]