Giter VIP home page Giter VIP logo

steg86's Introduction

steg86

license CI

steg86 is a format-agnostic steganographic tool for x86 and AMD64 binaries. You can use it to hide information in compiled programs, regardless of executable format (PE, ELF, Mach-O, raw, &c). It has no performance or size impact on the files that it modifies (adding a message does not increase binary size or decrease execution speed).

For more details on how steg86 works, see the Theory of Operation section.

Installation

steg86 can be installed via cargo:

$ cargo install steg86

Alternatively, you can build it in this repository with cargo build:

$ cargo build

Usage

See steg86 --help for a full list of flags and subcommands.

Profiling

To profile a binary for steganographic suitability:

$ steg86 profile /bin/bash
Summary for /bin/bash:
  175828 total instructions
  27957 potential semantic pairs
  19 potential commutative instructions
  27944 bits of information capacity (3493 bytes, approx. 3KB)

Embedding

To embed a message into a binary:

$ steg86 embed /bin/bash ./bash.steg <<< "here is my secret message"

By default, steg86 embed writes its output to $input.steg. For example, /lib64/ld-linux-x86-64.so.2 would become /lib64/ld-linux-x86-64.so.2.steg.

steg86 embed will exit with a non-zero status if the message cannot be embedded (e.g., if it's too large).

Extraction

To extract a message from a binary:

$ steg86 extract bash.steg > my_message
$ cat message
here is my secret message

steg86 extract will exit with a non-zero status if a message cannot be extracted (e.g., if it can't find one).

Theory of Operation

steg86 takes advantage of one of x86's encoding peculiarities: the R/M field of the ModR/M byte:

  7  6  5  4  3  2  1  0
 -------------------------
 | MOD |  REG  |   R/M   |
 -------------------------

The ModR/M byte is normally used to support both register-to-memory and memory-to-register variants of the same instruction. For example, the MOV instruction has the following variants (among many others):

opcode mnemonic
89 /r MOV r/m32,r32
8B /r MOV r32,r/m32

Because the ModR/M field can encode either a memory addressing operation or a bare register, opcodes that support both register-to-memory and memory-to-register operations also support multiple encodings of register-to-register operations.

For example, mov eax, ebx can be encoded as either 89 d8 or 8b c3 without any semantic changes. This gives us one bit of information per duplicated instruction semantic. Given enough register-to-register instructions with multiple encodings, we can hide entire messages with those bits.

Additionally, because these semantically identical encodings are frequently the same size, we can modify preexisting binaries without having to fix relocations or RIP-relative addressing.

steg86 does primitive binary translation to accomplish these goals. It uses iced-x86 for encoding and decoding, and goblin for binary format wrangling.

Prior work

The inspiration for steg86 came from @inventednight, who described it as an adaptation of a similar idea (also theirs) for RISC-V binaries.

The technique mentioned above is discussed in detail in Hydan: Hiding Information in Program Binaries (2004).

steg86 constitutes a separate discovery of Hydan's technique and was written entirely independently; the refinements discussed in the paper may or may not be more optimal than the ones implemented in steg86.

Future improvements

  • steg86 currently limits the embedded message to 16KB. This is a purely artificial limitation that could be resolved with some small format changes.

  • x86 (and AMD64) both have multi-byte NOPs, for alignment purposes. Additional information can be hidden in these in a few ways:

    • The OF 1F /0 multi-byte NOP can be up to 9 bytes, of which up to 5 are free (SIB + 4-byte displacement).
    • There are longer NOPs (11, 15 bytes) that may also be usable.
  • Going beyond register-to-register duals and rewriting add/sub, as Hydan does.

steg86's People

Contributors

autumnontape avatar dependabot-preview[bot] avatar dependabot[bot] avatar layderv avatar woodruffw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

steg86's Issues

Use SIB trickery to encode information

On further thought I don't think this will actually work since it involves different encoding lengths, but:

For 32-bit x86 binaries, there are two different ways to encode a displacement-only indirect addressing operation: you can either use the disp32 encoding via ModR/M (mod=b00 and rm=b101) or you can use the SIB encoding, which is activated by mod=b00 and rm=b100.

The SIB encoding, then, can be set with index=b100 to mark an invalid index register and base=b101, indicating that only the displacement is used. The result: two separate encodings for the same displacement-only indirect operation.

The downside is that the SIB encoding is 1 byte longer, since it includes the SIB byte itself. So, the only way this would probably work in the context of steg86 is if a particular binary was already using the SIB form, and could be selectively rewritten to use the non-SIB form + a padding NOP.

Use TEST to encode information

TEST has only one register-to-register form:

r/m8, r8
r/m16, r16
r/m32, r32

However, it is commutative, meaning that test eax, ebx and test ebx, eax are semantically identical. Thus, for test r1, r2 where r1 != r2, we can define r1, r2 as false and r2, r1 as true.

steg86 does not compile.

$ rustc --version
rustc 1.43.1
$ cargo install steg86

Results in a number of errors regarding "with_name". All are similar to this one:
steg86-compile-errors.log

Compiling steg86 v0.1.1
error[E0599]: no function or associated item named with_name found for struct clap::build::arg::Arg<'_> in the current scope
--> /home/michaelr/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.1/src/main.rs:19:26
|
19 | Arg::with_name("input")
| ^^^^^^^^^
| |
| function or associated item not found in clap::build::arg::Arg<'_>
| help: there is a method with a similar name: get_name

Not using --raw is broken

Trying to use any steg86 subcommand without passing --raw results in an error like this:

error: The following required arguments were not provided:
    <input>
    --raw

USAGE:
    steg86 extract <input> --raw --bitness <bitness>

For more information try --help

I'm not very familiar with clap, but removing .requires("raw") from the "bitness" argument of each subcommand fixes the problem. I think clap is acting like bitness is always provided because it has a default value.

cargo install steg86 does not work

Currently, when we run the command cargo install steg86, we get the following errors:

error[E0599]: no variant or associated item named `VersionlessSubcommands` found for enum `AppSettings` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:10:31
   |
10 |         .setting(AppSettings::VersionlessSubcommands)
   |                               ^^^^^^^^^^^^^^^^^^^^^^ variant or associated item not found in `AppSettings`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:20:26
   |
20 |                         .about("treat the input as a raw binary")
   |                          ^^^^^ method not found in `Arg<'_>`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:26:26
   |
26 |                         .about("the bitness of the raw binary")
   |                          ^^^^^ method not found in `Arg<'_>`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:35:26
   |
35 |                         .about("the binary to profile")
   |                          ^^^^^ method not found in `Arg<'_>`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:45:26
   |
45 |                         .about("treat the input as a raw binary")
   |                          ^^^^^ method not found in `Arg<'_>`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:51:26
   |
51 |                         .about("the bitness of the raw binary")
   |                          ^^^^^ method not found in `Arg<'_>`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:60:26
   |
60 |                         .about("the binary to embed into")
   |                          ^^^^^ method not found in `Arg<'_>`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:66:26
   |
66 |                         .about("the path to write the steg'd binary to")
   |                          ^^^^^ method not found in `Arg<'_>`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:76:26
   |
76 |                         .about("treat the input as a raw binary")
   |                          ^^^^^ method not found in `Arg<'_>`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:82:26
   |
82 |                         .about("the bitness of the raw binary")
   |                          ^^^^^ method not found in `Arg<'_>`

error[E0599]: no method named `about` found for struct `Arg` in the current scope
  --> /home/elf/.cargo/registry/src/github.com-1ecc6299db9ec823/steg86-0.1.2/src/main.rs:91:26
   |
91 |                         .about("the binary to extract from")
   |                          ^^^^^ method not found in `Arg<'_>`

For more information about this error, try `rustc --explain E0599`.
error: failed to compile `steg86 v0.1.2`, intermediate artifacts can be found at `/tmp/cargo-install4K8u0S`

Caused by:
  could not compile `steg86` due to 11 previous errors

Cloning the repo and building it from source with cargo build works fine though.

Support "raw" binaries

steg86 itself is format agnostic, but the CLI currently assumes that every input is either an ELF, a Mach-O, or a PE.

There should probably be a --raw or similar flag that tells steg86 to treat its input as a plain sequence of x86 instructions and skip extraction of the text section.

Fuck You

what the fuck was that challenge
have a star you goddamn guess god hunter

"encountered an invalid instruction" when operating on PE32/PE32+

I've tried running steg86 profile against several EXEs and DLLs, both PE32 and PE32+, and every time, it has produced an error like this:

Fatal: encountered an invalid instruction at text offset 3678 (file offset 4702)

It seems like this should be easy to reproduce, but I can upload an example file if not. I've had no such problems with ELF files.

CLI unit tests

The following invariants should be (and would be easy to) test:

  • steg86 profile <input> run on the same <input> multiple times always produces the same results
  • The output of steg86 extract <input> matches the input of steg86 embed <input>

Use XCHG pairs to encode information

XCHG has r/mX, rX and rX, r/mX variants, and so is compatible with steg86's primary mechanism. It should be added to the SUPPORTED_OPCODES list and SEMANTIC_PAIRS table.

Fatal: incompatible steg86 version (expected 1, got 0)

I recently tried to extract a message using steg86 v0.2.0 which failed with this error:
Fatal: incompatible steg86 version (expected 1, got 0)
I tried to install an older version of steg86 (either using cargo or from source) but I encountered errors (mainly E0308 and E0599).

Is there a way to solve this issue?
Thanks

Support universal ("fat") Mach-Os

This isn't hard to do, I was just lazy about it:

  1. If our input is a fat Mach-O, iterate over its architecture slices
  2. Select the first x86-64 or i386 slice, preferring x86-64 over i386 if both are present

This could be made configurable (i.e., allowing the user to select the specific slice by its index), but it's probably not worth the effort.

Allow more than 16KB to be encoded

As per your README, the tool does not allow to store more than 16KB (header included).

How do you see this changed? A simple u16->u64 change? Is a version dump required (I'd say so)?

Consult a lawyer about current license

I'm not a lawyer, but I've had enough experience with software licensing that I'd be wary of the current license without consulting one.

Slapping restrictions on base licenses that have passages like "without restriction, including without limitation" can have unintended side-effects. (And you don't want to rephrase existing license text without an expert who knows the precedents for how different phrasings have been interpreted by the courts in the past.)

...though, admittedly, I'm more familiar with how they show up in licenses like the GPL or some Creative Commons licenses, where you run into things like explicit "you may not add additional restrictions" (GPL2) or "additional restrictions are invalid and may be ignored" (GPL3) clauses, combined with things like "the text of this license is licensed to you under the condition you do not modify it" (GPL) or "the name of this license is a trademark that is only licensed to you for use with the un-modified text of our version of this license" (Creative Commons).

In the GPL2 example, a GPL2 license with additional restrictions is equivalent to All Rights Reserved because it's impossible to satisfy all the terms simultaneously. That's why the GPL3 changed it to "If someone tries to slap additional restrictions on this, they're invalid and may be ignored".

In the Creative Commons example, trademark law prevents you from mentioning Creative Commons when making a derivative license for the same reason that commercials say "the next leading brand". (You're not allowed to use someone's trademark to promote a competitor without their permission and riding on the name-recognition of the license you derived from counts.)

...plus, of course, the usual concerns:

  1. Often, to save legal costs and paperwork, companies will refuse anything that's not an exact match for the licenses they've already OKed, which means that other people will then do likewise to ensure that their creations' aren't hampered in their ability to get popular by dependencies with such "toxic" licenses.
  2. People tend to look unfavourably on licenses which don't get approved by one of the OSI, the FSF, or Debian legal, and the Open Source Definition, the Free Sofware Foundation's Four Software Freedoms, and the Debian Free Software Guidelines all forbid restrictions on fields of endeavour. (Not that I'm telling you to change it. Just being thorough in giving you context.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.