Giter VIP home page Giter VIP logo

asmdeobf's Introduction

Extracting a CFG

To extract a CFG, we first need to identify conditional branches. Conditional branches that are taken for certain (jz when zf == 0) have to be ignored, as the other dest can be invalid code. However, there needs to be some care taken regarding initial conditions when loops are involved, for-example the following code's label_2 may be ignored if initial conditions are not updated, through discovery of new blocks:

    xor eax, eax
label_1:
    jnz label_2
    inc eax
    jmp label_1
label_2:

So this becomes abstract intrepretation. In order to update initial conditions properly, we re-execute new blocks if the initial conditions have not yet reached a fixed-point. In order to save time, we use BFS so both branches of a conditional are run first before the remaining code is run.

In order to handle jge/jle, and fixed locations in stack/memory, our constraint involves constants with disjunctions, where esp_0 + const is also considered a constant.

Expression manipulations which hinder abstract interpretation

It is possible where are some cases where there are expressions like eax_0 + eax_1 - eax_1 - eax_0. This can hinder constant analysis.

In this case, we may be tempted to use symbolic execution rather than block simplification for performance reasons. To be complete, we need to implement all operations symbolically, then evaluate constraints at the end of the block. But this has some disadvantages:

  • The expressions may be large, even if some sub-expressions are the same variable, this may be expensive.
  • Coding new simplifications duplicates work from block simplification.
  • If there is a need for coding constraint-based block simplification (e.g. range constraints making some parts 0 to allow for further transforms), then there is probably also a need for constraint-based expression simplification, which duplicates work.
  • The symbolic evaluator restricts how the stack/memory is implemented (e.g. that they are separated), instead of leaving this up to the constraint class.

A mitigation to the performance problem is to instead perform block simplification every time there is an unknown predicate, which should not occur often.

API Calls

API calls end a block to facilitate block-by-block deobfuscation.

Functions

In order to identify functions, we look at when the constraints start having a different eip at the start of a block than before. In order to then identify where the call came from, we can perform taint analysis to see which operation tainted eip.

Inlining

Some functions are useless and may be called multiple times, so in order to facilitate analysis (both constraint propagation and block deobfuscation), we allow inlining.

De-obfuscation after the CFG is extracted

De-obfuscation is done on the CFG, as well as block-by-block.

Self-modifying code

The analysis supposes the program does not modify itself.

Decryption loops can be emulated outside of the static analyzer.

For self-modifying conditions, we will provide an api to let the user hook addresses so fixed code can become some user-specified code to be handed to the analyzer.

Other tools

Radeco only works on linux and is not aimed particularly at de-obfuscation. For convenience of python we code own ssa form/simplification routines.

Visualization

asmdeobf's People

Contributors

stevenxxiu avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.