Giter VIP home page Giter VIP logo

Comments (4)

mewmew avatar mewmew commented on August 20, 2024

Hej Félix,

Thanks a lot for working on fcd! I feel very happy that you've decided to do small write-ups on your decompiler development adventures; where you document your progress, ideas, problems, set-backs and major wins at http://zneak.github.io/fcd/

They have been a good source of inspiration!

If you haven't had a chance to see it yet, I'd very warmly recommend watching the talk Alessandro Di Federico presented at a LLVM developer conference in late 2016, where he introduced rev.ng which turns machine code into LLVM IR, using Qemu IR as an intermediate step in the translation.

Using Qemu is interesting, and it comes with its own set of benefits and drawbacks. What rev.ng performs well at is (sometimes non-trivial) control flow analysis, by propagating constraints on register values (similar to what you do during type analysis).

Related to this issue, rev.ng identifies tail calls and functions in a generic way, first by identifying candidate function addresses, and later pruning these candidate function addresses based on a set of constraints (single entry basic block (if I recall correctly), etc). A tail call may then be identified by a call from a function which jumps past other known functions; in other words, by a call to an address which is not part of the address region of the caller function.

I've been playing around with some of these concepts the last couple of days, in a toy experiment to lift 32-bit x86 assembly to LLVM IR.

Relevant extract from the isTailCall function at https://github.com/decomp/exp/blob/master/cmd/bin2ll/x86.go#L204

// isTailCall reports whether the given instruction is a tail call instruction.
func (d *disassembler) isTailCall(funcEntry bin.Address, inst *instruction) bool {
...
		if funcEntry <= target && target < funcEnd {
			// Target inside function.
			return false
		}

The target address of a JMP instruction is compared against known function start (entry basic block) and function end addresses (the last address of the function that is not part of succeeding data (e.g. switch jump tables) or other functions).

I hope this may give some ideas on how tail calls may be identified, and hopefully you may end up incorporating some useful parts and finding yet other ways to identify them.

Wish you all the best.

Happy reversing!

Cheers /u

from fcd.

fay59 avatar fay59 commented on August 20, 2024

Alessandro reached out to me a few months ago, after his talk, so I saw it. :)

The problem is not so much the idea as the time and interest that I have to put into this. Fcd's disassembler is very crude and not very good, and that's what's holding up the fix. It's one of the last concepts that have stayed almost identical since the beginning of the project, when I had a three-month deadline to make something that works, and I had to cut a few (many) corners to make that work.

Right now, the disassembler does a single pass that does function identification, control flow reconstruction, and lifting, all at once. Because of that, it misses out on a lot of information that could be gathered if each of these tasks were a separate step. It would be much easier to solve tail calls and jump tables with a better disassembler.

The other thing is that I'm not sure that I want to get into the disassembler business. It's an extremely difficult problem and some people already do it much better. I'm floating ideas to use disassembly input from radare, Bininja, Hopper, IDA, or any other useful source.

Once this problem is solved, identifying tail calls is straightforward, I think. You would just need to check if the destination is a known function's entry point. You could also make assumptions based on the state of the stack frame, since tail calls pop their stack just like it would for a normal return. I don't really like the address range check (because functions don't have to be contiguous, although they usually are in cooperative programs), but rev.ng's logic looks like what I'd like to do for the same task.

Thanks for your interest! I love to get feedback on the stuff that I do.

from fcd.

mewmew avatar mewmew commented on August 20, 2024

The other thing is that I'm not sure that I want to get into the disassembler business. It's an extremely difficult problem and some people already do it much better. I'm floating ideas to use disassembly input from radare, Bininja, Hopper, IDA, or any other useful source.

Understandably so. This is the path that remill and MC-Semantics have taken, using primarily IDA and Binary Ninja for now.

Once this problem is solved, identifying tail calls is straightforward, I think.

I would agree, for the most part. Identifying tail calls to static addresses is straight forward. Tail calls to dynamic addresses (e.g. jmp eax) will still require a substantial amount of work to get them right; probably relying on information from symbolic execution and constraint propagation.

I don't really like the address range check (because functions don't have to be contiguous, although they usually are in cooperative programs), but rev.ng's logic looks like what I'd like to do for the same task.

Fair point. The preliminary solution to this problem in bin2ll is to track function chunks (using a hash map from basic block address to function entry point). Works quite well for now, to add support for non-continious functions as well.

Thanks for your interest! I love to get feedback on the stuff that I do.

Very happy to. It's great to see so many different approaches to a somewhat tricky problem : )

from fcd.

mewmew avatar mewmew commented on August 20, 2024

For reference, bin2ll also relies on IDA for now for discovery of function and basic block addresses, and function chunks information. The dependency is not tied directly to IDA, as a tool lst2json parses the relevant information (using regular expressions on foo.lst files produced by IDA), to produce JSON input for bin2ll. In the future, support for Binary Ninja and potentially radare will be added, if the bin2ll experiment turns fruit-full.

from fcd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.