Giter VIP home page Giter VIP logo

Comments (4)

zygoloid avatar zygoloid commented on July 2, 2024

The mangled name grammar intends to provide a unique (and ideally as short as is reasonably possible) name for each external-linkage symbol, and to be (sufficiently-experienced-)human readable as well as sufficient to allow a demangler such as __cxa_demangle to produce a more pleasant human readable description of the symbol. But it's not designed to be such a sufficiently rigid representation of the original source as to allow full reconstruction of the original declaration, and I'd view that as scope creep that we should not succumb to (despite being the author of your (1)). For example, I think it would be reasonable to add a namespace/class distinction only if it provides benefits for one of the intended use cases.

from cxx-abi.

brianthelion avatar brianthelion commented on July 2, 2024

@zygoloid Appreciate the context, thank you.

The main use-case that I'm interested in involves LD_PRELOAD. Given the importance of function interposition in the developer's toolkit, I don't think it's unreasonable to request that LD_PRELOAD use-cases be admitted to the "intended" category if they aren't there already. You may disagree, but let me elaborate:

99% of the time, effective function interposition through LD_PRELOAD has

  1. An explicit dependency on mangled names; and
  2. An implicit dependency on demangled names.

The explicit dependency on the mangled name is due to use of dlsym(...) as the primary mechanism of runtime interposition. Yes, some LD_PRELOAD use-cases may not involve interposition at all, but those are in the 1% as far as I can tell.

The implicit dependency on the demangled name comes when trying to get LD_PRELOAD to correctly catch flow control in the first place. To do this, the author of the interposition shim has to correctly reverse-engineer the declaration for the callable that s/he wants to get in front of. The workflow there almost always starts with demangling. Due to the ambiguities in the mangling grammar, source code exporting the target symbol is required to achieve certainty about the callable declaration's precise syntax. This is problematic, especially when no source code is available.

On the whole, the argument is: (a) LD_PRELOAD is important and (b) critical LD_PRELOAD workflows depend on demangling, so (3) the ABI should provide enhanced support for those use-cases. Interested to hear your thoughts.

Cheers!

from cxx-abi.

mglisse avatar mglisse commented on July 2, 2024

To do this, the author of the interposition shim has to correctly reverse-engineer the declaration for the callable that s/he wants to get in front of. The workflow there almost always starts with demangling. Due to the ambiguities in the mangling grammar, source code exporting the target symbol is required to achieve certainty about the callable declaration's precise syntax. This is problematic, especially when no source code is available.

I don't think it is true that reverse-engineering is the first step. The natural first step is reading the documentation and sources (possibly partial sources, like a header provided to compile plugins). If you do not have any sources, that means that whoever provided the object to you does not support your interposition, and they could have obfuscated it by renaming all the mangled symbols to just "a", "b", etc. Even if you do have the original mangled names, the class/namespace distinction is likely to be much less of an issue than finding the layout of classes, expected semantics, etc.

from cxx-abi.

brianthelion avatar brianthelion commented on July 2, 2024

@mglisse Thanks for your thoughts here as well. I think you may be overlooking the most critical and popular use-case for interposition, namely, tracing. Given that context, I offer a rebuttal:

  • RE: "Natural first steps" -- Reading docs and sources (if available) is neither efficient nor to-the-point, as the goal of tracing is to understand runtime behavior that can't be deduced in a straightforward manner from manual static analysis. This generally involves mass interposition that would simply take too much time without direct support from the tooling.

  • RE: "Tracing support" in provided shared objects -- I have never heard anyone engaged in a tracing exercise say, "Wow, this library provides great mass interposition support." This just isn't a thing, as the tracing use-case presumes a lack of source-level instrumentation -- "logging" -- in the first place. In the subset of cases where source code is available, the burden of tracing support has fallen to the compilers, vis-a-vis -finstrument-functions. Aside from not working on existing shared objects, this method is a very blunt instrument with several drawbacks: (a) it is inappropriate for latency-sensitive applications; (b) it generates truly massive amounts of trace data, much of which is superfluous; (c) it does not allow for the easy extraction of arguments to the callable at call time.

  • RE: "Expected semantics" of the underlying code -- Again, for the mass interposition use-case, the underlying semantics aren't relevant from the outset. The engineer just wants to get traces back with as little work as possible while not compromising the application. Not having to deal with the semantics of the underlying code is important due to the size of the problem.

I look forward to your feedback.

Cheers!

from cxx-abi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.