itanium-cxx-abi / cxx-abi Goto Github PK

View Code? Open in Web Editor NEW

472.0 472.0 85.0 13.83 MB

C++ ABI Summary

HTML 99.30% CSS 0.01% C 0.26% C++ 0.35% Python 0.03% Shell 0.03% PLSQL 0.01%

cxx-abi's People

Contributors

Stargazers

Watchers

cxx-abi's Issues

Broken table of contents on Exception Handling page

I'm not sure if this is the right repo to file an issue. Apologies if it isn't.

Some minor issues with this web page: http://mentorembedded.github.io/cxx-abi/abi-eh.html

In the table of contents, clicking "Level I: Base ABI" does not bring you to that header. The cause is that the HTML link points to "layout" when it should point to "base-abi".
In the table of contents, clicking "Level III: Implementation" incorrectly brings you to "Level II: C++ ABI". The cause is that the HTML link points to "cxx-abi" when it should point to "cxx-rt"
The table of contents for III has the wrong numbers (2.4 and 2.5 instead of 3.4 and 3.5)

    3.1 Introduction
    3.2 Data Structures
    3.3 Runtime Initialization
    2.4 Throwing an Exception
    2.5 Catching an Exception

Should be

    3.1 Introduction
    3.2 Data Structures
    3.3 Runtime Initialization
    3.4 Throwing an Exception
    3.5 Catching an Exception

Mangling for fold expressions

Support for C++17 fold expressions appears to be missing in the name mangling specification. Gcc and Clang appear to use new fL, fR, fl, and fr productions to introduce binary left folds, binary right folds, unary left folds, and unary right folds respectively:

$ cat t.cpp
template<typename ...T>
auto ftbl(T... p) -> decltype((1 + ... + p.dm)) { return (1 + ... + p.dm); }
template<typename ...T>
auto ftbr(T... p) -> decltype((p.dm + ... + 1)) { return (p.dm + ... + 1); }
template<typename ...T>
auto ftur(T... p) -> decltype((p.dm + ...)) { return (p.dm + ...); }
template<typename ...T>
auto ftul(T... p) -> decltype((... + p.dm)) { return (... + p.dm); }
struct X { int dm; };
auto f(X x) {
  ftbl(x);
  ftbr(x);
  ftur(x);
  ftul(x);
}

$ clang -c -std=c++17 t.cpp
...

$ nm t.o
0000000000000000 T _Z1f1X
0000000000000000 W _Z4ftblIJ1XEEDTfLplLi1Edtfp_2dmEDpT_
0000000000000000 W _Z4ftbrIJ1XEEDTfRpldtfp_2dmLi1EEDpT_
0000000000000000 W _Z4ftulIJ1XEEDTflpldtfp_2dmEDpT_
0000000000000000 W _Z4fturIJ1XEEDTfrpldtfp_2dmEDpT_

The grammar productions appear to look like:

<expression> ::= ...
             ::= <fold-expression>
<fold-expression> ::= fL <operator-name> <expression> <expression>  # binary left fold
                  ::= fR <operator-name> <expression> <expression>  # binary right fold
                  ::= fl <operator-name> <expression>  # unary left fold
                  ::= fr <operator-name> <expression>  # unary right fold

need ABI update for defaulted operator<=> implicitly declaring an operator==

Consider:

struct A {
  virtual operator<=>(const A&) const = default;
};

The vtable for A needs a slot for the virtual function A::operator<=> and a slot for the implicitly-declared virtual function A::operator==.

Proposal: allocate virtual table slots for virtual operator==s as if they were declared at the end of the class, in order of the declarations of the corresponding operator<=>s, after any virtual table slots inserted for implicitly-declared operator= functions.

[v2] consider more offsets when laying out empty subobjects

Currently, when laying out an empty subobject, we only consider offset zero, followed by offsets >= dsize of the class. We never consider the multiples of [nv]alignof the subobject that are greater than zero and less than the dsize of the class.

Considering those additional offsets would reduce the size of some classes. For example:

struct noncopyable {};
struct A : noncopyable {};
struct B { int n; };
struct C : noncopyable {};
struct D1 : A, B, C {}; // sizeof(D1) == 8, could be 4
struct D2 { // sizeof(D2) == 8, could be 4
  [[no_unique_address]] A a;
  B b;
  [[no_unique_address]] C c;
};

Allow '$' in grammar productions to indicate a vendor extension

Coverity uses the Itanium ABI name mangling format with various extensions to uniquely identify entities beyond those of traditional ABIs. For example, we encode unique names for templates, classes, enumerations, Objective-C methods, and Apple Blocks.

We make use of the existing support for vendor extensions for builtin types, type qualifiers, and operators where possible, but this doesn't cover our needs.

At present, the way that we have encoded our extensions has potential to conflict with future changes to the Itanium ABI. Fortunately, we haven't encountered any conflicts so far, but I'm sure it is just a matter of time.

We're wondering about the possibility of reserving a character such as '$' to indicate the start of a vendor extension in any production except identifier. The reason for suggesting '$' is that it is already allowed in identifiers, but is not currently used as a literal in any existing production. The use of such an extension would result in a non-portable name; it isn't expected that all implementations would be able to decode a name that used such extensions. The goal is simply to allow vendor extensions that won't conflict with future ABI changes.

unspecified alignment for guard variables

We specify that guard variables are 64 bits wide, but we don't specify their alignment, and implementations vary. For example, for:

inline void f() { 
    static int n = g();
}

When targeting 32-bit x86, GCC and ICC use 8-byte alignment whereas Clang uses 4-byte alignment. (Generally, Clang uses the alignment of uint64_t whereas the others appear to always use 64-bit alignment.)

Presumably we should say something about this.

do not require complete object or deleting destructor symbols for abstract classes

Given

struct A { virtual ~A() = 0; }; A::~A() {}

we seem to require three symbols to be emitted: _ZN1AD0Ev, _ZN1AD1Ev, and _ZN1AD2Ev (or at least, Clang and GCC both emit all three). Of these, only _ZN1AD2Ev can ever be referenced; the deleting destructor and complete object destructor are not entered into the vtable, and a complete object of type A can never be destroyed directly.

We should not require an implementation to emit these extra symbols. Note in particular that code must be emitted for the operator delete call for D0, which needs whole program analysis (-ffunction-sections, LTO, etc) to remove.

The same holds regardless of whether the destructor is virtual (but if not, then there's at least only the complete object destructor symbol to worry about, which can always be an alias to the base subobject destructor symbol).

mangling for constrained templates

p0734r0 added new forms of overloadable declaration that we need to mangle. For instance, we now need to distinguish:

template<typename T> concept A = ...
template<typename T> concept B = ...
template<A T> void f(T); // f1
template<B T> void f(T); // f2
template<typename T> requires A<T> void g(T); // g1
template<typename T> requires B<T> void g(T); // g2

It is permissible (but not necessary) for the mangling of f1 and g1 to be the same (other than the name).

(There are also requires-clauses on non-template functions, but I don't believe there is any need to mangle those since at most one such function can have its requires-clause evaluate to true, and the rest are never emitted.)

As a general model, I suggest we include "extra information" about a template-parameter (for a function template -- we don't need this for non-overloadable templates) as a prefix on the template-arg mangling. (We should also consider extending this to the case where the the template parameter is a template template parameter and the template argument does not have an identical template-parameter-list, to handle the case described in http://sourcerytools.com/pipermail/cxx-abi-dev/2014-December/002791.html)

I suggest we affix the constraint expression from the requires-clause (if any) to the template-args, and do not perform any expansion or canonicalization of the as-written form of the template declaration. Strawman mangling suggestion:

<template-arg> ::= C <concept name> <template-arg>
<template-args> ::= I <template-arg>+ Q <requires-clause expr> E

Example:

template<A T, B U> requires C<T, U> void f();
f<int, 3>(); // _Z1fIC1AiC1BLi3EQ1CIT_T0_EEvv

Understanding the role of demangling in the toolchain

Apologies for a brief digression here into the more sociological aspects of the ABI.

It's clear who consumes mangled names. But who consumes demangled names?

Naively, it would seem that demangling is provided just to make debugging easier. But then we see at least a couple of examples (1) (2) of semantic extraction from the mangling grammar. Are there existing tools that depend be being able to accurately invert the grammar? What are the ABI standard's responsibilities with respect to demangling support?

I'm interested from the perspective of someone that would find better demangling support helpful. In particular, code generation from demangled names could be possible if the grammar were less ambiguous about the namespace/class distinction. I'd love to understand where the maintainers and the broader community of ABI consumers stand on this sort of thing.

ABI support for contracts

I've been working on implementing contracts in Clang, and it looks like some sort of ABI-level support might be needed.

It'd be handy to have something along the lines of __cxa_contract_violation(const std::contract_violation &) for the compiler to call in the absence of a user-provided handler, akin to __cxa_pure_virtual.
Ideally, one could call an unknown function via an arbitrary function pointer and have any precondition violations report the source location of the caller. This must surely require ABI support.

Mangling _FloatN types

In C11 extension ISO/IEC TS 18661, new floating point extensions and types are defined. For example, _FloatN is introduced as a binary interchange format, where N can be 16, 32, 64, 128 (or bigger). We are implementing _Float16 support in Clang and also like to use it in C++ mode. Therefore we need mangling support for _Float16. Is this something the C++ ABI would consider to support?

mangling for long double

Hi!

The cxx-abi specifies e as the mangling for long double. In GCC for PowerPC, we originally used
64-bit IEEE float as long double. When later IBM extended double ("double-double") was introduced
as the preferred long double type, that was given the mangling g (which is demangled as __float128)
so that libraries can support both floating point formats at the same time.

This doesn't work all that great. For example, demangling _Z1fg will show the wrong type (__float128
instead of long double); as another example, break f(long double) in GDB will not work.

Now we have a third format for long double: 128-bit IEEE float (IEC 60559 binary128). So we need
a new mangling for that.

I'd like to propose the mangling
k<builtin-type>
to stand for "long double implemented the same as <builtin-type>". It would be demangled as just
long double

(I chose k for no specific reason other than it is short and was available).

Comments?

mangling for instantiation-dependent non-type template parameter types

The mangling for a function template does not include the instantiation-dependent portions of non-type template parameters (including such things transitively within template template parameters). This is becoming increasingly important as people try to write things like:

template<typename T, std::void_t<typename T::x>* = nullptr> void func() {}
template<typename T, std::void_t<typename T::y>* = nullptr> void func() {}

(std::void_t<T> is void for all T. It's not obvious whether it's supposed to be a dependent type, but the above cases are at least instantiation-dependent types.)

For a type T providing both a nested x and a nested y, we will mangle instantiations of the two possible func<T>s the same, despite them being distinct templates.

Including the (pre-substitution) types of non-type template parameters in the mangling (if they're instantiation-dependent) seems like the obvious fix, but it would likely result in an ABI break for a significant amount of existing code.

We should probably at least fix this for ABI v2.

The type of an 'auto' lambda parameter should be mangled as the corresponding template type parameter

The document isn't clear that 'auto' in a generic lambda should be mangled as the underlying artificial template type parm. doing that can lead to recursive manglings (which is one reason why I thought it a bug).

Here's a suggested diff (sorry, had to gzip it to attach it)

abi-lambda.diff.gz

unresolved-names with resolved prefixes

Previously: http://sourcerytools.com/pipermail/cxx-abi-dev/2015-October/002867.html

I think the problem here does not only affect substitutable prefixes for unresolved-names. Clang and GCC also disagree about how to mangle this:

inline namespace Y {
  template<typename T, typename U> struct is_same { static const bool value = false; };
  template<typename T> struct is_same<T, T> { static const bool value = true; };
}
template<bool, typename T> struct enable_if {};
template<typename T> struct enable_if<true, T> { typedef T type; };

template <class T> typename enable_if<is_same<T, float>::value, float>::type arg(T __re);
float f = arg<float>(0);

... where GCC gives _Z3argIfEN9enable_ifIXsrN1Y7is_sameIT_fEE5valueEfE4typeES3_ (which doesn't match the grammar in the ABI), and Clang gives _Z3argIfEN9enable_ifIXsr7is_sameIT_fEE5valueEfE4typeES1_ (which can collide with a different function in a different TU).

I think GCC's approach is closer to being the right one. If we can resolve an initial portion of an <unresolved-name>, we should emit the fully-qualified path to the resolved component. Though there are some other open questions here: is the partially-resolved name substitutable? (Do we get S1_ or S3_ at the end of the mangling?) Does it receive a leading N, per GCC's mangling, or no leading N, per Clang's mangling / the ABI?

mangling substitutions for `std` inline namespaces

Both libstdc++ and libc++ use inline namespaces for versioning, which removes the utility of basically all the built-in std substitutions other than St. We should provide a way for the S* substitutions to be used with an inline namespace. I don't have a concrete suggestion yet; whatever we pick, we'll presumably want the std::<inline namespace> part to itself be substitutable, which makes this a bit awkward to fit into the existing scheme.

mangling for non-type template arguments of class type

Standard paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0732r2.pdf

C++20 adds the ability for near-arbitrary class types to be used as the type of non-type template arguments. We need a mangling for these.

We are guaranteed that such classes are not unions and do not contain unions, so perhaps the simplest approach would be to emit pseudo-aggregate-initialization syntax for the flattened sequence of subobjects:

struct A {
  int x;
  int : 0;
  int y;
};
struct B : A { int arr[2]; };
template<auto> struct Q {};
void f(Q<B{1, 2, 3, 4}>); // mangled as _Z1f1QIXtl1BLi1ELi2ELi3ELi4EEEE

These are going to get very long, very fast, so we might want an alternative representation for large objects. One intended use case is for things like:

template<size_t N> struct str {
  constexpr str(const char *p) : data{} { for (int n = 0; *p; ++p) data[n] = *p; }
  char data[N];
};
template<size_t N> str(const char (&)[N]) -> str<N>;
template<str S> struct R {};
void f(R<"some very long string here">);

... and encoding this with a long sequence of Lic123E manglings seems highly undesirable. This ties into another open question: how should string literal expressions be mangled? (Currently we say that we only mangle the length, which is insufficient.) Reusing whatever mangling we use for string literals as the mangling for char arrays within a non-type template argument might be wise. Alternatively / additionally, we could emit only a (suitably secure) hash of the literal value if it's very long.

Another form of mangling is also required here: all non-type template arguments with the same value throughout the entire program are lvalues denoting the same object, so we need a mangling for the global constant holding that object. Proposal:

<special-name> ::= TA <template-arg> # template parameter object for argument <template-arg>

Eg:

template<auto V> const auto *p = &V;
// mangled as _Z1pIXtl1BLi1ELi2ELi3ELi4EEEE
// initialized as pointer to _ZTAXtl1BLi1ELi2ELi3ELi4EEE
template const B *p<B{1, 2, 3, 4}>;

mangling for char8_t

I'm working on adding prototype support for P0482 to gcc and am interested in reserving a mangling for the char8_t type.

char16_t and char32_t use Ds and Di respectively. s and i correspond to short and int. Dc (c for char) or Dh (h for unsigned char) would be the obvious candidates, but both are taken (for decltype(auto) and IEEE 754 support respectively). I believe Du is currently available, so I'll suggest that as a starting point.

For now, I'm using the vendor extended type mangling (u7char8_t) and am fine with that at least until support for char8_t is accepted in upstream gcc.

mangling for generic lambda conversion to function pointer and static invoker

How should we mangle the conversion function template to function pointer, and the function that it returns? Example:

inline void test() {
    auto x = [](auto){};

    using F = void(int);
    F *(decltype(x)::*p)() const = &decltype(x)::operator F*;
    
    void (*q)(int) = x;

    static auto p_static = p;
    static auto q_static = q;

    assert(p == p_static);
    assert(q == q_static);
}

Here, the first assertion is required to hold, so we need a consistent mangling for the conversion function. And the second assertion should probably hold too, so we can deduplicate the static invoker function across vendors and so that we get consistent behavior when that value is (eg) used to initialize a global constant that is visible across TUs.

In the conversion function case, we need to decide how to mangle the type, and particularly how to mangle the return type of the function pointer type that the lambda conversion function template converts into. That type is not actually nailed down by the standard to the extent that we could mangle it; instead, we are told that "The return type of the pointer to function shall behave as if it were a decltype-specifier denoting the return type of the corresponding function call operator template specialization."

Current manglings for p:

_ZZ4testvENKUlT_E_cvPFDTcldtdeLPv0EonclscOS_fp_EES_EIiEEv -- GCC
_ZZ4testvENKUlT_E_cvPFDaS_EIiEEv -- Clang and EDG

Neither mangling is great. They pointlessly repeat the lambda parameter signature from the Ul mangling. GCC's exposes an implementation detail, namely the exact decltype expression used under the hood (including a cast of a null pointer to the closure type, and a reference to a not-in-scope function parameter!). Clang's and EDG's give the conversion function itself a deduced return type, which is strictly-speaking incorrect, but in today's C++ can't collide with anything else because it's impossible to declare an operator auto(*)(T)() function, and in any case there can't be one declared in the same scope as the conversion function.

For the static invoker function, we need a name for the function as well as a type. EDG and Clang call this __invoke and GCC calls it _FUN, but those seem like things you would only include in a mangling by accident; our convention is to use special manglings as names for such entities instead.

Some possibilities:

Alternative 1:

for the conversion to function pointer, use the type of the call operator as written as the pointee type: that is, the name of the conversion function is always cvPF<sig>E, where is the signature of the call operator
for the invoker, use li as the name, and use the type of the call operator as written as the type (that is, lie about it having a deduced return type if the operator() has a deduced return type)

This matches the manglings used today by EDG and Clang, with 8__invoke replaced by li.

Alternative 2 (removing some of the redundancy):

for the conversion to function pointer, use lc as the name instead of cv<...>, removing the need to mangle the return type and to (redundantly) repeat the lambda parameter list from the preceding mangling
for the invoker, use li as the name, and add this case to the list of cases where we do not include the return type in the encoding

That gives:

_ZZ4testvENKUlT_E_lcIiEEv
_ZZ4testvENKUlT_E_2liIiEES_

Note that we still include a redundant v in the lc mangling (consistent with cv manglings), and a redundant S_ (or more generally a redundant sequence of substitutions) in the li mangling, but the consistency those bring seems worthwhile.

Alternative 3 (matching the standard's model):

Add a type mangling for the "decltype-specifier denoting the return type of the corresponding function call operator template specialization" type described in the standard, say Dl, and (as above) use li as the name of the invoker (but otherwise treat it as a regular static function). That gives:

_ZZ4testvENKUlT_E_cvPFDlS_EIiEEv
_ZZ4testvENKUlT_E_2liIiEEDlS_

(where the Dl encoding would be used for all generic lambdas, regardless of whether the operator() has a dependent return type).

Of these, I think I prefer Alternative 1: it's the smallest extension to the ABI, and is closest to existing implementation practice.

Name mangling for possible new Clang type

Hello, I put forward a patch for review not that long ago that adds a new type to Clang. It's a dependent type that I use to enable template parameters in conjunction with address spaces. The patch itself and further details can be viewed here: https://reviews.llvm.org/D33666

In essence, as it's a new type if it was to be accepted it would need an appropriate name mangling. So I was hoping to raise a discussion on a name mangling if that's possible. At the moment the mangling function is as follows:

void CXXNameMangler::mangleType(const DependentExtAddressSpaceType *T) {
Out << "DEas";
mangleExpression(T->getAddrSpaceExpr());
Out << '_';
mangleType(T->getPointeeType());
}

I'm quite unaware of the naming conventions used, so I doubt it's ideal at the moment. I followed suit from the existing DependentSizedExtVectorType mangling function. Which does something quite similar with its size expr and element type. In this case the AddrSpace expression would be the address space index. The PointeeType would be the type the address space is to be attached to when it's no longer dependent. The DEas acronym is the types name minus the Type section of the name.

Thank you very much for your time and consideration on this issue, I apologies if this is the incorrect way to raise this type of issue and would appreciate redirection to the appropriate avenue if that is the case.

describe inherited constructor mangling

[Imported from cxx-abi-dev]

Per http://wg21.link/p0136r1 an inheriting constructor declaration no longer results in the implicit synthesis of derived class constructors, and instead the behavior of a call to an inherited constructor is that:

the portion of a hypothetical defaulted default constructor prior to the base constructor invocation is executed, then
the inherited constructor is invoked, then
the portion of a hypothetical defaulted default constructor after the base constructor invocation is executed

Proposal:

To avoid emitting the code for (1) and (3) in every inherited constructor call site, add a new form of mangled name for a fake constructor that forwards to a base class constructor, whose <encoding> is that of the base class constructor, except that the <nested-name> is that of the derived class and the <unqualified-name> is

<ctor-dtor-name> ::= CI1 <base class type> # complete object inheriting constructor
<ctor-dtor-name> ::= CI2 <base class type> # base object inheriting constructor

This would give code largely similar to what we generate with the C++11 inheriting constructor rules, except that the additional copy constructions and destructions for parameters would be removed.

The usage of this mangling would be entirely optional; the purpose of including this mangling in the ABI is only to coalesce multiple weak definitions of the same symbol. If an implementation can't forward all the arguments (eg for varargs constructors) or just doesn't want to emit these symbols, the full initialization can be inlined instead (or another technique can be used).

As usual, CI2 constructors do not construct virtual base class subobjects. As a consequence, when a constructor is inherited from a virtual base, the corresponding CI2 symbol does not need the formal parameters, so they are not passed.

need a mangling for string literals in inline variable initializers

Consider a case such as

inline const char *str = "foo";

str is required to have a single value across translation units, so the same string literal object must be used as its initializer in all cases. For example:

// TU 1
inline constexpr const char *str = "foo";
const char *x = str;

// TU 2
#include <cassert>
inline constexpr const char *str = "bar";
extern const char *x;
const char *y = str;
int main() { assert(x == y); }

The assertion here is not permitted to fail. Unfortunately, this doesn't only affect string literals appearing in inline variable initializers:

inline constexpr const char *f() { return "foo"; }
inline constexpr const char *x = f(); // must be the same string literal object in all TUs

... and templated variables expose the same issue too:

template<int> constexpr const char *x = "foo"; // x<0> must be the same pointer in all TUs

template<int> struct A {
  static const char *const x;
};
template<int N> constexpr const char *A<N>::x = "foo"; // A<0>::x must be the same pointer in all TUs

I think there are two plausible solutions:

extend the existing _Z <function encoding> Es [ <discriminator> ] mangling to cover this case (note that this means we still need to number string literals within functions and classes, even though we usually don't need the number)
mangle the string literals based only on their contents, for example using whatever mangling scheme we settle on for #63 / #64

I'm inclined to prefer option 2. We should probably also remove the existing mangling for string literals if we take that option.

do not register a substitution if the mangling is no longer than the substitution would be

ABI v2 possibility: when creating a mangling, do not register a substitution if the substitution would not be any shorter than the text we just mangled.

Example:

template<typename T> T f(T*, T*, T) {}

f<int> mangles as _Z1fIiET_PS0_S1_S0_. Following this rule, we would instead have _Z1fIiET_PT_PT_T_, which is both shorter and easier to read.

Note that nearly all uses of a <template-param> register a substitution right now, so this rule would fire frequently at least for them.

Hardware interference size

Since the variables in [hardware.interference] are constants, the values should probably be part of the ABI. Depending on the target architecture, of course.

I see that Clang developers were discussing this at http://lists.llvm.org/pipermail/cfe-dev/2018-May/thread.html#58073 but that discussion doesn't seem to have resolved.

It seems pretty clear that both values should be 64 for x86*, but other architectures are less clear; on ARM we might want to use 64 for constructive and 128 for destructive, as conservative answers given current existing variants.

mangling for string literals

The ABI says that string literals in instantiation-dependent expressions are mangled thusly:

<expr-primary> ::= L <string type> E # string literal

... presumably because, in C++98, the type of the literal was the only property that could affect the validity of the instantiation-dependent expression. That is no longer the case; a C++11 program can inspect the contents of such a string literal in an instantiation-dependent expression, so we need to mangle said contents.

Proposal:

<expr-primary> ::= L <string type> <char>* <hash>? E    # string literal
<char> ::= <0-9a-zA-DF-Z>                               # values 48-57, 97-122, 65-68, 70-90
       ::= _<hex><hex>                                  # other chars encoded in (big-endian) hexadecimal
       ::= __<hex><hex><hex><hex>
       ::= ___<hex><hex><hex><hex><hex><hex>
       ::= ____<hex><hex><hex><hex><hex><hex><hex><hex>
<hex> ::= <0-9a-f>
<hash> ::= <hex>{M}

... where the first N (say, 16) characters of the string are encoded directly, followed by a 4M-bit hash of the entire string (algorithm TBD, but following target endianness) if its length is greater than N (where for all purposes other than determining the type, the terminating nul character is ignored).

The idea here is to preserve the string literal contents (at least the start of it) so that demanglers can display it, while avoiding mangling the entire contents of very long strings.

As an example, if we take N = 16, M = 8, and use MD5 as our hashing algorithm (taking the high-order 32 bits of its output), "Hello, world!" would mangle as LA14_cHello_2c_20world_21E, and U"this is a very long string indeed" would mangle as LA34_Dithis_20is_20a_20very_20l1cf8df38`.

If we like this direction, there are a few open questions:

Should we encode the remainder of the string if that would be shorter than the hash?
What hash algorithm should we use (and what values of N and M)? How much do we care about collision-resistance, given that almost any choice will shield us from accidental collisions? It seems plausible that someone will use a pair of strings with known-colliding MD5 sums as template arguments in (eg) test code for an MD5 algorithm, and at least one common way of generating such a pair produces two strings with the same prefix. How much should we care about that? (It'd be easy to "fix" such cases by applying some simple invertible transform on the string data first, such that the colliding pairs that people are likely to want to use in practice are different from the colliding pairs for our hash.)

Specify how scoped enums interact with varargs

Discussion of scoped enums and varargs in CWG led to them being declared conditionally-supported with implementation-defined behavior. So we should agree on semantics here.

operator encoding for spaceship operator

P0515, voted into the C++20 working draft, adds a new operator token <=>, which needs a mangling. This is formally called the "three-way comparison operator", but informally called the "spaceship operator".

I suggest we mangle <=> as ss.

mangling for designated initialization

Now that we've voted designated initialization into the C++ draft, we need a mangling. These are distinct:

template<typename T> void f(decltype(T{.a = 1, .b = 2}));
template<typename T> void f(decltype(T{.c = 1, .d = 2}));

Something like il di 1a Li1E di 1b Li1E E would be enough for what we've voted into C++ (and di 1a di 1b could be used for a multi-level .a.b designator, which implementations will likely support as an extension).

Allow the use of virtual member function pointer thunks

The current ABI specifies that member function pointers (MFPs) for virtual functions should be emitted as an offset into the v-table. This is nicely code-efficient but creates substantial problems for systems that aim to provide some level of control-flow integrity, such as pointer authentication, since an exploit can easily overwrite the offset and redirect calls to the MFP to any other virtual function in the v-table, or for that matter any function pointer known to be stored at a fixed relative offset to the v-table in a particular build. Such systems may instead prefer to emit MFPs to virtual functions using virtual dispatch thunks, embedding them into the existing ABI as if the thunk were a non-virtual member function.

If thunk pointers are globally unique, and thunks are used consistently instead of v-table offsets, then this alternative ABI can even provide the same level of MFP equality semantics provided by the standard ABI. However, this is not necessary because C++ states that MFP equality is unspecified for MFPs to virtual functions, and so implementations may reasonably use non-unique thunks. (There are quite a few cases where both ABIs will incorrectly report that two virtual MFPs are different; in fact, in general, equality for virtual MFPs is only well-defined in the context of a specific most-derived class.)

The required changes would be to:

permit the representation of an MFP to a virtual function to be a virtual dispatch thunk and
specify the mangling of such a thunk.

atexit(3) vs dlclose(3)

Please make it explicitly clear and strong statement whether atexit(3) is allowed inside DSO and what is the behavior of dlclose().

Alternatively explicitly clear that this is UB/ID.

There is some software in wild using this behavior and it's not portable to all POSIX-like C++ ABI aware systems.

NetBSD calls atexit(3) callback on program termination and it crashes as there is no function reachable after dlclose(3). Linux handles this differently and it works more sanely there, as it calls the callback upon dlclose(3).

module-scope lambda closure types will need mangling

Consider:

export module M;
export int *f(decltype([]{T t;}) lambda) {
  static int n;
  return &n;
}
export int *g() { return f(); }
export int *f(decltype([]{T t;}) lambda) {
  static int n;
  return &n;
}

(The function parameters have different types, so we have two overloaded f functions rather than a redefinition error.)

If this module is imported into multiple translation units, they must agree on the type of the function parameter; calling g() in those translation units must return the same static variable.

Similarly:

export module M;
using T = decltype([]{});

... must use the same type for lambda in every translation unit in which M is imported, so we need some linkage name for that closure type.

Proposal: number all lambdas appearing outside of any other numbering context in an importable translation unit (module interface unit, partition, or header unit) lexically within the translation unit. Restart the numbering (with some suitable disambiguator) at the module declaration in order to try to make the numbering as stable as possible.

Approach for library-specific mangling compression

libc++ is revising its ABI, at least for some of its clients, and is very interested in using new "catalog" substitutions for the new ABI.

Some of its clients that wish to use a new ABI also correspond to new targets, but libc++ is not suggesting that they would use target-specific mangling rules; instead, they will also be changing their versioning namespace from __1 to __2 for these clients, and so manglings will not change for any existing entities.

We should recognize that the list of "catalog" substitutions is likely to keep growing. This will surely not be the last ABI version of libc++; further, the C++ committee will surely add more entities to the standard library; and then, libc++ may only belatedly realize that a particular entity was worth compressing, such that it will only be in the catalog for ABI versions N and higher. And, of course, this catalog offer also has to be extended to other standard library implementations, and in some cases they may need to put slightly different entries in the catalog. So the cataloging work will scale by the number of implementations, and the number of ABI versions, and the size of the standard library.

Nevertheless, I personally feel that it's appropriate for the Itanium ABI to support a large catalog here. If we're careful about the structure of these substitutions, we can keep the costs from getting too obviously combinatorial. But I'd like to get consensus on this before encouraging libc++ to start investigating which substitutions to include.

My current thinking is that we should add this in a fairly structured way to the grammar:

  <substitution>   ::= S <library-vendor> <library-version number> <library-entity>

  <library-vendor> ::= c     # libc++
  etc.

  <library-entity> ::= s     # lib::basic_string<char, lib::char_traits<char>, lib::allocator<char>>
  <library-entity> ::= up    # lib::unique_ptr
  etc.

with the expectation that there's an ad hoc rule for turning a combination of a library vendor and version into a namespace. Manglers and demanglers then only need to know three things:

the mapping of library-entities to/from relative entities within the library namespace,
the mapping of a particular vendor+version to/from a particular library namespace, and
the set of library-entities that are substituted in any particular vendor+version.

We should be relatively parsimonious about adding new library-vendor abbreviations, especially one-byte ones; there are only 19 characters available following S. This could create a bit of a political minefield in the future.

Library version numbers don't have to correspond to any versioning scheme used elsewhere. In particular, they do not have to correspond to the number used in e.g. std::__2. Note that one advantage of adding these compressions is that it eliminates some of the pressure for library vendors to use short names for their versioning namespaces in the first place. In fact, we may want to encourage libraries to use namespaces that are systematized the same way as the mangling, e.g. std::__c2 — although they might not want to do that, since such names have a habit of making their way into user-visible diagnostics.

We may want to consider whether these substitutions should introduce candidate substitutions for the seq-id compression. seq-id substitutions will often be shorter than these 4–5-byte catalog substitutions, which isn't possible for the current catalog. Of course, introducing candidates this way may also lengthen other candidates.

need ABI update for consteval virtual functions

As of C++20, we'll have consteval (compile-time-only) virtual functions. These have the following impact:

A class with only consteval virtual functions is still polymorphic
The consteval-ness is part of the notional vtable slot (a consteval function cannot override a non-consteval function and vice versa)
Virtual dispatch on such a function can never happen at runtime

(and the above are highly unlikely to change). As a consequence, we do not need to allocate vtable slots to such functions. (If we do allocate such slots, they will never be used and cannot be filled.) So we should modify the ABI to avoid vtable slot allocation in this case.

mangling updates for lambdas with explicit template parameter lists

Under p0428r2[*] (part of C++2a), lambda-expressions can have explicit template parameters:

inline auto f() {
  return []<typename T>(T t) {
    static T thing;
    return &thing;
  }(0);
}

Our lambda mangling forms a <lambda-sig> from the type of the lambda call operator, whose function parameter types may now contain references to template parameters that we do not encode into the mangling.

Should we include the explicit template parameters in the <lambda-sig> in some way? Or should we allow lambdas with distinct template parameter lists to result in the same <lambda-sig> and distinguish them via the discriminator?

[*] open-std.org is down right now; this document can be viewed on the author's github page instead

support for lambdas in the initializer of an inline variable or variable template

The list in 5.1.8 of contexts in which the mangling of a lambda should include an enclosing declaration as context is incomplete. Two missing cases:

// x() returns the same pointer in every TU
inline auto x = []{ static int n; return &n; };

// y<T>() returns the same pointer for the same T in every TU
template<typename T> auto y = []{ static int n; return &n; };

See pull request #34.

Class objects returned in registers even with non-trivial move constructor

Commit 05fc233 added a non-trivial move destructor to the criteria for requiring a temporary when passing by value. Should there be a similar change made to the rules for returning class values?

There is some discussion on https://stackoverflow.com/questions/38043288/does-the-c-standard-guarantee-that-a-function-return-value-has-a-constant-addr, in particular the top answer.

The current wording in the ABI document, section 3.1.4, does not agree with the wording in the C++17 draft, which says (http://eel.is/c++draft/class.temporary#3):

When an object of class type X is passed to or returned from a function, if each copy constructor, move constructor, and destructor of X is either trivial or deleted, and X has at least one non-deleted copy or move constructor, implementations are permitted to create a temporary object to hold the function parameter or result object. The temporary object is constructed from the function argument or return value, respectively, and the function's parameter or return object is initialized as if by using the non-deleted trivial constructor to copy the temporary (even if that constructor is inaccessible or would not be selected by overload resolution to perform a copy or move of the object). [ Note: This latitude is granted to allow objects of class type to be passed to or returned from functions in registers. — end note ]

Out of band exception return

There's been discussion in WG21 and WG14 about out-of-band error returns:

It's still early, but I think it would be useful for folks involved in Itanium ABI to look at the ongoing discussions.

abi_tag mangling

@jicama provided wording for the GNU abi_tag attribute's mangling here:

jicama@69cea3c

Implementation of this is necessary for ABI compatibility with GCC >=5's libstdc++, so it seems important for the ABI document to cover it.

(Sorry this isn't just a pull request -- github doesn't support PRs for non-branches, and 69cea3c does not seem to be on a branch.)

[[no_unique_address]] with duplicate type

The current specification produces suboptimal layouts for classes where two non-static member variables of the same empty type use [[no_unique_address]].

In #49 and in the paper introducing the feature it was noted that in order to allow existing code to easily transition to [[no_unique_address]], the layout should be the same as if empty base classes had been used, where applicable.

In the case of two instances of the same type however, no base classes could have previously been used, as a class may not directly have two of the same base class.

Here an optimal layout is produced:

struct e {};
struct s {
	[[no_unique_address]] e a;
	[[no_unique_address]] e b;
	int c;
};
// 0: empty a
// 1: empty b
// 0: int c
// size: 4

The same layout can be produced, and is already being produced by GCC and Clang, without [[no_unique_address]]:

struct e {};
struct a : e {};
struct b : e {};
struct s : a, b {
    int x;
};

However moving the int variable to the front changes things:

struct e {};
struct s {
    int c;
    [[no_unique_address]] e a;
    [[no_unique_address]] e b;
};
// 0: int c
// 0: empty a
// 4: empty b
// size: 8

So it is this case where two non-static member variables of the same empty type, adorned with [[no_unique_address]], are placed not at the front of the class, which produces a suboptimal layout when another layout, already being used in other cases, would do.

D4/D5 destructor and other manglings not specified

Judging from the GCC and clang source codes, gcc/cp/mangle.c and lib/AST/ItaniumMangle.c, these compilers can produce additional manglings currently missing from the cxx-abi documents.

D4: “old-style "[unified]" destructor” / maybe-in-charge destructor [gcc]
D5: “D5 is a comdat name with D1, D2 and, if virtual, D0 in it.” [clang], also https://stackoverflow.com/questions/19485012/what-is-a-destructor-group-symbol-in-gcc-name-mangling
CI: something to do with inheriting constructors [gcc]
C4: “old-style "[unified]" constructor” / maybe-in-charge constructor [gcc]
C5: “C5 is a comdat name with C1 and C2 in it.” [clang]

Does their absence from cxx-abi indicate that these constitute vendor-specific extensions which just failed to make use of the v/U mangling character? (That is to say, _ZN3FooD4Ev should have been something like _ZN3Foov0Ev.) Should C4/C5/D4/D5/CI be marked in cxx-abi as reserved nonetheless, so as to not cause future problems for compilers?

clarify how discriminators are determined when `if constexpr` and pack expansion remove counted entities

Testcase for which GCC and Clang are ABI-incompatible:

template<bool B> int *f() {
  if constexpr (B) {
    return [] {
      static int n;
      return &n;
    } ();
  } else {
   return [] {
      static int n;
      return &n;
    } ();
  }
}

int *p = f<false>();

Clang mangles the lambda as the first lambda within f<false>, GCC mangles it as the second. I think Clang is correct: the lexically first lambda is discarded by the if constexpr.

Similar things happen with pack expansion:

template<typename ...T> int *f() {
  ( ([] { return 0; } () + T()), ... );
  return [] { static int n; return &n; } ();
}

int *g() {
  return f<int, char, double>();
}

Here, the mangling of the returned static int should depend on the number of template arguments passed to f. (Clang implements that; GCC acts as if the static int is within the second lambda in the instantiation.)

Presumably we should clarify the ABI to say that the discriminator is based on the (imaginary) lexical order in the instantiation, not the order in the template definition.

mangling for fixed point types

Hi!

We are attempting to implement fixed point types in clang according to Chapter 4 of the Embedded-C Spec / ISO N1169. This extension includes the addition of up to 24 fixed point types that vary in size, sign, fract/accum, and saturated/not saturated.

signed short _Accum
signed _Accum
signed long _Accum
unsigned short _Accum
unsigned _Accum
unsigned long _Accum
_Sat signed short _Accum
_Sat signed _Accum
_Sat signed long _Accum
_Sat unsigned short _Accum
_Sat unsigned _Accum
_Sat unsigned long _Accum
signed short _Fract
signed _Fract
signed long _Fract
unsigned short _Fract
unsigned _Fract
unsigned long _Fract
_Sat signed short _Fract
_Sat signed _Fract
_Sat signed long _Fract
_Sat unsigned short _Fract
_Sat unsigned _Fract
_Sat unsigned long _Fract

The standard though does not specify mangling when using these types in C++, so usage of these types is limited to C.

Any suggestions for neatly mangling these types?

For now, we can do something along the lines of u7fixed00 to u7fixed23 or u4SulA (for _Sat unsigned long _Accum as an example), but would like to see what other people's thoughts are.

"POD for the purpose of layout" is underspecified

Testcase:

struct alignas(2 * sizeof(unsigned)) Base {
    unsigned x;
    ~Base() = default;
};
struct Der : Base {
    unsigned y;
};

If Base is POD for the purpose of layout, then sizeof(Der) == 4 * sizeof(unsigned). Otherwise, sizeof(Der) == 2 * sizeof(unsigned). All verrsions of GCC disagree with all versions of Clang on this question -- GCC believes that Base is POD for the purpose of layout, and Clang believes that it is not.

The ABI doesn't say who is right. It says that we must use the C++03 definition of POD to answer the question, which says: "A POD-struct is an aggregate class that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor."

But "user-defined" is meaningless in C++11 onwards. It looks like Clang interprets it as meaning "user-declared", which makes Base non-POD, and GCC interprets it as meaning "user-provided", which makes Base POD.

So what's the rule? Is it "user-declared" or "user-provided"?

A similar situation occurs if we default the default constructor instead of the destructor. Again, GCC believes that Base is POD and Clang believes that it is not. In that case, the C++03 rules are applicable: Base is not an aggregate because in C++03, "An aggregate is an array or a class with no user-declared constructors [...]" (and for what it's worth, this rule has been changed many times since then, but in C++20, we're back to "no user-declared constructors").

mojibake in EH example

https://github.com/itanium-cxx-abi/cxx-abi/blob/master/abi-eh.html#L2223 uses some non-UTF-8 smart quotes, which don't display correctly because the document doesn't specify an encoding.

I suggest converting it to HTML5 by adding <!DOCTYPE html> (which implies UTF-8 by default) and then replacing the quotes with UTF-8 characters.

error in abi-examples.html Table 1b: Example Data Layout

struct X {
int ix;
virtual void x();
};
struct E : X, D {
void f ();
void h ();
int ie;
};

sizeof(E) should be 72,not 64

reserve manglings containing a period for vendor-specific versions / pieces of functions?

Several vendors use mangled names of the form

real_mangled_name.suffix

to represent either a version of a function (eg, parameter 3 is the constant x) or a piece of a function (eg, coroutine resumption slice of a function) or similar. However, demanglers are inconsistent in their handling of this form -- some require the suffix to contain only digits, others support and ignore an arbitrary suffix, others simply reject all such names as being an invalid mangling.

We should officially permit such manglings for internal-linkage symbols, with an arbitrary suffix, in order to give clear guidance to demangler implementers.

difficulties in understanding `nearly empty class`

Third item in the description of nearly empty class, i.e. has at most one non-virtual, nearly empty direct base class: non-virtual and nearly empty can't both be satisfied, right? because a nearly empty class must contain a virtual pointer. BTW, I think virtual pointer here should refer to a virtual table pointer

the full description is listed below.

nearly empty class
A class that contains a virtual pointer, but no other data except (possibly) virtual bases. In particular, it:
   has no non-static data members and no non-zero-width unnamed bit-fields,
   has no direct base classes that are not either empty, nearly empty, or virtual,
   has at most one non-virtual, nearly empty direct base class, and
   has no proper base class that is empty, not morally virtual, and at an offset other than zero. 
Such classes may be primary base classes even if virtual, sharing a virtual pointer with the derived class.

class layout for `[[no_unique_address]]`

C++20 adds a [[no_unique_address]] attribute, which allows EBO layout to be requested for non-static data members. We need to update the ABI document to describe how it affects class layout.

As a concrete goal, we should aim to ensure that these two classes are laid out the same:

struct A : T1, T2, ... { ... };
struct A { [[no_unique_address]] T1 t1; [[no_unique_address]] T2 t2; ... };

... in all cases where that is possible. (There are cases where it is not: for example, if A has a primary base class other than T1 and any prior base class is non-empty.)

mangling for converted non-type template arguments

See prior discussion here: http://sourcerytools.com/pipermail/cxx-abi-dev/2014-November/002785.html

Recent language changes (in particular, auto template parameters and the allowance of arbitrary constant expressions for pointer and member pointer non-type template arguments) mean that encoding the target of a non-type template argument is not sufficient to uniquely identify the argument. We also need the type in some cases, and for pointers to members, we need the conversion path used to form the type too.

One previously-discussed approach that seemed to have support was to use a cv... expression to describe the conversion if the natural type of the non-type template argument differs from the actual type, and can't be inferred from the parameter (eg, for a function template or when the parameter has a deduced type). For a pointer-to-member, a minimal sequence of cv... expressions would be used to express the derived-to-base or base-to-derived conversion path.

We need concrete rules describing exactly how this should work, of course :)

clang++/g++ disagree on how template arguments in substitutions are resolved

Basically the question is: when a template parameter reference like T_ occurs in a substitution, is the reference looked up in the template instance where the substitution is defined, or where it is used?

It appears that llvm-cxxfilt assumes the former, but c++filt assumes the latter. Consider the (hand-written) mangled symbol _Z5helloIXadL_Z6ignoreI9RangitotoEvT_EEEvS2_:

[roc@localhost cpp_demangle]$ c++filt --version
GNU c++filt (GNU Binutils) 2.29.51
Copyright (C) 2018 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.
[roc@localhost cpp_demangle]$ c++filt _Z5helloIXadL_Z6ignoreI9RangitotoEvT_EEEvS2_
void hello<&(void ignore<Rangitoto>(Rangitoto))>(&(void ignore<Rangitoto>(Rangitoto)))
[roc@localhost cpp_demangle]$ llvm-cxxfilt --version
LLVM (http://llvm.org/):
 LLVM version 7.0.0svn
 Optimized build.
 Default target: x86_64-unknown-linux-gnu
 Host CPU: skylake
[roc@localhost cpp_demangle]$ llvm-cxxfilt _Z5helloIXadL_Z6ignoreI9RangitotoEvT_EEEvS2_
void hello<&(void ignore<Rangitoto>(Rangitoto))>(Rangitoto)

In this case both tools agree that the S2_ substitution refers to the T_, but they disagree on what that expands to.

To my reading, the spec isn't clear on this issue. The most relevant text I can find is

Note that substitutable components are the represented symbolic constructs, not their associated mangling character strings.

which suggests the definition instance is preferred, which also seems logical to me.

itanium-cxx-abi / cxx-abi Goto Github PK

cxx-abi's People

Contributors

Stargazers

Watchers

Forkers

cxx-abi's Issues

Recommend Projects

Recommend Topics

Recommend Org