itanium-cxx-abi / cxx-abi Goto Github PK
View Code? Open in Web Editor NEWC++ ABI Summary
C++ ABI Summary
Several vendors use mangled names of the form
real_mangled_name.suffix
to represent either a version of a function (eg, parameter 3 is the constant x) or a piece of a function (eg, coroutine resumption slice of a function) or similar. However, demanglers are inconsistent in their handling of this form -- some require the suffix to contain only digits, others support and ignore an arbitrary suffix, others simply reject all such names as being an invalid mangling.
We should officially permit such manglings for internal-linkage symbols, with an arbitrary suffix, in order to give clear guidance to demangler implementers.
In C11 extension ISO/IEC TS 18661, new floating point extensions and types are defined. For example, _FloatN is introduced as a binary interchange format, where N can be 16, 32, 64, 128 (or bigger). We are implementing _Float16 support in Clang and also like to use it in C++ mode. Therefore we need mangling support for _Float16. Is this something the C++ ABI would consider to support?
P0515, voted into the C++20 working draft, adds a new operator token <=>
, which needs a mangling. This is formally called the "three-way comparison operator", but informally called the "spaceship operator".
I suggest we mangle <=>
as ss
.
p0734r0 added new forms of overloadable declaration that we need to mangle. For instance, we now need to distinguish:
template<typename T> concept A = ...
template<typename T> concept B = ...
template<A T> void f(T); // f1
template<B T> void f(T); // f2
template<typename T> requires A<T> void g(T); // g1
template<typename T> requires B<T> void g(T); // g2
It is permissible (but not necessary) for the mangling of f1
and g1
to be the same (other than the name).
(There are also requires-clauses on non-template functions, but I don't believe there is any need to mangle those since at most one such function can have its requires-clause evaluate to true
, and the rest are never emitted.)
As a general model, I suggest we include "extra information" about a template-parameter (for a function template -- we don't need this for non-overloadable templates) as a prefix on the template-arg mangling. (We should also consider extending this to the case where the the template parameter is a template template parameter and the template argument does not have an identical template-parameter-list, to handle the case described in http://sourcerytools.com/pipermail/cxx-abi-dev/2014-December/002791.html)
I suggest we affix the constraint expression from the requires-clause (if any) to the template-args, and do not perform any expansion or canonicalization of the as-written form of the template declaration. Strawman mangling suggestion:
<template-arg> ::= C
<concept name> <template-arg>
<template-args> ::= I
<template-arg>+ Q
<requires-clause expr> E
Example:
template<A T, B U> requires C<T, U> void f();
f<int, 3>(); // _Z1fIC1AiC1BLi3EQ1CIT_T0_EEvv
Standard paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0732r2.pdf
C++20 adds the ability for near-arbitrary class types to be used as the type of non-type template arguments. We need a mangling for these.
We are guaranteed that such classes are not unions and do not contain unions, so perhaps the simplest approach would be to emit pseudo-aggregate-initialization syntax for the flattened sequence of subobjects:
struct A {
int x;
int : 0;
int y;
};
struct B : A { int arr[2]; };
template<auto> struct Q {};
void f(Q<B{1, 2, 3, 4}>); // mangled as _Z1f1QIXtl1BLi1ELi2ELi3ELi4EEEE
These are going to get very long, very fast, so we might want an alternative representation for large objects. One intended use case is for things like:
template<size_t N> struct str {
constexpr str(const char *p) : data{} { for (int n = 0; *p; ++p) data[n] = *p; }
char data[N];
};
template<size_t N> str(const char (&)[N]) -> str<N>;
template<str S> struct R {};
void f(R<"some very long string here">);
... and encoding this with a long sequence of Lic123E
manglings seems highly undesirable. This ties into another open question: how should string literal expressions be mangled? (Currently we say that we only mangle the length, which is insufficient.) Reusing whatever mangling we use for string literals as the mangling for char
arrays within a non-type template argument might be wise. Alternatively / additionally, we could emit only a (suitably secure) hash of the literal value if it's very long.
Another form of mangling is also required here: all non-type template arguments with the same value throughout the entire program are lvalues denoting the same object, so we need a mangling for the global constant holding that object. Proposal:
<special-name> ::= TA <template-arg> # template parameter object for argument <template-arg>
Eg:
template<auto V> const auto *p = &V;
// mangled as _Z1pIXtl1BLi1ELi2ELi3ELi4EEEE
// initialized as pointer to _ZTAXtl1BLi1ELi2ELi3ELi4EEE
template const B *p<B{1, 2, 3, 4}>;
Hi!
The cxx-abi specifies e
as the mangling for long double. In GCC for PowerPC, we originally used
64-bit IEEE float as long double. When later IBM extended double ("double-double") was introduced
as the preferred long double type, that was given the mangling g
(which is demangled as __float128
)
so that libraries can support both floating point formats at the same time.
This doesn't work all that great. For example, demangling _Z1fg will show the wrong type (__float128
instead of long double
); as another example, break f(long double)
in GDB will not work.
Now we have a third format for long double: 128-bit IEEE float (IEC 60559 binary128). So we need
a new mangling for that.
I'd like to propose the mangling
k<builtin-type>
to stand for "long double
implemented the same as <builtin-type>
". It would be demangled as just
long double
(I chose k
for no specific reason other than it is short and was available).
Comments?
Consider:
export module M;
export int *f(decltype([]{T t;}) lambda) {
static int n;
return &n;
}
export int *g() { return f(); }
export int *f(decltype([]{T t;}) lambda) {
static int n;
return &n;
}
(The function parameters have different types, so we have two overloaded f
functions rather than a redefinition error.)
If this module is imported into multiple translation units, they must agree on the type of the function parameter; calling g()
in those translation units must return the same static variable.
Similarly:
export module M;
using T = decltype([]{});
... must use the same type for lambda
in every translation unit in which M
is imported, so we need some linkage name for that closure type.
Proposal: number all lambdas appearing outside of any other numbering context in an importable translation unit (module interface unit, partition, or header unit) lexically within the translation unit. Restart the numbering (with some suitable disambiguator) at the module declaration in order to try to make the numbering as stable as possible.
Commit 05fc233 added a non-trivial move destructor to the criteria for requiring a temporary when passing by value. Should there be a similar change made to the rules for returning class values?
There is some discussion on https://stackoverflow.com/questions/38043288/does-the-c-standard-guarantee-that-a-function-return-value-has-a-constant-addr, in particular the top answer.
The current wording in the ABI document, section 3.1.4, does not agree with the wording in the C++17 draft, which says (http://eel.is/c++draft/class.temporary#3):
When an object of class type X is passed to or returned from a function, if each copy constructor, move constructor, and destructor of X is either trivial or deleted, and X has at least one non-deleted copy or move constructor, implementations are permitted to create a temporary object to hold the function parameter or result object. The temporary object is constructed from the function argument or return value, respectively, and the function's parameter or return object is initialized as if by using the non-deleted trivial constructor to copy the temporary (even if that constructor is inaccessible or would not be selected by overload resolution to perform a copy or move of the object). [ Note: This latitude is granted to allow objects of class type to be passed to or returned from functions in registers. — end note ]
The current ABI specifies that member function pointers (MFPs) for virtual functions should be emitted as an offset into the v-table. This is nicely code-efficient but creates substantial problems for systems that aim to provide some level of control-flow integrity, such as pointer authentication, since an exploit can easily overwrite the offset and redirect calls to the MFP to any other virtual function in the v-table, or for that matter any function pointer known to be stored at a fixed relative offset to the v-table in a particular build. Such systems may instead prefer to emit MFPs to virtual functions using virtual dispatch thunks, embedding them into the existing ABI as if the thunk were a non-virtual member function.
If thunk pointers are globally unique, and thunks are used consistently instead of v-table offsets, then this alternative ABI can even provide the same level of MFP equality semantics provided by the standard ABI. However, this is not necessary because C++ states that MFP equality is unspecified for MFPs to virtual functions, and so implementations may reasonably use non-unique thunks. (There are quite a few cases where both ABIs will incorrectly report that two virtual MFPs are different; in fact, in general, equality for virtual MFPs is only well-defined in the context of a specific most-derived class.)
The required changes would be to:
Please make it explicitly clear and strong statement whether atexit(3) is allowed inside DSO and what is the behavior of dlclose().
Alternatively explicitly clear that this is UB/ID.
There is some software in wild using this behavior and it's not portable to all POSIX-like C++ ABI aware systems.
NetBSD calls atexit(3) callback on program termination and it crashes as there is no function reachable after dlclose(3). Linux handles this differently and it works more sanely there, as it calls the callback upon dlclose(3).
See prior discussion here: http://sourcerytools.com/pipermail/cxx-abi-dev/2014-November/002785.html
Recent language changes (in particular, auto
template parameters and the allowance of arbitrary constant expressions for pointer and member pointer non-type template arguments) mean that encoding the target of a non-type template argument is not sufficient to uniquely identify the argument. We also need the type in some cases, and for pointers to members, we need the conversion path used to form the type too.
One previously-discussed approach that seemed to have support was to use a cv
... expression to describe the conversion if the natural type of the non-type template argument differs from the actual type, and can't be inferred from the parameter (eg, for a function template or when the parameter has a deduced type). For a pointer-to-member, a minimal sequence of cv
... expressions would be used to express the derived-to-base or base-to-derived conversion path.
We need concrete rules describing exactly how this should work, of course :)
I'm not sure if this is the right repo to file an issue. Apologies if it isn't.
Some minor issues with this web page: http://mentorembedded.github.io/cxx-abi/abi-eh.html
3.1 Introduction
3.2 Data Structures
3.3 Runtime Initialization
2.4 Throwing an Exception
2.5 Catching an Exception
Should be
3.1 Introduction
3.2 Data Structures
3.3 Runtime Initialization
3.4 Throwing an Exception
3.5 Catching an Exception
Discussion of scoped enums and varargs in CWG led to them being declared conditionally-supported with implementation-defined behavior. So we should agree on semantics here.
Previously: http://sourcerytools.com/pipermail/cxx-abi-dev/2015-October/002867.html
I think the problem here does not only affect substitutable prefixes for unresolved-names. Clang and GCC also disagree about how to mangle this:
inline namespace Y {
template<typename T, typename U> struct is_same { static const bool value = false; };
template<typename T> struct is_same<T, T> { static const bool value = true; };
}
template<bool, typename T> struct enable_if {};
template<typename T> struct enable_if<true, T> { typedef T type; };
template <class T> typename enable_if<is_same<T, float>::value, float>::type arg(T __re);
float f = arg<float>(0);
... where GCC gives _Z3argIfEN9enable_ifIXsrN1Y7is_sameIT_fEE5valueEfE4typeES3_
(which doesn't match the grammar in the ABI), and Clang gives _Z3argIfEN9enable_ifIXsr7is_sameIT_fEE5valueEfE4typeES1_
(which can collide with a different function in a different TU).
I think GCC's approach is closer to being the right one. If we can resolve an initial portion of an <unresolved-name>, we should emit the fully-qualified path to the resolved component. Though there are some other open questions here: is the partially-resolved name substitutable? (Do we get S1_
or S3_
at the end of the mangling?) Does it receive a leading N
, per GCC's mangling, or no leading N
, per Clang's mangling / the ABI?
Support for C++17 fold expressions appears to be missing in the name mangling specification. Gcc and Clang appear to use new fL
, fR
, fl
, and fr
productions to introduce binary left folds, binary right folds, unary left folds, and unary right folds respectively:
$ cat t.cpp
template<typename ...T>
auto ftbl(T... p) -> decltype((1 + ... + p.dm)) { return (1 + ... + p.dm); }
template<typename ...T>
auto ftbr(T... p) -> decltype((p.dm + ... + 1)) { return (p.dm + ... + 1); }
template<typename ...T>
auto ftur(T... p) -> decltype((p.dm + ...)) { return (p.dm + ...); }
template<typename ...T>
auto ftul(T... p) -> decltype((... + p.dm)) { return (... + p.dm); }
struct X { int dm; };
auto f(X x) {
ftbl(x);
ftbr(x);
ftur(x);
ftul(x);
}
$ clang -c -std=c++17 t.cpp
...
$ nm t.o
0000000000000000 T _Z1f1X
0000000000000000 W _Z4ftblIJ1XEEDTfLplLi1Edtfp_2dmEDpT_
0000000000000000 W _Z4ftbrIJ1XEEDTfRpldtfp_2dmLi1EEDpT_
0000000000000000 W _Z4ftulIJ1XEEDTflpldtfp_2dmEDpT_
0000000000000000 W _Z4fturIJ1XEEDTfrpldtfp_2dmEDpT_
The grammar productions appear to look like:
<expression> ::= ...
::= <fold-expression>
<fold-expression> ::= fL <operator-name> <expression> <expression> # binary left fold
::= fR <operator-name> <expression> <expression> # binary right fold
::= fl <operator-name> <expression> # unary left fold
::= fr <operator-name> <expression> # unary right fold
Testcase for which GCC and Clang are ABI-incompatible:
template<bool B> int *f() {
if constexpr (B) {
return [] {
static int n;
return &n;
} ();
} else {
return [] {
static int n;
return &n;
} ();
}
}
int *p = f<false>();
Clang mangles the lambda as the first lambda within f<false>
, GCC mangles it as the second. I think Clang is correct: the lexically first lambda is discarded by the if constexpr
.
Similar things happen with pack expansion:
template<typename ...T> int *f() {
( ([] { return 0; } () + T()), ... );
return [] { static int n; return &n; } ();
}
int *g() {
return f<int, char, double>();
}
Here, the mangling of the returned static int should depend on the number of template arguments passed to f
. (Clang implements that; GCC acts as if the static int is within the second lambda in the instantiation.)
Presumably we should clarify the ABI to say that the discriminator is based on the (imaginary) lexical order in the instantiation, not the order in the template definition.
Consider a case such as
inline const char *str = "foo";
str
is required to have a single value across translation units, so the same string literal object must be used as its initializer in all cases. For example:
// TU 1
inline constexpr const char *str = "foo";
const char *x = str;
// TU 2
#include <cassert>
inline constexpr const char *str = "bar";
extern const char *x;
const char *y = str;
int main() { assert(x == y); }
The assertion here is not permitted to fail. Unfortunately, this doesn't only affect string literals appearing in inline variable initializers:
inline constexpr const char *f() { return "foo"; }
inline constexpr const char *x = f(); // must be the same string literal object in all TUs
... and templated variables expose the same issue too:
template<int> constexpr const char *x = "foo"; // x<0> must be the same pointer in all TUs
template<int> struct A {
static const char *const x;
};
template<int N> constexpr const char *A<N>::x = "foo"; // A<0>::x must be the same pointer in all TUs
I think there are two plausible solutions:
extend the existing _Z
<function encoding> Es
[ <discriminator> ] mangling to cover this case (note that this means we still need to number string literals within functions and classes, even though we usually don't need the number)
mangle the string literals based only on their contents, for example using whatever mangling scheme we settle on for #63 / #64
I'm inclined to prefer option 2. We should probably also remove the existing mangling for string literals if we take that option.
Hello, I put forward a patch for review not that long ago that adds a new type to Clang. It's a dependent type that I use to enable template parameters in conjunction with address spaces. The patch itself and further details can be viewed here: https://reviews.llvm.org/D33666
In essence, as it's a new type if it was to be accepted it would need an appropriate name mangling. So I was hoping to raise a discussion on a name mangling if that's possible. At the moment the mangling function is as follows:
void CXXNameMangler::mangleType(const DependentExtAddressSpaceType *T) {
Out << "DEas";
mangleExpression(T->getAddrSpaceExpr());
Out << '_';
mangleType(T->getPointeeType());
}
I'm quite unaware of the naming conventions used, so I doubt it's ideal at the moment. I followed suit from the existing DependentSizedExtVectorType mangling function. Which does something quite similar with its size expr and element type. In this case the AddrSpace expression would be the address space index. The PointeeType would be the type the address space is to be attached to when it's no longer dependent. The DEas acronym is the types name minus the Type section of the name.
Thank you very much for your time and consideration on this issue, I apologies if this is the incorrect way to raise this type of issue and would appreciate redirection to the appropriate avenue if that is the case.
https://github.com/itanium-cxx-abi/cxx-abi/blob/master/abi-eh.html#L2223 uses some non-UTF-8 smart quotes, which don't display correctly because the document doesn't specify an encoding.
I suggest converting it to HTML5 by adding <!DOCTYPE html>
(which implies UTF-8 by default) and then replacing the quotes with UTF-8 characters.
Hi!
We are attempting to implement fixed point types in clang according to Chapter 4 of the Embedded-C Spec / ISO N1169. This extension includes the addition of up to 24 fixed point types that vary in size, sign, fract/accum, and saturated/not saturated.
signed short _Accum
signed _Accum
signed long _Accum
unsigned short _Accum
unsigned _Accum
unsigned long _Accum
_Sat signed short _Accum
_Sat signed _Accum
_Sat signed long _Accum
_Sat unsigned short _Accum
_Sat unsigned _Accum
_Sat unsigned long _Accum
signed short _Fract
signed _Fract
signed long _Fract
unsigned short _Fract
unsigned _Fract
unsigned long _Fract
_Sat signed short _Fract
_Sat signed _Fract
_Sat signed long _Fract
_Sat unsigned short _Fract
_Sat unsigned _Fract
_Sat unsigned long _Fract
The standard though does not specify mangling when using these types in C++, so usage of these types is limited to C.
Any suggestions for neatly mangling these types?
For now, we can do something along the lines of u7fixed00
to u7fixed23
or u4SulA
(for _Sat unsigned long _Accum
as an example), but would like to see what other people's thoughts are.
Under p0428r2[*] (part of C++2a), lambda-expressions can have explicit template parameters:
inline auto f() {
return []<typename T>(T t) {
static T thing;
return &thing;
}(0);
}
Our lambda mangling forms a <lambda-sig>
from the type of the lambda call operator, whose function parameter types may now contain references to template parameters that we do not encode into the mangling.
Should we include the explicit template parameters in the <lambda-sig>
in some way? Or should we allow lambdas with distinct template parameter lists to result in the same <lambda-sig>
and distinguish them via the discriminator?
[*] open-std.org is down right now; this document can be viewed on the author's github page instead
Third item in the description of nearly empty class
, i.e. has at most one non-virtual, nearly empty direct base class
: non-virtual
and nearly empty
can't both be satisfied, right? because a nearly empty
class must contain a virtual pointer. BTW, I think virtual pointer
here should refer to a virtual table pointer
the full description is listed below.
nearly empty class
A class that contains a virtual pointer, but no other data except (possibly) virtual bases. In particular, it:has no non-static data members and no non-zero-width unnamed bit-fields, has no direct base classes that are not either empty, nearly empty, or virtual, has at most one non-virtual, nearly empty direct base class, and has no proper base class that is empty, not morally virtual, and at an offset other than zero.
Such classes may be primary base classes even if virtual, sharing a virtual pointer with the derived class.
Basically the question is: when a template parameter reference like T_
occurs in a substitution, is the reference looked up in the template instance where the substitution is defined, or where it is used?
It appears that llvm-cxxfilt
assumes the former, but c++filt
assumes the latter. Consider the (hand-written) mangled symbol _Z5helloIXadL_Z6ignoreI9RangitotoEvT_EEEvS2_
:
[roc@localhost cpp_demangle]$ c++filt --version
GNU c++filt (GNU Binutils) 2.29.51
Copyright (C) 2018 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.
[roc@localhost cpp_demangle]$ c++filt _Z5helloIXadL_Z6ignoreI9RangitotoEvT_EEEvS2_
void hello<&(void ignore<Rangitoto>(Rangitoto))>(&(void ignore<Rangitoto>(Rangitoto)))
[roc@localhost cpp_demangle]$ llvm-cxxfilt --version
LLVM (http://llvm.org/):
LLVM version 7.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake
[roc@localhost cpp_demangle]$ llvm-cxxfilt _Z5helloIXadL_Z6ignoreI9RangitotoEvT_EEEvS2_
void hello<&(void ignore<Rangitoto>(Rangitoto))>(Rangitoto)
In this case both tools agree that the S2_
substitution refers to the T_
, but they disagree on what that expands to.
To my reading, the spec isn't clear on this issue. The most relevant text I can find is
Note that substitutable components are the represented symbolic constructs, not their associated mangling character strings.
which suggests the definition instance is preferred, which also seems logical to me.
The current specification produces suboptimal layouts for classes where two non-static member variables of the same empty type use [[no_unique_address]]
.
In #49 and in the paper introducing the feature it was noted that in order to allow existing code to easily transition to [[no_unique_address]]
, the layout should be the same as if empty base classes had been used, where applicable.
In the case of two instances of the same type however, no base classes could have previously been used, as a class may not directly have two of the same base class.
Here an optimal layout is produced:
struct e {};
struct s {
[[no_unique_address]] e a;
[[no_unique_address]] e b;
int c;
};
// 0: empty a
// 1: empty b
// 0: int c
// size: 4
The same layout can be produced, and is already being produced by GCC and Clang, without [[no_unique_address]]
:
struct e {};
struct a : e {};
struct b : e {};
struct s : a, b {
int x;
};
However moving the int
variable to the front changes things:
struct e {};
struct s {
int c;
[[no_unique_address]] e a;
[[no_unique_address]] e b;
};
// 0: int c
// 0: empty a
// 4: empty b
// size: 8
So it is this case where two non-static member variables of the same empty type, adorned with [[no_unique_address]]
, are placed not at the front of the class, which produces a suboptimal layout when another layout, already being used in other cases, would do.
@jicama provided wording for the GNU abi_tag
attribute's mangling here:
Implementation of this is necessary for ABI compatibility with GCC >=5's libstdc++, so it seems important for the ABI document to cover it.
(Sorry this isn't just a pull request -- github doesn't support PRs for non-branches, and 69cea3c does not seem to be on a branch.)
I've been working on implementing contracts in Clang, and it looks like some sort of ABI-level support might be needed.
__cxa_contract_violation(const std::contract_violation &)
for the compiler to call in the absence of a user-provided handler, akin to __cxa_pure_virtual
.The list in 5.1.8 of contexts in which the mangling of a lambda should include an enclosing declaration as context is incomplete. Two missing cases:
// x() returns the same pointer in every TU
inline auto x = []{ static int n; return &n; };
// y<T>() returns the same pointer for the same T in every TU
template<typename T> auto y = []{ static int n; return &n; };
See pull request #34.
There's been discussion in WG21 and WG14 about out-of-band error returns:
It's still early, but I think it would be useful for folks involved in Itanium ABI to look at the ongoing discussions.
Consider:
struct A {
virtual operator<=>(const A&) const = default;
};
The vtable for A
needs a slot for the virtual function A::operator<=>
and a slot for the implicitly-declared virtual function A::operator==
.
Proposal: allocate virtual table slots for virtual operator==
s as if they were declared at the end of the class, in order of the declarations of the corresponding operator<=>
s, after any virtual table slots inserted for implicitly-declared operator=
functions.
Judging from the GCC and clang source codes, gcc/cp/mangle.c
and lib/AST/ItaniumMangle.c
, these compilers can produce additional manglings currently missing from the cxx-abi documents.
D4
: “old-style "[unified]" destructor” / maybe-in-charge destructor [gcc]
D5
: “D5 is a comdat name with D1, D2 and, if virtual, D0 in it.” [clang], also https://stackoverflow.com/questions/19485012/what-is-a-destructor-group-symbol-in-gcc-name-mangling
CI
: something to do with inheriting constructors [gcc]
C4
: “old-style "[unified]" constructor” / maybe-in-charge constructor [gcc]
C5
: “C5 is a comdat name with C1 and C2 in it.” [clang]
Does their absence from cxx-abi indicate that these constitute vendor-specific extensions which just failed to make use of the v
/U
mangling character? (That is to say, _ZN3FooD4Ev
should have been something like _ZN3Foov0Ev
.) Should C4/C5/D4/D5/CI be marked in cxx-abi as reserved nonetheless, so as to not cause future problems for compilers?
As of C++20, we'll have consteval (compile-time-only) virtual functions. These have the following impact:
(and the above are highly unlikely to change). As a consequence, we do not need to allocate vtable slots to such functions. (If we do allocate such slots, they will never be used and cannot be filled.) So we should modify the ABI to avoid vtable slot allocation in this case.
Coverity uses the Itanium ABI name mangling format with various extensions to uniquely identify entities beyond those of traditional ABIs. For example, we encode unique names for templates, classes, enumerations, Objective-C methods, and Apple Blocks.
We make use of the existing support for vendor extensions for builtin types, type qualifiers, and operators where possible, but this doesn't cover our needs.
At present, the way that we have encoded our extensions has potential to conflict with future changes to the Itanium ABI. Fortunately, we haven't encountered any conflicts so far, but I'm sure it is just a matter of time.
We're wondering about the possibility of reserving a character such as '$' to indicate the start of a vendor extension in any production except identifier. The reason for suggesting '$' is that it is already allowed in identifiers, but is not currently used as a literal in any existing production. The use of such an extension would result in a non-portable name; it isn't expected that all implementations would be able to decode a name that used such extensions. The goal is simply to allow vendor extensions that won't conflict with future ABI changes.
Apologies for a brief digression here into the more sociological aspects of the ABI.
It's clear who consumes mangled names. But who consumes demangled names?
Naively, it would seem that demangling is provided just to make debugging easier. But then we see at least a couple of examples (1) (2) of semantic extraction from the mangling grammar. Are there existing tools that depend be being able to accurately invert the grammar? What are the ABI standard's responsibilities with respect to demangling support?
I'm interested from the perspective of someone that would find better demangling support helpful. In particular, code generation from demangled names could be possible if the grammar were less ambiguous about the namespace/class distinction. I'd love to understand where the maintainers and the broader community of ABI consumers stand on this sort of thing.
libc++ is revising its ABI, at least for some of its clients, and is very interested in using new "catalog" substitutions for the new ABI.
Some of its clients that wish to use a new ABI also correspond to new targets, but libc++ is not suggesting that they would use target-specific mangling rules; instead, they will also be changing their versioning namespace from __1
to __2
for these clients, and so manglings will not change for any existing entities.
We should recognize that the list of "catalog" substitutions is likely to keep growing. This will surely not be the last ABI version of libc++; further, the C++ committee will surely add more entities to the standard library; and then, libc++ may only belatedly realize that a particular entity was worth compressing, such that it will only be in the catalog for ABI versions N and higher. And, of course, this catalog offer also has to be extended to other standard library implementations, and in some cases they may need to put slightly different entries in the catalog. So the cataloging work will scale by the number of implementations, and the number of ABI versions, and the size of the standard library.
Nevertheless, I personally feel that it's appropriate for the Itanium ABI to support a large catalog here. If we're careful about the structure of these substitutions, we can keep the costs from getting too obviously combinatorial. But I'd like to get consensus on this before encouraging libc++ to start investigating which substitutions to include.
My current thinking is that we should add this in a fairly structured way to the grammar:
<substitution> ::= S <library-vendor> <library-version number> <library-entity>
<library-vendor> ::= c # libc++
etc.
<library-entity> ::= s # lib::basic_string<char, lib::char_traits<char>, lib::allocator<char>>
<library-entity> ::= up # lib::unique_ptr
etc.
with the expectation that there's an ad hoc rule for turning a combination of a library vendor and version into a namespace. Manglers and demanglers then only need to know three things:
We should be relatively parsimonious about adding new library-vendor
abbreviations, especially one-byte ones; there are only 19 characters available following S
. This could create a bit of a political minefield in the future.
Library version numbers don't have to correspond to any versioning scheme used elsewhere. In particular, they do not have to correspond to the number used in e.g. std::__2
. Note that one advantage of adding these compressions is that it eliminates some of the pressure for library vendors to use short names for their versioning namespaces in the first place. In fact, we may want to encourage libraries to use namespaces that are systematized the same way as the mangling, e.g. std::__c2
— although they might not want to do that, since such names have a habit of making their way into user-visible diagnostics.
We may want to consider whether these substitutions should introduce candidate substitutions for the seq-id
compression. seq-id
substitutions will often be shorter than these 4–5-byte catalog substitutions, which isn't possible for the current catalog. Of course, introducing candidates this way may also lengthen other candidates.
[Imported from cxx-abi-dev]
Per http://wg21.link/p0136r1 an inheriting constructor declaration no longer results in the implicit synthesis of derived class constructors, and instead the behavior of a call to an inherited constructor is that:
Proposal:
To avoid emitting the code for (1) and (3) in every inherited constructor call site, add a new form of mangled name for a fake constructor that forwards to a base class constructor, whose <encoding>
is that of the base class constructor, except that the <nested-name>
is that of the derived class and the <unqualified-name>
is
<ctor-dtor-name> ::= CI1 <
base class
type> # complete object inheriting constructor
<ctor-dtor-name> ::= CI2 <
base class
type> # base object inheriting constructor
This would give code largely similar to what we generate with the C++11 inheriting constructor rules, except that the additional copy constructions and destructions for parameters would be removed.
The usage of this mangling would be entirely optional; the purpose of including this mangling in the ABI is only to coalesce multiple weak definitions of the same symbol. If an implementation can't forward all the arguments (eg for varargs constructors) or just doesn't want to emit these symbols, the full initialization can be inlined instead (or another technique can be used).
As usual, CI2
constructors do not construct virtual base class subobjects. As a consequence, when a constructor is inherited from a virtual base, the corresponding CI2
symbol does not need the formal parameters, so they are not passed.
The mangling for a function template does not include the instantiation-dependent portions of non-type template parameters (including such things transitively within template template parameters). This is becoming increasingly important as people try to write things like:
template<typename T, std::void_t<typename T::x>* = nullptr> void func() {}
template<typename T, std::void_t<typename T::y>* = nullptr> void func() {}
(std::void_t<T>
is void
for all T
. It's not obvious whether it's supposed to be a dependent type, but the above cases are at least instantiation-dependent types.)
For a type T
providing both a nested x
and a nested y
, we will mangle instantiations of the two possible func<T>
s the same, despite them being distinct templates.
Including the (pre-substitution) types of non-type template parameters in the mangling (if they're instantiation-dependent) seems like the obvious fix, but it would likely result in an ABI break for a significant amount of existing code.
We should probably at least fix this for ABI v2.
Currently, when laying out an empty subobject, we only consider offset zero, followed by offsets >= dsize of the class. We never consider the multiples of [nv]alignof the subobject that are greater than zero and less than the dsize of the class.
Considering those additional offsets would reduce the size of some classes. For example:
struct noncopyable {};
struct A : noncopyable {};
struct B { int n; };
struct C : noncopyable {};
struct D1 : A, B, C {}; // sizeof(D1) == 8, could be 4
struct D2 { // sizeof(D2) == 8, could be 4
[[no_unique_address]] A a;
B b;
[[no_unique_address]] C c;
};
Since the variables in [hardware.interference] are constants, the values should probably be part of the ABI. Depending on the target architecture, of course.
I see that Clang developers were discussing this at http://lists.llvm.org/pipermail/cfe-dev/2018-May/thread.html#58073 but that discussion doesn't seem to have resolved.
It seems pretty clear that both values should be 64 for x86*, but other architectures are less clear; on ARM we might want to use 64 for constructive and 128 for destructive, as conservative answers given current existing variants.
Now that we've voted designated initialization into the C++ draft, we need a mangling. These are distinct:
template<typename T> void f(decltype(T{.a = 1, .b = 2}));
template<typename T> void f(decltype(T{.c = 1, .d = 2}));
Something like il di 1a Li1E di 1b Li1E E
would be enough for what we've voted into C++ (and di 1a di 1b
could be used for a multi-level .a.b
designator, which implementations will likely support as an extension).
Given
struct A { virtual ~A() = 0; }; A::~A() {}
we seem to require three symbols to be emitted: _ZN1AD0Ev
, _ZN1AD1Ev
, and _ZN1AD2Ev
(or at least, Clang and GCC both emit all three). Of these, only _ZN1AD2Ev
can ever be referenced; the deleting destructor and complete object destructor are not entered into the vtable, and a complete object of type A
can never be destroyed directly.
We should not require an implementation to emit these extra symbols. Note in particular that code must be emitted for the operator delete
call for D0
, which needs whole program analysis (-ffunction-sections
, LTO, etc) to remove.
The same holds regardless of whether the destructor is virtual (but if not, then there's at least only the complete object destructor symbol to worry about, which can always be an alias to the base subobject destructor symbol).
I'm working on adding prototype support for P0482 to gcc and am interested in reserving a mangling for the char8_t
type.
char16_t
and char32_t
use Ds
and Di
respectively. s
and i
correspond to short
and int
. Dc
(c
for char
) or Dh
(h
for unsigned char
) would be the obvious candidates, but both are taken (for decltype(auto)
and IEEE 754 support respectively). I believe Du
is currently available, so I'll suggest that as a starting point.
For now, I'm using the vendor extended type mangling (u7char8_t
) and am fine with that at least until support for char8_t
is accepted in upstream gcc.
ABI v2 possibility: when creating a mangling, do not register a substitution if the substitution would not be any shorter than the text we just mangled.
Example:
template<typename T> T f(T*, T*, T) {}
f<int>
mangles as _Z1fIiET_PS0_S1_S0_
. Following this rule, we would instead have _Z1fIiET_PT_PT_T_
, which is both shorter and easier to read.
Note that nearly all uses of a <template-param> register a substitution right now, so this rule would fire frequently at least for them.
We specify that guard variables are 64 bits wide, but we don't specify their alignment, and implementations vary. For example, for:
inline void f() {
static int n = g();
}
When targeting 32-bit x86, GCC and ICC use 8-byte alignment whereas Clang uses 4-byte alignment. (Generally, Clang uses the alignment of uint64_t
whereas the others appear to always use 64-bit alignment.)
Presumably we should say something about this.
C++20 adds a [[no_unique_address]]
attribute, which allows EBO layout to be requested for non-static data members. We need to update the ABI document to describe how it affects class layout.
As a concrete goal, we should aim to ensure that these two classes are laid out the same:
struct A : T1, T2, ... { ... };
struct A { [[no_unique_address]] T1 t1; [[no_unique_address]] T2 t2; ... };
... in all cases where that is possible. (There are cases where it is not: for example, if A
has a primary base class other than T1
and any prior base class is non-empty.)
Both libstdc++ and libc++ use inline namespaces for versioning, which removes the utility of basically all the built-in std
substitutions other than St
. We should provide a way for the S*
substitutions to be used with an inline namespace. I don't have a concrete suggestion yet; whatever we pick, we'll presumably want the std::<inline namespace>
part to itself be substitutable, which makes this a bit awkward to fit into the existing scheme.
struct X {
int ix;
virtual void x();
};
struct E : X, D {
void f ();
void h ();
int ie;
};
sizeof(E) should be 72,not 64
Testcase:
struct alignas(2 * sizeof(unsigned)) Base {
unsigned x;
~Base() = default;
};
struct Der : Base {
unsigned y;
};
If Base
is POD for the purpose of layout, then sizeof(Der) == 4 * sizeof(unsigned)
. Otherwise, sizeof(Der) == 2 * sizeof(unsigned)
. All verrsions of GCC disagree with all versions of Clang on this question -- GCC believes that Base
is POD for the purpose of layout, and Clang believes that it is not.
The ABI doesn't say who is right. It says that we must use the C++03 definition of POD to answer the question, which says: "A POD-struct is an aggregate class that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor."
But "user-defined" is meaningless in C++11 onwards. It looks like Clang interprets it as meaning "user-declared", which makes Base
non-POD, and GCC interprets it as meaning "user-provided", which makes Base
POD.
So what's the rule? Is it "user-declared" or "user-provided"?
A similar situation occurs if we default the default constructor instead of the destructor. Again, GCC believes that Base
is POD and Clang believes that it is not. In that case, the C++03 rules are applicable: Base
is not an aggregate because in C++03, "An aggregate is an array or a class with no user-declared constructors [...]" (and for what it's worth, this rule has been changed many times since then, but in C++20, we're back to "no user-declared constructors").
The document isn't clear that 'auto' in a generic lambda should be mangled as the underlying artificial template type parm. doing that can lead to recursive manglings (which is one reason why I thought it a bug).
Here's a suggested diff (sorry, had to gzip it to attach it)
The ABI says that string literals in instantiation-dependent expressions are mangled thusly:
<expr-primary> ::= L <string type> E # string literal
... presumably because, in C++98, the type of the literal was the only property that could affect the validity of the instantiation-dependent expression. That is no longer the case; a C++11 program can inspect the contents of such a string literal in an instantiation-dependent expression, so we need to mangle said contents.
Proposal:
<expr-primary> ::= L <string type> <char>* <hash>? E # string literal
<char> ::= <0-9a-zA-DF-Z> # values 48-57, 97-122, 65-68, 70-90
::= _<hex><hex> # other chars encoded in (big-endian) hexadecimal
::= __<hex><hex><hex><hex>
::= ___<hex><hex><hex><hex><hex><hex>
::= ____<hex><hex><hex><hex><hex><hex><hex><hex>
<hex> ::= <0-9a-f>
<hash> ::= <hex>{M}
... where the first N (say, 16) characters of the string are encoded directly, followed by a 4M-bit hash of the entire string (algorithm TBD, but following target endianness) if its length is greater than N (where for all purposes other than determining the type, the terminating nul character is ignored).
The idea here is to preserve the string literal contents (at least the start of it) so that demanglers can display it, while avoiding mangling the entire contents of very long strings.
As an example, if we take N = 16, M = 8, and use MD5 as our hashing algorithm (taking the high-order 32 bits of its output), "Hello, world!"
would mangle as LA14_cHello_2c_20world_21E
, and U"this is a very long string indeed" would mangle as
LA34_Dithis_20is_20a_20very_20l1cf8df38`.
If we like this direction, there are a few open questions:
How should we mangle the conversion function template to function pointer, and the function that it returns? Example:
inline void test() {
auto x = [](auto){};
using F = void(int);
F *(decltype(x)::*p)() const = &decltype(x)::operator F*;
void (*q)(int) = x;
static auto p_static = p;
static auto q_static = q;
assert(p == p_static);
assert(q == q_static);
}
Here, the first assertion is required to hold, so we need a consistent mangling for the conversion function. And the second assertion should probably hold too, so we can deduplicate the static invoker function across vendors and so that we get consistent behavior when that value is (eg) used to initialize a global constant that is visible across TUs.
In the conversion function case, we need to decide how to mangle the type, and particularly how to mangle the return type of the function pointer type that the lambda conversion function template converts into. That type is not actually nailed down by the standard to the extent that we could mangle it; instead, we are told that "The return type of the pointer to function shall behave as if it were a decltype-specifier denoting the return type of the corresponding function call operator template specialization."
Current manglings for p:
_ZZ4testvENKUlT_E_cvPFDTcldtdeLPv0EonclscOS_fp_EES_EIiEEv
-- GCC
_ZZ4testvENKUlT_E_cvPFDaS_EIiEEv
-- Clang and EDG
Neither mangling is great. They pointlessly repeat the lambda parameter signature from the Ul
mangling. GCC's exposes an implementation detail, namely the exact decltype
expression used under the hood (including a cast of a null pointer to the closure type, and a reference to a not-in-scope function parameter!). Clang's and EDG's give the conversion function itself a deduced return type, which is strictly-speaking incorrect, but in today's C++ can't collide with anything else because it's impossible to declare an operator auto(*)(T)()
function, and in any case there can't be one declared in the same scope as the conversion function.
For the static invoker function, we need a name for the function as well as a type. EDG and Clang call this __invoke
and GCC calls it _FUN
, but those seem like things you would only include in a mangling by accident; our convention is to use special manglings as names for such entities instead.
Some possibilities:
Alternative 1:
cvPF<sig>E
, where is the signature of the call operatorli
as the name, and use the type of the call operator as written as the type (that is, lie about it having a deduced return type if the operator() has a deduced return type)This matches the manglings used today by EDG and Clang, with 8__invoke
replaced by li
.
Alternative 2 (removing some of the redundancy):
lc
as the name instead of cv<...>
, removing the need to mangle the return type and to (redundantly) repeat the lambda parameter list from the preceding manglingli
as the name, and add this case to the list of cases where we do not include the return type in the encodingThat gives:
_ZZ4testvENKUlT_E_lcIiEEv
_ZZ4testvENKUlT_E_2liIiEES_
Note that we still include a redundant v
in the lc
mangling (consistent with cv
manglings), and a redundant S_
(or more generally a redundant sequence of substitutions) in the li
mangling, but the consistency those bring seems worthwhile.
Alternative 3 (matching the standard's model):
Add a type mangling for the "decltype-specifier denoting the return type of the corresponding function call operator template specialization" type described in the standard, say Dl
, and (as above) use li
as the name of the invoker (but otherwise treat it as a regular static function). That gives:
_ZZ4testvENKUlT_E_cvPFDlS_EIiEEv
_ZZ4testvENKUlT_E_2liIiEEDlS_
(where the Dl
encoding would be used for all generic lambdas, regardless of whether the operator()
has a dependent return type).
Of these, I think I prefer Alternative 1: it's the smallest extension to the ABI, and is closest to existing implementation practice.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.