See prior discussion here: <a href="http://sourcerytools.com/pipermail/cxx-abi-dev/201

Comments (15)

daveedvdv commented on July 20, 2024

The link in this issue appears dead. Here is the thread summarized from my personal mail archive:

Subject: N4198 and mangling for member pointer template arguments

Richard Smith, Nov. 25, 2014:
N4198 (accepted at Urbana) makes it possible for a template parameter of type T U::* to have a template argument of type T V::*, where V is a base class of U or vice versa. A naive attempt to apply the existing ABI rules leads to mangling collisions in cases like this:

struct A { int n; };
struct B : A {};
template<int A::*> void f() {}
template<int B::*> void f() {}
void g() {
  constexpr int A::*p = &A::n;
  constexpr int B::*q = p;
  f<p>();
  f<q>();
}

(Here, a naive approach would use XadL_ZN1A1nEEE as the template argument value in both calls.)

In order to resolve this, I suggest we introduce a new mangling for the case of a member pointer template argument where the class containing the member is different from the class in the template parameter. The minimal information we'll need to include is the class in the template parameter and a designator if the base class is a repeated base class.

One approach would be to use

  sc <type> ad L<member>E

and to explicitly include the final type plus those intermediate types that introduce multiple inheritance from the base class (that is, just enough to uniquely identify the path).

Another would be to introduce a new mangling that incorporates the final type and an offset or discriminator.

Thoughts?

John McCall, Dec. 1, 2014:
Do we have the same problem for references and pointers to base subobjects? Okay, I see that the answer is “no”, but only because you kept that restriction in N4198. I think we can assume that that’s not permanent.

Richard Smith, replies same day:
I agree; I expect we'll eventually pare back the restrictions to something like "no pointers/references to union members, and no one-past-the-end pointers", or even remove all restrictions altogether if no-one gets upset that different template arguments can compare equal. (We've actually already crossed this bridge by specifying that pointers to members of a union compare equal even if they point to different members, but no-one has got upset about it yet...)

John McCall, Dec. 1, 2014:
I like the idea of using (possibly invented) static_casts; it’s not optimally compact, but it at least theoretically works with existing demanglers. Have you checked to see if it actually works?

Richard Smith, replies same day:
For _Z1fIXscM1BiadL_ZN1A1nEEEEvv (from my example above):

GCC's c++filt gives void f<static_cast<int B::*>(&A::n)>()
libc++abi's demangler gives void f<static_cast<int B::*>(&(A::n))>() ... which is wrong, but it's equally wrong without the static_cast.

John McCall, Dec. 1, 2014:
I agree with only including those intermediate steps necessary to uniquely determine the path.

We’d have to specify in what dependent situations we include the path. “Never” is the easiest answer, so that in

  template <class T, int T::*member> void foo(decltype(T() + temp<&A::baz>());

we’d mangle &A::baz without a path clarification even if we could type-check "temp<&A::baz>()” at template definition time.

Richard Smith, replies same day:
That seems reasonable to me, but I'm not exactly sure what classifies as a "dependent situation"; do you mean that we should mangle the path only if the is not nested within an instantiation-dependent ?

There's another issue that we should probably fix at the same time: qualification conversions are permitted in template arguments, and we currently mangle a signature that performs a qualification conversion the same way as we mangle a signature that does not. We could either fold the qualification conversion into the last (synthetic) static_cast, or add an explicit synthetic const_cast to model it. I'm inclined to favour the latter, even though it will give longer manglings in the (hopefully rare) case where both conversions occur (because it also works if the user has cast away constness, and because it's simpler). Example:

// tu1
extern int n;
template<int*> void f() {}
void g() { f<&n>(); }

// tu2
extern int n;
template<const int*> void f() {}
void h() { f<&n>(); }

Here:

g calls _Z1fIXadL_Z1nEEEvv
h calls _Z1fIXccPKiadL_Z1nEEEvv

from cxx-abi.

daveedvdv commented on July 20, 2024

To illustrate the auto case that Richard implied in his original issue submission, we ran into cases like this one:

template < auto * p > struct A { } ;	
char c;
A<&c> a1;
const char& cr = c;
A<&cr> a2;
static_assert(!__is_same(A<&c>, A<&cr>));

(Some implementations incorrectly fail the static assertion check.)

If I understand correctly, the type of a1 would be mangled as 1AIXadL_Z1cEEE, and the type of a2 would be mangled as 1AIXccPKcadL_Z1cEEE?

from cxx-abi.

zygoloid commented on July 20, 2024

Concrete suggestion: [UPDATED 2023-03-29 to specify the natural template parameter corresponding to a pack argument.]

STEP 1: Form a base mangling

Use L <type> <value> E for fundamental types and null pointers and null pointers to members.
Use L <enum type> <value> E for enumeration types. (This is current implementation practice but not covered by the ABI doc.)
Use ad <expression> for non-null pointers to members.
Use L _Z <encoding> E for non-null pointers and for references, where the <encoding> identifies the complete object.
See #63 for class types.
For types and templates, use <type>.

STEP 2: Add the path to the subobject (or containing class)

For a non-null pointer or a reference, apply the minimal series of member access expressions and array indexing in order to reach the most-derived object of the identified subobject.

Class member accesses are expressed using dt. An unqualified name is used (eg, dt <expr> 3foo unless it would be ambiguous, in which case the resolved type name is used as a qualifier: dt sr <type> <simple-id>. This changes:
```
 <unresolved-name> ::= sr <unresolved-type> <base-unresolved-name>     # T::x / decltype(p)::x
```
to
```
 <unresolved-name> ::= sr <type> <base-unresolved-name>     # T::x / decltype(p)::x
```
with the understanding that the <type> is encoded as an <unresolved-type> except in a non-type template argument.
Array indexing is expressed using ix, with a constant index of type ptrdiff_t, except when the parameter is a pointer and the array indexing would be the last step in denoting the subobject. This case, and the case of a one-past-the-end pointer, are handled further below.

For a reference or for a non-null pointer or pointer-to-member, if the object identified so far (or for a pointer to member, the class in the value's type) is of a different class type from that of the parameter, those classes must be related by inheritance, and a minimal sequence of casts is inserted. For each class in the inheritance path between the class type associated with the argument and the class type associated with the parameter that directly inherits from multiple base classes that have a path to the target base class, a cast to the chosen base class of that class is inserted. (For a base-to-derived pointer-to-member conversion, these casts are prepended in base-to-derived order, otherwise they are prepended in derived-to-base order.) Casts are expressed using cv.

For a pointer, the address is taken using ad. For a one-past-the-end pointer or a pointer to an array element, a final indexing is performed using pl, with a constant index of type ptrdiff_t as the second operand.

If the template parameter is a non-type template parameter with a deduced type, and the type of the parameter is not the same as that of the <template-arg> expression, a final cast is added to the parameter type using cv. (This could involve a qualification conversion, derived-to-base conversion, base-to-derived pointer-to-member conversion, change in noexcept for a function pointer, or change in array bound.)

Finally, if the expression for a non-type template argument is not an <expr-primary>, it is wrapped in X ... E.

STEP 3: Ensure a unique overload is identified

If the template is not overloadable, use the <template-arg> identified above. A template is overloadable if it is a function template that is not the call operator or conversion function of a generic lambda.

Otherwise, determine the natural template parameter for the <template-arg>.

For a template template argument, the natural template parameter is the one with an exactly-matching template head.
For a type template argument, the natural template parameter is (unconstrained) typename / class
For a non-type template argument, the natural template parameter is an unconstrained parameter whose type is that of the <template-arg> expression, or an lvalue reference to that expression if it is an lvalue.
For an empty pack argument (JE), the natural template parameter is typename....
For any other pack argument, the natural template parameter is determined based on the first pack element.

If the natural template parameter for the <template-arg> does not match the actual template parameter (that is, if the mangling of the natural template parameter differs from the mangling of the actual template parameter), the <template-arg> is prefixed with a <template-param-decl> (see #31 / #85) describing the actual template parameter.

Comparison to status quo

This proposal aims to preserve all unambiguous status quo manglings wherever possible. The following existing manglings are changed:

For function templates, we will sometimes include a template parameter decl mangling that we didn't previously include. This may prove problematic for explicit specializations and explicit instantiations of function templates, but most features that lead to this being included are recent (deduced template parameter types, mismatched template template parameters and template template arguments, constrained template parameters, ...), so the risk to ABI if we make this change now is probably not huge.
For pointers to members, we include a conversion path. The relevant feature is currently unimplemented in at least one compiler pending an ABI rule being invented.
For pointers and references, we include a subobject path. This exists only to support a C++20 rule change permitting essentially unrestricted subobject paths, so the ABI risk is minimal.

from cxx-abi.

zygoloid commented on July 20, 2024

For Daveed's example, this gives the suggested type for a1 but gives 1AIXcvPKcadL_Z1cEEE for a2. Contrary to my earlier suggestion, I think it's more consistent to always use cv rather than sc or cc to represent casts, for a few reasons:

we want to express a cast that ignores access checks
the final cast for a NTTP with a deduced type may involve both a const_cast and a static_cast, and it's wasteful to produce both, or even to require implementations to figure out which one is needed
we want to express a notional cast rather than a particular syntax, and cv is used for both T(x) and (T)x

So I think we should not bother with named casts and just use cv.

The above approach is based around the idea of trying not to invent new mangling productions wherever possible. Another way we could go for identifying subobjects would be to introduce a new mini-grammar to more compactly describe subobject paths. Eg, foo.x[3].y could be encoded as L_Z3foo1xA3_1yE rather than XdtixdtL_Z3fooE1xLi3_1yE if we wanted to go that way -- it'd mean more changes to demanglers and a less regular grammar, but shorter manglings for these new cases.

from cxx-abi.

rjmccall commented on July 20, 2024

For each class in the inheritance path between the class type associated with the argument and the class type associated with the parameter that directly inherits from multiple base classes that have a path to the target base class, a cast to the chosen base class of that class is inserted.

This could generate a needlessly long path under certain circumstances, e.g.

struct Base {};
template <unsigned N> struct Derived : Derived<N-1>, Base {};
template <> struct Derived<0> : Base {};

Derived<100> -> Derived<99> -> ... -> Derived<0> -> Base can be shortened to Derived<100> -> Derived<0> -> Base, but this rule will require a conversion at each step.

It should be straightforward to get an actually minimal path by just updating the target as we go, starting from the most-derived end:

Given a fully explicit base path P := C_n -> ... -> C_0, the minimized base path Min(P) is defined as follows: let C_i be the last element for which the conversion to C_0 is unambiguous; if that element is C_n, the minimized path is C_n -> C_0; otherwise, the minimized path is Min(C_n -> ... -> C_i) -> C_0.

Minimizing the path is useful no matter how we end up mangling it.

from cxx-abi.

rjmccall commented on July 20, 2024

As for how to mangle the path: you're right that it's probably simplest to use casts, but they can definitely get verbose, especially with member pointers. (Member pointers won't be as bad as they could be because the member type should be reliably substituted, but it's still a lot.) Also, using casts means that all of those intermediate cast types will be substitution candidates, which might be really awkward for implementations. I wouldn't be opposed to some sort of optimized conversion mangling that's only used (for now) in template arguments.

from cxx-abi.

zygoloid commented on July 20, 2024

Agreed, your approach to minimizing the sequence of conversions is strictly better, and seems no harder to implement. Please pretend I suggested the smarter thing ;-)

Regarding an optimized mangling for conversion paths:

If we want a minimal representation for pointers, references, and pointers-to-members, I think all we need is:

the target type
a constant offset (for pointers-to-members, a this adjustment) if it's ambiguous
a list of union members traversed
one bit to indicate whether we have a one-past-the-end pointer
The above still lets us produce correct expression forms for template arguments, albeit ugly and uninformative ones such as (T*)((char*)&x + N)+1 (for a one-past-the-end pointer to T at offset N).

Alternatively, we could number the possible values and just use a base value plus an index. That should even be workable for pointers-to-members -- whenever we have a non-null constant value of type A T::* pointing to a member of class U, both T and U must necessarily be complete class types, so we can number the base classes if T as a base of U (or vice versa) is ambiguous. But this approach would have no way to demangle to valid C++ code.

At the other end of the scale, if we want a good, readable demangling, we need a path of member accesses, base classes (only where ambiguous), array indexes, and a final "one past the end" flag. This could be very compact, eg 3mem5otherA13_ for ".mem.other[13]".

So I think we need to consider how much information we want in the demangled form versus how much mangled name length we're prepared to pay. (And also the cost of non-uniformity between dependent and resolved names.)

from cxx-abi.

zygoloid commented on July 20, 2024

I'm reconsidering "STEP 2" above. It seems important that renaming a private non-static data member is not an ABI-affecting change (except in weird cases, like forming a pointer-to-member or using the member name in a SFINAE context). So given:

class A {
  int n;
public:
  const int &get() const { return n; }
};
A a;
template<const int &p> struct B {};

... it seems important that the mangling for B<a.get()> does not depend on the private member name n; I think this concern is more important than the ability to produce "nice" demanglings.

For pointers to members, something somewhat similar but more subtle (and probably far less important) can happen if we encode the conversion path:

template<typename T, typename Alloc> struct vector : private __vector_base<T>, private __alloc_holder<Alloc> {};

Here, suppose __vector_base<T> and __alloc_holder<Alloc> have a base class in common (for example, because the allocator derives from vector<U>, __alloc_holder derives from the allocator, and __vector_base derives from a non-templated __vector_helper), and we expose a pointer-to-member pointing to a member of __vector_common. Then changing the implementation to:

template<typename T> struct __vector_base_2 : __vector_base<T> {};
template<typename T, typename Alloc> struct vector : private __vector_base_2<T>, private __alloc_holder<Alloc> {};

... affects the ABI. More broadly: private implementation-detail classes in the middle of the inheritance hierarchy can show up in pointer-to-member conversion paths.

So I suggest we replace STEP 2 with something like this:

STEP 2: Add the offset to the subobject (or containing class)

For a non-null pointer or a reference, if the value is a subobject pointer/reference or a past-the-end pointer value, identify the subobject with a new expression mangling (used only for this purpose):

<expression> ::= so <referent type> <expr> [<offset number>] <union-selector>* [p] E
<union-selector> ::= _ [<number>]

where

referent type is the pointee type for the pointer or the referenced type for the reference, or the unqualified type of the subobject for a pointer to void
offset is the offset from the complete object to the subobject, and is omitted if it is zero
one union-selector is present for each union member in the path to the subobject
the number in each union-selector is the index of the active union member in declaration order minus 2 (or omitted for the first member) and
the p is present when (and only when) forming a one-past-the-end pointer (including the case of forming arr + N when arr is an array of size N).

For a pointer, the address is then taken using ad.

For a pointer-to-member, if the class of the member is different from that in the parameter type, it is converted using a new expression mangling (used only for this purpose):

<expression> ::= mc <parameter type> <expr> [<offset number>] E

where

parameter type is the (pointer-to-member) type of the parameter, including any qualification conversions
expr is the adL_Z...E expression forming the pointer-to-member
offset is the this adjustment from the parameter's class type to the pointer-to-member's class type (negative values are prefixed with n)

If either the template parameter is a non-type template parameter with a deduced type, or the template name is unresolved or refers to an overloadable template, and the type of the parameter is not the same as that of the <template-arg> expression, a final cast is added to the parameter type using cv. This can only involve a qualification conversion (change in qualifiers or array bounds), or a function pointer conversion (change in noexcept), or a cast to possibly-qualified void *.

Finally, if the expression for a non-type template argument is not an <expr-primary>, it is wrapped in X ... E.

Commentary

This adds new expression manglings that are not used for expressions in general, only for expressions in template arguments. That seems a little unfortunate, but there's really no way to express the past-the-end flag and the union member accesses with any current mangling, without relying on field names. We could mangle converted pointers to members as (eg) &C::x + 4, but it seems a little unsafe to use a mangling corresponding to actual (but incorrect) source syntax.

The above manglings should be invariant under most changes that don't have broader effects on ABI (renaming, rearranging class hierarchies without layout changes, and so on). The exceptions are:

Reordering union members now affects ABI, as does adding new unions. It seems hard to avoid ABI changes from both renaming and reordering union members. Maybe there's something else we could key off? (Using the type of the active union member seems worse, and is in any case incomplete, as there can be more than one union member with the same type.)
Renaming members to which pointers-to-members are formed can affect ABI, as can refactoring the enclosing class (and thereby changing the offset from the parameter type to the type containing the member), but this seems largely to be a pre-existing problem. We could switch to a different mangling where possible (eg, mangling all pointers to data members as L <type> <offset> E) but (a) we can't address the most common case (no conversions) without an ABI change for existing code, (b) we can't address pointers to non-virtual member functions that way, and (c) the language rules expect us to distinguish pointers to different (union) members at the same offset, so doesn't seem worthwhile to pursue.

Folding a qualification conversion into the so / mc mangling is just a minor tweak to reduce the length of manglings and simplify the output from demanglers. mc <type> <expr> <n> E can be demangled as (<type>)<expr>. so <type> <expr> <offset> ... E can be demangled as *(<type>*)((char*)<expr> + <offset>), but it's probably better to demangle it more abstractly as something like {<type> at offset <offset> in <expr>}.

from cxx-abi.

jicama commented on July 20, 2024

Renaming members to which pointers-to-members are formed can affect ABI, as can refactoring the enclosing class (and thereby changing the offset from the parameter type to the type containing the member), but this seems largely to be a pre-existing problem.

Indeed, non-static data member names are already part of the ABI in some situations. Is it such a problem to have that existing condition extended to the new functionality? And identifying union members by name seems to me the best choice.

from cxx-abi.

zygoloid commented on July 20, 2024

Indeed, non-static data member names are already part of the ABI in some situations. Is it such a problem to have that existing condition extended to the new functionality?

I think it's fine for us to use the names of union members in class NTTP manglings, because those members are required to be public. But I think it's not reasonable for private member names to be used in pointer and reference NTTP manglings with the class author being able to do nothing about it. (Today, if you don't form a pointer to member, and you don't use the member name in a SFINAE context, you can rename private members and reorganize in ways that don't change class layout freely.) For example, there is no way to avoid the private member names of, say, std::tuple becoming part of the ABI with the previous approach, because they're exposed by reference to constant evaluations. I imagine the folks who want to provide ABI stability guarantees would be very upset by that.

And identifying union members by name seems to me the best choice.

I'm much more on the fence on this one. But the positional approach results in shorter manglings, and if we're not going to be able to recover the full path anyway, using a minimal discriminator seems like it might be preferable.

from cxx-abi.

rjmccall commented on July 20, 2024

@zygoloid, I know compilers have been adopting some variant of this. Any changes you want to make before I land this?

from cxx-abi.

zygoloid commented on July 20, 2024

I think I'm happy enough with the rules as described above if others are, and Clang has implemented those rules for a while, so we have at least some experience with them.

from cxx-abi.

rjmccall commented on July 20, 2024

I think I buy Richard's argument about just using an offset in step 2:

It's quite plausible that you could end up with one of these pointer/reference arguments for a private member of a stdlib type via a constexpr function call, e.g. to std::array::data(). That would make it ABI-breaking to fix private member names in the stdlib headers to avoid macro collisions with other system headers. Member names are currently exposed in the mangling of dot/arrow expressions and member pointer constants, but you have to directly use that name in a dependent function signature or in a template argument.
Inventing and mangling the canonical expression path seems like it introduces a lot of implementation complexity, especially if we have compression in it around derived-to-base conversions. There's a lot less complexity in just computing an offset.

@jicama, does that part of Richard's proposal seem acceptable?

from cxx-abi.

rjmccall commented on July 20, 2024

And I think the ABI break in Part 3 is probably necessary and best made as soon as possible.

from cxx-abi.

rjmccall commented on July 20, 2024

Okay, I've pushed a complete draft for this as #166. This includes the specification of template-param-decl from #85 and the specification of class constant mangling from #63.

There are a few substantive differences between my draft and Richard's last proposal. Chiefly, I took the way that mangling was different for function template arguments and formalized it into the idea of contexts that require "precise typing". That then made it pretty easy to stop using precise typing in some places where it's not necessary, like if we're mangling a template-param-decl anyway. And I've clarified that class constants do not use precise typing for members.

from cxx-abi.

mangling for converted non-type template arguments about cxx-abi HOT 15 OPEN

Comments (15)

STEP 1: Form a base mangling

STEP 2: Add the path to the subobject (or containing class)

STEP 3: Ensure a unique overload is identified

Comparison to status quo

STEP 2: Add the offset to the subobject (or containing class)

Commentary

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent