Comments (38)
from cxx-abi.
It's pretty unfortunate that we have independent discriminator sequences for different <lambda-sig>
s, but what's done is done.
I feel like we should at least mangle how many explicit template parameters there are. Do we need anything else? We don't otherwise mangle template parameter lists, I think.
from cxx-abi.
I feel like we should at least mangle how many explicit template parameters there are. Do we need anything else?
I don't think we need anything at all; I think this is largely a question of readability of mangled and demangled names. Is a count of template parameters useful to a demangler?
I have a (very) slight preference for mangling the template parameters, for consistency between []<int>{}
vs []<long>{}
and [](int){}
vs [](long){}
. This would require us to invent manglings for template parameters, but given cases like #20, that may be inevitable in the long run.
from cxx-abi.
I have a (very) slight preference for mangling the template parameters, for consistency between
[]<int>{}
vs[]<long>{}
and[](int){}
vs[](long){}
. This would require us to invent manglings for template parameters, but given cases like #20, that may be inevitable in the long run.
I share this preference. I recently had to explore mangling for template parameters so that we could generate unique names for function template definitions (not their specializations). I ended up with the following extensions (using the recently added '$' to indicate vendor extensions)
<template-param-decl>
::= $t
::= $n <type>
::= $T <template-args>
<template-arg>
::= <template-param-decl> <template-param>
...
With the above, <lambda-sig>
could be modified to the following (I don't think there are conflicts here, but I haven't looked closely).
<lambda-sig>
::= [<template-args>] <parameter type>+
from cxx-abi.
The only reason I can think of that knowing the template parameters would be useful to a demangler is that it would let the demangler reject some ill-formed mangled names, e.g. if they referred to a non-existent or incorrectly-kinded template parameter. So, not really.
I'm fine with inventing a mangling for template parameters. It'll make some pretty long symbols even longer, but these are symbols that can almost always be stripped anyway. And yeah, it'll be useful for some other tools, as well as for any potential use cases that come up down the road (maybe with concepts?).
from cxx-abi.
Edit: consistently call the new production <template-param-decl> instead of sometimes using the (already-existing) production name <template-param>
Edit 2023-03-31: Add both locations of requires-clauses for lambdas, update to latest version of #24, add Tp
from later discussion
Based somewhat on @tahonermann's approach, how about:
<template-param-decl>
::= Ty # template type parameter
::= Tn <type> # template non-type parameter
::= Tt <template-param-decl>* E # template template parameter
::= Tp <template-param-decl> # template parameter pack
<lambda-sig>
::= <template-param-decl>* <parameter type>+
Issue #24 would add
<template-param-decl> ::= Tk <concept name> [ <template-args> ] # constrained parameter
and extends <lambda-sig>
to:
<lambda-sig> ::= <template-param-decl>* [Q <early requires-clause expression>]
<parameter type>+ [Q <late requires-clause expression>]
Issue #47 would add
<template-arg> ::= <template-param-decl> <template-arg>
... for cases where it's permissible to overload on the <template-param-decl>, and where the <template-param-decl> is not obvious from the form of the <template-arg>. Specifically, we'd use this when
- the <template-param-decl> is instantiation-dependent or contains a placeholder type
- the <template-param-decl> is a concept-name or partial-concept-id
- the <template-param-decl> is a template template parameter that is not an exact match to the <template-arg>
(Per discussion on issue #20, we may want to defer the non-concepts part of this to ABI v2. But we've made bigger changes already to fix mangling collisions, so I don't think we should.)
For a type difference between a non-type template parameter and its <template-arg> (particularly for the pointer-to-member case), I think we previously agreed to use a "cv" mangling to describe the conversion. A <template-param-decl> is not sufficient there; we need to know the conversion path, not just the final type, in some cases.
from cxx-abi.
<template-param>
::= Ty # template type parameter
::= Tn <type> # template non-type parameter
::= Tt <template-param-decl>* E # template template parameter
I see both <template-param>
and <template-param-decl>
there. Did you intend to extend the existing <template-param>
here? If so, the <template-param-decl>
use should presumably be <template-param>
.
I would prefer a different name than to extend the meaning of <template-param>
. Actually, I'd prefer to re-purpose that name for the new production above and rename the existing <template-param>
to <template-param-ref>
(and ideally rename <function-param>
and <template-template-param>
similarly).
The reason I extended <template-arg>
to include both a <template-param-decl>
and <template-param>
was to register a substitution candidate for use by later template parameter references from a potentially different template parameter scope. This allowed generation of distinct names for cases like the following. The concern here is similar to that raised in issue #20, but is, I think, only a concern for our use of the described extension in order to name templates as opposed to their specializations. I don't see this concern being applicable to lambdas with explicit template parameters, but thought I'd mention it in case it might be relevant for other potential uses of the proposed <template-param>
changes.
template<typename T, template<typename U, T*> class> void ft();
template<typename T, template<typename U, U*> class> void ft();
from cxx-abi.
@tahonermann Thanks, I've edited my comment to fix the <template-param> typo to <template-param-decl> throughout. This is intended to be a new production unrelated to the existing <template-param>.
I don't think I like attaching the <template-param-decl> to a use of a <template-param>, and I expect that will lead to ambiguities down the road in templates that do not use all of their parameters; I would prefer that we give the template parameter declarations up-front (in the <template-args> list of the template, for a use of an overloadable template).
Your example demonstrates that my proposed scheme is missing something: in
template<typename T, template<typename U, T *P> class K> void ft() {}
template<typename T, template<typename U, U *P> class K> void ft() {}
... we need a way to reference both T
and U
within the type of P
, and can't call them both T_
. We could probably get away with just numbering the template parameters that are in scope, in lexical order, so T
would be T_
and U
would be T0_
within the declaration of P
(and T0_
would instead name K
for contexts lexically after the declaration of K
). But perhaps that's too cute.
from cxx-abi.
I don't think I like attaching the <template-param-decl> to a use of a <template-param>, and I expect that will lead to ambiguities down the road in templates that do not use all of their parameters; I would prefer that we give the template parameter declarations up-front (in the <template-args> list of the template, for a use of an overloadable template).
It feels weird to me too. The way we used it (well, intend to, we haven't actually rolled this out yet) is in the <template-args> list for a function template (only). The <template-arg> addition I described allowed including the <template-param-decl> encoding for overload disambiguation purposes and the <template-param> encoding to register a substitution candidate for resolution across template parameter scopes. For these functions:
template<typename T, template<typename U, T *P> class K> void ft() {}
template<typename T, template<typename U, U *P> class K> void ft() {}
mangled names like the following are generated. This is using your proposed <template-param-decl> additions and my <template-arg> addition. I probably have other errors in here; the point is the use of 'S0_' in the first (to reference 'T') where 'T_' is used in the second (to reference 'U'):
_Z2ftITyT_TtTyT_TnPS0_T0_ET0_Evv
_Z2ftITyT_TtTyT_TnPT_T0_ET0_Evv
from cxx-abi.
Interesting. It looks like you're relying on all references to a template parameter other than the first being mangled as a <substitution>. That appears to almost be true; I think there are precisely three contexts where a <template-param> can appear that do not support a <substitution>:
- a non-type template parameter in an expression
- the operand of
sizeof...
(technically,sP
allows substitutions for the type/template case butsZ
does not) - in an <unresolved-type> (but see #38 for that)
Presumably your scheme doesn't work in those cases? (It's incidentally unnerving to observe how much we're polluting the substitution space with template parameters that already have names that are as short as the substitution or shorter.)
The idea of using substitutions to express something that could not be expressed without them seems odd to me; my mental model of a substitution is that it refers to some earlier substring of the mangling, and you should be able to first replace all substitutions with their corresponding substrings and then demangle and get the same result. (In fact, I have an idea for a multi-pass in-place constant-storage demangler that relies on this property, but this margin is too small to contain it etc.)
Is there a problem with the lexical numbering of visible parameters approach?
FWIW, I think I would mangle a function template as if its <template-args> contained a list of <template-param-decl>s alone. So your two ft
templates would be:
_Z2ftITyTtTyPT_EEvv
_Z2ftITyTtTyPT0_EEvv
from cxx-abi.
Yes, I have been relying on all template parameter references being mangled as substitutions; I wasn't aware of the exceptions you mentioned, but yes, I agree it looks like my approach does fail those cases. My approach also looks to fail for <expression> as well; I suspect cases like this would result in name collisions:
template<typename T, template<typename U, decltype(T{})> class> void ft() {}
template<typename T, template<typename U, decltype(U{})> class> void ft() {}
When coming up with this scheme, I do recall having distinct is-this-really-a-valid-use-of-substitutions thoughts :)
Is there a problem with the lexical numbering of visible parameters approach?
I think there may be at the implementation level. I've only looked at EDG's implementation and I'm not strongly familiar with it. Perhaps Daveed could chime in. From what I recall, EDG uses a coordinate system to identify a template parameter declaration scope and position and uses that to generate the parameter index value for <template-param>. I'm not sure how difficult it would be to support a lexical numbering system.
FWIW, I think I would mangle a function template as if its <template-args> contained a list of <template-param-decl>s alone.
Yeah, I think I'll have to revisit this. I only introduced the <template-param> uses to resolve ambiguities that arose with the naming that was already in place and never felt really satisfied with that approach. Thanks for sharing your suggestions.
from cxx-abi.
My approach also looks to fail for <expression> as well
And, of course, that is what you meant when you stated
a non-type template parameter in an expression
(I apparently struggle with reading comprehension...)
from cxx-abi.
From what I recall, EDG uses a coordinate system to identify a template parameter declaration scope and position and uses that to generate the parameter index value for <template-param>. I'm not sure how difficult it would be to support a lexical numbering system.
We do the same in Clang, identifying a template parameter by depth and index. I think lexical numbering of in-scope parameters in such a model should be fairly simple: when you start mangling a template template parameter, track its lexical index, and add that to the index of parameters within it to form their index. (That is, store an array mapping from depth to offset, populated when you enter a template template parameter; the lexical index of a template parameter is then "offset[depth] + index".)
Or we could directly invent a <template-param> mangling that specifies both a depth and an index for the depth != 0 cases, or even encode the complete path of indexes from the outermost template to the parameter (the lexical index can be viewed as a compressed form of the latter approach: the lexical index is simply the sum of those indexes). Among these, the lexical index has the advantage of being the shortest, but I'd be fine with the other options.
from cxx-abi.
Or we could directly invent a <template-param> mangling that specifies both a depth and an index for the depth != 0 cases, or even encode the complete path of indexes from the outermost template to the parameter (the lexical index can be viewed as a compressed form of the latter approach: the lexical index is simply the sum of those indexes).
Both of those approaches sound reasonable. I'm assuming depth == 0 means the outermost list. Would a relative depth approach for referencing a parameter from an enclosing template parameter list be reasonable? That would avoid the need to specify a depth for reference to a parameter in the same parameter list which, I assume, is the common case. For example, if we changed <template-param> to the following (where 'U' stands for "up"):
<template-param>
# First parameter of the current or innermost enclosing list (omit optional U for current list)
::= T [ U ] _
# Nth parameter of the current or innermost enclosing list (omit optional U for current list)
::= T [ U ] <parameter-2 non-negative number> _
# First parameter of an enclosing list other than the innermost
::= T U <distance-2 non-negative number> _ _
# Nth parameter of an enclosing list other than the innermost
::= T U <distance-2 non-negative number> _ <parameter-2 non-negative number> _
then the following function templates:
template<typename T, template<typename U, T *P> class K> void ft() {}
template<typename T, template<typename U, U *P> class K> void ft() {}
would mangle to:
_Z2ftITyTtTyTnPTU_EEvv
_Z2ftITyTtTyTnPT_EEvv
Presumably, these references would be substitution candidates. If so, that would again break with your model of substitutions representing substrings since the context of the reference would affect the mangling.
from cxx-abi.
from cxx-abi.
I think in the name of consistency we should follow what we do for function parameters (and what implementations do in practice) and number based on depth and index within that depth. Specifically:
<template-param> ::= T_ # first template parameter at level 0
::= T <parameter-2 non-negative number> _ # subsequent template parameters at level 0
::= TL <L-1 non-negative number> __ # first template parameter at level L
::= TL <L-1 non-negative number> _ <parameter-2 non-negative number> _ # subsequent template parameters at level L
(where L is the level number excluding any levels of template parameters that have already been substituted; for example, the template parameters of a function template that is a member of a class template would get mangled with L = 0, because the class template's arguments get substituted before mangling).
This seems straightforward for compilers and for demanglers, and has the advantage for human readers that the name for a template parameter doesn't change in different scopes in the same mangling.
from cxx-abi.
That sounds fine to me. The text should indicate what order levels are numbered in (presumably outside-in). Do you want to write up a patch?
from cxx-abi.
Yes, I'll assemble a patch.
One more question: should we mangle template parameter packs differently from non-pack parameters? The <template-param-decl> approach described above (and implemented in Clang) does not do so.
I would note that the mangling for these two have conflicted since C++11:
#ifdef TU1
template<template<typename T> class C> void f();
#else
template<template<typename T, typename ...> class C> void f();
#endif
template<typename T> struct C {};
void g() { f<C>(); }
... so I think using distinct manglings for pack parameters would be appropriate.
Suggestion:
<template-param-decl> ::= Tp <template-param-decl> # template parameter pack
(by analogy to the use of Dp <type>
for function parameter packs).
from cxx-abi.
That seems like a template-template-argument example of the general problem that we neither mangle template parameter lists nor reflect coercions in template arguments, right? I still feel like if we're interested in solving that problem, we should solve it in general rather than poking around the edges.
from cxx-abi.
I think in the name of consistency we should follow what we do for function parameters (and what implementations do in practice) and number based on depth and index within that depth. Specifically:
Quite coincidentally, I proposed the same extension to <template-param>
(with slightly different term names) for our internal use earlier this week. So I'm definitely +1 for this approach.
I'm also in favor of the Tp
extension to specify template parameter packs.
from cxx-abi.
P0315R4 means that lambdas can appear in the signature of a function template, so we need a mangling for lambda-expressions too. I think the obvious mangling is the following (and I've incorporated it into pull request #85 with the other changes from this issue):
<expr-primary> ::= L <lambda type> E # lambda expression
I've implemented all of this other than lambda literals (and P0315R4 more broadly) in Clang and libc++abi, and it seems to work well enough.
from cxx-abi.
Do you think that leaves adequate room for the 100% inevitable extension of SFINAE to lambda bodies?
from cxx-abi.
Less snarkily:
- Is a
<lambda type>
the<closure-type-name>
for the lambda or just a signature? - What is the context for a lambda that appears in a function signature? Is it the function? Remember that the context matters for three things:
- It might be mangled as part of the
<closure-type-name>
when mangling the lambda expression, depending on the answer to the question above. I think this would require the ultimate symbol name to be infinitely long. :) - It's mangled as part of the
<closure-type-name>
in other contexts, including the context of theoperator()
as well as any template specializations that use it. - It's the scope of unique discriminators for lambdas with identical signatures. A propos, we also need to decide the inter-ordering of parameters and the return type (and does this change if there's a trailing return types?), as well as whether the function body continues the same scope or not (and if not, how are the context differentiated in the mangling?).
- It might be mangled as part of the
- Is the note about not sharing between translation units still fully true under modules?
from cxx-abi.
CWG is pretty solidly opposed to permitting SFINAE of lambda bodies; at this point it would be a complete reversal of direction. Of course, that doesn't mean it won't happen, but I don't think it's all that likely. That said, I don't think this approach actually precludes such an extension. The contexts in which we will mangle lambda expressions with this proposal are the contexts in which you couldn't overload on a SFINAE constraint in the closure type anyway. (This proposal only covers member functions, members of local classes, and other similar things that appear within some larger ODR context. If a lambda-expression is part of the signature of a non-member function template, the symbols for specializations of that function template should not have external linkage, modulo the wrinkle with modules.) If we changed our minds and started allowing SFINAE on lambda bodies in the future, we would need to mangle the lambdas' bodies in the cases where we would currently give function template specializations internal linkage. So we'd end up with inconsistent rules (sometimes mangling bodies, sometimes not), but no ABI break.
Onto the questions:
- The idea is that the
<lambda type>
is the<type>
for the lambda (probably not directly a<closure-type-name>
, more likely a<nested-name>
or<local-name>
involving one -- though perhaps we can argue the qualifier is entirely redundant and just encode the<closure-type-name>
directly?). Even if we omit the qualifier, I think we need to include the discriminator as part of the mangling; consider:(Here, the two different function templatesstruct S { template<bool B> void f(X<[]{ return B; }()>) { ... } template<bool B> void f(X<[]{ return !B; }()>) { ... } };
S::f
need distinct manglings.) - When a lambda-expression appears at class scope but not within a default member initializer or default argument, I think we should use the class as the context for the numbering. (I think we already cover all the other contexts to which the ODR applies with the existing rules.) I missed this with my updates so far; I'll add it to the pull request now.
- Yes, there are new complications for lambdas in modules, see #84 -- but those complications arise regardless of the other changes here.
from cxx-abi.
Oh, there's one other context in which a lambda can appear where we will need to give it a context and a numbering: in the type of an inline variable or variable template:
inline decltype([]{ static int n = 0; return ++n; }) x;
This one seems hard to deal with, since we don't find out the mangling context until after we have already finished processing the lambda. Ouch. I'm going to take that case back to CWG to see if we can disallow that.
from cxx-abi.
Having thought about this a bit, I think it's preferable to use L <closure-type-name> E
rather than L <type> E
for lambda expressions:
- it's much easier to demangle, since you can match
LUl
rather than needing to step over arbitrary name prefixes to determine whether you have a lambda expression - it's always sufficient; the context information is always unnecessary (you always get "the context that would be used for a lambda appearing in this position", which you already know)
- it more directly matches the approach of "mangle what you see in the source code"
Pull request updated to match.
So far, CWG seems to agree with the direction of disallowing lambdas in the types of inline variables and variable templates.
from cxx-abi.
Can we also say that there's never a discriminator so that we can completely bypass the context problem?
I was asking about modules because, if something with formally internal linkage in the module interface has to be usable across translation units, we basically have to solve that set of problems anyway, so there's no point in including a statement like "implementations should make sure they use internal linkage for these".
Lambdas that are used in e.g. template arguments can still have their members used, right? So we still need a mangling for the type and its operator()
?
from cxx-abi.
The struct S
example a few comments back needs us to mangle the discriminator. I think that's unavoidable due to cases like that one.
For a lambda in a module, we will have a module name (or module partition name) to number within; I think we will still need to say something about the case where we don't have such a name.
Yes, lambdas used in template arguments or decltype
can still have their members used. For example:
struct A {
int f(decltype([]{ static int n; return ++n; }) lambda = {}) {
return lambda();
}
};
int a = A().f(), b = A().f();
... and the ODR says that's the same lambda and the same static local variable across all translation units.
from cxx-abi.
Okay. So, I'm extremely uncomfortable about making the ABI of a declaration depend on the exact set of declarations that have come before it in its class / namespace / module. If it was local to a translation unit, that would be bad enough, but already it's not because of modules, and I wouldn't really believe that it would stay restricted anyway. If we can ask the committee to reconsider anything, can we ask them to reconsider whether cases like your struct S
need to be supported?
from cxx-abi.
Asked; for those with C++ committee reflector access, this is being discussed in this thread.
from cxx-abi.
@zygoloid, in your specification for:
<template-param> ::= T_ # first template parameter at level 0
::= T <parameter-2 non-negative number> _ # subsequent template parameters at level 0
::= TL <L-1 non-negative number> __ # first template parameter at level L
::= TL <L-1 non-negative number> _ <parameter-2 non-negative number> _ # subsequent template parameters at level L
Should that be L-2
with the level number omitted for L=1? Or was it intentional that a number always follow L
? My expectation was that:
- For Level 0, Param 0:
T_
- For Level 0, Param 1:
T0_
- For Level 1, Param 0:
TL__
- For Level 1, Param 1:
TL_0_
- For Level 2, Param 0:
TL0__
- For Level 2, Param 1:
TL0_0_
from cxx-abi.
@tahonermann, my intention was to be consistent with the directly analogous mangling for function parameters, which use fp_
/ fp0_
/ ... for level 0, fL0p_
/ fL0p0_
/ ... for level 1, fL1p_
, fL1p0_
/ ... for level 2, and so on. (I did consider the approach you suggested, but weighed the consistency argument -- and keeping the rule a little simpler -- slightly higher.)
I would have no objection to switching to the scheme you mention if we think the complexity is worthwhile.
from cxx-abi.
Thanks @zygoloid. Mostly i just wanted to confirm the intent. I wasn't aware that the function parameter mangling didn't omit the number for L=1
. The consistency argument is persuasive to me, so I'm happy to stick with what you proposed.
from cxx-abi.
Is any additional followup still needed for this issue? The associated PR is still open (and now has conflicts that will need to be resolved).
from cxx-abi.
Well, we were blocked on the CWG discussion about numbering lambdas, and as someone who doesn't pour over mailings, it wasn't really clear to me how that was resolved. The idea of making these lambdas internal linkage seemed appealing, but I'm not sure if the CWG actually went for that in the end. It would be good to get some resolution there.
from cxx-abi.
I'm not sure there is a CWG issue with respect to numbering. My impression of the above is that either a lexical or depth mechanism would would work but there was a preference for the depth/position mechanism.
I'm not clear on the difference between this issue and #85. This seems mostly like a subset of that one.
CWG did make some changes to make it ill-formed to use a TU-local closure type from another TU via modules, but I think that is orthogonal to this issue.
from cxx-abi.
I'm not clear on the difference between this issue and #85.
@jhsedg, this issue (#31) tracks the general problem, #85 is the pull request with the changes proposed to resolve it.
from cxx-abi.
Any progress on merging this in?
from cxx-abi.
Related Issues (20)
- "Deducing this" mangling HOT 14
- Should std::rethrow_exception be covered by the EH ABI? HOT 2
- Emergency EH buffer is overspecified HOT 6
- Where is the most recent ABI document? HOT 1
- Add `[[trivial_abi]]` attribute
- Lambda POD for the purposes of layout? HOT 2
- Mangling the name of an externally visible lambda in a static data member of a class HOT 1
- Proposal: Include an optional specification for mangling names that reference anonymous symbols HOT 4
- Is it possible to form a pointer-to-data-member with offset -1 using explicit derived-to-base conversions without UB? HOT 3
- unnecessary `E`s after <expression> and mangling collisions between <expression> and <number> HOT 1
- need mangling for lambdas appearing in unevaluated operands within a class body HOT 3
- What does "forbidding the use of function templates" mean? HOT 2
- [C++20] [Modules] Do we need the concept of `key function` for class defined in module purview? HOT 25
- Missing HTML encoding in 2.3.1 Data Member Pointers HOT 2
- Proposal: document or somehow notice __cxa_init_primary_exception HOT 3
- Mangling for C++ pack indexing HOT 1
- Function and function pointer types with vendor calling conventions HOT 2
- Ambiguity in mangling grammar around type qualifiers HOT 8
- Question about section 2.9.4 HOT 4
- Questions About Non-POD Types Data Layout HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cxx-abi.