codeplaysoftware / standards-proposals Goto Github PK
View Code? Open in Web Editor NEWRepository for publicly sharing proposals in various standards groups
License: Apache License 2.0
Repository for publicly sharing proposals in various standards groups
License: Apache License 2.0
Folowing on from our discussion regarding the lifetime of execution resources I have recently been thinking about how we should define equality of execution resources.
It would be very useful for users to be able to compare one execution resource against another, to check if they are pointing to the same underlying resource. However, this begs another quetion of whether execution resources should be required to be consistent identifiers.
For example, if you were to discover the system topology multiple times, should the same hardware resources always be represented by comparible execution resources? Or if you were to construct a particular type of execution context that does not require that it is constructed from an execution resource, but can return an execution resource, should it possible to compare this resource against the equivelant from a system topology discovery?
The current interface for affinity queries supports querying latency, bandwidth, capacity and power consumption. However, there is currently no way to query for the relationship between an execution resource and a memory resource for memory region properties such as whether they support atomic operations for concurrent access to the memory.
We may be able to support this with the existing affinity query interface, though this will need to be investigated.
The background section is still missing discussions about NUMA architectures, Chapel and other PGAS models from various discussions on these topics. We should add further discussion of these for the next revision of the paper.
The Affinity execution_context
has asymmetry in that it is constructed from an execution resource but supplies a memory resource. Should relationship be between execution resources and memory resources without an intervening execution context?
Differences in the intersection of papers Affinity - D0796r2 and Context - P0737r0.
std::thread
specific resource: Affinity defines execution_resource
which is implied to be an execution resource that executes std::thread
. Context defines thread_execution_resource_t
which is explicit about executing std::thread
. Question: Should we lay the foundation for non-std::thread
execution resources (e.g, GPU) by embedding the name thread in the initially proposed execution resource?
Affinity execution_resource
is moveable and copyable, Context thread_execution_resource_t
is neither moveable or copyable. Question: Should an execution resource be a PIMPL (pointer to implementation) value type or an immutable reference? Creating and managing vectors of execution resources requires PIMPL .
Affinity has std::vector<resource> execution_resource::resources() const noexcept;
and Context has const thread_execution_resource_t & thread_execution_resource_t::partition( size_t i ) const noexcept ;
. Related to the PIMPL question.
Affinity has std::vector<execution_resource> this_system::resources() noexcept;
and Context has thread_execution_resource_t program_thread_execution_resource ;
. Question: Should there be a root execution resource that is a handle to the union of all individual execution resources? A root execution resource, by definition, has a vector of nested resources which is equivalent to the Affinity proposal's root vector of resources.
Affinity execution_context
has name()
. Is this for the type of resource, denote the topological identity of the resource, or both? Context avoided the design point, for the initial proposal.
Affinity can_place_*
methods open the design point of placement without addressing how to query if placement has occurred or how placement can be performed. Context avoided the design point in the initial proposal.
Affinity execution_context
is constructed from an execution_resource
without additional properties, implying 1-to-1 correspondence. Context execution context (concept) construction is undefined.
In the Rapperswil feedback, it was suggested that the bulk_execution_affinity
interface should be extended to also incorporate the unit of displacement, for example, whether to scatter by core or socket, etc.
This could be introduced via a new property of a parameter to the existing property.
Though we should also consider that introducing this may require us to enumerate types of execution resource, such as threads, cores and sockets.
It was suggested by SG1 at the Prague meeting, that we should change the paper name to be more inclusive of other domains as well as heterogeneous and distributed computing. Perhaps we could just change it to simply "System topology discovery for C++"?
There was feedback from Jacksonville that not all users would want to dive into the fine-grained work of querying a system's topology and manually binding resources, and that many users would instead want a higher-level descriptive interface which allows the implementation to decide how to allocate resources based on the user's requirements. There were concerns that we don't yet clearly present what a high-level interface for affinity would look like.
For Rapperswil we should provide clarity on the different levels of interface that we are proposing and their target users, and also on how we envision a high-level interface for affinity C++ to look like, preferably with examples.
One aspect of the feedback from SG1 on P1795r1 at Belfast was that we need to demonstrate how the abstract topology discovery interface proposed would work in practice, and provide some examples of how properties of the system topology could be used generically within applications.
I think one of the first steps in this is to identify the potential abstract properties or queries that could be expressed generically, i.e. not pertaining to any particular kind of processor or system component.
So far I have the following list:
We don't need to propose all of these properties now, but we can prepare a pseudo list of properties or queries for expositional purposes to demonstrate how algorithms could take advantage of this interface.
There was feedback from Jacksonville suggesting that we add a way to retrieve the execution resource or execution context of the current thread of execution.
It was raised that the current design limits the creation of an execution_context
to a single execution_resource
therefore enforcing that the execution_context
represent all member execution_resource
s. This excludes the case where you may want to, for example, create an execution_context
from half the threads of a core. In this case, you would want to list the execution_resource
s you want the execution_context
to represent.
For this we need to add an additional constructor to the execution_context
or alter the existing constructor to allow it to take a set of execution_resource
s. Perhaps we could do this by providing partitioning algorithms which can partition an execution_resource
in a particular way and return a new iterable which could then be passed to the execution_context
constructor.
There was feedback from Jacksonville that we should start looking into how to support dynamic device discovery, where an execution resource within a system may become available or unavailable during execution.
In order to reliably support this feature there needs to be a guarantee that if a device can become available or unavailable during execution, that this can be handled gracefully. For this reason we should aim to make this feature optional so that implementations which cannot handle dynamic device discovery gracefully can opt to not support it.
Supporting dynamic device discovery would also mean that the system topology may change between one query and another. So there needs to be a way for users to be notified of a change to the system topology through some form of callback mechanism, and there needs to be a way for for users to update the topology information when this happens.
Another option could be to not be specific about whether an execution resource is dynamic or static, but simply have some execution resources to be updated by dynamic device discovery.
@AerialMantis (cc: @Ruyk)
We are stalled on D0796r2 waiting for you to weigh in with reviews on pull request #52 and others.
There was feedback from Jacksonville that while the internal structure of a system's topology when being queried is inherently hierarchical it's generally desired that the user interface not be hierarchical.
For Rapperswil, we should add further discussion on the requirements of the system topology structure and highlight that the more hierarchical like structure is only for the low-level interface and that the high-level interface would be more descriptive.
In our last discussion, we decided that we should create a motivational example of how a developer could use the topology discovery design proposed in P1795 to optimise an algorithm such as matrix multiply based on different system architectures.
Feedback from the Rapperswill meeting suggested that we should remove the this_thread::bind
and this_thread::unbind
interface, as it is too open to misuse and also conflicts with other more desireable approaches.
We decided that in order to avoid having to return a container of member resources from each resource, we should instead make the execution_resource
(and the subsequent memory resource type) iterable. This means replacing the resources
member function with begin
an end
member functions. Perhaps we should also define iterator traits for the resource types.
In the current revision of the paper the execution_context
is lightly defined, and simply provides a way construct a context around a particular part of the system topology in order to get an executor or allocator with affinity to specific resources.
After discussing this design with Chris K on the executor telecom he raised the very good point of what the execution_context is trying to be. Is it (a) the execution context; a polymorphic type which can serve as a wrapper for other concrete execution context types such as a thread pool, in the same way, the polymorphic executor does. Or is it (b) simply another concrete execution context type like static thread pool type is that is specifically designed for managing the resources of a discoverable topology.
Both of these are reasonable goals though they have very different scopes. If we were to aim for (a) then the scope is much larger, the execution_context must be fully compatible with all concrete execution context types, which would likely mean introducing explicit properties which can be mapped to the various properties of the concrete execution context types. If we were to aim for (b) then the scope is less, the execution_context can be limited to functionality which is required for managing resources discoverable in the topology. Additionally, if we were to aim for (b) we should probably rename the execution context type to something like resource_execution_context
.
Personally, I think we should aim for (b) as it is a more limited scope, and trying to define a more generic execution context type means making the design compatible with many other concrete execution context types, which atm there are not many of. I feel it may be too early to try to define what a standard execution context should look like.
Feedback from SG14 on the proposed wording of P2155r0:
HMM (Heterogeneous Memory Management) is a proposed interface for supporting non-conventional memory architectures into the regular kernel path. We should look into this as background research for the paper.
https://github.com/torvalds/linux/blob/master/Documentation/vm/hmm.rst
Some feedback was received in the Belfast meeting that it would be useful to identify whether resources within a system topology are contested and be able to discover only the parts of the system topology which are non-contested, perhaps via some kind of flag.
The current design doesn't make any guarantee as to whether the resources reflected in the system topology are uniquely available and uncontested by another part of the application or another process. I would be beneficial to define when users can expect to have uncontested access to resources when it's possible for the implementation to do so and provide a way to only discover resources that are available. Though this might have to be done at a fine-grained
level as some resources may not be able to reflect this information and some resources may only be partially contested, for example, a bounded thread pool may take a specific number of threads.
I can think of three different situations where this information could be available when discovering the system topology:
Perhaps this is something which needs to be queryable on a per-resource basis.
Another point to consider here is that whether resources are contested could change dynamically, so it would have to factor in consideration about how the topology is updated and how users are notified of changes.
Some feedback from the Belfast meeting was that it would be useful to have configuration providers which can inject information about resources into the system topology relevant to a particular environment.
Some initial thoughts, I wonder if such an interface could be used to add entirely new resources to the topology or to simply add additional information to resources. I think the latter should be relatively straight forward, providing the resources available in the system match the expectations of the configuration provider. The former may be more complicated, adding new resources could be fairly trivial if we provide a way to create resources and populate its information, the difficulty
would come in when defining connections or possible contentions with existing resources in the system topology.
It was pointed out that this is also a nice solution to the problem we have of how to define non-standard domain-specific identification of the abstract C++ resources. If configuration providers can see the topology when injecting information then they could be used to provide concrete labels for specific resources, even by just checking their names. Then these configuration providers could be provided open-source supplementary to the standard.
In the new update methods of the handler (to/from device), seems that the case where buffer contents are updated with other buffer contents is missing.
Following on from a discussion here and here in prior to P1795r1 about the return type of traverse_topology
.
The two alternatives considered were to either have traverse_topology
return a vector<system_resource>
which requires system_resource
to be copy constructible or to have traverse_topology
return a ranges::view<system_resource>
so that the collection can be further processed lazily after it is returned. We also discussed the possibility of combining the best of both, by having system_resource
be semiregular
and then returning a ranges::view<system_resource>
that is temporarily tied to the lifetime of the system_resource
but capable of being assigned to a container such as a vector
after any lazy transformation is done.
This also raised the question of whether the topology information contained within a system_topology
object is static, I believe we are leaning towards this being the case to avoid the topology being modified asynchronously while it's being inspected.
We should continue the discussion of this and clarify this in P1795r2.
Currently, the requirements for the execution_resource
are quite vague:
[Note: The intention is that the actual implementation details of a resource topology are
described in an execution context when required. This allows the execution resource objects
to be lightweight objects that serve as identifiers that are only referenced. --end note]
In #40 we decided that the execution_resource
should be copyable and moveable but must act as an opaque type with reference counting semantics.
Taken from #40:
- Answering the second point, the
execution_reosurce
should remain copyable and moveable so that it can be used within std alogrithms, but it should be an opaque type with reference counting semantics.
Perhaps we want to introduce normative wording which requires certain behaviour of the execution_resource
when being copied or moved in order to guarantee the corerect behaviour.
Feedback from some users after trying out the placeholder accessors seems to indicate that they should be default constructible, and that the buffer should be assigned later during the requirement setting stage.
In the last call we discussed the direction to go in for P1437: System topology discovery for heterogeneous & distributed computing, now that it's been split off from P0796. We looked at some of the use-cases for having a low-level affinity interface in C++ and what we would like such an interface to look like. We decided that based on the feedback from Kona we should refocus the motivation and goals of the proposal for a low-level affinity interface in the first revision of P1437.
Some of the benefits of a low-level affinity interface in C++ that we discussed were:
We discussed that having a standardized interface in C++ for querying the topology of a system for its execution resources and the affinity relationships between those resources would be highly beneficial for writing generic code that can target heterogeneous platforms. However, we also recognised that expecting C++ to keep up with the rapidly changing and developing architectures within heterogeneous computing domains and to support their various unique features and capabilities is unrealistic. To this end, we would like to aim instead for C++ to provide a unified layer between future hardware standardization efforts like HMM and executor based programming models such as threads pools, SYCL or Kokkos. This would provide a middle layer for users and library implementors to target in order to write more generic and potentially "performance portable" applications and programming models, whilst also providing hardware vendors with a way to extend the interface to provide support for the more unique features and capabilities of their architectures.
We discussed concerns that the current C++ abstract machine and the language around it are just not sufficient for describing heterogeneous systems. So while expecting the C++ abstract machine to be completely revamped to cover a range of different hardware features and capabilities is unrealistic, there will have to be some new language introduced to allow C++ to describe the system topologies that are being queried. We noted that this is something that is even becoming evident in P0443, the unified executors proposal, where it's proving difficult to express certain properties in the language the C++ abstract machine currently provides.
Closely related to this we discussed the move towards a unified address space in heterogeneous systems via SVM and HMM. We made the point that this move actually makes the case for affinity in C++ stronger, because while you have different address spaces, the distinction between different hardware memory regions and their capabilities are clear, but once you have a single unified address space, potentially with cache coherency, distinguishing different memory regions becomes much more subtle. Therefore it becomes much more important to understand the various memory regions and their affinity relationships in order to achieve good performance on various hardware.
We also discussed one the more controversial aspects of P0796, that being the current representation of the system topology, still being largely hierarchical, as closely based on Hwloc. While Hwloc is highly used in many domains, it now does not always accurately represent existing machines, because it's structure is strictly hierarchical, while many machines no longer have a simply hierarchical topology. To solve this we discussed a potential graph representation for a system topology where you have node relationships that represent the containment relationships of machines, sockets, CPUs, etc, but also have node relationships that represent network and memory region connections. So the graph becomes more of an opaque system representation that can be viewed from a number of different perspectives, depending on what relationships you are interested in.
Going forward here I think we should have some further discussion of the motivation and goals and perhaps decide on some clear use cases and then at some point I would like to put together a merge request for updating the front matter of P1437, and perhaps take out the proposed interface for now.
As discussed in #40 we need to provide a way for users to identify the type of an execution_resource
.
Taken from #40:
Answering the first point, the
execution_resource
should be a generic execution resource type that isn't associated with any particular type of resource, however we should introduce someway of identifying what kind of resource a particularexecution_resource
is. A runtime approach would be favourable over a compile-time approach, firstly as many low-level APIs which provide access to a system's topology such as Hwloc, HSA and OpenCL and runtime discoverable so a compile-time interface would not be suitable for expressing this, and secondly because having a compile-time interface would mean introducing a large number of types, which would reduce or complciate the ability to store resources generically.
With the new proposal for buffer properties the optional mutex parameter of the buffer constructor can instead be provided as a property. This will reduce the number of buffer constructors and allow the mutex to be used in combination with other constructors not possible before.
One comment that was made in Belfast was that the naming of the properties reflects an older revision of OpenMP, so one of the first things I propose for this paper is to update the naming to that of OpenMP 5.0.
This would mean:
bulk_execution_affinity.none (remane the same)
bulk_execution_affinity.scatter -> bulk_execution_affinity.spread
bulk_execution_affinity.compact -> bulk_execution_affinity.close
bulk_execution_affinity.balanced (remane the same)
I also wanted to clarify the meaning of the concurrency property in P1436r2, particularly as I think this could be relevant to the wording of the bulk_execution_affinity properties. The intention is that it represents the maximum available concurrent execution agents available to an executor when used in a single invocation of execution::bulk_execute. This does not guarantee that these execution agents will always be created with concurrent forward progress and it also assumes that the execution resources are uncontested by other executors or third party libraries. One concern with this definition that we may want to address is that it does not allow any control over the domain or level of the hierarchy it is applied to, so you cannot use this property for nested calls to execution::bulk_execute with different affinity binding as you would in say OpenMP, so this is perhaps something we want to address.
For the wording of the bulk_execution_affinity properties, I have drafted initial wording based on the discussions in Belfast (I hope I accurately captured the direction we were going in). The basis of this wording is the assumption that an invocation of execution::bulk_execute(e, f, s) creates a consecutive sequence of work-items from 0 to s-1, mapped to the available concurrency, that is some number of execution resources, which are subdivided in some implementation-defined way.
_Property | Wording |
---|---|
bulk_execution_affinity.none | A call to execution::bulk_execute(e, f, s) is not required to bind the created execution agents for the work-items of the iteration space specified by s to execution resources. |
bulk_execution_affinity.close | A call to execution::bulk_execute(e, f, s) should aim to bind the created execution agents for the work-items of the iteration space specified by s to execution resources such that the average locality distance between adjacent work-items is minimized. Only binding subsequent execution agents to a resource if no other resources would otherwise result in fewer execution agents being bound to it. |
bulk_execution_affinity.spread | A call to execution::bulk_execute(e, f, s) should aim to bind the created execution agents for the work-items of the iteration space specified by s to execution resources such that the average locality distance of adjacent work-items in the same subdivision of the available concurrency is maximized and the average locality distance of adjacent work-items in different subdivisions of the available concurrency is maximized. Only binding subsequent execution agents to a resource if no other resources would otherwise result in fewer execution agents being bound to it. |
bulk_execution_affinity.balanced | A call to execution::bulk_execute(e, f, s) should aim to bind the created execution agents for the work-items of the iteration space specified by s to execution resources such that the average locality distance of adjacent work-items in the same subdivision of the available concurrency is minimized and the average locality distance of adjacent work-items in different subdivisions of the available concurrency is maximized. Only binding subsequent execution agents to a resource if no other resources would otherwise result in fewer execution agents being bound to it. |
Note: the subdivision of the available concurrency is implementation-defined.
Note: when the number of work-items is greater than the available concurrency, the binding should wrap following the same subdivision._
We may want to reconsider the terms "concurrency" and "locality distance" in the above wording, another suggestion during the SG1 session was to incorporate the idea of "interference", used in the existing hardware_[constructive|destructive]_interferance queries.
Additionally, the current behaviour when the number of work-items is greater than the available concurrency the binding should wrap, however, we may wish to define further properties for alternative chunking patterns.
This proposed wording was also sent to the SG1 mailing list to start a discussion there.
I feel we do that switch rather fast without some overall high level design description. Or may be it is scattered through the Proposed Wording.
I didn't find that part in the paper, are we not posting that part for review?
It was suggested at the Rapperswil meeting that we should consider alternatives to having the affinity_query
comparison operators return a size_t which describes the magnitude of the relative affinity and having the comparison operators return a boolean.
Update the contributor list for D0796r2.
We decided that the polymorphic allocator does fit the requirements we have for affinity-based allocation, so we should drop the pmr_memory_resource_type
from the paper and just leave the allocator_type
.
Whilst adding the bulk_execution_affinity
properties to the paper there was some discussion about who should be responsible for specifying the properties, the execution context, the executor or both.
(taken from #48):
Ruyk:
Let me see if I understand this properly. Assuming this is a simple fixed size thread pool of size 4 underneath:
The thread pool itself is created and "maintained" by the execution context. That means the threads are created on construction of the execution context. At this points, threads are bound to wherever execution(s) resource(s) have been indicated on construction of the executionContext, if any.
Then we perform a require for a bulk executor following a scatter policy. What do you expect to happen?
A. The threads of the thread pool are re-bound following the scatter placement of threads per resource
B. The new affExec will enforce placement of execution agents on the thread pool threads following the scatter policy.
C. Neither of the above
My understanding is B from the proposal.
However, if what I want to do is place the actual execution threads following scatter policy on the given cores, I would need to pass the policy on construction of the execution context, rather than the executor (since the thread pool has been created already). I could potentially re-bind threads after they have been created, but that has a cost that could be avoided if the initial placement of resources of the thread pool is done on construction. Can we have an alternative constructor where these policies are passed to the execution context? If we are having a fine-grain selection of member_of resources, this will allow for the high-level interface to be used in that case.
Now we call bulk_execute with the callable. How is each execution agent placed onto an execution resource? If the executor property is thread_execution_mapping_t or other_execution_mapping_t , that means the executor will query the placement (presumably an id) and place agents on existing threads. Which one is a valid placement?
A. Agent 0 in Thread 0, Agent 1 in Thread 2, Agent 2 in Thread 1, Agent 3 in Thread 3, then loop over again for the remaining 4 agents
B. Agent 0 in Thread 0, Agent 1 in Thread 3, Agent 2 in Thread 1, Agent 3 in Thread 3, then loop over again for the remaining 4 agents
C. Either of the above
My understanding is A will be correct, or at least commonly expected.
However, when oversubscribing agents on threads is not clear what to expect when using these policies. Will the executor hold execution of an agent until the resource "far away" is available or will execute consecutively if possible?
Also note that the same execution context (in this case, the same thread pool) may be in use by multiple executors. What is the expectation in terms of placement of agents on threads?
AerialMantis:
The way I see this working is that if you were to construct an execution context, say the static_thread_pool and initialise it with a set number of threads of execution, such as 4. By default, the bulk_execution_affinity property would be none so the thread pool wouldn't be required to make any kind of guarantee as to how it's threads of execution are bound to the underlying resources. It could automatically bind each of its threads of execution to a certain resource if it makes sense for it to do so, but it wouldn't have to. It would only be when a different bulk_execution_affinity property was requested by an executor that it would be required to perform binding in a particular pattern. So when this happens if the thread pool had already performed binding then it may have to rebind to achieve the binding pattern requested by the executor.
The reason for having these properties on the executor rather than the execution context is to make it more usable as a high-level feature. There are some executor implementations which will not make use of an execution context, such as inline executors which are just created as concrete executor types and used without referring to an explicit execution context, an execution context still exists though it's implicit.
I think we should have a more fine-grained interface for configuring execution contexts where you can specify specific affinity binding on construction. In this case, the executor which the execution context provides would only support the bulk_execution_affinity property which the execution context was constructed with, so could not be requested to use another, essentially becoming query only. So essentially this would mean that you could specify affinity binding at the execution context level or the executor level, but specifying it at the execution context level would take precidence over the executor level as the executor would inherit the property.
In terms of how the affinity binding is implemented within bulk_execute, I would also agree that A would be the most commonly expected, though I think it would be okay to allow an implementation to decide, providing it matched the requirements of the pattern and was consistent across invocations of bulk_execute. For oversubscribing agents I would expect that you would have multiple agents mapped to the same threads of execution, so only one agent could progress at a time. As to how those agents are ordered I think that would be at the discretion of the implementation, though implementations would be expected to order agents in way which is most efficient for the requesting binding pattern.
You raise an interesting point here in how an execution context will deal with multiple executors submitting work to it. In terms of affinity binding patterns, I think I would expect that if one executor was to request one bulk_execution_affinity property and another executor was to request a different property then one task would have to be scheduled before the other with the affinity binding pattern being altered between tasks. Though we would recommend users not do this frequently as it could be inneficient to keep rebinding the threads of execution. If both tasks were to be using the same affinity binding pattern, then they could conceivably be overlayed, though I think this would be at the discretion of the implementation as to whether this could be done efficiently.
Ruyk:
So essentially this would mean that you could specify affinity binding at the execution context level or the executor level, but specifying it at the execution context level would take precedence over the executor level as the executor would inherit the property.
I guess the opposite would make more sense: The one closer to the user is the one that should make the final decision.
I guess my concern is that associating the binding with the executor may incur in extra costs, since the executor dispatch may be called in a part of the code that is performance critical, whereas the executor context can be defined anywhere.
Apart from that, what you says makes sense to me. Since this is an exploratory paper, I am happy if this is merged with the minor change above and just add a straw poll for deciding where we should do the affinity (executor context, executor or both).
AerialMantis:
Yeah, that's a good point, perhaps what we want then is for the executor to be able to override the behaviour of the execution context. Maybe then the property of the execution context could be inherited by the executor as the default behaviour, but can still be altered by the executor.
I think that's a fair concern, that putting control over affinity binding at the executor level could incur costs at the wrong time, though if we also add the ability to configure the execution context as well with the same property then this should alleviate that, as the cost will be paid at configuration time.
Okay great, I will add some notes to the paper covering some of the points we discussed here and I will add a straw poll for where the affinity binding capability should go.
It was raised at the Rapperswil meeting that the proposal for mdspan (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0009r7.html) also provides mechanisms for specifying how memory is laid out and that we should ensure that the affinity proposal aligns with this.
There was feedback from Jacksonville that it would be useful to be able to query the load factor of an execution context, as to make a decision based on the current load of different contexts.
I wonder if we need a memory_context similar to the execution_context that has the allocation capabilities, e.g, you retrieve an allocator that is bound to the memory resource.
This allows implementation to add the machinery required to allocate or to bind allocation on the memory_context itself.
In the current revision of the paper (r3) the discover_topology
function is permitted to throw an exception in case of a failure in discovering the system's topology. However, this could be problematic as this failure could prevent a library dependant on this discovery from functioning, even if the failure had nothing to do with the resources the library was looking to utilise.
A solution to this could be support partial error in topology discovery, where calling discover_topology
could be permitted to fail but still return a valid topology structure representing the topology that was discovered successfully. The way in which these errors are reported (i.e. exceptions or error values) would have to be decided, exceptions could be problematic as it could unwind the stack before capturing important topology information.
Affinity has affinity_query
between two execution resources. Recommend this be between an execution resource and a memory resource.
In P0796r2 affinity queries are performed by using the comparison operators >
and <
between two affintiy_query
objects returning an expected<size_t, error_type>
representing the magnitude of the difference in the two properties. However, some feedback has pointed out that this approach is problematic as it means that the <=
and >=
operators cannot consistently be supported.
We should consider alternative approaches which allow for a more typical use of the equality operators but still meet the existing requirements.
Adding memset to SYCL by using https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clEnqueueFillBuffer.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.