EDIT: 2020-02-12 Expanded the introduction section with more details explaining the existing problems in non-structural async.
Abstract
This is a proposal to change the semantics of the async
procs in a way that enforces a more structural control flow. The goal of the new APIs is to force you to await
your async operations, while still allowing you to easily execute multiple operations in parallel. The proposal eliminates a large category of usage errors with the old APIs and enables some additional optimisations such as storing your Future[T]
results on the stack and creating async procs consuming stack-based openarray[T]
inputs and var
parameters.
For a more comprehensive set of rationales for enforcing the structural control flow proposed here, please read the following article:
https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
Acknowledgements: some of the key ideas in this proposal were first suggested by @yglukhov in private conversations.
Current problems:
P1) All non-ref input parameters to async procs must be copied
Consider the following async proc:
proc checkBrokenLinks(uris: seq[Uri]): Future[seq[bool]] {.async.} =
## Tests all supplied URLs in parallel and returns
## whether they are still reachable or not.
...
If this wasn't an async proc, Nim would pass the supplied input sequence as a read-only reference (please note that I'm using C++ terminology here). This relies on the fact that the lifetime of the sequence at the call-site will surely extend until the moment the function delivers its result.
Unfortunately, in the async world this is no longer the case. The caller of checkBrokenLinks
is free to use it like this:
proc brokenCode: Future[seq[bool] =
var uris = @[
Uri.init("git://github.com/status-im/nim-chronos"),
Uri.init("https://status.team")
]
let brokenLinksFut = checkBrokenLinks(uris)
...
return brokenLinksFut
If the uris
sequence was passed by reference, the async proc may be resumed after brokenCode
returns which may result in accessing the now dangling reference. To avoid this problem, the Nim compiler makes sure to copy all input parameters of the async proc to corresponding fields in the "environment" struct associated with the async proc's closure iterator. This copying may be quite expensive for value types such as string
and seqs
and the users are advised to avoid using such type in async procs and to prefer using ref
parameters where only a pointer must be copied.
P2) var
and openarray
parameters are not supported
As a corollary from the previous problem, it becomes impossible to use var
and openarray
parameters with async procs, because these require the input data to be passed by reference.
P3) The async/await syntax has easily-accessible degrees of freedom that may be dangerous for novice users
Consider the following simple async proc:
proc terminateConnection(s: AsyncSocket) {.async.} =
var myDisconnectMsg: array[X, byte]
prepareDisconnect(myDisconnectMsg)
var res = s.send(addr MyDisconnectMsg[0], X) # oops, forgot to call await here
s.close()
It showcases two critical problems triggered by a simple omission of await
in the code:
- The socket will be closed prematurely.
- The
send
operation will be working with bogus data.
This proposal argues that the default behavior should be completely safe and impossible to misuse, while the more advanced concerns such as enabling parallel execution could be handled with a more specialized syntax.
The Proposed Solution:
We create a new set of APIs that hide the explicit use of Future
values in the user code and enforce awaiting of all async operations. If all operations are awaited, it becomes possible to store the inputs of the said operations in the "pseudo stack" associated with the async proc, which in turn enables the use of the reference types such as lent
, var
and openarray
providing much better safety than the current pointer/len
inputs.
So, here is the full set of new APIs:
1. Allow await
to be given multiple arguments or a tuple
proc httpRequest(url: string): Future[HttpResult]
proc jsonRpcCall(url: string): Future[JsonNode]
proc foo {.async.} =
var keyword = "Status"
var (googlePage, jsonData) = await(httpRequest(&"http://google.com/?q={keyword}"),
jsonRpcCall("localhost/myApi"))
echo "HTTP RESPONSE ", googlePage.status, "\n", googlePage.body
echo "JSON RESPONSE\n", jsonData{"result"}
This form of await
just performs the I/O operations in parallel returning a tuple of the final results. It is similar to using var r1 = request(); var r2 = request(); await all(r1, r2)
in the current design.
For convenience await (foo, bar)
is considered the same as await(foo,bar)
.
2. Introduce a new select
API (EDIT: this point is made partially obsolete by point 4)
select
is a new API that is given a number of I/O operations that should be started in parallel. The key difference from await
is that the handlers installed for each operation will be executed as soon as any of the results are ready. Control flow keywords such as return
and break
can be used to cancel some of the outstanding operations:
proc foo {.async.} =
var keyword = "Status"
var timedOut = false
select:
httpRequest(&"http://google.com/?q={keyword}") as response:
# executes this as soon as it's ready
echo "HTTP RESPONSE ", googlePage.status, "\n", googlePage.body
jsonRpcCall("localhost/myApi") as jsonData:
echo jsonData{"result"}
return # returns from the current proc; skips the other handlers
*timeout(100):
# `timeout` is the same as `sleepAsync`
timedOut = true
break # continues after the select; skips the other handlers
echo "async ops ", if timedOut: "timed out" else: "finished"
The execution after the select
block continues when all of the handlers have been executed, although there must be a way to mark some of them as optional (here, I've used *
for this).
The named results are considered in scope after the select
statement. You can choose to only name a particular result without providing a handling block.
3. Introduce a new safeasync
pragma (EDIT: this may well be the default mode)
The safeasync
pragma is responsible for inserting the await
keyword in automatic way. It also has the role of the current multisync
pragma in the sense that it allows you to compile the same code for both sync and async usage:
proc foo: bool {.safeasync.} =
var keyword = "Status"
# Notice how I don't need to use await anymore
var (googlePage, jsonData) = (httpRequest(&"http://google.com/?q={keyword}"),
jsonRpcCall("localhost/myApi"))
return googlePage.status == 200 and not jsonData.hasKey("error")
How does this work? It inserts a call to a template called implicitAwait
on each expression within the proc's body. implicitAwait
is defined as an identity for all non-future types and as a regular await
statement for all futures:
template implicitAwait(x: auto): auto = x
template implicitAwait(x: Future): auto = await x
Please note that the body of a safeasync
will work in synchronous mode by executing each operation in turn. It's also possible to compile the code for implicit off-loading to a background thread pool in programs that don't feature an asynchronous event loop.
Appending 3.A
Please note that using the await
statement may still be supported inside safeasync
procs. One may use it to improve the code clarity. It's also possible to implement safeasync
in an alternative way that requires the use of await
and signals any omission as an error, but the arguments for this are not very strong - in all code there might be significant differences between operations that are algorithmically cheaper or heavier. It's usually the names of the operations that reveal where the I/O waits will happen.
4. Support async operations in parallel
blocks
I'm extending the proposal to also enhance Nim's parallel
construct with additional support for async operations. This proposal can replace the need for a separate select
API, although it could still exist as a simple high-level helper. The new features are the following:
Within parallel
blocks:
4.1) Allow spawn
to be followed by a do
block that will be executed with the result of the operation, once complete.
4.2) Allow spawn
to be used with procs returning Future[T]
results. spawn
immediately starts the async operation and it adds the Future
to a list of tasks to be awaited just before the exit of the parallel block. This enforces the structural handling of the async operations, but one can still work with the returned futures in the familiar fashion - passing them to helper procs, setting up callbacks and so on. It is guaranteed that the callbacks will be executed in the same thread that has entered the parallel block.
4.3) Add a new call called spawnOptional
that launches non-critical parallel operations. If the parallel
block is able to complete before all such operations have completed, they are simply cancelled.
4.4) Support break
and return
in parallel blocks by cancelling all outstanding operations.
With such an API, the select
example above becomes:
proc foo {.async.} =
var keyword = "Status"
var timedOut = false
parallel:
spawn httpRequest(&"http://google.com/?q={keyword}") do (response):
# executes this as soon as it's ready
echo "HTTP RESPONSE ", googlePage.status, "\n", googlePage.body
spawn jsonRpcCall("localhost/myApi") do (jsonData):
echo jsonData{"result"}
return # returns from the current proc; skips the other handlers
spawnOptional timeout(100) do:
# `timeout` is the same as `sleepAsync`
timedOut = true
break # continues after the select; skips the other handlers
echo "async ops ", if timedOut: "timed out" else: "finished"
Please note that such a parallel
block will be more powerful than the select
construct, because it enables you to add multiple tasks to be awaited from a loop.
The use of parallel
blocks and spawn
comes at a cost. All parameters passed in the spawn
expression must be copied inside the spawned task. Please note that this matches precisely the behavior of spawn
when it comes to sending computational tasks to a thread pool as well.
4.5) Introduce an underlying object representing the "parallel block" and create an accessor for it (e.g. a thisParallelBlock
magic valid only inside the block). This object will feature operations such as addThreadJob
, addAsyncIO
, addOptionalAsyncIO
. It's the equivalent to the nursery
object described in the article linked in the abstract. Its goal is to enable the creation of helper libraries that perform something with the parallel block context.
parallel:
addJobs(thisParallelBlock)
4.6) Define the exception-handling semantics inside parallel blocks - if an exception is thrown by a spawned task inside a parallel block, this exception will be re-raised in the thread that has entered the block. All other spawned tasks are cancelled.
5. Support async tasks in spawn
outside of parallel
blocks.
This is an escape hatch that will replace the current usages of asyncCheck
and traceAsyncErrors
. Semantically, it spawns a new "async thread" of execution. Just like when spawning a regular thread, all parameters passed to the spawn
expression must be copied or moved in the new thread. The spawned function must annotated with raises: [Defect]
. If it terminates with a Defect
, the whole process is also terminated.
6. Migration path for the current async
A semi backwards-compatible async
pragma can be added to serve as a drop-in replacement for the existing async
pragma. It will differ in only one way. All started async operations will be added to a list that is awaited at the end of their scope. This is not strictly backwards-compatible, but most of the existing async code should not be affected by the change in semantics.