Giter VIP home page Giter VIP logo

conorwilliams / libfork Goto Github PK

View Code? Open in Web Editor NEW
533.0 18.0 23.0 6.67 MB

A bleeding-edge, lock-free, wait-free, continuation-stealing tasking library built on C++20's coroutines

Home Page: https://conorwilliams.github.io/libfork/

License: Mozilla Public License 2.0

CMake 2.21% C++ 93.22% Shell 0.31% Python 4.27%
cpp20 coroutines wait-free threadpool scheduler work-stealing lock-free task-graph tasking concurrency

libfork's Introduction

Welcome to libfork ๐Ÿด

A bleeding edge, lock-free, wait-free, continuation-stealing coroutine-tasking library.

** Now supplying ๐ŸŒต! **

Libfork is primarily an abstraction for fully-portable, strict, fork-join parallelism. This is made possible without the use of any macros/inline assembly using C++20's coroutines. Ultra-fine grained parallelism (the ability to spawn tasks with very low overhead) is enabled by an innovative implementation of an (almost) non-allocating cactus-stack utilizing segmented stacks. Libfork presents a cross-platform API that decouples scheduling tasks (a customization point) from writing tasks. Additionally, libfork provides performant NUMA-aware work-stealing schedulers for general use. If you'd like to learn more check out the tour of libfork then try it on compiler explorer or, just grok the TLDR:

#include "libfork/core.hpp"

inline constexpr auto fib = [](auto fib, int n) -> lf::task<int> { 
  
  if (n < 2) {
    co_return n;
  }

  int a, b;

  co_await lf::fork[&a, fib](n - 1);    // Spawn a child task.
  co_await lf::call[&b, fib](n - 2);    // Execute a child inline.

  co_await lf::join;                    // Wait for children.

  co_return a + b;                      // Safe to read after a join.
};

Performance

Libfork is engineered for performance and has a comprehensive benchmark suit. For a detailed review of libfork on 1-112 cores see the paper, the headline results are linear time/memory scaling, this translates to:

  • Up to 7.5ร— faster and 19ร— less memory consumption than OneTBB.
  • Up to 24ร— faster and 24ร— less memory consumption than openMP (libomp).
  • Up to 100ร— faster and >100ร— less memory consumption than taskflow.

Scheduler overhead

For a quick comparison with other libraries, the average time to spawn/run a task during the recursive Fibonacci benchmark gives a good approximation to the tasking overhead and peak throughput:

Fibonacci task benchmark results

Memory consumption

Libfork is competitive with other libraries in terms of memory consumption; below is the peak (physical) memory allocation during the T3L unbalanced tree search benchmark:

View graph

Fibonacci task benchmark results

Using libfork

Libfork is a header-only library with full CMake support and zero required-dependencies. Refer to the BUILDING document for full details on compiling the tests/benchmarks/docs, installation, optional dependencies and, tools for developers. See below for the easiest ways to consume libfork in your CMake projects.

Package managers

We recommend consuming libfork via a package manager, this streamlines the management of optional dependencies.

vcpkg

Libfork is available via vcpkg. Add the following to your vcpkg.json:

"dependencies": [
    "libfork"
]

You may then use the library in your project's cmake:

find_package(libfork CONFIG REQUIRED)  

target_link_libraries(
    project_target PRIVATE libfork::libfork
)

NOTE: The libfork port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

Conan2

Libfork is available in Conan Center Index, to install the latest version run (Please make sure that you use a c++20 ready conan profile!):

conan install --requires="libfork/[*]" --build=missing

Or add libfork/[*] to your conanfile recipe requirements.

You may then use the library in your project's cmake:

find_package(libfork CONFIG REQUIRED)  

target_link_libraries(
    project_target PRIVATE libfork::libfork
)

NOTE: The libfork recipe in Conan is kept up to date by Conan team members and community contributors. If the version is out of date, please create an issue or pull request on the Conan repository.

With CMake

If you have installed libfork from source, following the BUILDING document, then you can use the following CMake code to consume libfork in your project:

find_package(libfork CONFIG REQUIRED)

target_link_libraries(
    project_target PRIVATE libfork::libfork
)

Using CMake's FetchContent

If you have not installed libfork you may use CMake's FetchContent to download and build libfork as part of your project:

include(FetchContent)

FetchContent_Declare(
    libfork
    GIT_REPOSITORY https://github.com/conorwilliams/libfork.git
    GIT_TAG v3.7.2
    GIT_SHALLOW TRUE
)

FetchContent_MakeAvailable(libfork)

target_link_libraries(
    project_target PRIVATE libfork::libfork
)

Using git submodules

You can incorporate libfork into your project as a git submodule. In this case, assuming you cloned libfork as a submodule into "external/libfork":

add_subdirectory(external/libfork)

target_link_libraries(
    project_target PRIVATE libfork::libfork
)

Single header

Although this is not recommend and primarily exist for easy integration with godbolt; libfork supplies a single header that you can copy-and-paste into your project. See the BUILDING document's note about hwloc integration and compiler flags.

API reference

See the generated docs.

Contributing

  1. Read the CODE_OF_CONDUCT document.
  2. Read the BUILDING document.
  3. Have a snoop around the impl namespace.
  4. Ask as many questions as you can think of!

Changelog

See the ChangeLog document.

A tour of libfork

This section provides some background and highlights of the core API, for details on implementing your own schedulers on-top of libfork see the extension documentation. Don't forget you can play around with libfork on godbolt.

Contents

Fork-join

Definitions:

  • Task: A unit of work that can be executed concurrently with other tasks.
  • Parent: A task that spawns other tasks.
  • Child: A task that is spawned by another task.

The tasking/fork-join interface is designed to mirror Cilk and other fork-join frameworks. The best way to learn is by example, lets start with the canonical introduction to fork-join, the recursive Fibonacci function, in regular C++ it looks like this:

auto fib(int n) -> int {
  
  if (n < 2) {
    return n;
  }

  int a = fib(n - 1);
  int b = fib(n - 2);

  return a + b;
}

We've already seen how to implement this with libfork in the TLDR but, here it is again with line numbers:

 1| #include "libfork/core.hpp"
 2|
 3| inline constexpr fib = [](auto fib, int n) -> lf::task<int> { 
 4|  
 5|   if (n < 2) {
 6|     co_return n;
 7|   }
 8|
 9|   int a, b;
10|
11|   co_await lf::fork[&a, fib](n - 1);    
12|   co_await lf::call[&b, fib](n - 2);  
13|
14|   co_await lf::join;                  
15|
16|   co_return a + b;                    
17| };

NOTE: If your compiler does not support the lf::fork[&a, fib] syntax then you can use lf::fork(&a, fib) and similarly for lf::call.

This looks almost like the regular recursive Fibonacci function. However, there are some important differences which we'll explain in a moment. First, the above fibonacci function can be launched on a scheduler, like lazy_pool, as follows:

#include "libfork/schedule.hpp"

int main() {
  
  lf::lazy_pool pool(4); // 4 worker threads

  int fib_10 = lf::sync_wait(pool, fib, 10);
}

The call to sync_wait will block the main thread (i.e. the thread that calls main()) until the pool has completed execution of the task. Let's break down what happens after that line by line:

  • Line 3: First we define an async function. An async function is a function-object with a templated first argument that returns an lf::task<T>. The first argument is used by the library to pass static and dynamic context from parent to child. Additionally, it acts as a y-combinator - allowing the lambda to be recursive - and provides a few methods which we will discuss later.
  • Line 9: Next we construct the variables that will be bound to the return values of following forks/calls.
  • Line 11: This is the first call to lf::fork which marks the beginning of an async scope. lf::fork[&a, fib] binds the return address of the function fib to the integer a. Internally the child coroutine will have to store a pointer to the return variable so we make this explicit at the call site. The bound function is then invoked with the argument n - 1. The semantics of all of this is: the execution of the forked function (in this case fib) can continue concurrently with the execution of the next line of code i.e. the continuation. As libfork is a continuation stealing library the worker/thread that performed the fork will immediately begin executing the forked function while another thread may steal the continuation.
  • Line 12: An lf::call binds arguments and return address in the same way as lf::fork however, it has the semantics of a serial function call. This is done instead of an lf::fork as there is no further work to do in the current task so stealing it would be a waste of resources.
  • Line 13: Execution cannot continue past a join-point until all child tasks have completed. After this point it is safe to access the results (a and b) of the child task. Only a single worker will continue execution after the join. This marks the end of the async scope that began at the fork.
  • Line 16: Finally we return the result of the to a parent task, this has similar semantics to a regular return however, behind the scenes an assignment of the return value to the parent's return address is performed. This is the end of the async function. The worker will attempt to resume the parent task (if it has not already been stolen) just as a regular function would resume execution of the caller.

NOTE: At every co_await the OS-thread executing the task may change!

NOTE: Libfork implements strict fork-join which means all children must be joined before a task returns. This restriction give some nice mathematical properties to the underlying directed acyclic graph (DAG) of tasks that enables many optimizations.

Ignoring a result

If you wanted to ignore the result of a fork/call (i.e. if you wanted the side effect only) you can simply omit return address from lines 11 and 12 e.g.:

co_await lf::fork[fib](n - 1);    
co_await lf::call[fib](n - 2); 

The cactus-stack

Normally each call to a coroutine would allocate on the heap. However, libfork implements a cactus-stack - supported by segmented-stacks - which allows each coroutine to be allocated on a fragment of linear stack, this has almost the same overhead as allocating on the real stack. This means the overhead of a fork/call in libfork is very low compared to most traditional library-based implementations (about 10x the overhead of a bare function call).

The internal cactus-stack is exposed to the user via the co_new function:

inline constexpr auto co_new_demo = [](auto co_new_demo, std::span<int> inputs) -> lf::task<int> {
 
  // Allocate space for results, outputs is a std::span<int>
  auto [outputs] = co_await lf::co_new<int>(inputs.size());

  // Launch a task for each input.
  for(std::size_t i = 0; i < inputs.size(); ++i) {
    co_await lf::fork[&outputs[i], some_function](inputs[i]);
  }

  co_await lf::join; // Wait for all tasks to complete.

  co_return std::accumulate(outputs.begin(), outputs.end(), 0);
};

Here the co_await on the result of lf::co_new returns an immovable RAII class which will manage the lifetime of the allocation.

Restrictions on references

References as inputs to coroutines can be error prone, for example:

co_await lf::fork[process_string](std::string("32")); 

This would dangle if process_string accepted arguments by reference. Specifically a process_string accepting std::string & would not compile by the standard reference semantics while std::string const & and std::string && would compile but would dangle. To avoid this libfork coroutines bans std::string && -> std::string const & conversions and r-value reference arguments for forked async-functions. If you want to move a value into a forked coroutine then pass by value.

Note: You can still dangle by ending the lifetime of an l-value referenced object after a fork e.g.:

{
  int x;

  co_await lf::fork[&x, some_function]();

} // Lifetime of x ends here, return address is now dangling!

co_await lf::join; 

Delaying construction

Some types are expensive or impossible to default construct, for these instances libfork provides the lf::eventually template type. lf::eventually functions like a std::optional that is only constructed once and supports references:

// Not default constructible.
struct difficult {
  difficult(int) {}
};

// Async function that returns a difficult.
inline constexpr auto make_difficult = [](auto) -> lf::task<difficult> {
  co_return 42;
}

// Async function that returns a reference.
inline constexpr auto reference = [](auto) -> lf::task<int &> {
  co_return /* some reference */;
}

inline constexpr auto eventually_demo = [](auto) -> lf::task<> { 

  // Use lf::eventually to delay construction.
  lf::eventually<difficult> a;
  lf::eventually<int &> b;
  
  co_await lf::fork[&a, make_difficult]();    
  co_await lf::fork[&b, reference]();

  co_await lf::join;     

  std::cout << *b << std::endl; // lf::eventually support operators * and ->
};

Exceptions

Libfork supports exceptions in async functions. If an exception escapes an async function then it will be stored in its parent and re-thrown when the parent reaches a join-point. For example:

inline constexpr auto exception_demo = [](auto) -> lf::task<> { 
  
  co_await lf::fork[throwing_work](/* args.. */);    
  co_await lf::fork[throwing_work](/* args.. */);   

  co_await lf::join; // Will (re)throw one of the exceptions from the children.                                    
};

However, you need to be very careful when throwing exception inside a fork-join scope because it's UB for a task which has forked children to return (regularly or by exception) without first calling lf::join. For example:

inline constexpr auto bad_code = [](auto) -> lf::task<> { 
  
  co_await lf::fork[work](/* args.. */);    

  function_which_could_throw();  // UB on exception! No join before return.

  co_await lf::join;                                       
};

Instead you must wrap your potentially throwing code in a try-catch block and call lf::join.

inline constexpr auto good_code = [](auto good_code) -> lf::task<> { 
  
  co_await lf::fork[&a, work](/* args.. */);    

  try {
    function_which_could_throw(); 
  } catch (...) {
    good_code.stash_exception(); // Store's exception.
  } 

  co_await lf::join; // Exception from child or stashed-exception will be re-thrown here.                             
};

If/when C++ adds asynchronous RAII then this will be made much cleaner.

If you would like to capture exceptions for each child individually then you can use a return object that supports capturing exceptions, for example:

inline constexpr auto exception_stash_demo = [](auto) -> lf::task<> { 
  
  try_eventually<int> ret;
  
  co_await lf::fork[&ret, int_or_throw](/* args.. */);    

  co_await lf::join; // Will not throw, exception stored in ret.

  if (ret.has_exception()) {
    // Handle exception.
  } else {
    // Handle result.
  }                            
};

Any return pointer which satisfies the stash_exception_in_return concept will trigger libfork to store the exception in the return object. This concept is specified as follows:

template <typename I>
concept stash_exception_in_return = lf::quasi_pointer<I> && requires (I ptr) {
  { stash_exception(*ptr) } noexcept;
};

Note: the call to stash_exception must be noexcept.

Immediate invocation

Sometimes you may want to just call an async function without a fork join scope, for example:

int result;

co_await lf::call[&result, some_function](/* args.. */);

co_await lf::join; // Still needed in-case of exceptions

In this case you could simplify the above with lf::just:

int result = co_await lf::just[some_function](/* args.. */);

Explicit scheduling

Normally in libfork where a task is being executed is controlled by the runtime. However, you may want to explicitly schedule a task to be resumed on a certain worker or write an awaitable that transfers execution to a different pool of workers. This is made possible through the context_switcher API. Instead of writing a regular awaitable, write one that conforms to the context_switcher concept, like this:

struct my_special_awaitable {
  auto await_ready() -> bool;
  auto await_suspend(lf::submit_handle handle)  -> void;
  auto await_resume()  -> /* [T] */;
};

This can be co_awaited inside a libfork task, if await_ready returns false then the task will be suspended and await_suspend will be called with a handle to the suspended task, this can be resumed by any worker you like.

This is used by libfork's template<scheduler T> auto resume_on(T *) to enable explicit scheduling.

Contexts and schedulers

We have already encountered a scheduler in the fork-join however, we have not yet discussed what a scheduler is or how to implement one. A scheduler is a type that conforms to the lf::scheduler concept, this is a customization point that allows you to implement your own scheduling strategy. This makes a type suitable for use with lf::sync_wait. Have a look at the extensions api for further details.

Three schedulers are provided by libfork:

  • lf::lazy_pool A NUMA-aware work-stealing scheduler that is suitable for general use. This should be the default choice for most applications.
  • lf::busy_pool Also a NUMA-aware work-stealing scheduler however, workers will busy-wait for work instead of sleeping. This often gains very little performance over lf::lazy_pool and should only be preferred if you have an otherwise idle machine and you are willing to sacrifice a lot of power consumption for very little performance.
  • lf::unit_pool A is single threaded scheduler that is suitable for testing and debugging.

NOTE: The workers inside libfork's thread pools should never block i.e. do not call sync_wait or any other blocking function inside a task.

libfork's People

Contributors

abrilrbs avatar conorwilliams avatar imgbotapp avatar martymcflyinthesky avatar tzcnt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libfork's Issues

Benchmarks on c7g.16xlarge

I managed to get the benchmark suite to run on an AWS c7g.16xlarge instance running Ubuntu 22.04, built with Clang 15 (with Release build type), with the machine otherwise quiet. Benchmarks were done on 8dd4984 (current tip of tree).

See the attached log: results.txt

Findhwloc CMake script not working

Your findhwloc CMake script doesn't work on my environment. I get this output:

[cmake] -- Could NOT find HwlocLibfork (missing: HwlocLibfork_INCLUDE_DIRS) (found suitable version "2.9.0", minimum required is "2.5.0")
[cmake] -- Found hwloc 2.9.0 in :hwloc
[cmake] CMake Warning at CMakeLists.txt:87 (message):
[cmake]   HWLOC not found, NUMA support disabled!

It seems to be failing to find the include file. I'm using Debian 12. My hwloc is installed as a system package, and the include file is in the most usual place...

โžœ  ~ ls /usr/include | grep hwloc.h
hwloc.h
โžœ  ~ sudo apt install libhwloc-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libhwloc-dev is already the newest version (2.9.0-1).

My substantially less complex script seems to find the library and include file just fine.

Performance, benchmarks, ...

Hi, I've just discovered this potential gem. But to be sure the features reflect in practical setting, we'd need to test it first.

Do you have any such performance benchmarks (even if not rigorous...)?

I'd be mostly interested in direct comparison with Nim's Weave which I myself consider as state-of-the-art library. So ideally the benchmarks could be aligned with Weave's ones ๐Ÿ˜‰.

Thoughts?

Performance section desired simplification

The Performance section in the readme doesn't have any timings provided, while the paper is absolutely unreadable because it's written in academic matters: formulas, coefficients, etc.
It would be great to have simple timings for simple developers that just want to use the library and want to have practical comparison in nanoseconds or CPU cycles: how much it takes to construct the task body, how much it takes for the task worker thread to pick up the task.

Think of comprehensive benchmarks like here: https://github.com/SergeyMakeev/ExcaliburHash

P.S. anyway, thanks for the amazing work!
I was thinking about such a library some time ago and found this. That's amazing!

Invalid default compiler flag on ARM

On Ubuntu 22.04 running on a c7g.16xlarge AWS instance with Clang 15, I get the following error following HACKING.md:

CMake Error at /usr/share/cmake-3.22/Modules/CMakeTestCXXCompiler.cmake:62 (message):
  The C++ compiler

    "/usr/bin/clang++-15"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/ubuntu/libfork/build/dev/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/ninja cmTC_1185a && [1/2] Building CXX object CMakeFiles/cmTC_1185a.dir/testCXXCompiler.cxx.o
    FAILED: CMakeFiles/cmTC_1185a.dir/testCXXCompiler.cxx.o
    /usr/bin/clang++-15   -Wno-assume -D_FORTIFY_SOURCE=3 -fstack-protector-strong -fcf-protection=full -fstack-clash-protection -Wall -Wextra -Wpedantic -Wconversion -Wsign-conversion -Wcast-qual -Wformat=2 -Wundef -Werror=float-equal -Wshadow -Wcast-align -Wunused -Wnull-dereference -Wdouble-promotion -Wimplicit-fallthrough -Wextra-semi -Woverloaded-virtual -Wnon-virtual-dtor -Wold-style-cast  -std=c++20 -MD -MT CMakeFiles/cmTC_1185a.dir/testCXXCompiler.cxx.o -MF CMakeFiles/cmTC_1185a.dir/testCXXCompiler.cxx.o.d -o CMakeFiles/cmTC_1185a.dir/testCXXCompiler.cxx.o -c /home/ubuntu/libfork/build/dev/CMakeFiles/CMakeTmp/testCXXCompiler.cxx

The following change allows it to build:

diff --git a/CMakePresets.json b/CMakePresets.json
index 3134d07..c173de0 100644
--- a/CMakePresets.json
+++ b/CMakePresets.json
@@ -61,7 +61,7 @@
       "name": "flags-linux",
       "hidden": true,
       "cacheVariables": {
-        "CMAKE_CXX_FLAGS": "-Wno-assume -D_FORTIFY_SOURCE=3 -fstack-protector-strong -fcf-protection=full -fstack-clash-protection -Wall -Wextra -Wpedantic -Wconversion -Wsign-conversion -Wcast-qual -Wformat=2 -Wundef -Werror=float-equal -Wshadow -Wcast-align -Wunused -Wnull-dereference -Wdouble-promotion -Wimplicit-fallthrough -Wextra-semi -Woverloaded-virtual -Wnon-virtual-dtor -Wold-style-cast",
+        "CMAKE_CXX_FLAGS": "-Wno-assume -D_FORTIFY_SOURCE=3 -fstack-protector-strong -fstack-clash-protection -Wall -Wextra -Wpedantic -Wconversion -Wsign-conversion -Wcast-qual -Wformat=2 -Wundef -Werror=float-equal -Wshadow -Wcast-align -Wunused -Wnull-dereference -Wdouble-promotion -Wimplicit-fallthrough -Wextra-semi -Woverloaded-virtual -Wnon-virtual-dtor -Wold-style-cast",
         "CMAKE_EXE_LINKER_FLAGS": "-Wl,--allow-shlib-undefined,--as-needed,-z,noexecstack,-z,relro,-z,now",
         "CMAKE_SHARED_LINKER_FLAGS": "-Wl,--allow-shlib-undefined,--as-needed,-z,noexecstack,-z,relro,-z,now"
       }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.