Giter VIP home page Giter VIP logo

pika-org / pika Goto Github PK

View Code? Open in Web Editor NEW
60.0 60.0 10.0 22.23 MB

pika builds on C++ std::execution with fiber, CUDA, HIP, and MPI support.

Home Page: https://pikacpp.org

License: Boost Software License 1.0

Python 1.35% Shell 1.09% CMake 8.06% C++ 87.99% Batchfile 0.01% Cuda 1.30% Assembly 0.16% Nix 0.01% Awk 0.03%
concurrency cplusplus cpp cuda gpu hip mpi p2300 parallelism rocm stdexec

pika's Issues

Revisit CPO structure

The sender/receiver CPOs currently use a helper base class to define fallback implementations with tag_fallback_invoke. The need for tag_fallback_invoke should be revisited and the CPO types should potentially be in nested namespaces to avoid the tag_fallback_invoke overloads being in the overload set for unrelated CPOs. This could improve compile times.

Use bors for all CI

This would reduce duplicate and unnecessary builds. This requires:

  • CI that is stable for all enabled builds
  • Jenkins sets the commit status on all builds

Add tracy support

https://github.com/wolfpld/tracy

This is not terribly difficult on a basic level, but integration into projects and running applications with tracy is a bit clunky, especially for multi-node runs, since a tracy-instrumented application needs to send the data to a server.

#252 adds basic support. Next steps would be one or both of the following:

  1. Start using stackless threads as much as possible since that is easier to do with senders. This would mean the Tracy integration needs no further changes.
  2. Add support for fibers or saving/restoring the annotations on suspend/resume.

Consider adding more "official" API headers for CUDA and MPI functionality

The use of async_cuda and async_mpi functionality has so far been through pika/modules/async_{cuda,mpi}.hpp. Since we don't consider pika/modules headers as public API headers we should probably add something more official to access that functionality.

Possible options:

  • pika/execution/{cuda,hip,gpu,mpi,communication}.hpp
  • pika/{cuda,hip,gpu,mpi,communication}.hpp

Investigate `small_vector` performance issue

pika::detail::small_vector seems to be significantly slower than boost::container::small_vector. It's unclear if it's "just" a bug in the implementation or if it's something more inherent in the use of the standard library features in the implementation.

We should:

  • find a performance test that reproduces the regression (most likely something involving future::then since small_vector is used for storing continuations)
  • profile/debug/whatever to find out if pika::detail::small_vector is fixable

If we can't find a suitable regression test within pika, the following DLA-Future test shows a clear performance drop: srun -n4 -c36 miniapp/miniapp_triangular_solver --m 20480 --n 20480 --mb 128 --nb 128 --grid-rows 2 --grid-cols 2 --nruns 5 --pika:use-process-mask (on the Piz Daint mc partition). The performance is about ~1150GFlop/s with Boost's small_vector and ~800GFlop/s with pika's.

Enable hipsolver functionality

We currently only have a macro translation layer for basic CUDA functionality and cuBLAS functionality to HIP equivalents. cuSOLVER is currently CUDA-only.

Test if CUDA callbacks would again be a viable replacement for polling

The event polling has been successful and turned out to perform significantly better than using CUDA callbacks. However, that was tested when the CUDA callbacks still required runtime registration on the CUDA thread. We should check:

  • if using plain CUDA callbacks would again be a competitive option to event polling in the scheduler, or
  • if the former does not work well enough if a separate polling thread would work well enough.
    Either of these would be beneficial architecturally because they would decouple the CUDA senders from the schedulers. Related: #17.

Use `fmt`

The internal format implementation could perhaps be replaced by fmt?

Enable sanitizers in CI

This is potentially very important for debugging. We currently only test with lsan (leak sanitizer). We should try to add tsan (thread sanitizer), msan (memory sanitizer), asan (address sanitizer), and ubsan (undefined behaviour sanitizers). Enabling them with heavy suppression files would be a good start, just to allow consumers of pika to enable sanitizers.

  • lsan
  • tsan
  • asan
  • ubsan

msan

This would require recompiling all dependencies with -fsanitize=memory, including the standard library: https://github.com/google/sanitizers/wiki/MemorySanitizer#using-instrumented-libraries. This is something to consider doing with the CSCS CI pipelines and spack.

Remove future functionality

Including hpx::async/apply/dataflow/when_all. This can be done once DLA-Future has been completely ported to use senders. The cleanup itself for this issue is not complicated and only requires removing functionality. What functionality is currently missing on the sender/receiver side?

Add variant of `when_all` that accepts containers of senders and sends a container of values

dataflow recursively waits for any containers of futures (of containers). One can do e.g. this with dataflow: dataflow(unwrapping([](vector<T>){...}), vector<future<T>>{...}). There is no equivalent sender adaptor. when_all is variadic and requires all arguments to be senders themselves.

This feature is used in DLA-Future. A when_all_vector(vector<sender_of<T>>) would be sufficient as a starting point.

Use CSCS GitLab CI to replace as many Jenkins builds as possible

In practice this means replacing the non-Cray builds currently running on Jenkins. Work was started on STEllAR-GROUP/hpx-local#5.

Make use of the gitlab matrix functionality at least for release/debug builds, if not for different compiler etc. configurations.

Also replace the HIP testing done with Jenkins with container-runner-hohgant-mi200 (as already started in eth-cscs/DLA-Future#982 for DLA-Future, though that is blocked on an MPI issue; we can initially run all non-MPI tests for pika).

Add documentation for pika

We could initially refer to hpx's documentation and provide only the things that differs from it. But it might diverge too much from it in the future to keep it like this.

To try to have something actionable here, I think we would need the following in order of importance:

  • list of public API headers (0.5.0) (#225)
  • document thread binding behaviour from #739 (#751)
  • examples of typical use cases
  • list of public API functions/classes/variables (doxygen, and manually curated?)
  • examples of using specific APIs
  • detailed documentation per function/class/variable

Revive APEX support

With the distributed functionality removed, the actual APEX support was removed. APEX can be turned into a direct dependency with no special support required on the APEX side and is quite straightforward to implement. I have some old working code for this already but from HPX. This needs to be revived.

Enable MPI tests in CI

We currently don't test MPI functionality anywhere. It needs to be added to at least one CI configuration.

Make `--pika:use-process-mask` the default?

This is the common case, and should possibly be the default. The open question is, what should the default be if there is no process mask (i.e. all pus are in the mask)? We currently default to only one worker thread per core, rather than per pu. We can probably keep this behaviour even if using the process mask is the default. The only confusing aspect is that a process mask with all pus will not necessarily create as many worker threads as bits in the process mask (though that is the case at the moment as well).

Separate algorithms into a separate repository

The main open question here is if the algorithms project should rely on the pika runtime, the other way around, or neither? The default execution policies assume that a global thread pool exists which would normally be set up by the runtime.

Move everything except public functionality to detail namespace

The public API of pika is small: sender/receiver functionality, runtime initialization, what else?

Hidden functionality can then gradually be brought into the public namespace through pika::experimental:: or directly into pika::.

The only reasonable way to do this is module by module:

  • affinity (#152)
  • algorithms (#377, #411, #475)
  • allocator_support (#153)
  • assertion (#155)
  • async_base (#158)
  • async_combinators (#160)
  • async_cuda (#196)
  • async (nothing to be done)
  • async_mpi (#374)
  • command_line_handling (#216)
  • concepts (nothing to be done)
  • concurrency (#246)
  • config (#248)
  • coroutines (#257)
  • datastructures (#276)
  • debugging (#324)
  • errors (#365)
  • execution(#508)
  • execution_base (#508)
  • executors (#508)
  • filesystem (#379)
  • format (#487)
  • functional (#380)
  • futures (#525)
  • hardware (#607)
  • hashing (#631)
  • include (#632)
  • ini (#633)
  • init_runtime (#634)
  • iterator_support
  • itt_notify
  • lock_registration
  • logging
  • memory
  • mpi_base
  • pack_traversal
  • prefix (#1177)
  • preprocessor
  • program_options
  • properties (#673)
  • resource_partitioner
  • runtime_configuration
  • runtime (#826, #1091)
  • schedulers (#625)
  • string_util (#595)
  • synchronization (#483)
  • tag_invoke (#596) (#599)
  • testing (#594)
  • thread_pool_util (#509)
  • thread_pools (#462)
  • thread_support (#461)
  • threading (nothing to be done)
  • threading_base (#445)
  • threadmanager (#428)
  • timing (#209)
  • topology (#179)
  • type_support (#386, #400)
  • util (#420)
  • version (#166)

This is also a good opportunity to do general cleanup.
Avoid nesting detail namespaces into experimental namespace #448

Fix or disable remaining failing tests using timed suspension

The following (and possibly a few more) tests fail for various reasons after disabling timed suspensions. They need to be dealt with before the first release.

  • tests.unit.modules.synchronization.shared_mutex.shared_mutex1
  • tests.unit.modules.threading.condition_variable2
  • tests.unit.modules.threading.stop_token_cb1
  • tests.unit.modules.threading.thread
  • tests.performance.local.tls_overhead

Separate CUDA/HIP/GPU support into another repository

The exact requirements of this are not 100% clear. This at least needs:

  • The schedulers need a generic way to register polling callbacks, or
  • The polling can be done in a separate thread?
  • The runtime/a thread pool needs to know if it should wait for all events to finish

Update performance test references

The current references seem to be somewhat too strict. Alternatively, can we slightly relax the criteria (without missing performance regressions).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.