eomii / rules_ll Goto Github PK

View Code? Open in Web Editor NEW

75.0 3.0 8.0 4.05 MB

An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming

Home Page: https://ll.eomii.org

License: Other

Starlark 86.31% Shell 1.31% Nix 12.38%

clang clang-tidy llvm cuda hip gpu-programming bazel build-system openmp sanitizers

rules_ll's People

Contributors

Stargazers

Watchers

Forkers

aaronmondal h13035 spamdoodler jaroeichler jannisfengler silvanshade gmh5225 mimed95

rules_ll's Issues

Rework internal file inputs as preparation for module std

The draft #98 adds experimental support for C++23 module std. Getting things to work required some customizations to the internal file inputs and to the way we handle toolchain.cpp_stdlib. This is not pretty. We should rework things in a way that doesn't require hacky list indexing and .to_list()ing depsets.

Upstream parts of rules_ll into the original Clang/LLVM Bazel overlay

It may be desirable for non-rules_ll users to get bzlmod support for the original Clang/LLVM Bazel overlay. The files whose contents we may be able to upstream are ll/extensions.bzl, MODULE.bzl and .bazelrc. Ideally, bzlmod users should be able to import llvm-project via the bazel-central-registry.

rules_ll specific extensions should remain in this repository and the bazel-eomii-registry.

Migrate to zlib-ng

Since zlib still hasn't addressed madler/zlib#633 in almost a year we should consider it unsupported and deprecated.

I've already sent https://reviews.llvm.org/D143320 but that'll take some time to get into LLVM main due to the official overlay not yet using bzlmod by default. However, we can already use the patch in rules_ll.

We should probably also aim to upstream our zlib-ng buildfile to the BCR.

Recommend usage of local bazel cache

New users might not know how to use bazels caching effectively across projects.

Our setup should make users aware that local caching exists and how to enable it.

😱 Docs for `ll.defs` unreadable

The docs for ll.defs are messed up. somehow we have some super long lines in there 🤣

Vale pre-commit hook not reporting on error-free files

Vale pre-commit hooks don't work properly at the moment. Tracking progress in errata-ai/vale#575.

Linking in WSL uses system libraries

When attempting to adjust linking paths of libraries such as OpenSSL or libcrypto under WSL Ubuntu 22.04 with the "-L" flag the build will use the system libraries instead of the provided ones by the development environment.

Document usage with local CUDA

This is already possible, but we should probably document the workflow. I suspect that this is especially relevant for WSL2 users because the WSL CUDA driver tends to differ from the one we package in rules_ll#unfree.

Maybe we should add explicit checks that set certain rpath values for WSL as well?

Support multiple targets in `compilation_database`

Using the compilation_database rule is too clunky otherwise.

Toolchain transitions need to be more flexible

We already use extensive toolchain transitions to handle our various compilation_modes. It looks like this is not enough anymore.

Our current approach is limited in the following ways:

We cannot create dynamic libraries in the bootstrap toolchain as linking is not supported there. This means that we can't provide builds for dynamic libc++ and friends.
We cannot use ll_binary tools in genrules since that requires the ll_binary to be in exec configuration. We need some way to transition from the compilation_mode-specific target configurations to an exec configuration. This is not supported at the moment.

We need to be careful that opening the toolchains up to handle such cases only leads to excessive rebuilds when absolutely necessary. Otherwise users may end up building LLVM several times just to get a trivial ll_binary working in a genrule. We may also need better platform support to tackle this elegantly.

Things work at the moment because we can fall back to rules_cc for exec tools. This is a very undesirable limitation of the current implementation.

Cannot run tests in CI

Attempts to run the tests in CI via remote execution currently doesn't work because Bazel doesn't like to run in a nix-built container. build and run works, but test doesn't, most likely due to bazelbuild/bazel#12579.

Technically it's already decent coverage if just builds pass, but many issues arise from dynamic linking behavior and are only visible during runtime. So at the moment we'd either have to run all examples manually without the ll_test wrappers, or only run a bazel build cpp without running anything.

Another option would be to build a custom Bazel which we distribute as part of rules_ll. Building a custom Bazel against an LLVM toolchain and statically linking libc++ could be an option that keeps things portable between CI and regular usage, but it might lead to issues for non-nix workflows.

@JannisFengler @SpamDoodler @jaroeichler What do you think? Statically linking Bazel with libc++ would add a few MB to all images, caches, the devenv etc because we'd have duplicate libc++ functions in every subbinary and we'd have to thinkg about infrastructure to support staying upstream with the bazel sources. That would make it easier to get remote execution to work though. Do we want to go down that path or should we try to find another solution?

Improve user experience for the `rbegen` tool

This tool currently requires users to manually run the pre-commit hooks to reliably check whether generated configs have changed. This should be integrated into the rbegen invocation.

We should also add a release attribute to the tool that tags the image with a release version and pushes it to a remote registry. We need this to release the next version of rules_ll.

😵 GPU examples make my eyes bleed

It can be tricky to write CUDA/HIP code that at least remotely looks like C++. At the moment the examples are littered with // NOLINT directives so that clang-tidy doesn't completely ragequit.

Let's try to find better ways to write these examples.

Borrowing an Nvidia GPU

Hi Aaron, do you have time to meet in the city center now, so I can borrow you an Nvidia GPU to fix the cuda tool chain errors?
Best regards,
Jannis

BMIs are visible to `pcm` -> `o` compilations

While precompilations correctly cannot see each other if they are specified in the same interfaces attribute, the same is not true for the implicit BMI-to-Object compilation. This is a bug. Only files in srcs should be able to see BMIs from interfaces.

Migrate CUDA imports to new variants in nixpkgs

NixOS/nixpkgs#224646 (comment) mentioned that the way we currently import CUDA from nix is outdated. We should change imports from the outdated

pkgs.cudaPackages.cudatoolkit

cudaPackages.{lib,cuda_foo}

@JannisFengler @SpamDoodler This might make WSL compatibility work.

Readd vale

As part of the transition to the flake-based workflow we removed Vale.

Getting it to run again is slightly tricky, as we need an additional config step before we can run the vale binary. Let's try to make things work again in a reproducible manner, i.e. ideally without having to rely on vales irreproducible autoinstaller.

Embedded device support (Steamdeck)

I started testing Steamdeck support for GPU (and CPU) code execution. My excuse is that Teslas run on similar APU architecture and I am thinking that performace gained here is worth looking into.

Remote execution images too hard to customize

The only remote execution image currently provided is the default image which we use for the tests and pin in rbe/default/config/BUILD.

The default image includes openssl because the examples require it. This is not ideal. Since all the toolchain and container auto generation can be difficult to grasp we should provide a straightforward, documented way to customize it.

Release blockers

We need to address these issues before we can release the next version of rules_ll:

Document new remote execution toolchains

This is a nuanced topic requiring dedicated docs and architecture explanations.

:scream: Clang tidy too slow

Heterogeneous code takes forever to check. This is likely caused by all the CUDA and HIP headers we have to include. There should be some builtin default setting to exclude these headers from the checks.

Rework inclusion handling

Quoted includes and angle includes need to be clearly separated. (-iquote, -isystem, -I, -idirafter -isystem-after).
There should be no implicit header includes. If something uses an unusual include path it should be specified manually. We should adhere to the C++ standard.
There should be a mechanism similar to strip_prefix in rules_cc. Otherwise we have to manually specify includes for external repositories. There were reasons for not implementing strip_prefix like in rules_cc. I will post an update when I remember the details. Currently, the compiler is invoked at the top-level of the action sandbox. If we were to move it into the build subdirectory within that sandbox we will need to change the way inclusions of external headers are handled (maybe prefix with ../../ or something like that).
Extra care needs to be taken that system headers do not accidentally include library headers named like system headers. This is an issue arising from wrong include order.

WSL: Number of devices is 0

WSL GPU Detection Issue in CUDA Example

Problem

Running the default CUDA example in WSL fails to detect the GPU.

Workaround

Setting LD_LIBRARY_PATH resolves the issue:

export LD_LIBRARY_PATH="/usr/lib/wsl/lib:$LD_LIBRARY_PATH"

Suggestion

rules_ll appends automatically /usr/lib/wsl/lib to LD_LIBRARY_PATH when rules_ll is running in WSL.
(Not sure If this belongs into the nix flake or into the bazel rules)

Add libxml2-dev to bazel registry

Building Clang from upstream depends on libxml2-dev. libxml2-dev should be added as external dependency in the bazel-eomii-registry and to the dependencies for rules_ll.

Module override for circl breaks downstream users

We can't use go_deps.module_override in upstream dependencies. This means that importers of rules_ll will break. Circl is required for our cluster setup and needs to be usable from downstream repos.

Consider building our own remote execution service

While we the new remote execution workflows are very efficient, we are still running gigantic builds compared to most other projects. This means that we quickly fall out of the "free" or "open source" tiers of remote execution services. Self-hosting might be inevitable 😅

For a full setup there is buildfarm. Regarding remote caching there is the pretty good bazel-remote, but it might also be fun to try wrapping dragonflydb with the remote-api gRPC calls and use that as cache.

The remote-apis are fairly straightforward, so we could also build an entire stack ourselves.

Sanitizers incompatible with `depends_on_llvm`

We are missing #include <sanitizer/msan_interface.h>. Probably caused by drift from upstream. Should be easy to fix.

The `ll init` command is a bit whacky

To make sure that we don't accidentally destroy people's workspaces the current ll init only appends some contents to files. If one runs the command more than once this will lead to duplicate code in those files which can look somewhat buggy.

We should probably factor the command out into a separate shell script and add more flexibility/checking/whatever to improve its user experience. This should be an actual shell script instead of a nix string template so that we can properly run linters on it. This likely requires changing the structure of the command in a way that every variable is passed as an external argument. I'm thinking something along the lines of invoking it like this in the flake:

''${ll} \
  --bazelversion=${./.bazelversion} \
  --module=${./examples/MODULE.bazel} \
  --bazelrc=${./examples/.bazelrc}''

osx support

title

Handle position independent code compilation

Blocks #4.

Shellcheck pre-commit hook might be broken

After porting the pre-commit hooks to nix it seems that shellcheck is always skipped.

Consider importing BUILD.bazel overlays as static variables

bazelbuild/bazel#14659 prevents us from leveraging our registry.

We can probably work around this by moving the third-party-overlays/*.BUILD.bazel to static variables.

Add support for packaging

We currently do not support loading loading shared objects during runtime. An ll_pkg rule may be an option.

ld.lld is unable to find libraries

ld.lld is unable to find libraries when building with Ubuntu 22.04 and gcc 11.2.0.

ERROR: /home/ubuntu/rules_ll/examples/format_example/BUILD.bazel:3:10: LlLinkExecutable format_example/format_example failed: (Exit 1): ld.lld failed: error executing command (from target //format_example:format_example) bazel-out/k8-fastbuild/bin/external/@rules_ll.override/ll/ld.lld --color-diagnostics '-dynamic-linker=/lib64/ld-linux-x86-64.so.2' --lto-O3 --pie --nostdlib -L/usr/lib64 -lm -ldl -lpthread -lc ... (remaining 11 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
ld.lld: error: unable to find library -lm
ld.lld: error: unable to find library -ldl
ld.lld: error: unable to find library -lpthread
ld.lld: error: unable to find library -lc
Target //format_example:format_example failed to build

I set the symlinks for crt*.o and Scrt*.o manual, but all the libraries are in /usr/lib/x86_64-linux-gnu and not in /usr/lib64 .
Maybe it makes sense to to set /usr/lib/x86_64-linux-gnu the default?

Too many shared libraries in heterogeneous toolchains

After adding support for shared libraries our heterogeneous toolchains broke. The new shared library linking causes us to blindly link all of CUDA's shared libraries which is of course not what we want. Instead of rewriting the linking logic, we may want to consider rewriting the CUDA-related build files and/or making the toolchains finer-grained in the sense that static and shared libraries are more clearly separated.

Rework `aggregate` attribute

We need support for creating dynamic shared objects. Clang plugins such as the hipsycl plugin require this.

The best way to implement this is probably by reworking the aggregate attribute. This will require support for position independent code.

GPU targets run correctly but tests fail

Tests are missing shared objects like libamdhip64.so. Sometimes these tests can flakily pass by chance if the corresponding library path has been populated before.

Probably needs some runfile/symlink tweaking.

Investigate the use of aspects for clang-tidy

Playing around with clippy in rules_rust made me notice how incredibly convenient it would be to have clang-tidy run as a plugin that just prints warnings like a "regular" compiler warning. I'm not sure whtether this is possible, but if i was, it could be a significant improvement for our user experience and would obsolete the ll_compilation_database targets in many cases.

Let's see whether it's possible to copy the rules_rust/clippy behavior to rules_ll/clang-tidy.

rules_ll fails if gcc is used to build the Clang/LLVM based toolchain from upstream

One of the main goals of rules_ll is to build a Clang/LLVM based toolchain from upstream. This should work with Clang and GCC.
One error occurs when running the examples with GCC as default compiler:

error: zlib.h: no such file or directory
Which can be fixed with installing the libz-dev package.

After installing the missing headers, the build fails with following error message:

ERROR: /root/.cache/bazel/_bazel_root/79c7c71f78facf0e35780b9a06528730/external/@rules_ll.override.llvm_project_overlay.llvm-project/llvm/BUILD.bazel:164:11: Compiling llvm/lib/Support/Process.cpp [for tool] failed: (Exit 1): gcc failed: error executing command (from target @@rules_ll.override.llvm_project_overlay.llvm-project//llvm:Support) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 70 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from external/@rules_ll.override.llvm_project_overlay.llvm-project/llvm/lib/Support/Process.cpp:107:
external/@rules_ll.override.llvm_project_overlay.llvm-project/llvm/lib/Support/Unix/Process.inc: In static member function 'static size_t llvm::sys::Process::GetMallocUsage()':
external/@rules_ll.override.llvm_project_overlay.llvm-project/llvm/lib/Support/Unix/Process.inc:93:20: error: aggregate 'llvm::sys::Process::GetMallocUsage()::mallinfo2 mi' has incomplete type and cannot be defined
93 | struct mallinfo2 mi;
| ^~
external/@rules_ll.override.llvm_project_overlay.llvm-project/llvm/lib/Support/Unix/Process.inc:94:10: error: '::mallinfo2' has not been declared
94 | mi = ::mallinfo2();
| ^~~~~~~~~
Target //format_example:format_example failed to build

The system is Ubuntu 20.04.4 LTS, the GCC version is 9.4.0

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Repository problems

These problems occurred while renovating this repository. View logs.

WARN: Package lookup failures

Warning

Renovate failed to look up the following dependencies: Could not determine new digest for update (github-tags package eomii/rules_ll).

Files affected: templates/default/MODULE.bazel

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

🤖 Update dependency bazel to v8.0.0-pre.20240603.2
🤖 Update dependency bazel_skylib to v1.7.1
🤖 Update dependency rules_java to v7.6.5
🤖 Update dependency stardoc to v0.7.0
Click on this checkbox to rebase all open PRs at once

Detected dependencies

bazel-module

MODULE.bazel

platforms 0.0.10

rules_cc 0.0.9

bazel_skylib 1.7.0

rules_java 7.6.1

stardoc 0.6.2

llvm-project-overlay 17-init-bcr.3

templates/default/MODULE.bazel

rules_ll <TODO: USE THE COMMIT FROM THE FLAKE HERE HERE>

bazelisk

.bazelversion

bazel 8.0.0-pre.20240516.1

templates/default/.bazelversion

bazel 8.0.0-pre.20240516.1

github-actions

.github/workflows/docs.yml

ubuntu 22.04

.github/workflows/pre-commit.yml

ubuntu 22.04

.github/workflows/scorecard.yml

ubuntu 22.04

templates/default/.github/workflows/pre-commit.yml

ubuntu 22.04

Check this box to trigger a request for Renovate to run again on this repository

We can't ignore example lockfile but also can't commit it

Surely there is some workaround.

Ignoring examples/flake.lock causes direnv/devenv to break
Committing it would break downstream users because the relative reference to rules_ll via ../ is not reproducible across machines (for different users the absolute path to the directory is different).
Usually one would use git update-index --skip-worktree for this, but that doesn't work with devenv.

What a dilemma lmao.

For now I've sent #67, but that's hardly a satisfactory solution.