buildbarn / bb-deployments Goto Github PK
View Code? Open in Web Editor NEWExample deployments of Buildbarn on various platforms
License: Apache License 2.0
Example deployments of Buildbarn on various platforms
License: Apache License 2.0
Hi,
We're currently trying out buildbarn and have a setup running on a local kubernetes cluster. What i gather from the documentation is that our Bazel client should be able to report the SHA256 hashed we need to check the CAS and AC via the Buildbarn browser.
But, for me this does not seem to work, or we're missing something obvious.
We've updated our browserUrl
in https://github.com/buildbarn/bb-deployments/blob/master/kubernetes/config/common.yaml, and we tried adding browserUrl: common.browserUrl,
into the config/scheduler and config/worker yaml files. Alas this does not change anything in the output of the bazel client.
We've verified that the bb-browser is actually working and we can show some nice details when we fetch the CAS/AC ids from the kubernetes pods logs.
It would be very useful if these things can be reported in the client somehow, so that a user does not have to look through the logs to get this detailed info.
Any ideas on what might be the issue and how we can resolve it?
I am trying to use docker-compose
deployment and I am getting this error:
runner-installer_1 | /bb
worker-ubuntu16-04_1 | 2020/07/27 04:41:42 Runner is not ready yet: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /worker/runner: connect: no such file or directory"
I have not made any changes to the repo.
It is possible for the runner container to fail to start, due to this error:
runner-ubuntu16-04_1 | [FATAL tini (7)] exec /bb/bb_runner failed: Text file busy
This may happen if the runner container is started while the bb_runner binary is still being copied.
The new config jsonnet files in the bare deployment example are inconsistent with themselves, for example referencing "blobstore.clientBlobstore" rather than the correct "blobstore.client_blobstore" entry. Other examples are all the metrics references.
Once those are fixed, the resulting json files are not acceptable to the various bb-applications because the resulting files contain entries that are "unknown", for example bb_event_service complain about "grpc.GRPCClientConfiguration.endpoint".
Also, the run.sh script assumes that the json files are in the config directory, but the current state is that it only contain jsonnet files, which means that the script will need to add a jsonnet -> json conversion to be usable out of the box.
Hello developer,
Is there a wizard that can show me how to connect buildbarn with goma (remoteexec_proxy) please?
Best Regards
Seems rbe_autoconfig
is getting depreciated in favour of a new script:
https://github.com/bazelbuild/bazel-toolchains#where-is-rbe_autoconfig
We should update the instructions to reflect this.
GKE clusters has an option to run nodes on preemptive VMs, which are much cheaper: https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
Is it safe to deploy Buildbarn to such a cluster?
Hi, the build fails with duplicate declarations as below. Running in Windows 11 Pro in a Git Bash (as the build requires bash for patching). Any clues as to how to fix?
$ bazel build -- ///bare:bare
DEBUG: C:/users/kschz/_bazel_kschz/u6fkmli4/external/com_github_bazelbuild_remote_apis/repository_rules.bzl:12:10: The switched_rules_by_language macro is deprecated. Consumers of @bazel_remote_apis should specify per-language dependencies in their own workspace.
INFO: Analyzed target //bare:bare (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: C:/users/kschz/_bazel_kschz/u6fkmli4/external/com_github_buildbarn_go_xdr/pkg/compiler/parser/BUILD.bazel:10:11: GoCompilePkg external/com_github_buildbarn_go_xdr/pkg/compiler/parser/parser.a [for host] failed: (Exit 1): builder.exe failed: error executing command bazel-out\host\bin\external\go_sdk\builder.exe compilepkg -sdk external/go_sdk -installsuffix windows_amd64 -src external/com_github_buildbarn_go_xdr/pkg/compiler/parser/xdr_base_listener.go -src ... (remaining 37 arguments skipped)
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_base_listener.go:8:6: BaseXDRListener redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_base_listener.go:8:6: other declaration of BaseXDRListener
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:18:6: XDRLexer redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_lexer.go:21:6: other declaration of XDRLexer
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:25:5: xdrlexerLexerStaticData redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_lexer.go:28:5: other declaration of xdrlexerLexerStaticData
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:38:6: xdrlexerLexerInit redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_lexer.go:41:6: other declaration of xdrlexerLexerInit
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:237:6: XDRLexerInit redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_lexer.go:240:6: other declaration of XDRLexerInit
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:243:6: NewXDRLexer redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_lexer.go:246:6: other declaration of NewXDRLexer
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:262:2: XDRLexerT__0 redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_lexer.go:265:2: other declaration of XDRLexerT__0
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:263:2: XDRLexerT__1 redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_lexer.go:266:2: other declaration of XDRLexerT__1
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:264:2: XDRLexerT__2 redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_lexer.go:267:2: other declaration of XDRLexerT__2
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:265:2: XDRLexerT__3 redeclared in this block
external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr_lexer.go:268:2: other declaration of XDRLexerT__3
bazel-out\host\bin\external\com_github_buildbarn_go_xdr\pkg\compiler\parser\xdr.go\xdr_lexer.go:265:2: too many errors
compilepkg: error running subcommand external\go_sdk\pkg\tool\windows_amd64\compile.exe: exit status 2
Target //bare:bare failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.442s, Critical Path: 0.08s
INFO: 4 processes: 4 internal.
FAILED: Build did NOT complete successfully
Hi,
If I pick the bb-remote-execution version reported in the master branch of this repo (e664853), I get many of these
3: Failed to obtain input directory ".": Buffer is 158 bytes in size, while a maximum of 0 bytes is permitted
In fact, at that point in time, the cas.blobAccessDirectoryFetcher.maximumMessageSizeBytes
member is left uninitialized by the NewBlobAccessDirectoryFetcher
function.
I know that it will be fixed in a following commit (buildbarn/bb-remote-execution@85954ef) but I am not sure which commit I should pick such that the combination is stable.
The example k8s doesn't work with the newer version of the docker images that have been published. It would be great to get an updated example to how to work with the newer images. Thanks!
I'd like to try and use buildbarn, but I've got an ARM cluster. Could using docker buildx make build barn work on ARM?
The latest version of Buildbarn contains many changes to the configuration file schema. This causes the existing deployments to break. The Docker Compose based one has been updated to use the latest container images, but the bare and Kubernetes deployments remain unchanged.
These deployments should be patched up to work once more.
Hi, I'm trying to setup a working buildbarn cluster following the instructions provided in README.md.
I'm a little wary about the first line from this part:
build:mycluster --bes_backend=grpc://fill-in-the-frontend-service-hostname-here:8985
build:mycluster --bes_results_url=http://fill-in-the-browser-service-hostname-here/build_events/bb-event-service/
build:mycluster --remote_executor=grpc://fill-in-the-frontend-service-hostname-here:8980
build:mycluster --remote_instance_name=remote-execution
According to bazel document, --bes_backend
should point to bb-event-service, not bb-frontend-service. Is my understanding correct?
The config files for Kubernetes do not match the new protobufs:
It would be useful to have a working example of sharded storage.
I am trying to run the abseil-hello program using docker-compose. With the code as it was, I was getting the repeating error message:
worker-ubuntu16-04_1 | 2022/07/18 10:45:02 Worker {"datacenter":"paris","hostname":"ubuntu-worker.example.com","rack":"4","slot":"15","thread":"5"}: rpc error: code = Unavailable desc = Failed to synchronize with scheduler: connection error: desc = "transport: Error while dialing failed to do connect handshake, response: \"HTTP/1.1 403 Forbidden\\r\\nContent-Length: 3380\\r\\nConnection: keep-alive\\r\\nContent-Language: en\\r\\nContent-Type: text/html;charset=utf-8\\r\\nDate: Mon, 18 Jul 2022 10:37:57 GMT\\r\\nMime-Version: 1.0\\r\\nServer: squid\\r\\nVary: Accept-Language\\r\\nX-Cache: MISS from localhost\\r\\nX-Cache-Lookup: NONE from localhost:8080\\r\\nX-Squid-Error: ERR_ACCESS_DENIED 0\\r\\n\\r\\n\\n<html><head>\\n<meta type=\\\"copyright\\\" content=\\\"Copyright (C) 1996-2017 The Squid Software Foundation and contributors\\\">\\n<meta http-equiv=\\\"Content-Type\\\" content=\\\"text/html; charset=utf-8\\\">\\n<title>ERROR: The requested URL could not be retrieved</title>\\n<style type=\\\"text/css\\\"><!--\\n /*\\n * Copyright (C) 1996-2017 The Squid Software Foundation and contributors\\n *\\n * Squid software is distributed under GPLv2+ license and includes\\n * contributions from numerous individuals and organizations.\\n * Please see the COPYING and CONTRIBUTORS files for details.\\n */\\n\\n/*\\n Stylesheet for Squid Error pages\\n Adapted from design by Free CSS Templates\\n http://www.freecsstemplates.org\\n Released for free under a Creative Commons Attribution 2.5 License\\n*/\\n\\n/* Page basics */\\n* {\\n\\tfont-family: verdana, sans-serif;\\n}\\n\\nhtml body {\\n\\tmargin: 0;\\n\\tpadding: 0;\\n\\tbackground: #efefef;\\n\\tfont-size: 12px;\\n\\tcolor: #1e1e1e;\\n}\\n\\n/* Page displayed title area */\\n#titles {\\n\\tmargin-left: 15px;\\n\\tpadding: 10px;\\n\\tpadding-left: 100px;\\n\\tbackground: url('/squid-internal-static/icons/SN.png') no-repeat left;\\n}\\n\\n/* initial title */\\n#titles h1 {\\n\\tcolor: #000000;\\n}\\n#titles h2 {\\n\\tcolor: #000000;\\n}\\n\\n/* special event: FTP success page titles */\\n#titles ftpsuccess {\\n\\tbackground-color:#00ff00;\\n\\twidth:100%;\\n}\\n\\n/* Page displayed body content area */\\n#content {\\n\\tpadding: 10px;\\n\\tbackground: #ffffff;\\n}\\n\\n/* General text */\\np {\\n}\\n\\n/* error brief description */\\n#error p {\\n}\\n\\n/* some data which may have caused the problem */\\n#data {\\n}\\n\\n/* the error message received from the system or other software */\\n#sysmsg {\\n}\\n\\npre {\\n font-family:sans-serif;\\n}\\n\\n/* special event: FTP / Gopher directory listing */\\n#dirmsg {\\n font-family: courier;\\n color: black;\\n font-size: 10pt;\\n}\\n#dirlisting {\\n margin-left: 2%;\\n margin-right: 2%;\\n}\\n#dirlisting tr.entry td.icon,td.filename,td.size,td.date {\\n border-bottom: groove;\\n}\\n#dirlisting td.size {\\n width: 50px;\\n text-align: right;\\n padding-right: 5px;\\n}\\n\\n/* horizontal lines */\\nhr {\\n\\tmargin: 0;\\n}\\n\\n/* page displayed footer area */\\n#footer {\\n\\tfont-size: 9px;\\n\\tpadding-left: 10px;\\n}\\n\\n\\nbody\\n:lang(fa) { direction: rtl; font-size: 100%; font-family: Tahoma, Roya, sans-serif; float: right; }\\n:lang(he) { direction: rtl; }\\n --></style>\\n</head><body id=\\\"ERR_ACCESS_DENIED\\\">\\n<div id=\\\"titles\\\">\\n<h1>ERROR</h1>\\n<h2>The requested URL could not be retrieved</h2>\\n</div>\\n<hr>\\n\\n<div id=\\\"content\\\">\\n<p>The following error was encountered while trying to retrieve the URL: <a href=\\\"scheduler:8983\\\">scheduler:8983</a></p>\\n\\n<blockquote id=\\\"error\\\">\\n<p><b>Access Denied.</b></p>\\n</blockquote>\\n\\n<p>Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect.</p>\\n\\n<p>Your cache administrator is <a href=\\\"mailto:admin@localhost?subject=CacheErrorInfo%20-%20ERR_ACCESS_DENIED&body=CacheHost%3A%20localhost%0D%0AErrPage%3A%20ERR_ACCESS_DENIED%0D%0AErr%3A%20%5Bnone%5D%0D%0ATimeStamp%3A%20Mon,%2018%20Jul%202022%2010%3A37%3A57%20GMT%0D%0A%0D%0AClientIP%3A%2010.212.185.2%0D%0A%0D%0AHTTP%20Request%3A%0D%0ACONNECT%20%2F%20HTTP%2F1.1%0AUser-Agent%3A%20grpc-go%2F1.45.0%0D%0AHost%3A%20scheduler%3A8983%0D%0A%0D%0A%0D%0A\\\">admin@localhost</a>.</p>\\n<br>\\n</div>\\n\\n<hr>\\n<div id=\\\"footer\\\">\\n<p>Generated Mon, 18 Jul 2022 10:37:57 GMT by localhost (squid)</p>\\n<!-- ERR_ACCESS_DENIED -->\\n</div>\\n</body></html>\\n\""
So I deleted the url 'ubuntu-worker.example.com' and replaced it with 'localhost'. I received the following error instead:
worker-ubuntu16-04_1 | 2022/07/18 11:19:42 Worker {"datacenter":"paris","hostname":"localhost","rack":"4","slot":"15","thread":"4"}: rpc error: code = Unavailable desc = Failed to synchronize with scheduler: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:8983: connect: connection refused"
Both errors were repeating continuously. Any suggestions?
I tried running the bare configurations and it seems that they fail to start up as the configurations have been updated to work with newer releases but the binaries were not were not (for example bb-storage
is trying to build from a commit thats more than a year old).
I tried updating the bb-storage
& other repository versions in WORKSPACE.bazel but that fails with strange patching errors.
Maybe this would be a good time to rethink the process, for example instead of using bazel to build all the parts, make a github action to build master
/ release
versions of the different binaries (bb-storage
, bb-browser
, etc.) so they don't need to be rebuilt to run in this example setup?
In essence this is what happens with the docker / k8s setup - the images are not built, but rather pulled in from the image registries.
Currently the README states that one can use the RBE container image by using the rbe_autoconfig
rule from bazel-toolchains. However, this will only work when running on Linux, otherwise failing the build:
ERROR: An error occurred during the fetch of repository 'rbe_default':
Traceback (most recent call last):
File "/private/var/tmp/_bazel_mcarl/b866f019b4047dbc85cbb95ed2dc14e0/external/bazel_toolchains/rules/rbe_repo.bzl", line 505
validate_host(ctx)
File "/private/var/tmp/_bazel_mcarl/b866f019b4047dbc85cbb95ed2dc14e0/external/bazel_toolchains/rules/rbe_repo/util.bzl", line 147, in validate_host
fail(<1 more arguments>)
Not running on linux host, cannot run rbe_autoconfig.
Hello BuildBarn folks ๐ธ
I am so happy to see you have jsonnet examples for bare
and docker-compose
! Is it possible to add jsonnet libs for the kubernetes deployments? We source most of our kubernetes manifests remotely (via bazel and rules_jsonnet), so this would make maintaining these manifests internally much simpler.
I wouldn't mind putting in a PR for this, if it will be reviewed and accepted?
Thank you!
As per https://github.com/buildbarn/bb-adrs/blob/master/0002-storage.md
Setting up configuration with this doesn't seem to match (tbf that doc is a few years old) for example in my common.libsonnet
I have
contentAddressableStorage: {
sharding: {
hashInitialization: 11946695773637837490,
shards: [
{
backend: {
mirrored: {
backendA: { grpc: { address: 'storage-a-0.storage.buildbarn:7982' } },
backendB: { grpc: { address: 'storage-a-1.storage.buildbarn:7982' } },
},
},
weight: 1,
},
{
backend: {
mirrored: {
backendA: { grpc: { address: 'storage-b-0.storage.buildbarn:7982' } },
backendB: { grpc: { address: 'storage-b-1.storage.buildbarn:7982' } },
},
},
weight: 1,
},
{
backend: {
mirrored: {
backendA: { grpc: { address: 'storage-c-0.storage.buildbarn:7982' } },
backendB: { grpc: { address: 'storage-c-1.storage.buildbarn:7982' } },
},
},
weight: 1,
},
],
},
},
This results in errors in the frontend / browser services as
2023/12/04 17:19:16 Fatal error: rpc error: code = InvalidArgument desc = Failed to create Content Addressable Storage: Replicator configuration not specified
The diagram in bare/ shows a single bb_storage instance, but the example creates two instances with the frontend.jsonnet and storage.jsonnet config files. Would it be better to fix the diagram to match the example, or vice versa?
Setting the JDK to 11 seems to cause the worker to hang on
Action external/bazel_tools/tools/jdk/platformclasspath_classes/DumpPlatformClassPath.class [for host];
Action external/bazel_tools/tools/jdk/platformclasspath_classes/DumpPlatformClassPath.class;
For the record we updated to JDK11 by changing bazelrc values:
build:rbe-ubuntu16-04 --host_java_toolchain=@bazel_tools//tools/jdk:toolchain_java11
build:rbe-ubuntu16-04 --host_javabase=@bazel_tools//tools/jdk:remote_jdk11
build:rbe-ubuntu16-04 --java_toolchain=@bazel_tools//tools/jdk:toolchain_java11
build:rbe-ubuntu16-04 --javabase=@bazel_tools//tools/jdk:remote_jdk11
Everything else is the same as the example in the README. I feel I am missing some obvious config option. Please let me know if you support JDK 11/have encountered this.
Currently, while we document how to set Buildbarn up, we don't really show how to use platform properties in conjunction with it.
I deployed Buildbarn to Kubernetes engine, but there are some issues:
error while evaluating the ingress spec: service "buildbarn/browser" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
Looks like bb-browser.example.com
is a placeholder, but the readme does not mention that.
Hello @EdSchouten
I tried with example with build barn to check how it works and all.
First thing I see is there is no build even service deployment is in the bare run.sh, so I manually deployed using below command :
../../bb-event-service/bazel-bin/cmd/bb_event_service/linux_amd64_pure_stripped/bb_event_service -blobstore-config blobstore-storage-clients.conf -web.listen-address localhost:7983
I picked the port as in the docker-compose file.
And below are the build options I chose in my bazelrc file:
build --experimental_strict_action_env --spawn_strategy=remote --genrule_strategy=remote --strategy=Javac=remote --strategy=Closure=remote --remote_executor=localhost:8980
build --jobs=8 --remote_instance_name=local --bes_backend=localhost:7984 --bes_results_url=http://localhost:7984/build_events/bb-event-service/
I am able to see the action results and able to view in browser url (http://localhost:7984/action/local/0488e607fb8389d0b3f92e96973f009f2347c8aec383e759a19877f8e93721c9/142/), and log here:
-bash-4.2$ 2019/04/30 15:11:16 Action: http://localhost:7984/action/local/73104ef5a33e79746cf6271dc47e4638ed2404fe24a9c2cf383029237c456ed5/142/
2019/04/30 15:11:16 ExecuteResponse: result:<output_files:<path:"bazel-out/k8-fastbuild/bin/main/_objs/hello-world/hello-world.pic.d" digest:<hash:"39350f1b30c43496f8193580cb062a17bef2f91ebe1d02c5e6fae17632866b69" size_bytes:8486 > > output_files:<path:"bazel-out/k8-fastbuild/bin/main/_objs/hello-world/hello-world.pic.o" digest:<hash:"89b366c59d8d24860bc790d46da131bbfa1e7e8edb7b1306b8243b1b75896145" size_bytes:6632 > > > message:"Action details (cached result): http://localhost:7984/action/local/73104ef5a33e79746cf6271dc47e4638ed2404fe24a9c2cf383029237c456ed5/142/"
2019/04/30 15:11:16 Action: http://localhost:7984/action/local/371b1a71b0e05eea276bc63598b440b6e79dde745bca2c821c4544a53d9bd4b1/142/
2019/04/30 15:11:17 ExecuteResponse: result:<output_files:<path:"bazel-out/k8-fastbuild/bin/main/hello-world" digest:<hash:"634dad4e327c62c3d517792d73e8bcc3219b74f495a5c72496fc25b741d23f43" size_bytes:13920 > is_executable:true > > message:"Action details (cached result): http://localhost:7984/action/local/371b1a71b0e05eea276bc63598b440b6e79dde745bca2c821c4544a53d9bd4b1/142/"
I'm having issues uploading build event protocol results, I see the error in the client is :
ERROR: The Build Event Protocol upload failed: INTERNAL: First received frame was not SETTINGS. Hex dump for first 5 bytes: 485454502f
INFO: Partial Build Event Protocol results may be available at http://localhost:7983/build_events/local/5c6d5f8e-5277-43a0-ba6d-3a69e1cc2c18
Please let me know how I can get rid of this error and make it working and able to see the results in the browser.
(And)
I have one more question that we already have a remote cache server already (nginx), Is there any way we can just use browser and build event protocol rather than using all the build barn bundle and execute the build using the current existing remote cache. If so could you please forward me the process of how I can achieve that would be more appreciated.
Buildbarn allows action to modify input files, however bazel local build does't allow the action to do this.
here is an easy action which modifies input files
def _convert_to_uppercase_impl(ctx):
in_file = ctx.file.input
out_file = ctx.actions.declare_file("hello_world")
ctx.actions.run_shell(
outputs = [out_file],
inputs = [in_file],
arguments = [in_file.path, out_file.path],
command = "echo \"command exit not normally\" > $1 && cat $1 >> $2",
)
return [DefaultInfo(files = depset([out_file]))]
bazel build local result:
: shell_command/foo.txt: Read-only file system
bazel build remote with buildbarn successfully
I guess maybe I use the root as the runner image's user causes the behavior? But the example in the bb-deployment also uses the root as the runner image's user.
The behavior results in the bug that, when someone modifies his own 0 byte intput file, other people's 0 byte input file is also affeted and causes the build fail and hard to reproduce the problem, because the 0 byte input files are all hard-linked to same file
Hi,
Looking through the Kubernetes deployment files I have a query on the labels/config that are set that hoping you can clarify.
In the config/storage.yaml the following is set for the action cache
"allowAcUpdatesForInstances": ["bb-event-service", "ubuntu16-04"]
My assumption (could be totally wrong :)) is that this relates to the instance label on the relevant pods ??
in the worker-ubuntu16-04.yaml I can see that the instance is set; instance: ubuntu16-04
in event-service.yaml there is no instance label.
Is this label required ?
Thanks
Jon
Hi there,
I've set up Buildbarn on a bare metal K8s cluster (running Talos), using Rook for the underlying PersistentVolumeClaim
s. The behavior I'm seeing is that is that remote builds against this build farm instance will get to a certain point, and then just freeze, e.g:
$ bazel build --config=remote-local //some/target
...
[56 / 411] 3 actions, 0 running
GoLink nogo_actual_/nogo_actual [for tool]; 10459s remote
Compiling src/google/protobuf/compiler/main.cc [for tool]; 10459s remote
GoLink external/io_bazel_rules_go/go/tools/builders/go-protoc-bin_/go-protoc-bin [for tool]; 10459s remote
I originally thought it was a result of the Rook storage being slow, but I independently benchmarked that and found it was plenty fast (100+ MB/s). My next guess was that the volumes didn't have enough storage space, but those looked fine as well:
/dev/rbd1 49G 3.4G 46G 7% /storage-cas
/dev/rbd0 2.0G 588K 1.9G 1% /storage-ac
I made some minor changes to the given storage.yaml
, but nothing that I'd expect to cause this:
$ git diff storage.yaml
diff --git a/kubernetes/storage.yaml b/kubernetes/storage.yaml
index 68a0cfe..41f58c2 100644
--- a/kubernetes/storage.yaml
+++ b/kubernetes/storage.yaml
@@ -62,17 +62,19 @@ spec:
spec:
accessModes:
- ReadWriteOnce
+ storageClassName: rook-ceph-block
resources:
requests:
- storage: 12Gi
+ storage: 50Gi
- metadata:
name: ac
spec:
accessModes:
- ReadWriteOnce
+ storageClassName: rook-ceph-block
resources:
requests:
- storage: 1Gi
+ storage: 2Gi
And lowered the replica count on the workers, as I was crashing my dusty old servers (I'm currently running 3 nodes):
$ git diff worker-ubuntu22-04.yaml
diff --git a/kubernetes/worker-ubuntu22-04.yaml b/kubernetes/worker-ubuntu22-04.yaml
index bbec852..59929f3 100644
--- a/kubernetes/worker-ubuntu22-04.yaml
+++ b/kubernetes/worker-ubuntu22-04.yaml
@@ -7,7 +7,7 @@ metadata:
prometheus.io/port: "80"
prometheus.io/scrape: "true"
spec:
- replicas: 8
+ replicas: 3
build:remote --remote_download_toplevel
build:remote --dynamic_mode=off
build:remote --jobs=3
build:remote --extra_execution_platforms=//tools/remote-toolchains:ubuntu-act-22-04-platform
# build:remote --extra_toolchains=//tools/remote-toolchains:all
build:remote-local --config=remote
build:remote-local --remote_executor=grpc://192.168.5.2:8980
build:remote-ci --config=remote
build:remote-ci --remote_executor=grpc://10.98.227.156:8980
Any ideas?
With docker on macOS, the idea about sharing /worker/runner
as a unix-socket across worker
and runner
fails with the error
gRPC server failure: rpc error: code = Unknown desc = Failed to create listening socket for "/worker/runner": listen unix /worker/runner: bind: file name too long
This appears to be due to this: moby/moby#23545 (comment)
The bb_runner
's socket directory is shared via mount, and in macOS that is consumed as full path. This leads to size overflow. Can we run runner
and worker
on the same container?
Hello,
I've started using buildbarn with docker-compose setup to facilitate hermetic builds in my bazel project. After few days of hassle-free experience I encountered with error:
Failed to store previous blob ...: Shard 0: Blob is 296016884 bytes in size, while this backend is only capable of storing blobs of up to 238608384 bytes in size
I see 2 issues here:
Its not obvious how to increase this limit. I tried to bump some constants in config/storage.jsonnet
but it won't help. According to bb-storage
source it should be equal to int64(sectorSizeBytes)*blockSectorCount
but none of this constants present in example configuration. 238608384 is not even divisible by 2^20.
I think 227 MiB too small to be a default blob size limit. Its common for C++ projects that binaries with debug symbols may weight in hundred of megabytes.
Thank you!
I'm trying to expose the frontend service to the outside world from OpenShift cluster with the assumption that it is gRPC (over HTTP/2) is that correct or it is custom protocol which needs tcp tunnel?
Sometimes #39 issue is not there and docker-compose
works without any warnings. In that case, when I try to build using bazel, I run into following:
INFO: Invocation ID: ccdc75c7-6884-4795-a2c8-13a572417bd0
DEBUG: /root/.cache/bazel/_bazel_root/8c069df52082beee3c95ca17836fb8e2/external/bazel_toolchains/rules/rbe_repo/version_check.bzl:68:14:
Current running Bazel is ahead of bazel-toolchains repo. Please update your pin to bazel-toolchains repo in your WORKSPACE file.
DEBUG: /root/.cache/bazel/_bazel_root/8c069df52082beee3c95ca17836fb8e2/external/bazel_toolchains/rules/rbe_repo.bzl:491:10: Bazel 3.4.1 is used in rbe_default.
INFO: Analyzed target //main:hello-world (17 packages loaded, 53 targets configured).
INFO: Found 1 target...
ERROR: /app/main/BUILD:3:10: C++ compilation of rule '//main:hello-world' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed catastrophically UNAVAILABLE: No workers exist for instance "remote-execution" platform {"properties":[{"name":"OSFamily","value":"Linux"},{"name":"container-image","value":"docker://marketplace.gcr.io/google/rbe-ubuntu16-04@sha256:1a8ed713f40267bb51fe17de012fa631a20c52df818ccb317aaed2ee068dfc61"}]}
Target //main:hello-world failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 7.582s, Critical Path: 3.32s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
I ran command:
bazel build --config=mycluster-ubuntu16-04 //...
I am compiling a C++ bazel app with docker-compose
but I am getting the following error. My guess is that workers are not being attached properly but I don't have enough knowledge about it. Can someone help with this error?
ERROR: /Users/deep/Development/uptimize/rough-bazel-project/BUILD:1:1: C++ compilation of rule '//:hello-cio' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed FAILED_PRECONDITION: No workers exist for instance "remote-execution" platform {"properties":[{"name":"OSFamily","value":"Linux"},{"name":"container-image","value":"docker://marketplace.gcr.io/google/rbe-ubuntu16-04@sha256:ac36d37616b044ee77813fc7cd36607a6dc43c65357f3e2ca39f3ad723e426f6"}]}
When I try build my project, I got
ERROR: /data/myname/work/test/2023/2023-10-17/basis/BUILD:67:11: Compiling sbase/models/kv/zmap.cpp failed: (Exit 34): 6 errors during bulk transfer:
java.io.IOException: Error while uploading artifact with digest '6a0cc14b6acc6b7e3b6e4b3feb8886844eb57ebb87fbcd24042c220cecadf9d4/1193960'
java.io.IOException: Error while uploading artifact with digest '102292b1aeafa3928c2f345a0da6a6fe2b3b99c7dd65eb957df3c36dffec7798/1602653'
java.io.IOException: Error while uploading artifact with digest 'e33be885f01e4a828eee2397194577c390ed1fbd23c5e0879b1e8ac619e36d42/1956836'
java.io.IOException: Error while uploading artifact with digest '9bbe3936f7a8c7d5d33e57d26bcc3edb40eef44b23d7495ae09cc93b424fad5c/2328744'
java.io.IOException: Error while uploading artifact with digest 'baa46cddf79e51d5201fb7764a7fba505e4e8fd108e3e23b20bc7f43e99e131a/1367011'
java.io.IOException: Error while uploading artifact with digest '49199a0bc4890b48bab688db5abe1e69f34f4339affd24778104c10b80d7a179/1249715'
and sometimes I got
at io.grpc.Status.asRuntimeException(Status.java:535)
... 10 more
Suppressed: java.io.IOException: Error while uploading artifact with digest 'e33be885f01e4a828eee2397194577c390ed1fbd23c5e0879b1e8ac619e36d42/1956836'
at com.google.devtools.build.lib.remote.ByteStreamUploader.lambda$uploadBlobAsync$0(ByteStreamUploader.java:171)
at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:203)
at com.google.common.util.concurrent.AbstractCatchingFuture$AsyncCatchingFuture.doFallback(AbstractCatchingFuture.java:190)
at com.google.common.util.concurrent.AbstractCatchingFuture.run(AbstractCatchingFuture.java:133)
... 57 more
Caused by: io.grpc.StatusRuntimeException: UNKNOWN: HTTP status code 413
invalid content-type: text/html
headers: Metadata(:status=413,server=SGW,date=Fri, 20 Oct 2023 03:56:27 GMT,content-type=text/html,content-length=176)
DATA-----------------------------
<html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>
from the execution_log_json_file
, I found the artifact in the error output.
{
"path": "external/boost~1.80.0/include/boost/geometry/srs/projections/epsg_traits.hpp",
"digest": {
"hash": "e33be885f01e4a828eee2397194577c390ed1fbd23c5e0879b1e8ac619e36d42",
"sizeBytes": "1956836",
"hashFunctionName": "SHA-256"
}
They are boost headers, which are too big.
May I ask, which component sets the threshold for this size? buildbarn? bazel? Or is it my nginx proxy?
What should I do to make my task work properly?
BTW, I have try modify the "size bytes" settings in https://github.com/buildbarn/bb-deployments/blob/master/kubernetes/config/common.yaml. But It seems not work for me.
Docker containers are out of date, therefore kubernetes deployments won't work out of the box with the new config files
Hi, after setting up builbarn via ./run.sh
, attempting to perform bazel build with the --remote_cache
flag pointing to the instance returns the following error:
INFO: Invocation ID: 627d373c-0f1f-44d6-93c6-0dcbd09812a3
ERROR: --remote_upload_local_results is set, but the current account is not authorized to write local results to the remote cache.
Any ideas what could be causing this? The part of the code which generates this error in bazel is here.
Since bazel is a great build tool for monorepo. However, this project splits services into multiple repo, which make it hard to see codes, package images and deploy.
Actually, since k8s test-infra that have multiple micro-services use bazel to build monorepo. So I wonder why not merge them into one repo? It is possible to support this?
I ended up getting around this by using a subPath.
The error happens because a non-root user doesn't have the necessary rights to do chmod 0777 /worker
command:
- sh
- -c
- mkdir -pm 0777 /worker/build && mkdir -pm 0700 /worker/cache && chmod 0777 /worker
I thought the following might help, but I don't think it affects empty dirs:
pod.spec.securityContext.fsGroup
Hello,
Although it's not all covered here I use mostly the configuration documented have you come across this issue before ?
Any ideas on how to resolve it would be much appreciated.
# k get pods -n buildbarn -w
NAME READY STATUS RESTARTS AGE
asset-0 0/1 CrashLoopBackOff 6 (93s ago) 7m26s
# k logs pod/asset-0 -n buildbarn
2023/12/20 12:57:14 Fatal error: rpc error: code = Unimplemented desc = Failed to create asset store and CAS: NewBlockListGrowthPolicy unimplemeted for assetBlobAccessCreator
Hi,
I tried to get started with the Recommended Setup:
run.sh
bash;.bazelrc
file;bazel build
command;And I got this error:
ERROR: Config value 'rbe-ubuntu-16-04' is not defined in any .rc file
Contents of .bazelrc
file:
build:mycluster --remote_executor=grpc://localhost:8980
build:mycluster --remote_instance_name=remote-execution
build:mycluster-ubuntu16-04 --config=mycluster
build:mycluster-ubuntu16-04 --config=rbe-ubuntu16-04
build:mycluster-ubuntu16-04 --jobs=64
My bazel version: 5.1.1
My OS info: Ubuntu 18.04 x86_64
Am I missing something out? Thanks in advance!!
I'm having trouble using remote execution on ubuntu focal 20.04
. The same works flawlessly on ubuntu bionic 18.04
.
Bazel version is 3.3.0
.
Here are the logs from my execution:
logs.txt
What am I doing wrong?
As of bazel 0.27 execution strategy is automatically selected (blog post).
When attempting a build with the provided bazelrc settings, the build fails with this error message:
ERROR: /home/ebongers/.cache/bazel/_bazel_ebongers/e3be84758b49246fcf039c20384168ff/external/bazel_gazelle/vendor/github.com/bazelbuild/buildtools/build/BUILD.bazel:5:1: Executing genrule @bazel_gazelle//vendor/github.com/bazelbuild/buildtools/build:parse.y.go_yacc failed: No usable spawn strategy found for spawn with mnemonic Genrule. Your --spawn_strategyor --strategy flags are probably too strict. Visit https://github.com/bazelbuild/bazel/issues/7480 for migration advises
Please update the documentation or provided bazelrc file to reflect this change.
Settings that work for me are:
# Common settings for remote builds.
build:remote --experimental_strict_action_env
build:remote --spawn_strategy=remote,linux-sandbox
Hi,
I'm new to Bazel and I'm testing the kubernetes deployment ontop of a cluster( Azure AKS). Unfortunately the remote instance is always trying to start instances of containers based on the rbe images from gcr.io. which are only accessible if i prepull the images on the target hosts. it should be possible to reference the buildbarn images rbe images (https://hub.docker.com/r/buildbarn/bb-runner-ubuntu16-04), right? i would like to avoid prepulling the images.
i have modified the following config to point to the publicly available buildbarn image instead
https://github.com/Qinusty/bb-deployments/blob/qinusty/fix-k8s-deploy/kubernetes/config/worker-ubuntu16-04.yaml#L27
is there any additional steps required to get a remote environment on kubernetes up and running
The error i receive is.
ARNING: option '--remote_instance_name' was expanded to from both option '--config=mycluster-ubuntu16-04' (source command line options) and option '--config=mycluster-ubuntu16-04' (source command line options) WARNING: option '--remote_instance_name' was expanded to from both option '--config=mycluster-ubuntu16-04' (source command line options) and option '--config=mycluster-ubuntu16-04' (source command line options) INFO: Invocation ID: c8ec811a-32b4-49fb-ae1b-a758318c1a99 INFO: Streaming build results to: http://xxxxx/build_events/bb-event-service/c8ec811a-32b4-49fb-ae1b-a758318c1a99 INFO: Analyzed 331 targets (44 packages loaded, 1757 targets configured). INFO: Found 331 targets... INFO: Deleting stale sandbox base /home/ivan/.cache/bazel/_bazel_root/5ca4ca104c78138db28b17b8a63ba397/sandbox ERROR: /home/ivan/Source/abseil-cpp/absl/base/BUILD.bazel:177:1: C++ compilation of rule '//absl/base:base' failed (Exit 34). Note: Remote connection/protocol failed with: execution failed FAILED_PRECONDITION: No workers exist for instance "remote-execution" platform {"properties":[{"name":"container-image","value":"docker://marketplace.gcr.io/google/rbe-ubuntu16-04@sha256:da0f21c71abce3bbb92c3a0c44c3737f007a82b60f8bd2930abc55fe64fc2729"}]} INFO: Elapsed time: 12.867s, Critical Path: 0.95s INFO: 0 processes. INFO: Streaming build results to: http://xxxxxxx/build_events/bb-event-service/c8ec811a-32b4-49fb-ae1b-a758318c1a99 FAILED: Build did NOT complete successfully
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.