bazelbuild / remote-apis-sdks Goto Github PK
View Code? Open in Web Editor NEWThis repository contains client libraries for the Remote Execution API https://github.com/bazelbuild/remote-apis
License: Apache License 2.0
This repository contains client libraries for the Remote Execution API https://github.com/bazelbuild/remote-apis
License: Apache License 2.0
https://github.com/bazelbuild/remote-apis-sdks/tree/master/go/pkg/balancer relies upon the APIs to be deleted, as announced here: grpc/grpc-go#3180
If this is essential functionality, it should be migrated ASAP. Also note that the replacement API is also experimental, so if outages caused by API changes are not acceptable, another approach should be found. Note that any transitive package dependencies should also be considered experimental.
remote_execution.proto
specifies for the ByteStream resource_name
that
If it [instance_name] is the empty path, the leading slash is omitted
The CAS download and upload code currently doesn't omit the slash, causing incompatibility with buildbox-casd and maybe other servers.
go/pkg/cas/upload.go: req.ResourceName = fmt.Sprintf("%s/uploads/%s/blobs/%s/%d", u.InstanceName, uuid.New(), digest.Hash, digest.SizeBytes)
go/pkg/casng/uploader.go: return fmt.Sprintf("%s/uploads/%s/blobs/%s/%d", instanceName, uuid.New(), hash, size)
go/pkg/client/cas_download.go: return fmt.Sprintf("%s/blobs/%s/%d", c.InstanceName, hash, sizeBytes)
go/pkg/client/cas_upload.go: return fmt.Sprintf("%s/uploads/%s/blobs/%s/%d", c.InstanceName, uuid.New(), hash, sizeBytes)
When we retry a failed Write call, we should respect the committed_bytes returned by the server and not redo everything from scratch, saving time and bandwidth.
Currently rexec assumes that CAS and Execution Service reside at the same endpoint (service). You cannot configure them independently. Access to the separate gRPC services shares the same gPRC connection.
I have started adding support for connecting to a CAS at a separate endpoint. We can use this for tracking (let me know if this is a bad idea, or something already being worked on!).
Node properties are a (relatively recent) part of the RE-API which is relied upon by Bazel's persistent remote workers feature: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/merkletree/MerkleTree.java#L398
Bazel uses this to "mark" some files as tools that should persist on the remote worker. This is used by RE implementations (definitely by EngFlow, unsure about Google's RBE).
However, https://github.com/bazelbuild/remote-apis-sdks doesn't support node properties at the moment, which, for example, makes it impossible to use the remotetool to debug actions that use remote persistent workers.
Introduce optional node properties specification throughout the rexec client layer, starting with adding it to the command proto:
message InputSpec {
...
map<string,NodeProperties> input_node_properties = 7;
}
// Either copied from [RE-API](https://github.com/bazelbuild/remote-apis/blob/main/build/bazel/remote/execution/v2/remote_execution.proto#L852) or introduce a proto dependency on it
message NodeProperties {
...
}
and then reading these in GetDirectoryTree, passing them through to rexec here, here and here.
Please let me know if you're okay with this plan, and I'll try to send a PR shortly.
Thank you!
When trying to use rexec
against BuildGrid, I got an error trying to batch upload/download blobs. Specifically because the blobs in the message was larger than what was advertised in the configured max batch size. It looks like the max batch size (MaxBatchSz) is hardcoded to a value just under 4MB, whereas BuildGrid advertised 2MB as part of it's CacheCapabilities:
remote-apis-sdks/go/client/cas.go
Line 139 in 3e40a08
Ideally, the cas client would check the CacheCapabilities endpoint for this value, and if set use that instead of it's default. It looks like no capabilities logic is currently in remote-apis-sdk
, so maybe this would be a good starting point?
When the uploads/downloads are unified, we don't really support cancellations -- the context may be canceled and the caller immediately returns, but the upload/download still happens in the background, consuming resources.
Ideally, for large (non-batched) reads and writes, if ALL clients interested in the digest are canceled, then we should terminate the RPC.
TestDownloadActionOutputsConcurrencyUsingBatch
consistently fails at master
on my Mac:
Executing tests from //go/pkg/client:client_test
-----------------------------------------------------------------------------
--- FAIL: TestDownloadActionOutputsConcurrency (0.00s)
--- FAIL: TestDownloadActionOutputsConcurrency/TestDownloadActionOutputsConcurrencyUsingBatch:true,UnifiedDownloads:true (20.66s)
cas_test.go:1633: 75 requests were made to BatchReadBlobs, wanted <= 50
FAIL
Since the BuildKite build is green, I presume it's a Mac issue.
If you have an action like:
work/
├── in
│ └── input.txt
└── out
└── link -> ../input.txt
on calling ComputeOutputsToUpload
on out
you get an error like work/in/input.txt is not under work/out
. This is allowed by the API (it says that output symlinks "may be links to other output files, or input files, ...").
I've had a bit of a look at tree.go
but so far haven't managed to fix it without making things worse. I'm pretty sure this is because of passing absPath
instead of execRoot
to loadFiles
, but simply altering that and passing path
instead of the empty string prefixes the outputs in an unexpected way.
Downloading from a single background thread would allow to batch downloading files from multiple actions, saving bandwidth and time.
# github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer
../go/pkg/mod/github.com/bazelbuild/[email protected]/go/pkg/balancer/gcp_balancer.go:35:9: cannot use &gcpBalancer literal (type *gcpBalancer) as type balancer.Balancer in return argument:
*gcpBalancer does not implement balancer.Balancer (missing ResolverError method)
It modifies a singleton instance of a SingleFlightCache, and therefore only passes if the test cases run in a particular order.
On both inputs and outputs, currently only directories that have a file somewhere underneath will be uploaded/downloaded.
This refers to the case where Execute returns a NOT_FOUND because of a cache miss. This is not supposed to happen often (in fact, ongoing RBE work is under way to make sure that does not happen at all), but it can happen with other backends. In this case, we need to re-upload missing inputs and retry the whole flow.
The things we need to add are, to begin with:
Whether the action was a Action Cache hit is already obvious from the ResultStatus
.
Only read from the file when the chunk needs to be uploaded. Read by chunks to upload, too -- it's okay to have a bigger in-memory buffer, if that proves more efficient than repeated IO, but what's not okay is reading the entire file contents into memory before the upload even starts (unless it's for a small file). Current situation is even worse: ALL the inputs are read into memory first thing, even before we start querying GetMissingBlobs.
The structure will https://github.com/golang-standards/project-layout. I'm adding new packages based on this, but the old ones need refactoring.
Currently we just package all files as executable.
Currently, we translate an input of "a/b/../foo" to a tree containing only a/foo. Ideally, it would be nice to translate it to a tree which also includes the a/b directory, even though it is not "really" used, because the bot would expect that.
Note: this would require #42 plus empty directories support on the server side as well.
Uploading from a single background thread would allow to batch uploads from multiple actions, saving bandwidth and time.
The rexec
tool is a great one stop shop for submitting and executing jobs synchronously.
We have a use case where we would like to get results directly from the ActionCache/CAS given only an ActionId (without the original input root, so we cannot submit the job again).
Would you be open to an additional command line tool dedicated to CAS interaction? Specifically, it would support upload/download of blobs and trees. Additionally, it would be able to interact with the ActionCache and check/download results.
We would be happy to contribute it!
The API specifies that action output paths should be relative to the working directory and not exec root. We have been doing it wrong since the beginning in RBE; now RBE supports both paths (to be backwards-compatible). We should modify the SDK to switch to the workdir relative paths wherever applicable.
It is possible that this will require no SDK changes per se, but rather changes to clients upstream of the SDK (such as the re-client).
This is an external tracking issue for this work as it concerns clients using the SDK.
TestBatchUpdateBlobsIndividualRequestRetries flaked out with:
--- FAIL: TestBatchUpdateBlobsIndividualRequestRetries (0.44s)
batch_retries_test.go:189: client.BatchWriteBlobs(ctx, blobs) diff on request at index 0 (want -> got):
(*remoteexecution.BatchUpdateBlobsRequest)(
- s`instance_name:"instance" requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000a" size_bytes:1 > data:"\001" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000b" size_bytes:1 > data:"\002" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000c" size_bytes:1 > data:"\003" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000d" size_bytes:1 > data:"\004" > `,
+ s`instance_name:"instance" requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000d" size_bytes:1 > data:"\004" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000a" size_bytes:1 > data:"\001" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000b" size_bytes:1 > data:"\002" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000c" size_bytes:1 > data:"\003" > `,
)
FAIL
IIUC, currently the file tree produced by ComputeMerkeTree()
remote-apis-sdks/go/pkg/tree/tree.go
Line 110 in c00c035
FileNode
or DirectoryNode
, but not SymlinkNode
.
This could be very useful for Swarming's isolate support, since the .isolate
file could contain symlinks. It could even contain invalid symlinks for testing purpose (@atetubou correct me if I'm wrong). As an example, I wrote this small case:
{
desc: "File invalid symlink included",
input: []*inputPath{
{path: "foo", isSymlink: true, symlinkTarget: "fooDir/foo"}, // invalid
},
spec: &command.InputSpec{
Inputs: []string{"foo"},
},
rootDir: &repb.Directory{
Symlinks: []*repb.SymlinkNode{{Name: "foo", Target: "fooDir/foo"}},
},
additionalBlobs: [][]byte{},
// ignore the cache stats
},
But I got an empty repb.Directory{}
instead.
From what I can see, the symlink support needs to be added at several places:
filemeta.Metadata
needs a symlink flag: remote-apis-sdks/go/pkg/filemetadata/filemetadata.go
Lines 12 to 16 in c00c035
tree
implementation needs to distinguish symlinks from actual files: remote-apis-sdks/go/pkg/tree/tree.go
Lines 24 to 33 in c00c035
VirtualInput
might also need this flag: remote-apis-sdks/go/pkg/command/command.go
Lines 56 to 70 in c00c035
My biggest concern is around the filemeta
cache. Right now it just treats invalid symlinks as an error:
remote-apis-sdks/go/pkg/filemetadata/filemetadata.go
Lines 46 to 48 in c00c035
Metadata
, and let the upper level module decide how to deal with it.
Happy to contribute this feature, but I'd like to hear your opinions first. Thanks!
Currently uploads in the SDK client work by first reading everything into memory, and then uploading individual byte slices (in parallel, and with batching). The way it needs to work is to:
We (grpc-go team) noticed that the grpc.WithBalancerName
API is being used in your repo. This was deprecated about 4 years ago and was deleted recently in grpc/grpc-go#5232.
We would recommend that you replace grpc.WithBalancerName
with the following two dial options:
WithDefaultServiceConfig(fmt.Sprintf(
{"loadBalancingConfig": [{"%s":{}}]}, balancer.Name))
grpc.WithDisableServiceConfig
Thanks
Predicated on #32, I need to get rid of the current way of "load everything into a bunch of []byte in memory" and go from there.
When we retry a failed Write call, we should respect the committed_bytes returned by the server and not redo everything from scratch, saving time and bandwidth.
One of the tiny number of things https://github.com/bazelbuild/tools_remote can do and these tools can't is parse the RemoteExecutionLog. It would be nice to figure out a solution to that.
Annoyingly, that proto isn't part of github.com/bazelbuild/remote-apis so this is a more thorny puzzle than just implementing the (fairly simple) features.
Today the API for the SDK only allows clients to specify Inputs
(files available to the client) and VirtualInputs
(binary available to clients) (from command.go)
But I would like to extend this to support providing digests that are available in the remote server.
I have a fork with this code that I am able to use for this. But I wanted to ask folks here if it would make sense to add to the SDK?
This would be useful when trying to do "build-without-bytes" for example.
Either regular inputs should win, or we should error out.
An advanced configurable fake for in-process RE worker is taking shape with PRs #15 and #19 .
But here, I mean an actual grpc server like https://github.com/bazelbuild/bazel/tree/master/src/tools/remote/src/main/java/com/google/devtools/build/remote/worker
This should be configurable with a parameter, but the default behavior should be to avoid querying GetMissingBlobs for previously uploaded blobs, drastically reducing its time. However, GetMissingBlobs doesn't take all that long in itself, so this is lower priority. It is, however, very easy to implement.
When I call ReadBlobStreamed from multiple go routines, I got the following panic.
fatal error: concurrent map read and map write
goroutine 23 [running]:
runtime.throw(0xcdfe2b, 0x21)
/usr/lib/google-golang/src/runtime/panic.go:1122 +0x72 fp=0xc0000d7cf0 sp=0xc0000d7cc0 pc=0x582092
runtime.mapaccess1(0xbdf7e0, 0xc000380480, 0xc0000d7db0, 0xc00f5ecbf8)
/usr/lib/google-golang/src/runtime/map.go:411 +0x24f fp=0xc0000d7d30 sp=0xc0000d7cf0 pc=0x55b48f
github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer.(*gcpBalancer).regeneratePicker(0xc000392100)
/usr/local/google/home/tikuta/go/src/github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer/gcp_balancer.go:267 +0x147 fp=0xc0000d7e30 sp=0xc0000d7d30 pc=0xb24b07
github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer.(*gcpBalancer).UpdateSubConnState(0xc000392100, 0xd93520, 0xc0207db620, 0x2, 0x0, 0x0)
/usr/local/google/home/tikuta/go/src/github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer/gcp_balancer.go:308 +0x374 fp=0xc0000d7ea8 sp=0xc0000d7e30 pc=0xb25074
google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc0000ee6c0)
/usr/local/google/home/tikuta/go/src/google.golang.org/grpc/balancer_conn_wrappers.go:77 +0x357 fp=0xc0000d7fd8 sp=0xc0000d7ea8 pc=0xa66fb7
runtime.goexit()
/usr/lib/google-golang/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0000d7fe0 sp=0xc0000d7fd8 pc=0x5b7a01
created by google.golang.org/grpc.newCCBalancerWrapper
/usr/local/google/home/tikuta/go/src/google.golang.org/grpc/balancer_conn_wrappers.go:60 +0x172
gb.scRefs needs to be guarded by mutex?
Hard-coding the port number in test code is flaky.
go.mod
and WORKSPACE
are not synced on dependencies. Dependabot usually posts PR to update a dependency, e.g. #512 which leaves WORKSPACE
behind.
Performance improvement similar to #38, but much more important.
The error is:
ERROR: Source forest creation failed: /usr/local/google/home/olaola/.cache/bazel/_bazel_olaola/7e5a745c7956785bd79aca70eb9e660b/execroot/bazel_remote_apis_sdks/external (File exists)
and it is resistant to bazel clean --expunge. We may have to update our dependencies, or file a bug with Bazel.
Suspect: the logic used to batch uploads.
Apparently, 20 ms is not enough for an error margin. I'll just make it a second!
Batch uploads are already there; this feature will greatly enhance performance for workers/bots using this library to download action inputs. It matters a lot less for build clients, because actions usually do not produce lots of outputs.
Similar to #43, but the implementation is not super-trivial: in order to do this, we need a map of digest->channel (in Go, or Future in Java), so that a function that needs to write a bunch of blobs knows which futures it should wait on. This way, if WriteBlobs(b1, b2) is called followed by WriteBlobs(b1, b3) , both functions will wait on the same future that will write the b1 blob.
It will, however, result in significant upload cost savings (I believe) on an uncached build for clients using a hermetic toolchain.
Unfortunately glog is pretty unfriendly to use in a library; when we call functions in this package we get messages printed to the console (often errors like "Logging before flag.Parse"). There doesn't seem to be any sensible way to control it (e.g. set its verbosity or configure where it outputs to) and its output is trampling over things we're trying to print at the same time.
A friendlier alternative might be google.golang.org/grpc/grpclog; it allows registering a custom logger which lets users integrate it gracefully with their chosen logging framework. Since the remote API is heavily based on grpc it's not introducing any extra dependency, although I don't know if grpc intends that to be used by anyone else or not.
I'm happy to make the code changes if we agree on what the solution should be.
The item that came out of the API meeting that George raised -- server losing the operation or current connection should be a retriable error.
These APIs are labeled as experimental, and they will be changed in the near-ish future. Details are in grpc/grpc-go#6472. Note that any package that uses gRPC's experimental APIs should itself be considered experimental, as a change to them would break compilation for the entire package that uses them. So we do NOT recommend the use of these features in any other libraries or applications that aren't tolerant to breakages like this.
https://github.com/googleapis/google-cloud-go may have another solution to the same type of problem, which is that it creates multiple channels and load balances between them. See https://github.com/googleapis/google-api-go-client/blob/caea95689f82049822552cd649765335123831e0/transport/grpc/pool.go#L30 for details.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.