bazelbuild / remote-apis-sdks Goto Github PK

View Code? Open in Web Editor NEW

45.0 45.0 48.0 1.43 MB

This repository contains client libraries for the Remote Execution API https://github.com/bazelbuild/remote-apis

License: Apache License 2.0

Go 92.56% Shell 0.22% Starlark 7.22%

remote-apis-sdks's People

Contributors

Stargazers

Watchers

remote-apis-sdks's Issues

gRPC Balancer uses experimental/deprecated (and soon to be deleted) gRPC APIs

https://github.com/bazelbuild/remote-apis-sdks/tree/master/go/pkg/balancer relies upon the APIs to be deleted, as announced here: grpc/grpc-go#3180

If this is essential functionality, it should be migrated ASAP. Also note that the replacement API is also experimental, so if outages caused by API changes are not acceptable, another approach should be found. Note that any transitive package dependencies should also be considered experimental.

ByteStream `resource_name` is incorrect if the instance name is empty

remote_execution.proto specifies for the ByteStream resource_name that

If it [instance_name] is the empty path, the leading slash is omitted

The CAS download and upload code currently doesn't omit the slash, causing incompatibility with buildbox-casd and maybe other servers.

go/pkg/cas/upload.go:           req.ResourceName = fmt.Sprintf("%s/uploads/%s/blobs/%s/%d", u.InstanceName, uuid.New(), digest.Hash, digest.SizeBytes)
go/pkg/casng/uploader.go:       return fmt.Sprintf("%s/uploads/%s/blobs/%s/%d", instanceName, uuid.New(), hash, size)
go/pkg/client/cas_download.go:  return fmt.Sprintf("%s/blobs/%s/%d", c.InstanceName, hash, sizeBytes)
go/pkg/client/cas_upload.go:    return fmt.Sprintf("%s/uploads/%s/blobs/%s/%d", c.InstanceName, uuid.New(), hash, sizeBytes)

Support resumable uploads

When we retry a failed Write call, we should respect the committed_bytes returned by the server and not redo everything from scratch, saving time and bandwidth.

Add support for hosting CAS at a separate endpoint

Currently rexec assumes that CAS and Execution Service reside at the same endpoint (service). You cannot configure them independently. Access to the separate gRPC services shares the same gPRC connection.

I have started adding support for connecting to a CAS at a separate endpoint. We can use this for tracking (let me know if this is a bad idea, or something already being worked on!).

Support node properties in the remote tool

Problem

Node properties are a (relatively recent) part of the RE-API which is relied upon by Bazel's persistent remote workers feature: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/merkletree/MerkleTree.java#L398
Bazel uses this to "mark" some files as tools that should persist on the remote worker. This is used by RE implementations (definitely by EngFlow, unsure about Google's RBE).
However, https://github.com/bazelbuild/remote-apis-sdks doesn't support node properties at the moment, which, for example, makes it impossible to use the remotetool to debug actions that use remote persistent workers.

Suggested fix

Introduce optional node properties specification throughout the rexec client layer, starting with adding it to the command proto:

message InputSpec {
  ...
  map<string,NodeProperties> input_node_properties = 7;
}

// Either copied from [RE-API](https://github.com/bazelbuild/remote-apis/blob/main/build/bazel/remote/execution/v2/remote_execution.proto#L852) or introduce a proto dependency on it
message NodeProperties {
...
}

and then reading these in GetDirectoryTree, passing them through to rexec here, here and here.

Please let me know if you're okay with this plan, and I'll try to send a PR shortly.

Thank you!

Virtual inputs are not serialized into command proto

Enable gRPC compression

Query the cache server Capabilities to determine max batch size

When trying to use rexec against BuildGrid, I got an error trying to batch upload/download blobs. Specifically because the blobs in the message was larger than what was advertised in the configured max batch size. It looks like the max batch size (MaxBatchSz) is hardcoded to a value just under 4MB, whereas BuildGrid advertised 2MB as part of it's CacheCapabilities:

remote-apis-sdks/go/client/cas.go

Line 139 in 3e40a08

MaxBatchSz = 4*1024*1024 - 1024

Ideally, the cas client would check the CacheCapabilities endpoint for this value, and if set use that instead of it's default. It looks like no capabilities logic is currently in remote-apis-sdk, so maybe this would be a good starting point?

implement upload/download cancellation

When the uploads/downloads are unified, we don't really support cancellations -- the context may be canceled and the caller immediately returns, but the upload/download still happens in the background, consuming resources.
Ideally, for large (non-batched) reads and writes, if ALL clients interested in the digest are canceled, then we should terminate the RPC.

test failing at master (Mac)

TestDownloadActionOutputsConcurrencyUsingBatch consistently fails at master on my Mac:

Executing tests from //go/pkg/client:client_test
-----------------------------------------------------------------------------
--- FAIL: TestDownloadActionOutputsConcurrency (0.00s)
    --- FAIL: TestDownloadActionOutputsConcurrency/TestDownloadActionOutputsConcurrencyUsingBatch:true,UnifiedDownloads:true (20.66s)
        cas_test.go:1633: 75 requests were made to BatchReadBlobs, wanted <= 50
FAIL

Since the BuildKite build is green, I presume it's a Mac issue.

Set up go fmt, go vet and lint to run as part of presubmit

Symlinks to inputs are incorrectly rejected

If you have an action like:

work/
├── in
│   └── input.txt
└── out
    └── link -> ../input.txt

on calling ComputeOutputsToUpload on out you get an error like work/in/input.txt is not under work/out. This is allowed by the API (it says that output symlinks "may be links to other output files, or input files, ...").

I've had a bit of a look at tree.go but so far haven't managed to fix it without making things worse. I'm pretty sure this is because of passing absPath instead of execRoot to loadFiles, but simply altering that and passing path instead of the empty string prefixes the outputs in an unexpected way.

implement a unified downloader

Downloading from a single background thread would allow to batch downloading files from multiple actions, saving bandwidth and time.

build fails with grpc-go v1.30.0

# github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer
../go/pkg/mod/github.com/bazelbuild/[email protected]/go/pkg/balancer/gcp_balancer.go:35:9: cannot use &gcpBalancer literal (type *gcpBalancer) as type balancer.Balancer in return argument:
	*gcpBalancer does not implement balancer.Balancer (missing ResolverError method)

cache_test is order dependent

It modifies a singleton instance of a SingleFlightCache, and therefore only passes if the test cases run in a particular order.

Handle empty directory structure

On both inputs and outputs, currently only directories that have a file somewhere underneath will be uploaded/downloaded.

Implement the cache miss retry loop

This refers to the case where Execute returns a NOT_FOUND because of a cache miss. This is not supposed to happen often (in fact, ongoing RBE work is under way to make sure that does not happen at all), but it can happen with other backends. In this case, we need to re-upload missing inputs and retry the whole flow.

Report execution metadata for performance analysis

The things we need to add are, to begin with:

Start and end times for various interesting events, such as computing the Merkle tree, checking action cache, remote and local execution stages, etc.
Number of inputs, outputs, total bytes of inputs and outputs
CAS cache hits for inputs

Whether the action was a Action Cache hit is already obvious from the ResultStatus.

Lazy file reads in uploader

Only read from the file when the chunk needs to be uploaded. Read by chunks to upload, too -- it's okay to have a bigger in-memory buffer, if that proves more efficient than repeated IO, but what's not okay is reading the entire file contents into memory before the upload even starts (unless it's for a small file). Current situation is even worse: ALL the inputs are read into memory first thing, even before we start querying GetMissingBlobs.

restructure the Go SDK repo directories according to Golang standard

The structure will https://github.com/golang-standards/project-layout. I'm adding new packages based on this, but the old ones need refactoring.

pass file executable modes correctly to RBE

Currently we just package all files as executable.

Support paths including .. in inputs

Currently, we translate an input of "a/b/../foo" to a tree containing only a/foo. Ideally, it would be nice to translate it to a tree which also includes the a/b directory, even though it is not "really" used, because the bot would expect that.
Note: this would require #42 plus empty directories support on the server side as well.

implement a unified uploader

Uploading from a single background thread would allow to batch uploads from multiple actions, saving bandwidth and time.

Standalone tool for CAS Interaction

The rexec tool is a great one stop shop for submitting and executing jobs synchronously.

We have a use case where we would like to get results directly from the ActionCache/CAS given only an ActionId (without the original input root, so we cannot submit the job again).

Would you be open to an additional command line tool dedicated to CAS interaction? Specifically, it would support upload/download of blobs and trees. Additionally, it would be able to interact with the ActionCache and check/download results.

We would be happy to contribute it!

Output paths should be relative to the working directory

The API specifies that action output paths should be relative to the working directory and not exec root. We have been doing it wrong since the beginning in RBE; now RBE supports both paths (to be backwards-compatible). We should modify the SDK to switch to the workdir relative paths wherever applicable.
It is possible that this will require no SDK changes per se, but rather changes to clients upstream of the SDK (such as the re-client).
This is an external tracking issue for this work as it concerns clients using the SDK.

flaky test of batch updates

TestBatchUpdateBlobsIndividualRequestRetries flaked out with:

--- FAIL: TestBatchUpdateBlobsIndividualRequestRetries (0.44s)
    batch_retries_test.go:189: client.BatchWriteBlobs(ctx, blobs) diff on request at index 0 (want -> got):
          (*remoteexecution.BatchUpdateBlobsRequest)(
        -       s`instance_name:"instance" requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000a" size_bytes:1 > data:"\001" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000b" size_bytes:1 > data:"\002" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000c" size_bytes:1 > data:"\003" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000d" size_bytes:1 > data:"\004" > `,
        +       s`instance_name:"instance" requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000d" size_bytes:1 > data:"\004" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000a" size_bytes:1 > data:"\001" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000b" size_bytes:1 > data:"\002" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000c" size_bytes:1 > data:"\003" > `,
          )
FAIL

Support `SymlinkNode` in `ComputeMerkleTree()`?

IIUC, currently the file tree produced by ComputeMerkeTree()

remote-apis-sdks/go/pkg/tree/tree.go

Line 110 in c00c035

 func ComputeMerkleTree(execRoot string, is *command.InputSpec, chunkSize int, cache filemetadata.Cache) (root digest.Digest, inputs []*chunker.Chunker, stats *Stats, err error) { 

could only contain FileNode or DirectoryNode, but not SymlinkNode.

This could be very useful for Swarming's isolate support, since the .isolate file could contain symlinks. It could even contain invalid symlinks for testing purpose (@atetubou correct me if I'm wrong). As an example, I wrote this small case:

		{
			desc: "File invalid symlink included",
			input: []*inputPath{
				{path: "foo", isSymlink: true, symlinkTarget: "fooDir/foo"},  // invalid
			},
			spec: &command.InputSpec{
				Inputs: []string{"foo"},
			},
			rootDir: &repb.Directory{
				Symlinks: []*repb.SymlinkNode{{Name: "foo", Target: "fooDir/foo"}},
			},
			additionalBlobs: [][]byte{},
                         // ignore the cache stats
		},

But I got an empty repb.Directory{} instead.

From what I can see, the symlink support needs to be added at several places:

filemeta.Metadata needs a symlink flag:

remote-apis-sdks/go/pkg/filemetadata/filemetadata.go

Lines 12 to 16 in c00c035

type Metadata struct {

Digest digest.Digest

IsExecutable bool

Err error

}

tree implementation needs to distinguish symlinks from actual files:

remote-apis-sdks/go/pkg/tree/tree.go

Lines 24 to 33 in c00c035

 type treeNode struct { 

 Files map[string]*fileNode 

 Dirs map[string]*treeNode 

 } 

 type fileNode struct { 

 Chunker *chunker.Chunker 

 IsExecutable bool 

 EmptyDirectoryMarker bool 

 }

VirtualInput might also need this flag:

remote-apis-sdks/go/pkg/command/command.go

Lines 56 to 70 in c00c035

 type VirtualInput struct { 

 // The path for the input file to be staged at, relative to the ExecRoot. 

 Path string 

 // The byte contents of the file to be staged. 

 Contents []byte 

 // Whether the file should be staged as executable. 

 IsExecutable bool 

 // Whether the file is actually an empty directory. This is used to provide 

 // empty directory inputs. When this is set, Contents and IsExecutable are 

 // ignored. 

 IsEmptyDirectory bool 

 }

My biggest concern is around the filemeta cache. Right now it just treats invalid symlinks as an error:

remote-apis-sdks/go/pkg/filemetadata/filemetadata.go

Lines 46 to 48 in c00c035

 if s, err := isSymlink(filename); err == nil && s { 

 fe.IsInvalidSymlink = true 

 }

. I wonder if we could instead add this invalidity info to Metadata, and let the upper level module decide how to deal with it.

Happy to contribute this feature, but I'd like to hear your opinions first. Thanks!

Centralize and limit upload threads

Currently uploads in the SDK client work by first reading everything into memory, and then uploading individual byte slices (in parallel, and with batching). The way it needs to work is to:

Not read the file into memory until/unless it actually needs to be uploaded. (#32)
Use a central limited thread pool for uploads, so that many concurrent actions cannot cause an OOM (this one).

Remove deleted api `grpc.WithBalancerName`

We (grpc-go team) noticed that the grpc.WithBalancerName API is being used in your repo. This was deprecated about 4 years ago and was deleted recently in grpc/grpc-go#5232.

We would recommend that you replace grpc.WithBalancerName with the following two dial options:

Specify the balancer name of your choice using a default service config:
WithDefaultServiceConfig(fmt.Sprintf({"loadBalancingConfig": [{"%s":{}}]}, balancer.Name))
Disable the service config specified by your resolver:
grpc.WithDisableServiceConfig

Thanks

Use a file digest cache when building input tree

Predicated on #32, I need to get rid of the current way of "load everything into a bunch of []byte in memory" and go from there.

implement resumable uploads

When we retry a failed Write call, we should respect the committed_bytes returned by the server and not redo everything from scratch, saving time and bandwidth.

support parsing RemoteExecutionLog

One of the tiny number of things https://github.com/bazelbuild/tools_remote can do and these tools can't is parse the RemoteExecutionLog. It would be nice to figure out a solution to that.

Annoyingly, that proto isn't part of github.com/bazelbuild/remote-apis so this is a more thorny puzzle than just implementing the (fairly simple) features.

Allow clients to provide InputDigests to avoid unnecessary transfer

Today the API for the SDK only allows clients to specify Inputs (files available to the client) and VirtualInputs(binary available to clients) (from command.go)
But I would like to extend this to support providing digests that are available in the remote server.

I have a fork with this code that I am able to use for this. But I wanted to ask folks here if it would make sense to add to the SDK?

This would be useful when trying to do "build-without-bytes" for example.

Handle name collisions between regular and virtual inputs

Either regular inputs should win, or we should error out.

implement an LRE (local remote worker) for integration tests

An advanced configurable fake for in-process RE worker is taking shape with PRs #15 and #19 .
But here, I mean an actual grpc server like https://github.com/bazelbuild/bazel/tree/master/src/tools/remote/src/main/java/com/google/devtools/build/remote/worker

Don't upload previously uploaded blobs

This should be configurable with a parameter, but the default behavior should be to avoid querying GetMissingBlobs for previously uploaded blobs, drastically reducing its time. However, GetMissingBlobs doesn't take all that long in itself, so this is lower priority. It is, however, very easy to implement.

Tree construction should be able to handle non-normalized paths

"fatal error: concurrent map read and map write" from (*gcpBalancer).regeneratePicker

When I call ReadBlobStreamed from multiple go routines, I got the following panic.

fatal error: concurrent map read and map write

goroutine 23 [running]:
runtime.throw(0xcdfe2b, 0x21)
        /usr/lib/google-golang/src/runtime/panic.go:1122 +0x72 fp=0xc0000d7cf0 sp=0xc0000d7cc0 pc=0x582092
runtime.mapaccess1(0xbdf7e0, 0xc000380480, 0xc0000d7db0, 0xc00f5ecbf8)
        /usr/lib/google-golang/src/runtime/map.go:411 +0x24f fp=0xc0000d7d30 sp=0xc0000d7cf0 pc=0x55b48f
github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer.(*gcpBalancer).regeneratePicker(0xc000392100)
        /usr/local/google/home/tikuta/go/src/github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer/gcp_balancer.go:267 +0x147 fp=0xc0000d7e30 sp=0xc0000d7d30 pc=0xb24b07
github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer.(*gcpBalancer).UpdateSubConnState(0xc000392100, 0xd93520, 0xc0207db620, 0x2, 0x0, 0x0)
        /usr/local/google/home/tikuta/go/src/github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer/gcp_balancer.go:308 +0x374 fp=0xc0000d7ea8 sp=0xc0000d7e30 pc=0xb25074
google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc0000ee6c0)
        /usr/local/google/home/tikuta/go/src/google.golang.org/grpc/balancer_conn_wrappers.go:77 +0x357 fp=0xc0000d7fd8 sp=0xc0000d7ea8 pc=0xa66fb7
runtime.goexit()
        /usr/lib/google-golang/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0000d7fe0 sp=0xc0000d7fd8 pc=0x5b7a01
created by google.golang.org/grpc.newCCBalancerWrapper
        /usr/local/google/home/tikuta/go/src/google.golang.org/grpc/balancer_conn_wrappers.go:60 +0x172

gb.scRefs needs to be guarded by mutex?

Add library for picking an unused port

Hard-coding the port number in test code is flaky.

Bazel and Go builds have inconsistent dependencies

go.mod and WORKSPACE are not synced on dependencies. Dependabot usually posts PR to update a dependency, e.g. #512 which leaves WORKSPACE behind.

Implement parallel downloads

Performance improvement similar to #38, but much more important.

bazel run //:gazelle no longer works with Bazel 0.28.0

The error is:

ERROR: Source forest creation failed: /usr/local/google/home/olaola/.cache/bazel/_bazel_olaola/7e5a745c7956785bd79aca70eb9e660b/execroot/bazel_remote_apis_sdks/external (File exists)

and it is resistant to bazel clean --expunge. We may have to update our dependencies, or file a bug with Bazel.

SDK sometimes goes over the max frame size

Suspect: the logic used to batch uploads.

Test of file metadata mtime is flaky

Apparently, 20 ms is not enough for an error margin. I'll just make it a second!

Provide easy way to update command result

Implement batch downloads

Batch uploads are already there; this feature will greatly enhance performance for workers/bots using this library to download action inputs. It matters a lot less for build clients, because actions usually do not produce lots of outputs.

Don't upload the same input concurrently

Similar to #43, but the implementation is not super-trivial: in order to do this, we need a map of digest->channel (in Go, or Future in Java), so that a function that needs to write a bunch of blobs knows which futures it should wait on. This way, if WriteBlobs(b1, b2) is called followed by WriteBlobs(b1, b3) , both functions will wait on the same future that will write the b1 blob.
It will, however, result in significant upload cost savings (I believe) on an uncached build for clients using a hermetic toolchain.

Please consider a logging library other than glog

Unfortunately glog is pretty unfriendly to use in a library; when we call functions in this package we get messages printed to the console (often errors like "Logging before flag.Parse"). There doesn't seem to be any sensible way to control it (e.g. set its verbosity or configure where it outputs to) and its output is trampling over things we're trying to print at the same time.

A friendlier alternative might be google.golang.org/grpc/grpclog; it allows registering a custom logger which lets users integrate it gracefully with their chosen logging framework. Since the remote API is heavily based on grpc it's not introducing any extra dependency, although I don't know if grpc intends that to be used by anyone else or not.

I'm happy to make the code changes if we agree on what the solution should be.

retry, don't error out when Done operation is not received

The item that came out of the API meeting that George raised -- server losing the operation or current connection should be a retriable error.

go/pkg/balancer uses experimental gRPC APIs

These APIs are labeled as experimental, and they will be changed in the near-ish future. Details are in grpc/grpc-go#6472. Note that any package that uses gRPC's experimental APIs should itself be considered experimental, as a change to them would break compilation for the entire package that uses them. So we do NOT recommend the use of these features in any other libraries or applications that aren't tolerant to breakages like this.

https://github.com/googleapis/google-cloud-go may have another solution to the same type of problem, which is that it creates multiple channels and load balances between them. See https://github.com/googleapis/google-api-go-client/blob/caea95689f82049822552cd649765335123831e0/transport/grpc/pool.go#L30 for details.

	type Metadata struct {
	Digest digest.Digest
	IsExecutable bool
	Err error
	}

	type treeNode struct {
	Files map[string]*fileNode
	Dirs map[string]*treeNode
	}

	type fileNode struct {
	Chunker *chunker.Chunker
	IsExecutable bool
	EmptyDirectoryMarker bool
	}

	type VirtualInput struct {
	// The path for the input file to be staged at, relative to the ExecRoot.
	Path string

	// The byte contents of the file to be staged.
	Contents []byte

	// Whether the file should be staged as executable.
	IsExecutable bool

	// Whether the file is actually an empty directory. This is used to provide
	// empty directory inputs. When this is set, Contents and IsExecutable are
	// ignored.
	IsEmptyDirectory bool
	}

	if s, err := isSymlink(filename); err == nil && s {
	fe.IsInvalidSymlink = true
	}

bazelbuild / remote-apis-sdks Goto Github PK

remote-apis-sdks's People

Contributors

Stargazers

Watchers

Forkers

remote-apis-sdks's Issues

Problem

Suggested fix

Recommend Projects

Recommend Topics

Recommend Org