Giter VIP home page Giter VIP logo

remote-apis-sdks's People

Contributors

andusy avatar atetubou avatar banksean avatar bansalvinayak avatar bentekkie avatar bergsieker avatar bozydarsz avatar bwkimmel avatar danny-kuminov avatar danw avatar dependabot[bot] avatar gkousik avatar ifoox avatar jasharpe avatar jwata avatar k-ye avatar mostynb avatar mrahs avatar nodirg avatar ola-rozenfeld avatar orn688 avatar peterebden avatar philwo avatar ramymedhat avatar rjobredeaux3 avatar rubensf avatar yannic avatar yoshisatoyanagisawa avatar yukasadaoka avatar ywmei-brt1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

remote-apis-sdks's Issues

gRPC Balancer uses experimental/deprecated (and soon to be deleted) gRPC APIs

https://github.com/bazelbuild/remote-apis-sdks/tree/master/go/pkg/balancer relies upon the APIs to be deleted, as announced here: grpc/grpc-go#3180

If this is essential functionality, it should be migrated ASAP. Also note that the replacement API is also experimental, so if outages caused by API changes are not acceptable, another approach should be found. Note that any transitive package dependencies should also be considered experimental.

ByteStream `resource_name` is incorrect if the instance name is empty

remote_execution.proto specifies for the ByteStream resource_name that

If it [instance_name] is the empty path, the leading slash is omitted

The CAS download and upload code currently doesn't omit the slash, causing incompatibility with buildbox-casd and maybe other servers.

go/pkg/cas/upload.go:           req.ResourceName = fmt.Sprintf("%s/uploads/%s/blobs/%s/%d", u.InstanceName, uuid.New(), digest.Hash, digest.SizeBytes)
go/pkg/casng/uploader.go:       return fmt.Sprintf("%s/uploads/%s/blobs/%s/%d", instanceName, uuid.New(), hash, size)
go/pkg/client/cas_download.go:  return fmt.Sprintf("%s/blobs/%s/%d", c.InstanceName, hash, sizeBytes)
go/pkg/client/cas_upload.go:    return fmt.Sprintf("%s/uploads/%s/blobs/%s/%d", c.InstanceName, uuid.New(), hash, sizeBytes)

Support resumable uploads

When we retry a failed Write call, we should respect the committed_bytes returned by the server and not redo everything from scratch, saving time and bandwidth.

Add support for hosting CAS at a separate endpoint

Currently rexec assumes that CAS and Execution Service reside at the same endpoint (service). You cannot configure them independently. Access to the separate gRPC services shares the same gPRC connection.

I have started adding support for connecting to a CAS at a separate endpoint. We can use this for tracking (let me know if this is a bad idea, or something already being worked on!).

Support node properties in the remote tool

Problem

Node properties are a (relatively recent) part of the RE-API which is relied upon by Bazel's persistent remote workers feature: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/merkletree/MerkleTree.java#L398
Bazel uses this to "mark" some files as tools that should persist on the remote worker. This is used by RE implementations (definitely by EngFlow, unsure about Google's RBE).
However, https://github.com/bazelbuild/remote-apis-sdks doesn't support node properties at the moment, which, for example, makes it impossible to use the remotetool to debug actions that use remote persistent workers.

Suggested fix

Introduce optional node properties specification throughout the rexec client layer, starting with adding it to the command proto:

message InputSpec {
  ...
  map<string,NodeProperties> input_node_properties = 7;
}

// Either copied from [RE-API](https://github.com/bazelbuild/remote-apis/blob/main/build/bazel/remote/execution/v2/remote_execution.proto#L852) or introduce a proto dependency on it
message NodeProperties {
...
}

and then reading these in GetDirectoryTree, passing them through to rexec here, here and here.

Please let me know if you're okay with this plan, and I'll try to send a PR shortly.

Thank you!

Query the cache server Capabilities to determine max batch size

When trying to use rexec against BuildGrid, I got an error trying to batch upload/download blobs. Specifically because the blobs in the message was larger than what was advertised in the configured max batch size. It looks like the max batch size (MaxBatchSz) is hardcoded to a value just under 4MB, whereas BuildGrid advertised 2MB as part of it's CacheCapabilities:

MaxBatchSz = 4*1024*1024 - 1024

Ideally, the cas client would check the CacheCapabilities endpoint for this value, and if set use that instead of it's default. It looks like no capabilities logic is currently in remote-apis-sdk, so maybe this would be a good starting point?

implement upload/download cancellation

When the uploads/downloads are unified, we don't really support cancellations -- the context may be canceled and the caller immediately returns, but the upload/download still happens in the background, consuming resources.
Ideally, for large (non-batched) reads and writes, if ALL clients interested in the digest are canceled, then we should terminate the RPC.

test failing at master (Mac)

TestDownloadActionOutputsConcurrencyUsingBatch consistently fails at master on my Mac:

Executing tests from //go/pkg/client:client_test
-----------------------------------------------------------------------------
--- FAIL: TestDownloadActionOutputsConcurrency (0.00s)
    --- FAIL: TestDownloadActionOutputsConcurrency/TestDownloadActionOutputsConcurrencyUsingBatch:true,UnifiedDownloads:true (20.66s)
        cas_test.go:1633: 75 requests were made to BatchReadBlobs, wanted <= 50
FAIL

Since the BuildKite build is green, I presume it's a Mac issue.

Symlinks to inputs are incorrectly rejected

If you have an action like:

work/
├── in
│   └── input.txt
└── out
    └── link -> ../input.txt

on calling ComputeOutputsToUpload on out you get an error like work/in/input.txt is not under work/out. This is allowed by the API (it says that output symlinks "may be links to other output files, or input files, ...").

I've had a bit of a look at tree.go but so far haven't managed to fix it without making things worse. I'm pretty sure this is because of passing absPath instead of execRoot to loadFiles, but simply altering that and passing path instead of the empty string prefixes the outputs in an unexpected way.

implement a unified downloader

Downloading from a single background thread would allow to batch downloading files from multiple actions, saving bandwidth and time.

build fails with grpc-go v1.30.0

# github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer
../go/pkg/mod/github.com/bazelbuild/[email protected]/go/pkg/balancer/gcp_balancer.go:35:9: cannot use &gcpBalancer literal (type *gcpBalancer) as type balancer.Balancer in return argument:
	*gcpBalancer does not implement balancer.Balancer (missing ResolverError method)

cache_test is order dependent

It modifies a singleton instance of a SingleFlightCache, and therefore only passes if the test cases run in a particular order.

Implement the cache miss retry loop

This refers to the case where Execute returns a NOT_FOUND because of a cache miss. This is not supposed to happen often (in fact, ongoing RBE work is under way to make sure that does not happen at all), but it can happen with other backends. In this case, we need to re-upload missing inputs and retry the whole flow.

Report execution metadata for performance analysis

The things we need to add are, to begin with:

  • Start and end times for various interesting events, such as computing the Merkle tree, checking action cache, remote and local execution stages, etc.
  • Number of inputs, outputs, total bytes of inputs and outputs
  • CAS cache hits for inputs

Whether the action was a Action Cache hit is already obvious from the ResultStatus.

Lazy file reads in uploader

Only read from the file when the chunk needs to be uploaded. Read by chunks to upload, too -- it's okay to have a bigger in-memory buffer, if that proves more efficient than repeated IO, but what's not okay is reading the entire file contents into memory before the upload even starts (unless it's for a small file). Current situation is even worse: ALL the inputs are read into memory first thing, even before we start querying GetMissingBlobs.

Support paths including .. in inputs

Currently, we translate an input of "a/b/../foo" to a tree containing only a/foo. Ideally, it would be nice to translate it to a tree which also includes the a/b directory, even though it is not "really" used, because the bot would expect that.
Note: this would require #42 plus empty directories support on the server side as well.

implement a unified uploader

Uploading from a single background thread would allow to batch uploads from multiple actions, saving bandwidth and time.

Standalone tool for CAS Interaction

The rexec tool is a great one stop shop for submitting and executing jobs synchronously.

We have a use case where we would like to get results directly from the ActionCache/CAS given only an ActionId (without the original input root, so we cannot submit the job again).

Would you be open to an additional command line tool dedicated to CAS interaction? Specifically, it would support upload/download of blobs and trees. Additionally, it would be able to interact with the ActionCache and check/download results.

We would be happy to contribute it!

Output paths should be relative to the working directory

The API specifies that action output paths should be relative to the working directory and not exec root. We have been doing it wrong since the beginning in RBE; now RBE supports both paths (to be backwards-compatible). We should modify the SDK to switch to the workdir relative paths wherever applicable.
It is possible that this will require no SDK changes per se, but rather changes to clients upstream of the SDK (such as the re-client).
This is an external tracking issue for this work as it concerns clients using the SDK.

flaky test of batch updates

TestBatchUpdateBlobsIndividualRequestRetries flaked out with:

--- FAIL: TestBatchUpdateBlobsIndividualRequestRetries (0.44s)
    batch_retries_test.go:189: client.BatchWriteBlobs(ctx, blobs) diff on request at index 0 (want -> got):
          (*remoteexecution.BatchUpdateBlobsRequest)(
        -       s`instance_name:"instance" requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000a" size_bytes:1 > data:"\001" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000b" size_bytes:1 > data:"\002" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000c" size_bytes:1 > data:"\003" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000d" size_bytes:1 > data:"\004" > `,
        +       s`instance_name:"instance" requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000d" size_bytes:1 > data:"\004" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000a" size_bytes:1 > data:"\001" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000b" size_bytes:1 > data:"\002" > requests:<digest:<hash:"000000000000000000000000000000000000000000000000000000000000000c" size_bytes:1 > data:"\003" > `,
          )
FAIL

Support `SymlinkNode` in `ComputeMerkleTree()`?

IIUC, currently the file tree produced by ComputeMerkeTree()

func ComputeMerkleTree(execRoot string, is *command.InputSpec, chunkSize int, cache filemetadata.Cache) (root digest.Digest, inputs []*chunker.Chunker, stats *Stats, err error) {
could only contain FileNode or DirectoryNode, but not SymlinkNode.

This could be very useful for Swarming's isolate support, since the .isolate file could contain symlinks. It could even contain invalid symlinks for testing purpose (@atetubou correct me if I'm wrong). As an example, I wrote this small case:

		{
			desc: "File invalid symlink included",
			input: []*inputPath{
				{path: "foo", isSymlink: true, symlinkTarget: "fooDir/foo"},  // invalid
			},
			spec: &command.InputSpec{
				Inputs: []string{"foo"},
			},
			rootDir: &repb.Directory{
				Symlinks: []*repb.SymlinkNode{{Name: "foo", Target: "fooDir/foo"}},
			},
			additionalBlobs: [][]byte{},
                         // ignore the cache stats
		},

But I got an empty repb.Directory{} instead.

From what I can see, the symlink support needs to be added at several places:

  1. filemeta.Metadata needs a symlink flag:
    type Metadata struct {
    Digest digest.Digest
    IsExecutable bool
    Err error
    }
  2. tree implementation needs to distinguish symlinks from actual files:
    type treeNode struct {
    Files map[string]*fileNode
    Dirs map[string]*treeNode
    }
    type fileNode struct {
    Chunker *chunker.Chunker
    IsExecutable bool
    EmptyDirectoryMarker bool
    }
    .
  3. VirtualInput might also need this flag:
    type VirtualInput struct {
    // The path for the input file to be staged at, relative to the ExecRoot.
    Path string
    // The byte contents of the file to be staged.
    Contents []byte
    // Whether the file should be staged as executable.
    IsExecutable bool
    // Whether the file is actually an empty directory. This is used to provide
    // empty directory inputs. When this is set, Contents and IsExecutable are
    // ignored.
    IsEmptyDirectory bool
    }

My biggest concern is around the filemeta cache. Right now it just treats invalid symlinks as an error:

if s, err := isSymlink(filename); err == nil && s {
fe.IsInvalidSymlink = true
}
. I wonder if we could instead add this invalidity info to Metadata, and let the upper level module decide how to deal with it.

Happy to contribute this feature, but I'd like to hear your opinions first. Thanks!

Centralize and limit upload threads

Currently uploads in the SDK client work by first reading everything into memory, and then uploading individual byte slices (in parallel, and with batching). The way it needs to work is to:

  1. Not read the file into memory until/unless it actually needs to be uploaded. (#32)
  2. Use a central limited thread pool for uploads, so that many concurrent actions cannot cause an OOM (this one).

Remove deleted api `grpc.WithBalancerName`

We (grpc-go team) noticed that the grpc.WithBalancerName API is being used in your repo. This was deprecated about 4 years ago and was deleted recently in grpc/grpc-go#5232.

We would recommend that you replace grpc.WithBalancerName with the following two dial options:

  • Specify the balancer name of your choice using a default service config:
    WithDefaultServiceConfig(fmt.Sprintf({"loadBalancingConfig": [{"%s":{}}]}, balancer.Name))
  • Disable the service config specified by your resolver:
    grpc.WithDisableServiceConfig

Thanks

implement resumable uploads

When we retry a failed Write call, we should respect the committed_bytes returned by the server and not redo everything from scratch, saving time and bandwidth.

support parsing RemoteExecutionLog

One of the tiny number of things https://github.com/bazelbuild/tools_remote can do and these tools can't is parse the RemoteExecutionLog. It would be nice to figure out a solution to that.

Annoyingly, that proto isn't part of github.com/bazelbuild/remote-apis so this is a more thorny puzzle than just implementing the (fairly simple) features.

Allow clients to provide InputDigests to avoid unnecessary transfer

Today the API for the SDK only allows clients to specify Inputs (files available to the client) and VirtualInputs(binary available to clients) (from command.go)
But I would like to extend this to support providing digests that are available in the remote server.

I have a fork with this code that I am able to use for this. But I wanted to ask folks here if it would make sense to add to the SDK?

This would be useful when trying to do "build-without-bytes" for example.

Don't upload previously uploaded blobs

This should be configurable with a parameter, but the default behavior should be to avoid querying GetMissingBlobs for previously uploaded blobs, drastically reducing its time. However, GetMissingBlobs doesn't take all that long in itself, so this is lower priority. It is, however, very easy to implement.

"fatal error: concurrent map read and map write" from (*gcpBalancer).regeneratePicker

When I call ReadBlobStreamed from multiple go routines, I got the following panic.

fatal error: concurrent map read and map write

goroutine 23 [running]:
runtime.throw(0xcdfe2b, 0x21)
        /usr/lib/google-golang/src/runtime/panic.go:1122 +0x72 fp=0xc0000d7cf0 sp=0xc0000d7cc0 pc=0x582092
runtime.mapaccess1(0xbdf7e0, 0xc000380480, 0xc0000d7db0, 0xc00f5ecbf8)
        /usr/lib/google-golang/src/runtime/map.go:411 +0x24f fp=0xc0000d7d30 sp=0xc0000d7cf0 pc=0x55b48f
github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer.(*gcpBalancer).regeneratePicker(0xc000392100)
        /usr/local/google/home/tikuta/go/src/github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer/gcp_balancer.go:267 +0x147 fp=0xc0000d7e30 sp=0xc0000d7d30 pc=0xb24b07
github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer.(*gcpBalancer).UpdateSubConnState(0xc000392100, 0xd93520, 0xc0207db620, 0x2, 0x0, 0x0)
        /usr/local/google/home/tikuta/go/src/github.com/bazelbuild/remote-apis-sdks/go/pkg/balancer/gcp_balancer.go:308 +0x374 fp=0xc0000d7ea8 sp=0xc0000d7e30 pc=0xb25074
google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc0000ee6c0)
        /usr/local/google/home/tikuta/go/src/google.golang.org/grpc/balancer_conn_wrappers.go:77 +0x357 fp=0xc0000d7fd8 sp=0xc0000d7ea8 pc=0xa66fb7
runtime.goexit()
        /usr/lib/google-golang/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0000d7fe0 sp=0xc0000d7fd8 pc=0x5b7a01
created by google.golang.org/grpc.newCCBalancerWrapper
        /usr/local/google/home/tikuta/go/src/google.golang.org/grpc/balancer_conn_wrappers.go:60 +0x172

gb.scRefs needs to be guarded by mutex?

bazel run //:gazelle no longer works with Bazel 0.28.0

The error is:

ERROR: Source forest creation failed: /usr/local/google/home/olaola/.cache/bazel/_bazel_olaola/7e5a745c7956785bd79aca70eb9e660b/execroot/bazel_remote_apis_sdks/external (File exists)

and it is resistant to bazel clean --expunge. We may have to update our dependencies, or file a bug with Bazel.

Implement batch downloads

Batch uploads are already there; this feature will greatly enhance performance for workers/bots using this library to download action inputs. It matters a lot less for build clients, because actions usually do not produce lots of outputs.

Don't upload the same input concurrently

Similar to #43, but the implementation is not super-trivial: in order to do this, we need a map of digest->channel (in Go, or Future in Java), so that a function that needs to write a bunch of blobs knows which futures it should wait on. This way, if WriteBlobs(b1, b2) is called followed by WriteBlobs(b1, b3) , both functions will wait on the same future that will write the b1 blob.
It will, however, result in significant upload cost savings (I believe) on an uncached build for clients using a hermetic toolchain.

Please consider a logging library other than glog

Unfortunately glog is pretty unfriendly to use in a library; when we call functions in this package we get messages printed to the console (often errors like "Logging before flag.Parse"). There doesn't seem to be any sensible way to control it (e.g. set its verbosity or configure where it outputs to) and its output is trampling over things we're trying to print at the same time.

A friendlier alternative might be google.golang.org/grpc/grpclog; it allows registering a custom logger which lets users integrate it gracefully with their chosen logging framework. Since the remote API is heavily based on grpc it's not introducing any extra dependency, although I don't know if grpc intends that to be used by anyone else or not.

I'm happy to make the code changes if we agree on what the solution should be.

go/pkg/balancer uses experimental gRPC APIs

These APIs are labeled as experimental, and they will be changed in the near-ish future. Details are in grpc/grpc-go#6472. Note that any package that uses gRPC's experimental APIs should itself be considered experimental, as a change to them would break compilation for the entire package that uses them. So we do NOT recommend the use of these features in any other libraries or applications that aren't tolerant to breakages like this.

https://github.com/googleapis/google-cloud-go may have another solution to the same type of problem, which is that it creates multiple channels and load balances between them. See https://github.com/googleapis/google-api-go-client/blob/caea95689f82049822552cd649765335123831e0/transport/grpc/pool.go#L30 for details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.