Giter VIP home page Giter VIP logo

rbt's People

Contributors

brianhicks avatar celsobonutti avatar rtfeldman avatar zwilias avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rbt's Issues

make Roc API stuff opaque

Right now we have to do things like:

Rbt = [Rbt { default: Job }]

but it would be better to make that opaque and a non-union type:

Rbt := { default : Job }

Problem is, glue can not see through module boundaries right now so making that change in Package-Config.roc would mean that the API in Rbt.roc could not construct values!

Once glue can see through module boundaries, this should be trivial to fix (and should simplify the generated glue code a lot!)

implement FileMapping

As in ADR 008 (docs/adrs/008-unified-inputs.md) we need some data structure like this to avoid conflicts between jobs and source files:

FileMapping := { sourceFile : Str, workspacePath : Str }

sourceFile : Str -> FileMapping
sourceFile = \path -> @FileMapping { sourceFile : path, workspacePath : path }

withWorkspacePath : FileMapping, Str -> FileMapping
withWorkspacePath = \@FileMapping { sourceFile }, workspacePath ->
    @FileMapping { sourceFile, workspacePath }

Right now we're just using a Str for input paths—they should be replaced by this data structure.

think through how to handle downloads

We want to be able to download files. We probably need to do this fairly frequently, and should be able to do this without invoking some system tool like curl. At minimum, we probably need to be able to:

  • download files from HTTP(S)
  • verify that the downloaded files match some hash
  • store the output in the content-addressable cache

The task here is not to actually implement this, but to create an ADR detailing what kind of API we want, exactly.

it should be possible to capture stdout or stderr to a file

When implementing #7, @celsobonutti found that it's pretty inconvenient to capture stdout as a build result: you need to spawn a shell and do input redirection. This is a common-enough pattern, in my mind, that we should allow for it in the rbt API. So, this issue needs two things:

  • a new ADR detailing what the new API should be for capturing stdout to a file
  • after getting input on the ADR, implementing it

move to a data store that can be used across threads

In order to walk the build graph in parallel, the data store has to be accessible across threads. We've been thinking about using sled for a while (in fact, it's still in the dependencies from the initial spike!)

jobs should be able to depend on other jobs

We have an ADR for this (docs/adrs/008-unified-inputs.md) but we're going to need to make some modifications to work around some problems in how Roc interprets the memory. ("alias references to mutually recursive types" in the Roc Zulip).

We will probably need to have Job be defined something like this for the first pass, and work towards the design from ADR 008 in future PRs.

Job := [Job { command : Command, inputs : List [FromProjectSource (List FileMapping), FromJob Job (List FileMapping)], outputs : List Str }]

empty out environment

we currently inherit environment variables from the parent process. We need to do that!

This should be a pretty reasonable first issue for someone, except for the fact that a lot of tools will get annoyed at you if you don't set some environment variables. The most common one I've seen is HOME (which we're setting in #62) but I tested a few things out and it seems like it might be necessary to set LOCALE to some fake value as well.

do something more reasonable with jobs' log output

Right now, stdout and stderr from jobs get dropped. That's not really tenable! We probably want to store it so we can more easily debug builds.

This might depend on #72 if we decide we want to store it in the database (or if we want to store it in the CAS.)

If someone other than @BrianHicks picks this up, the best bet is probably to have a chat with him on the Roc Zulip about design considerations here!

add optional names to jobs

Jobs currently just have their IDs and the first 20 characters (or so) of their commands. We should do two things to improve this:

  1. allow job to take an optional name field. If we get that, we should present jobs with that field everywhere we can.
  2. finalize what we want to do in the case where we don't have a name (the current solution is a little hacky.)

Those items can be one PR or two—they're at least partially separate!

Remove `.envrc` from the repo

I think that we shouldn't check in .envrc to the repo. Instead we should let the user specify it if they want it. The reason for this is so that users aren't stuck with the default .envrc. For example, I use lorri instead of direnv directly. This means that I want .envrc to be eval "$(lorri direnv)" instead of use nix.

chore: refactor responsibilities

Currently:

  • Workspace is responsible for setting up files for a job
  • Job is responsible for creating the command
  • Runner is responsible for bringing the workspace and job together
  • Coordinator is responsible for executing jobs in the right order

It might make sense to refactor these in terms of responsibilities:

  • Runner is responsible for taking a job (not a job and workspace, just a job), running it, and saying where we can get the output
    • for local builds, this probably just looks like managing the workspace
    • for remote builds, this looks like shipping the job definition and any necessary files to a remote executor, then downloading the result again (to be clear, remote builds are not part of this story!)
  • Workspace is responsible for isolating the file system as much as possible
    • it symlinks files into the working directory
    • it creates a fake HOME (see #62)
    • it symlinks any other jobs in
  • Job is responsible for command isolation
    • it isolates the PATH (see #61, or should Workspace maybe handle this?)
    • or maybe we should have some other kind of command that wraps std::process::Command that can do this wrapping for us?
  • Coordinator is responsible for the same things as before

This'll be a biggish refactor, but should mean we have better-defined responsibilities for all the components in the system.

think through how to download package dependencies

Lots of languages I've used (Ruby, Elm, Python, Rust) have a single manifest file (sometimes with a lock file) to define all the dependencies and versions needed for a project. In a lot of cases, we're just going to want to leave this alone and call (e.g.) pip install with the manifest and lock files as targets. But in certain cases, we may be able to take over some of the downloading and installing. (For example, look at all the x2nix packages out there!)

So, in the case where we can download a bunch of files, we probably want to cache those—how? Does the cache need to be mutable? In what cases? Can we avoid having a shared mutable state escape hatch altogether while still supporting these use cases?

validate that we aren't getting hash collisions

When rbt gets a Job from Roc, it uses the information to create a key, which is then used as the identifier for the job in the build graph. However unlikely it is, it's possible that we could get a collision there.

To guard against this, rbt could keep a mapping of key to job while constructing the build graph. Every time we got a new job, we'd insert into this mapping. If we already had an item in there, we'd verify that the new value was exactly equal to the existing one. If it wasn't, we'd raise an error and ask the caller to set some field designed to avoid hash collisions.

Thanks, @bhansconnect, for the idea of how to mitigate this!

Require review from other code owner

The way to go seems to be to:

  • set up CODEOWNERS
  • require one review for the trunk branch
  • require reviews from CODEOWNERS

The CODEOWNERS file should be set up to have at least two trusted contributors as code owners per folder.
That way every PR will be reviewed by a (different) trusted contributor.

  • Verify settings by trying to merge PR with review from someone who is not a code owner.

think through how to handle file patterns

We know that it'll be way too annoying to manually specify every single one of our files, so we'd like something a little better!

I'm assigning @zwilias on this since we've already paired on it and he has more context on the ADR that will be written up to close this issue.

isolate binaries to only those specified in the job

We currently fudge a little with systemTool: we assume the tool is in PATH, but we don't check at all and also don't prevent it from running anything else by name in PATH.

Implementation idea: look up the binary in PATH, then symlink the binary location to some discrete bin directory that the job has access to. Takes a little more work but isolates more and lets us give better error messages for missing binaries.

isolate HOME

we don't currently empty out the environment, which means that paths like HOME are totally available for caching, config files, etc. Not isolated even a little!

What we want: create a fake HOME, then look through it after the build completes. If the build leaves anything in it, issue a warning. Eventually this will be an error, but for now we don't have any way to work with mutable caches so we should be a little gentler.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.