roc-lang / rbt Goto Github PK
View Code? Open in Web Editor NEWRoc Build Tool
License: Universal Permissive License v1.0
Roc Build Tool
License: Universal Permissive License v1.0
Right now we have to do things like:
Rbt = [Rbt { default: Job }]
but it would be better to make that opaque and a non-union type:
Rbt := { default : Job }
Problem is, glue can not see through module boundaries right now so making that change in Package-Config.roc
would mean that the API in Rbt.roc
could not construct values!
Once glue can see through module boundaries, this should be trivial to fix (and should simplify the generated glue code a lot!)
As in ADR 008 (docs/adrs/008-unified-inputs.md
) we need some data structure like this to avoid conflicts between jobs and source files:
FileMapping := { sourceFile : Str, workspacePath : Str }
sourceFile : Str -> FileMapping
sourceFile = \path -> @FileMapping { sourceFile : path, workspacePath : path }
withWorkspacePath : FileMapping, Str -> FileMapping
withWorkspacePath = \@FileMapping { sourceFile }, workspacePath ->
@FileMapping { sourceFile, workspacePath }
Right now we're just using a Str
for input paths—they should be replaced by this data structure.
it creates directories and moves around output files before checking if it needs to. See these comment threads for a way to improve the situation:
The temporary directory should also probably live somewhere more inspectable in case of failure. (#33 (comment))
Once we have a bunch of jobs running in parallel (#73) we should make a nicer CLI output. Something like superconsole should come in quite handy!
We want to be able to download files. We probably need to do this fairly frequently, and should be able to do this without invoking some system tool like curl
. At minimum, we probably need to be able to:
The task here is not to actually implement this, but to create an ADR detailing what kind of API we want, exactly.
When implementing #7, @celsobonutti found that it's pretty inconvenient to capture stdout as a build result: you need to spawn a shell and do input redirection. This is a common-enough pattern, in my mind, that we should allow for it in the rbt API. So, this issue needs two things:
In order to avoid pointers, we currently copy a little too much memory. @bhansconnect wrote a helper to avoid doing some of that, which we should use instead: roc-lang/roc#4361. That way we could have owned lists and dicts, and in so doing avoid some copies!
In order to walk the build graph in parallel, the data store has to be accessible across threads. We've been thinking about using sled for a while (in fact, it's still in the dependencies from the initial spike!)
We have an ADR for this (docs/adrs/008-unified-inputs.md
) but we're going to need to make some modifications to work around some problems in how Roc interprets the memory. ("alias references to mutually recursive types" in the Roc Zulip).
We will probably need to have Job
be defined something like this for the first pass, and work towards the design from ADR 008 in future PRs.
Job := [Job { command : Command, inputs : List [FromProjectSource (List FileMapping), FromJob Job (List FileMapping)], outputs : List Str }]
We're currently using the default Rust hasher, as @bhansconnect points out in https://github.com/roc-lang/rbt/pull/33/files#r954403324. Should use something else. Looks like xxhash (crate) is a pretty reasonable/fast option.
we currently inherit environment variables from the parent process. We need to do that!
This should be a pretty reasonable first issue for someone, except for the fact that a lot of tools will get annoyed at you if you don't set some environment variables. The most common one I've seen is HOME
(which we're setting in #62) but I tested a few things out and it seems like it might be necessary to set LOCALE
to some fake value as well.
https://github.com/tokio-rs/console should be a useful tool for us to figure out if rbt is performing well after merging #82. Let's set it up and see if it can give us insight into how well the system is working!
Right now, stdout and stderr from jobs get dropped. That's not really tenable! We probably want to store it so we can more easily debug builds.
This might depend on #72 if we decide we want to store it in the database (or if we want to store it in the CAS.)
If someone other than @BrianHicks picks this up, the best bet is probably to have a chat with him on the Roc Zulip about design considerations here!
Jobs currently just have their IDs and the first 20 characters (or so) of their commands. We should do two things to improve this:
job
to take an optional name field. If we get that, we should present jobs with that field everywhere we can.Those items can be one PR or two—they're at least partially separate!
With #69, we have an actual build graph. Next step: walk it in parallel!
Dependencies:
We probably don't want to do this in an async context, by the way. The async_std
implementation of process is unstable, and we're going to want to avoid spawning all the build tasks at once!
I think that we shouldn't check in .envrc
to the repo. Instead we should let the user specify it if they want it. The reason for this is so that users aren't stuck with the default .envrc
. For example, I use lorri instead of direnv directly. This means that I want .envrc
to be eval "$(lorri direnv)"
instead of use nix
.
Currently:
Workspace
is responsible for setting up files for a jobJob
is responsible for creating the commandRunner
is responsible for bringing the workspace and job togetherCoordinator
is responsible for executing jobs in the right orderIt might make sense to refactor these in terms of responsibilities:
Runner
is responsible for taking a job (not a job and workspace, just a job), running it, and saying where we can get the output
Workspace
is responsible for isolating the file system as much as possible
HOME
(see #62)Job
is responsible for command isolation
PATH
(see #61, or should Workspace
maybe handle this?)std::process::Command
that can do this wrapping for us?Coordinator
is responsible for the same things as beforeThis'll be a biggish refactor, but should mean we have better-defined responsibilities for all the components in the system.
Lots of languages I've used (Ruby, Elm, Python, Rust) have a single manifest file (sometimes with a lock file) to define all the dependencies and versions needed for a project. In a lot of cases, we're just going to want to leave this alone and call (e.g.) pip install
with the manifest and lock files as targets. But in certain cases, we may be able to take over some of the downloading and installing. (For example, look at all the x2nix
packages out there!)
So, in the case where we can download a bunch of files, we probably want to cache those—how? Does the cache need to be mutable? In what cases? Can we avoid having a shared mutable state escape hatch altogether while still supporting these use cases?
When rbt gets a Job
from Roc, it uses the information to create a key, which is then used as the identifier for the job in the build graph. However unlikely it is, it's possible that we could get a collision there.
To guard against this, rbt could keep a mapping of key to job while constructing the build graph. Every time we got a new job, we'd insert into this mapping. If we already had an item in there, we'd verify that the new value was exactly equal to the existing one. If it wasn't, we'd raise an error and ask the caller to set some field designed to avoid hash collisions.
Thanks, @bhansconnect, for the idea of how to mitigate this!
The way to go seems to be to:
The CODEOWNERS file should be set up to have at least two trusted contributors as code owners per folder.
That way every PR will be reviewed by a (different) trusted contributor.
We know that it'll be way too annoying to manually specify every single one of our files, so we'd like something a little better!
I'm assigning @zwilias on this since we've already paired on it and he has more context on the ADR that will be written up to close this issue.
We currently fudge a little with systemTool
: we assume the tool is in PATH
, but we don't check at all and also don't prevent it from running anything else by name in PATH.
Implementation idea: look up the binary in PATH, then symlink the binary location to some discrete bin directory that the job has access to. Takes a little more work but isolates more and lets us give better error messages for missing binaries.
I've had this as a personal TODO forever but realized it needs to be public! ADR 4 (https://github.com/rtfeldman/rbt/blob/trunk/docs/adrs/004-symlinking.md) is not implemented and it totally could be picked up by someone else if they'd like!
Just a note for later, as I've just learned about seahash: https://lib.rs/crates/seahash
we don't currently empty out the environment, which means that paths like HOME
are totally available for caching, config files, etc. Not isolated even a little!
What we want: create a fake HOME
, then look through it after the build completes. If the build leaves anything in it, issue a warning. Eventually this will be an error, but for now we don't have any way to work with mutable caches so we should be a little gentler.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.