crev-dev / cargo-crev Goto Github PK
View Code? Open in Web Editor NEWA cryptographically verifiable code review system for the cargo (Rust) package manager.
License: Apache License 2.0
A cryptographically verifiable code review system for the cargo (Rust) package manager.
License: Apache License 2.0
People are going to get their machines compromised, and CrevIDs stolen.
My plan was that people should just create a self-Trust Proof with distrust
set to non-None
and publish that. Any client that finds a Trust Proof like that should immediately distrust the whole CrevId. Maybe even include the Proof like that into their own trust db to publish it for others to see. Only for CrevIDs that they considered trusted before, to prevent spamming.
The rest of the problem should be covered by the fact that the default number of reviews required to consider something a trusted code, should be at least 2. This way one compromised/malicious individual can not compromise anything. For this to happen, the graph/trust algorithm will have to get smarter too and consider only non-overlapping paths, so that people can't create a new CrevId, trust it, and it would now count as another reviewer.
I've improved formatting, but at least some color would help.
At least one user expressed that two are needed: https://www.reddit.com/r/rust/comments/99aiea/idea_for_a_scalable_code_reviewtrust_system_not/e4pc8rh/
So there could be eg.:
trust: some
transitive-trust: none
meaning "I trust code reviews of this person, but I don't necessarily trust people that this person trusts.
If we're going with two fields, things are becoming more complicated. How exactly should transitive trust work (especially that it is supposed to be configurable).
This should read WoT and from the perspective of the user's ID, display trust status of all files in the project.
Id should be configurable, so it should be possible to do this for other IDs as well.
Right now Review Proof and Trust Proof comes with a trust
field (among other). I was planing that every field would have 4 possible values:
The question is - should none
be "negative reviiew" and mean "I know this crate should not be used" or should there be additional field distrust
, and trust: none
mean "I have no trust in thie code, but I don't necessarily distrust it".
In my opinion the less potential fields and values the better, to a point. Too many fields and potential values, make user decision too difficult, and reasoning about the whole trust harder.
Too little fields/values might make certain important scenarios impossible too express.
cargo crev db commit # git ci -a
cargo crev db push # git push
cargo crev db pull # git pull --rebase (?)
One issue of any review system, is that not all reviewers are created equal:
It can be difficult enough to weigh in the opinion of multiple reviews when one personally knows the reviewers; when they are anonymous and numerous, such as in crev, it is just unwieldy. And unfortunately the "average" only gets you the opinion of the masses, which risks drowning the voices of the experts in the noise.
I have been thinking hard about this problem of scaling trust from a handful of individuals to an unlimited number of them, and my answer is to trust a few founders, and let them delegate (part of) their authority in a hierarchical Web of Trust. Then, associating a weigh to a handful of webs seems like "reasonable" homework, and more casual users can just go with the community consensus.
I have described the system, at length, at Scaling Trust: Weighted Webs of Trust. It is relatively complete, as far as I can see, and should support this goal in a scalable way.
The one remaining question, however, is how susceptible the system is to malign individuals. That is, how easy it would be for a determined individual, or group, to subvert the system? I have already tried to imagine multiple attack vectors, their consequences, and the available mitigations and responses... however it only takes one vulnerability to upend all this, so I would appreciate additional eyes.
I'm planning to give https://github.com/vitiral/artifact a try. 2.0.0 is just around the corner, and git version is building for me just fine.
cargo-crev
is kind of working already. In a sense it's even quite feature complete (alpha quality though)
See https://github.com/dpc/crev/tree/master/cargo-crev for instructions.
We should fill in project-name
, project-version
- for information purposes
If possible to get it reliably from cargo
, the revision
would be awesome too.
... should run git <cmd>
inside a ~/.config/crev/<ownid>
so it's easy to initialize git repo and push it somewhere etc.
Right now it's just ignored.
Moving discussion from #47 to a new thread.
cargo crev <verb> <obj> <args>
seems to work quite well.
Edit: I'll be updating this list. Not all implemented yet.
cargo crev
new
id
change
id
readme
review <pkg> <version>
trust <ids>
push
pull
fetch
url <url>
trusted
all
query
id
current [--urls]
own [--urls]
trusted [--urls]
distrusted [--urls]
all [--urls]
review [--by <id>] [--trusted] [--distrusted] <pkg> <version>
package
outdated [--trusted]
...
Right now cargo crev db git -- commit -a
has to be used, which is inconvenient. Anything after git
can't be an argument to cargo-crev
itself.
If the project is using multiple versions of same <crate>
, the <version>
argument must be passed to specify which.
Right now it will create a proof for the "random" one.
Clone github repositories to:
~/.cache/crev/remote/<id>/<blake2(id-url)>/...
It should be:
$HOME/.config/crev/<pubid>/{trust,review}/{year}-{month}.crev
This way it's easy to share the whole <pubid>
eg. on github, and year-month is a good balance between too many fies and rewriting too big files.
This is related to #44 .
Right now WoT graph is build by just a cost-bounded flooding of a graph. This makes it possible for anyone to create a new CrevId, trust it, and this way artificially increase the possible count of reviewes for a given crate.
This algorithm should keep track of path (id(s) of that directly trusted this one), on each step to the root of the trust tree, and so that when calculating the count of reviews, it's possible to "merge" reviews coming out from a common path, for the purpose of calculating the total trust count.
Example: If you directly trust only one other CrevId, you can only have trust count equal to 1, for any given crate, no matter how many people reviewed it.
Would it make sense to take shorter digest
, just the output shorter? Is the loss in security significant?
This should be an easy fix, and these needless quotes are ruining everything.
Just like git commit
:
<review goes here>
# lines after # are ignored
# blabhablah
# thouroguness means:
# none - you don't trust
# low - you're not sure
# medium - you trust somewhat
# high - you trust
# blabhablah
Hello,
I have just come upon https://github.com/dpc/crev/wiki/cargo-trust:-Concept ; and have a few comments about it.
First, $ cargo trust verify
$ cargo trust project
(EDIT: oops) appears to trigger Updating registry
. This means that somehow, signing something triggers a download. If it does, then how do I know that what I'm signing is what I have reviewed? It should definitely not need to download anything, as if it needs to then there's a TOCTOU attack.
Second, trust project
and trust id
are under the same subcommand. This is one of the design errors of GnuPG (and in a way OpenPGP): putting ownertrust and key validity under the term “sign”. Here, you want to consider project's validity, and other key's trust. As such, I don't think it makes sense to jam the two under the same command, and it will likely lead to the same kind of confusion brought by GnuPG's interface and everyone confusing ownertrust and key validity. Maybe validate project
and trust id
would be better names.
Finally, one of the big drawbacks of reimplementing one's own crypto (by having one's own keypair) is that it means it can't be put on a secure hardware token (eg. smartcard). Which is not nice from a security point of view, if you assume that identities are supposed to survive computer compromise.
HTH,
Leo
They should be in-memory data structures that allow easy lookup.
So something like:
type Id = usize;
struct ReviewStore {
review_by_id: HashMap<Id, ReviewProof>
pubid_to_id: MultiMap<String, Id>; // or BTreeMap<String, Vec<Id>>,
/// any other "index" for lookup
}
Functionally, it's a one-column table in relational database + many indices.
These should be able to (de-)serialize to/from a file.
The mode of operation would be: on every command that requires it, crev
scans stuff and builds a small database like that, later used to perform many lookups when verifying trust, traversing graph of trust, etc.
We can start with just loading everything every time, and if it won't scale, we can switch to SQLite or something. Then we can introduce a trait ReviewStore
and have many implementations.
It would be awesome if you could add a CONTRIBUTING.md and/or some documentation on the more technical side. Doesn't need to be much since this is still early and sure to change, but some notes on where everything is/how the code is organized could make it easier for others to contribute :)
I'm planing to migrate to:
license = "MPL-2.0 OR MIT OR Apache-2.0"
everywhere. Just to give people more choice. I did in digest crates already, before releasing.
@Dylan-DPC @rffrancon . Ack? :)
Just recursively add all files.
cargo crev
is using a custom recursive digest algorithm to calculate the unique digest of the crate content. It is vitally important that this operation is cryptographically secure, and there are no bugs, so it doesn't have to get fixed while all previously calculated and signed digests are now incorrect.
I'd like at least one person go through that fairly small sub-crate and reviewe it.
cargo crev
list
ids all # all known
ids trusted
ids mine # currently `id list`
reviews <package_name> <package_version>
We need to figure out, what is the best way to keep up with proofs being created.
One idea that I have is to have urls embedded in both IDs and Review Proofs, that would crev
fetch updates in the future.
Review Proof already is supposed to have project_urls
field: a sequence of URL. The main idea was to allow identifying somehow, which given Review Proof is reviewing. But a secondary function could be - fetching future Review Proof.
Eg. maintainers release version 1.2.3
. The snapshot at the point of the release (that is uploaded to NPM.org/crates.io) does contain some Review Proofs already, but only after it is released people will find out about it, review it, and hopefully submit reviews as PRs to the project.
If the existing Review Proofs contain URL to the upstream repository, crev
can download the up to date revision, and use the Review Proofs that came up after the release.
Similarly, it might be in user's best intention to keep a public personal git repository with all own Review Proofs, all Trust Proof from other users, and all Trust Proofs of their own. This way all people that trust given ID, and already have their ID in their local WoT, can fetch the latest version, and keep up to date. So we should add urls
field with sequence of urls in the user ID (and thus in Trust Proofs).
The only problem that I see here is finding a balance between keeping up to date, and potentially recursively having to download too much data.
The Trust Reviews URL have a natural cut-off point: download updates only for IDs you trust (since you trust them).
The repository URLs don't, since "trusting" and "not trusting" is not clearly defined, and changes all the time. But the "the current one and all dependencies" seems OK...
It is possible to cargo crev trust <garbage>
and generate Trust Proof out of it.
Related to #11
The current idea is that we will try to fetch repositories from urls of trusted IDs.
data
should only abstractly handle data without any IO/files/paths considerations.
lib
should handle the core logic, without concern for CLI.
bin
should be just simple CLI over lib
There are two, slightly different ways to serialize proofs:
version
should not be editable by the user, but be there in the final serializationcomment
should always be there when editing, but hidden if empty in the empty versiondigest
and revision
when editing (are we sure?)The solution here is probably to have two versions of each struct, with different serde
annotations, and some methods to conveniently convert them back and forth.
README.md
inside the proof directory.Move fetch
from cargo crev db
to cargo crev
command.
cargo crev
fetch
url <url>
trusted
known
Fetch:
This is going to be a primary way for people to discover other reviews and IDs.
This will probably solve itself soon, after Edition 2018 is stable.
ed25519-dalek
, miscreant
are both in pre-final release.
ed25519-dalek
is almost 1.0.0 but other crypto libraries are not. Is it a bad idea to use them?
When it's OK to release crev
to the public?
There's a confusion created by the name of the cargo subcommand (trust
) which is a verb.
Eg. to trust someone it would have to be:
cargo trust trust <id>
Originally I wanted to name it cargo crev
, then switched to cargo trust
. If there are no other ideas, I will revert back.
It should be possible to call it from anywhere.
cargo
does some stuff to the original Cargo.toml
and directory where it downloads the crate. If anything about it changes in the future, digest
that we've calculated could change, and all the existing Project Review Proofs would stop working.
Can we do anything about it? Will the cargo
team agree to keep the download directory immutable? rust-lang/cargo#6340
Right now when calculating the checksum cargo-crev
will skip the .cargo-ok
file.
I wish cargo
would just leave the whole directory alone, and any additional files or modifications happened in another place, letting the crate source stay immutable.
This would allow nicely:
Because it isn't right now, the first push has to be:
cargo trust db git -- push --set-upstream origin master
while the following should just work:
cargo trust db git -- push
Yaml is my favorite:
Few problems I've noticed:
Because of serde shotcommings (serde-rs/serde#1410) it doesn't look optimal.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.