Giter VIP home page Giter VIP logo

attestation's Introduction

in-toto Attestation Framework

The in-toto Attestation Framework provides a specification for generating verifiable claims about any aspect of how a piece of software is produced. Consumers or users of software can then validate the origins of the software, and establish trust in its supply chain, using in-toto attestations.

Learning about in-toto attestations

To get started, check out the overview of the in-toto Attestation Framework.

For a deeper dive, we recommend reading through our documentation to learn more about the goals of the in-toto Attestation Framework. If you encountered in-toto via the SLSA project, take a look at this blog post to understand how the two frameworks intersect and how you can use in-toto for SLSA.

Visit https://in-toto.io to learn about the larger in-toto project.

Working with in-toto attestations

The core of the in-toto Attestation Framework is the specification that defines the format for in-toto attestations and the metadata they contain.

We also provide a set of attestation predicates, which are metadata formats vetted by our maintainers to cover a number of common use cases.

For tooling integration, we provide protobuf definitions of the spec. We currently only provide language bindings for Go and Python.

Is your use case not covered by existing predicate types?

Take a look at the open issues or pull requests to see if your usage has already been reported. We can help with use cases, thinking through options, and questions about existing predicates. Feel free to comment on an existing issue or PR.

Want to propose a new predicate type?

If you still can't find what you're looking for, open a new issue or pull request. Before opening a request for a new metadata format, please review our New Predicate Guidelines.

Governance

The in-toto Attestation Framework is part of the in-toto project under the CNCF. For more information, see GOVERNANCE.md.

Use @in-toto/attestation-maintainers to tag the maintainers on GitHub.

Insights and Activities

Stay up-to-date with the latest activities and discussions in the in-toto Attestation Framework by exploring the maintainers' notes.

Disclaimer

The in-toto Attestation Framework is still under development. We are in the process of developing tooling to enable better integration and adoption of the framework. In the meantime, please visit any of the language-specific in-toto implementations to become familiar with current tooling options.

attestation's People

Contributors

adamzwu avatar adityasaky avatar alanssitis avatar arewm avatar axelsimon avatar chasen-bettinger avatar danbev avatar dasiths avatar dependabot[bot] avatar frimidan avatar github-actions[bot] avatar hectorj2f avatar joshuagl avatar lehors avatar marcelamelara avatar marklodato avatar mikhailswift avatar pxp928 avatar santiagotorres avatar steiza avatar tannerjones4075 avatar teq0 avatar tomhennen avatar trishankatdatadog avatar web-flow avatar woodruffw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attestation's Issues

Support subjects that are not digests

We have use cases where we need to refer to subjects that cannot be describe by a cryptographic hash. Example: Subversion repository revision, which is only identified by URI. We'd only want to support things that are semantically immutable.

Document convention for versioning

It would be good to come up with some convention for version numbers, using that for official specs (Statement, Provenance, etc.) and recommending it for other predicates.

Proposal

Strawman based on Semantic Versioning 2.0.0: "vMAJOR.MINOR". A change MAY increment only the minor version if consumers can parse the version as an earlier minor revision and have it still be considered accurate according to that earlier revision. For example, parsing a 1.1 message as 1.0 results in something that has the same semantic meaning as if the producer produced 1.0 directly. (I don't see a need for PATCH number since this is just a spec, not code.)

Examples for Provenance, going from 1.0 to 1.1 vs 2.0:

  • Adding a new buildFinished timestamp may increment MINOR because the field does not affect the meaning of any other field. The predicate still makes sense as a 1.0 message ignoring that field.
  • Add a new recipe.extraArgs field requires incrementing MAJOR because the overall meaning of the recipe changes. If you ignore the field, it is NOT semantically a 1.0 message because the recipe assumed that the existing 1.0 fields fully defined the recipe.
  • Modifying the meaning of builder.id requires incrementing MAJOR because it's changing an existing field. Interpreting it as 1.0 is not valid.

Alternatives

Alternative would be to just have MAJOR versions (v1, v2, v3, ...) and increment it on every single change. The downside is that then any minor increases would require all consumers to update.

Any thoughts?

Provide better guidance on `subject.name`

The documentation currently does not give clear guidance on how to properly use subject[*].name. For example, is it OK to consider one attestation {subject: [A, B], predicate: P} identical to two attestations {subject: [A], predicate: P} and {subject: [B], predicate: P}. That is what the Processing Model implies, but I don't think that is what we want in the general case. There might be cases where you want to know that A and B are both produced from the same process.

Thanks @nenaddedic for reporting.

Provenance: rename `reproducibility`

The name recipe.reproducibility sounds like whether it's a reproducible build or not, rather than the set of builder-controlled values that affected the build.

Maybe recipe.environment?

Make predicateType optional

For the case of plain code signing (e.g. via cosign), there is no predicate type. (More accurately, there is some implicit predicate implied by the public key.) To support such use cases, we should make predicateType optional if predicate is not present.

Explain "completeness"

Explain that "completeness" is ultimately an agreement between producer and consumer, usually through some intermediary "accreditor" as in SLSA. Counterexamples would help.

Consideration: where to place "workload" attestations

I'm thinking of something like a SPIRE SVID/Bundle to attest that a build step was actually carried out by a particular workload, on a particular node.

In the case of provenance generated by a separate orchestrator - the orchestrator could obviously include some cryptographic proof of where it was running. But we could also potentially include an attestation from the build process itself.

Where would make the most sense to include this?

Predicate-agnostic graph representation

Multiple predicate types may include "links" to other artifacts, which effectively forms a graph. For example, Provenance has materials and a future "PolicyDecision" may have "input policy". Right now, to traverse the graph, one needs to understand the predicate type in order to pull out the graph edges and labels.

Several people, such as @tiziano88 and @SantiagoTorres, have expressed a desire to have a generic mechanism to walk the graph without having to understand the predicate type. Prior discussion: in-toto/ITE#15 (comment)

As explained in that discussion, there are several major open questions that prevent such a common feature from being useful, such as:

  • What is the real-world use case for doing such predicate-agnostic graph traversal?
  • How would the graph traversal work if the size of the graph is intractably large and you don't understand the edge labels? If you trim at, say, depth N, why is that meaningful?
  • What is the abstract model and terminology for how this works?
  • If there are other predicates that don't support this, e.g. SPDX which has its own link convention, is there still value in doing this? In other words, if you're going to have to support multiple predicates anyway, why do we need a standard?

In my opinion, this all these details need to be worked out before we add such as feature.

In the meantime, we can suggest a convention based on what the Provenance predicate does. Other predicate types can use the same data structure and perhaps even the same field name.

Define a "attestation bundle" data structure and naming convention

Copied from secure-systems-lab/dsse#20.

We need a data structure and file naming convention to associate multiple attestations to a single software artifact.

Motivating use case

Suppose file foo.out is associated with two attestations, both of which are required by the layout of foo.out:

  • Provenance: the build system generates a link saying that foo.out was produced from material foo.c.
  • Vulnerability scan: a scanner generates a link saying that foo.out is free of known vulnerabilities.

The build system and the scanner need to know where to place the links on the filesystem, and the verifier needs to know how to find those links when evaluating foo.out.

Data structure

Define a "bundle" data structure containing multiple attestations (i.e. multiple DSSEs).

Ideas:

  • JSON Lines
  • ZIP file with particular naming convention
  • JSON object or array

File naming convention

Define a convention for locating the attestation bundle related to a given file on disk.

Proposal: <file>.<bundle_extension>

Example: If we choose JSON Lines, perhaps the attestations for foo.out could be found in foo.out.intoto.jsonl?

Provenance: indicate completeness of `recipe`

We currently have a metadata.materialsComplete field that indicates whether materials is complete or not. We need the equivalent for recipe.arguments and recipe.reproducibility. One minor thing is that it doesn't make sense for reproducibility to be complete but arguments is not.

Ideas:

  • Add metadata.argumentsComplete and metadata.reproducibilityComplete.
  • Add completeness.arguments, completeness.reproducibility, and completeness.materials (moved from metadata).
  • Use a field mask to list complete fields, as in complete: ["materials", "recipe.arguments"].
  • Do not support set-but-incomplete, and instead say that unset/null means "unknown" and set (but possibly empty) means "complete.
  • Use an enum for NONE, ARGUMENTS_ONLY, and ARGUMENTS_AND_REPRODUCIBILITY.

My inclination is the completeness.* option. That seems like it's the most straightforward. It also moves a security-critical field out of metadata.

Link to known attestations in repository

In today's in-toto community call while discussing open ITE's we discussed it being useful for both adopters and maintainers to have a collection of different in-toto attestations that exist in the wild.

Some examples I know of:

Defining a generalized predicate format for "human reviews" of artifacts

We started discussing what code review attestations should look like, and @iamwillbar suggested checking if we could define a generalized predicate for reviews. We could then derive code review and other formats (perhaps vuln scans like #58) from this general predicate. As a starting point, we've potentially got the following sections in the predicate.

Edit: the general consensus is that we should only handle attestations for human reviews here, so that's what we're going to be focusing on.

Meta Information

In the case of code reviews, this could identify the person performing the review, but in the case of a vuln scan, it could identify the scanner used. One thing to note is that this may be unnecessary in the case of some reviews, because we could use the signature to identify the functionary as we do with links. Further, we could capture temporal information about when the review was performed.

Artifact Information

This section essentially identifies the artifacts the review applies to. We can incorporate ITE-4 here as well. One project we're looking to work with is crev, so we could use something like purl as well to identify package-level reviews as a whole. One open question may be reviews for diffs, and how they'd chain together to apply to a file as a whole.

Edit: we can probably lean on VCS semantics via ITE-4 and tie CRs to the underlying VCS changes / commits they correspond to. Also, as noted below, this would be part of the statement's subject rather than the predicate, but it was included here to nail down exactly what we're likely to be capturing in a review.

Review Result

This is pretty self explanatory for both CRs and vuln scans. I'm not sure if a negative result CR should exist, except maybe for package-results as they're used in crev.


These are all early thoughts in defining a generalized review format, and I'm curious to hear what people think about this. Also open to hearing about other projects like crev we should be paying attention to when working on this.

Question: Support of multiple SBOM formats.

Hi,
I'm new to the conversation, hope not repeating a closed issue (didn't find such):
The current SPDX predicate is SPDX specific. This does not comply to the "separating statement from predicate" goal; suppose one wishes to use CycloneDX; she would need to define a new predicate "cyclonedx". Thus a policy engine cannot query "was an SBOM attestation created?", without parsing the actual sbom object.

An alternative would be:
predicate type: sbom
predicate {
sbom_type: <spdx, cyclonedx,...>
sbom body :
}

What am I missing?

Provenance: add a policy section

In the provenance spec, add a policy section documenting how the provenance is intended to be consumed. This should aid readability, just as the Processing Model helped with the Envelope/Statement spec.

Authenticated how?

The README says:

This repository defines the in-toto attestation format, which represents authenticated metadata about a set of software artifacts. Attestations are intended for consumption by automated policy engines, such as in-toto and Binary Authorization.

How is the metadata authenticated?

A good TLDR for attestations

It wasn't clear to me until the community meeting today that attestations are a superset of classic in-toto links. Each attestation is simply like a standardized protobuf for certain types of additional information. Would be good to clarify this as TLDR in the README?

Attestation Context\Meta-data\Meta-information

Following #58 here, opening the mentioned issue.

The statement-level meta-data should hold enough information to enable:

  1. Simple policy decisions that are agnostic to the predicate details.
  2. Enable a first level of indexing of the attestations for later recall.
  3. Enable parsing of the predicate.

The current mandatory fields in the statement level are the subject and the predicate-type, which is, as a matter of fact, the predicate-media-type.

Fields that could be of use at the statement level include:
Predicate abstract type - "sbom", "provenance"
Predicate media type - the exact format (uri) (for SBOM- SPDX, SPDX-Lite, CycloneDX, for provenance - slsa-provenance)
When was the attestation taken: Timestamp #46
Where was the attestation taken: Location in pipeline - . I suggest an abstract location and a specific location: the abstract context could be a string with recommended values (user workstation, git-server, build machine etc.), and the specific context could be some machine ID.
Project id - could be a url such as https://github/myproject or simply a string set by the entity creating the attestation. There is a difference between the project id and the subject; the subject would typically be an artifact, but a project may produce many subjects.
One could of course use multiple subject fields (as supposed to be supported), but that is not natural.
An application specific object field - it is always convenient to have a placeholder for a generic object for implementation-specific. As I understand this is supposed to be supported see the parsing rules, but it would be better not to rely on the "undefined" but to explicitly define an application-specific object placeholder.

Such fields enable elaborated policies at the statement level (for example: require an sbom produced at build, without caring about the SBOM details), and would enable indexing to support searching attestations: search by project, subject, time, part of pipeline etc.

What are the attestation community thoughts about this?

Only require 1 approver

We keep finding ourselves lacking two separate reviewers for PRs. IMO it's unnecessarily burdensome to have two reviews for every change. Any objections to decreasing this down to 1?

Provenance: remove `mediaType` and `tags` in favor of extensions

In Provenance, the materials[*].mediaType and materials[*].tags fields are not well defined. It is unclear exactly how they should be used or what the conventional values are. Given that we have a way to add extension fields (#8), let's remove these fields for now until we have a better idea on how to standardize them.

[Help Wanted] Determine attestation format for vuln scans

This is a follow-up issue for the sigstore/cosign#442.

I thought it would be more appropriate to continue the discussion for the aforementioned issue, since the spec is non-cosign-related.


With @developer-guy, we are currently trying to determine a vuln scan spec, as far as we can do best. We can enrich the following brand-new attestation SPEC:

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "subject": [
    {
      "name": "alpine",
      "git_commit": "a1b2c3",
      "digest": {
        "sha256": "c201c331d6142766c866"
      }
    }
  ],
  "predicateType": "SARIF",
  "predicate": {
    "timestamp": "1627564731",
    "owner": {
      "name": "<WHO_RAN_THIS_SCAN>"
    },
    "environment": {
      "name": "GitHub",
      "by": "<PIPELINE_ID>",
      "type": "<CI/CD?> (i.e., GitLab Runner)",
      "cluster": "us-east1-a.prod-cluster",
      "namespace": "<namespace>"
    },
    "success": true,
    "scanners": [
      {
        "name": "trivy",
        "version": "0.19.2",
        "db": {
          "name": "trivydb",
          "version": "v1-2021072912"
        },
        "timestamp": "1627564731",
        "result": "<SARIF RESULT HERE?>"
      },
      {
        "name": "dockle",
        "version": "v0.3.15",
        "timestamp": "1627564731",
        "result": "<SARIF RESULT HERE?>"
      }
    ]
  }
}

We called the predicateType as SARIF. But I think that name, not fits this type since the content is not in SARIF format. We may have to reconsider the name.

It's obvious that it's a bit hard to think of best practices during the implementation of the first version of the spec. It would be great if you maintainers get involved and give a hand to us to improve the overall structure. So we can easily implement the model into in-toto project in order to do validate and generate the attestation. Is that make sense to you? Thanks! We are waiting for your feedback about this.

FYI @dlorenc @NitinJain2 @trishankatdatadog

Clarify that any envelope can be used

The Envelope layer is not specified in this spec. We recommend a particular one (signing-spec) but the attestation format can be used with any envelope, so long as the producer and consumer agree.

Support attestation revocation

It would be useful to have a mechanism for revoking specific sets pf attestations without having to revoke an entire key. A real-world use case is that a builder had a bad release and generating bad provenance for a short period of time. We'd like to revoke the provenance generated only by that bad release, without having to do a full key revocation, since the latter would have a much larger negative impact.

Note that signature revocation was mentioned in secure-systems-lab/dsse#39, where we said it would be a better fit inside the payload. That's why I filed the issue here.

It's also possible we push this down further into the predicate and have predicate-specific methods. For the use case above, https://slsa.dev/provenance could have a builderVersion field and we could revoke based on that. But I don't particularly like that idea since revocation seems like it would apply equally to all attestations.

I don't have good ideas for solutions, but wanted to mention this here since it is is a real issue that has already come up.

Proposal: support subjects that have no digests

Currently, an in-toto Attestation subject can have the form {name, digest} and where both fields must be present. A name field can have the value "_" when a subject has no meaningful name, but it is currently impossible to specify a subject that has no meaningful digest.

We would like to support subjects of the form {name, uri} for subjects that do not have a meaningful digest. Examples:

  • A subject URI that identifies a builder (example: https://build.example.com/[email protected]). This could be the subject of an attestation that the builder meets a certain SLSA level, and it could be referenced in the builder section of a provenance attestation.
  • A subject URI that identifies a specific revision of a source-code repository (for example, svn+ssh://<host>/<repo-name>/<revision-number>). This subject could be referenced in the materials section of a provenance attestation, and it could match the subject of an attestation that the repository meets a certain SLSA level.

Why not use a content digest?

  • Computing a digest may not be feasible. For example, computing a digest over a (large) source repository at a specific revision number; or over the components that make up a build system stack, from the applications that manage the execution of compilers and other tools, to the operating systems and virtual machines that the software runs on.
  • A digest may not have useful semantics. For example, when a build system's content digest were hypothetically computed over only the builder-specific components of the build system stack, even a trivial software update would change that digest, without affecting any of the build system's SLSA security properties, i.e. things that we actually care about.

Proposal

  • In a Statement subject:
    • A subject can have the new form {name, uri}, as an alternative to the existing form {name, digest}.
    • The subject uri field uses resource URI syntax. See "What makes a good subject URI?" below for desirable properties.
    • When searching for an attestation, require an exact match with a subject URI.
    • No change is needed in the subject name field spec. This field may contain "_", or additional information to evaluate the attestation (for example to select between "production" vs "testing").
  • No change is needed in the Provenance predicate spec. The fields of interest, builder and materials, already support an URI without a digest.

What makes a good subject URI?

  • A good subject URI has immutable semantics. If a resource is semantically changed, then its subject must also change. For example, a source repository URI svn+ssh://<host>/<repo-name>/<revision-number> has a new a revision number after each committed change, or a new build system URI https://build.example.com/worker@<version> has a new version after a change in its SLSA security posture.
  • A good subject URI has universal semantics. If a subject has different semantics for different observers, then it is not a good subject.

Potential extensions: subject matching

For now, we require exact matches when searching for an attestation subject URI. In some cases it may be desirable that an attestation can apply to a collection instead of a specific instance. For example, an attestation that all revisions of a source repository meet a certain SLSA level as of some revision number or point in time. This potential extension would require a way to match a specific URI instance in provenance, against a collection (or class) specified in an attestation subject.

Provide better guidance on `materials`

Currently it's unclear on what "complete" means for materials. Does it include materials used by the builder itself, e.g. the GitHub Actions orchestrator, or just the "runner"? To minimize the list, do we recommend that builds execute within a container? Is a container a sufficient security boundary? How confident do you need to be that it's coimplete?

My thinking is that we're shooting for "everything inside the runner container," and it's OK if malicious builds break out of the container, but you're capturing the intent of a "good" build.

Does this mean completeness.materials be an enum instead of a boolean?

@TomHennen FYI who raised this question.

Reproducible builds expressed as Provenance predicate

There are a few open questions for how to support reproducible builds in the Provenance predicate:

Q1. Should each rebuilder produce its own attestation, or should all rebuilders sign the same attestation?

(option A) With the current schema, builder is required so each rebuilder must produce a unique attestation. The consumer would then verify that all the fields are identical except builder.

(option B) An alternate idea would be for all rebuilders to sign the same statement. This would only work if builder is optional and implicit from the signing key.

Q2. Should the Provenance attestation be sufficient to reproduce a build?

I think that's a good idea, but I'm not sure we're there yet.

At a minimum, we should document that notion and perhaps have a reproducible: true field to indicate that the builder thinks it is reproducible.

Even then, the rebuilder needs to understand the recipe. For example, if GitHub Actions is used, the need to understand how to parse and run the workflow. It would be nice if that we had some standardized convention for recipes, so that all builds speak the same language. Too big a project for the attestation format, but something to think about as part of SLSA.

Container Native Provenance Predicate?

Starting a rough brainstorm here to capture some ideas I had around the existing Provenance Predicate in this repo. The existing In-Toto Link format, Grafeas BuildProvenance and new In-Toto Attestation formats are all slightly off from my mental model.

At a high level, there are three things we care about: recipe (set of steps), materials (inputs), and final artifact (output). They each differ in how many of these you can have, and how they're defined.

Here's a comparison on their cardinality, and what I think I want:

Provenance

https://gist.github.com/dlorenc/2f31fcb3c9a5d0a06ad944f8b831b213

  • One Subject
  • One Recipe
    • Recipe only contains pointer to a material
    • Entrypoint (top level script)
    • Arguments
    • Environment
  • Multiple Materials

Link

  • Multiple Products
  • Multiple Byproducts
  • Multiple Materials
  • Single Environment

Grafeas BuildProvenance

  • Multiple Artifacts
  • Multiple Commands
    • Environment
    • Command
    • Args
  • Single Source

What I want

  • One Subject
  • One Recipe
    • Multiple Steps
      • Environment
      • Entrypoint
      • Arguments
  • Multiple Materials

In Graphviz format: https://gist.github.com/dlorenc/84fb062d0f0fd532b2cb603dc8648543

image

We can have multiple materials, one recipe that contains a set of steps. Each step can be it's own environment, typically a container image. These steps run against the materials, and produce an output. If the steps produce multiple output artifacts, the system should generate one of these for each individual output artifact.

This is all pretty rough and high level still, but I think this more closely matches the models of GCB, GitHub Actions, and TektonCD.

Cut initial release

We plan to cut an initial release before the end of April. Note that each layer can have separate revisions.

Questions:

  • How should we number the revision? 1? 0.1? 2021-04? Something else?
  • Should the first one be some sort of alpha / draft / provisional, or a full release?

If we use integers, the decision is simple. First release is 1.
If we use 0.x/1.x (or "draft"), which of Statement / Provenance / SPDX is 1.0 and which is 0.1 / draft?
If we date, what date should we use.

My suggestion:

  • Statement: v1 (we're fairly confident that it is good enough, and any changes could be reasonably considered v2)
  • Provenance: v1-beta1 (I'm not super confident that we won't make changes)

Thoughts?

Compositional Notion On Attestation Types

It'd be nice to describe a strategy for different types of attestations within the same action.

Should we have these two types of attestations live under the same object? or would they be separate attestations attached to the same action. I.e., you could have 1 subject with multiple predicates.

F.e., Having a provenance type that includes a measured boot record:

{
  "subject": [
    { "name": "curl-7.72.0.tar.bz2",
      "digest": { "sha256": "ad91970864102a59765e20ce16216efc9d6ad381471f7accceceab7d905703ef" }},
  "predicateType": "https://in-toto.io/Provenance/v1",
  "predicate": {
    "builder": { "id": "https://github.com/Attestations/GitHubHostedActions@v1" },
    "recipe": {
      "type": "https://github.com/Attestations/GitHubActionsWorkflow@v1",
      "definedInMaterial": 0,
      "entryPoint": "build.yaml:maketgz"
    },
    "metadata": {
      "buildStartedOn": "2020-08-19T08:38:00Z"
    },
    "materials": [
      {
        "uri": "git+https://github.com/curl/curl-docker@master",
        "digest": { "sha1": "d6525c840a62b398424a78d792f457477135d0cf" },
        "mediaType": "application/vnd.git.commit",
        "tags": ["source"]
      }
    ]
    "tpm-measured-boot: {
       "PCR0": "xxxx",
       "PCR1": "yyyy",
       ...
    }
  }
}

This way, we would be able to know provenance information of the build + information about the host's integrity.

Add support for SCAI predicate

Per my discussions with @SantiagoTorres and @MarkLodato, I'm opening an issue to get this discussion going on a broader forum.

CDI (or Code Deployment Integrity) is a framework for high-integrity provenance for software artifacts. CDI enables verifiers to make trust decisions about artifacts based on attested code properties, and enables verifiers to additionally establish trust in the attesters generating provenance metadata. (See our position paper)

In more detail, CDI can enhance in-toto in three key areas:

  1. Capture metadata about code security properties/behavior.
    CDI attestations contain not only information about a step in the supply chain, but also authenticated claims about specific security properties (e.g. claim: "this binary enforces strict memory bounds checks", evidence: builder is a verified Wasm AoT compiler). CDI also supports obtaining attestations from static/binary analysis tools.

  2. Integrity for the attesters: Tool endorsements
    In order to be able to substantiate claims about code properties inserted by a specific supply chain step, CDI obtains additional attestations or endorsements (signed claims) about attesters. Endorsements about the tools may come from a variety of endorsers such as CAs or third-party auditors, or from running tools on trusted execution environments (e.g., Intel SGX). These endorsements provide additional integrity metadata about the tools, reducing trust in the tools (i.e., attesters) and allowing verifiers to determine the confidence they have in attestations coming from certain attesters.

  3. Contextual deployment policies
    Attesting to code properties throughout the supply chain enables developers to specify fine-grained contextual policies about artifacts that can be enforced at deployment time. For instance, I may wish for a container running my binary to be deployed with co-tenants so long as they are side channel-free. This last feature requires post facto auditing, but I envision these policies being attached as part of the metadata for an artifact, and enforced by a verifier (e.g., a container orchestrator) at deployment time. I also envision eventually being able to specify such policies for supply chain steps (e.g., "build my artifact with an AES library that is attested to be side channel free").

A couple extra notes. Some of these features are more well fleshed out than others. I'm also currently working on a prototype to demonstrate running compilation inside of a TEE like SGX, but supporting other TEEs will be important as well.

Where I'm uncertain is the most effective way to integrate support for CDI with in-toto. On the one hand, amending the in-toto Link predicate may make sense since CDI is complementary. At the same time, it may be better for backward compatibility to develop an in-toto compatible predicate schema for CDI, especially considering that in-toto also supports attestation bundles now. Or perhaps a combination of the two.

I'd love some input on this. Thoughts?

Provenance: consider splitting `builder.id`

"Builder" primarily refers to the (id)entity generating the provenance, not the "runner" that actually did the build. This feels like it could easily be broken into two pieces to make things more clear (esp. at the lower SLSA levels): 1) authority that attests to the result of the build / content of provenance 2) runner that executes the build steps.

Originally posted by @msuozzo in #39 (comment)

Should the predicateType's hash be included as a digest?

Per https://github.com/in-toto/attestation/blob/main/spec/README.md, the predicateType value should be a TypeURI that defines both the shape and meaning of the predicate.

Unlike most other URIs used in in-toto, the hash of the predicateType does not seem to be recorded in the attestation.

I would be curious why this is the case, as this seems like a security vulnerability. The host of the URI referenced in the predicateType may silently update the content it responds with. This then creates a situation in which the meaning of the predicate object in the attestation is now different, with potentially unwelcome consequences. A consumer of the attestation must therefore trust the host of the predicateType to serve the same content that the attestation author referenced.

It seems adding a DigestSet for its content would remedy this?

Ability to refer to one attestation from another

There are likely use cases where you want to refer to one attestation from another. For example, you could have a "Policy Decision" predicate that says "I allow subject x to run in environment Y, based on input attestation Z."

Since this is just theoretical at the moment, we'll wait until we have a few concrete use cases to actually design and implement this. Please add any use cases you might have to this issue.

The straightforward solution is to treat the entire attestation as an artifact, and thus you refer to the attestation as a hash over the envelope. The downside to this approach is that it prevents one from re-encoding the envelope, such as if you add a signature to it, because doing so changes the hash. (Counter-point: don't do that.) Perhaps this is the best option.

Alternative ideas:

  • Refer to the hash of the statement, rather than the envelope. Downside: that loses who signed it, which is critical information.
  • Refer to the hash of some canonicalization of the envelope. Downside: that relies on canonicalization (frowned upon, see ITE-5), needs to be invented, and adds complexity.
  • Each attestation has a UUID. Downside: adds complexity and could be error prone. For example, what if someone else creates one with the same UUID, either accidentally or on purpose?
  • Store attestations in some ledger, then you can refer to the location within the ledger (transaction ID or leaf hash). Downside: requires a ledger, and is seemingly not any better than the straightforward solution.

Where to put implementations?

Does anyone have an opinion on where implementations of the in-toto Attestation should go?

We need to create a Java implementation that handles the 'Provenance' predicate (soon to move to the SLSA repo), in an in-toto Statement (this repo), wrapped in a DSSE envelope.

The existing in-toto-java repo seems to handle the existing link format (but not DSSE, etc...).

FYI @Alos who is looking into implementation options.

Create an index of predicates

Related to #54

With the provenance predicate moved to the SLSA repository, there was talk of creating an index of predicates that we can link to here. While we have few predicates now, it's likely worth discussing now how this index should work, and the formal processes surrounding the adding predicates to the index and keeping them updated long term. What sort of review mechanisms do you envision for predicates to be indexed in the in-toto/attestation repository?

cc @MarkLodato @TomHennen @SantiagoTorres @joshuagl anyone I missed?

Consider adding a "digest kind" and/or "content type" to subject

Currently, the attestation subject is a pure content digest. There are two related pieces of information we may want to consider adding:

  1. "digest kind": how to serialize the artifact into the hashing algorithm. Examples:
    • PE file: straight hash or the hash used for Authenticode?
    • JAR: straight hash or the hash used for JAR signing?
    • Git: commit ID or tag ID (if an annotated tag)?
  2. "content type": how the content is intended to be interpreted. Examples:
    • Docker image
    • Git commit
    • JPEG file
    • ZIP file

The main question is: Are there any compelling security or implementation reasons why we need this? If not, we leave it out to simplify things. If there are, then we need to come up with a design that overcomes the challenges below.

Background

Originally we had included something like this into the property name of DigestSet, e.g. "subject": {"gitCommit": "<sha1>"}. If you used "sha256", it meant straight file hash (no particular content type), but specific content types would use a digest kind appropriate for that type. Ultimately we decided against that because it added complexity without an obvious benefit.

Prior art: Rekor has a type registry that appears to be the same as "digest kind".

Potential benefits

  1. "Digest Kind" could tell verifiers how to compute the hash.

    • Counterpoint: They can just compute it in some canonical way or in multiple ways, without having to record it.
    • Counterpoint: In many applications, the hashing is done prior to reading the attestation, so having this information wouldn't help.
  2. There could be security vulnerabilities if a producer intended one kind/type and a consumer interpreted it as another. For example, the producer signed a PE Authenticode hash, but then the consumer interpreted it as a raw file hash.

    • Counterpoint: This is too abstract to warrant the cost. We need some sort of proof-of-concept to show that this would be exploitable in a somewhat realistic scenario.

Challenges

  1. Do you include the kind/type in the matching? For example, if one Provenance attestation lists a material sha256: X, type: Y and another attestation has subject: X, type: Z, does it match?

  2. What if the producer and consumer don't agree on the content type, or don't know the content type? For example, if a build system just produces files, it might not know that it is a docker image vs a zip file vs something else. Adding that configuration would add a lot of friction and room for mistakes. And if producer and consumer disagree, is that always a security issue? What if one thinks it's application/json and the other thinks it is application/vnd.oci.image.manifest.v1+json?

  3. How do you register these various digest kinds? If someone has a new one, how do they use it? What if that new type is private, e.g. a company-internal format?

    • If we use content type (which uses MediaType), do we have some mapping from media type to digest kind? What if that's wrong or incomplete?

Provide guidance on level of granularity for `recipe`

We need to provide guidance on how granular a recipe should be because the question has already come up. Most builds end up executing a series of steps. We want to discourage listing every single step that was run because it makes writing policies too difficult (duplication between policy and workflow definition, and the need to chain ephemeral artifacts between steps). At the same time, if the recipe is too coarse, it can lose valuable security information, such as a "prod" vs "test". This is a bit of a judgement call.

Proposed guidance: a recipe SHOULD be the smallest unit of work that a policy would reasonably want to identify. In a CI/CD scenario, each execution SHOULD be independent: the execution of one recipe SHOULD NOT affect the starting state of the next execution.

Example: A GitHub Actions Workflow has three levels: workflow → job → step. A yaml file defines a workflow, which may contain multiple jobs, each of which may contain multiple steps. Jobs are independent, meaning that each is run within a fresh VM. Steps are dependent, meaning that each step uses the state of the previous step. Therefore, the right level of granularity is the "job" and the recipe.entryPoint is ":".

Add support for Cyclonedx as a predicate type

intoto attestations currently document the SPDX predicate type. SBOMs and BOMs in general are a diverse space as of now and CycloneDX is the other leading industry alternative to SPDX for SBOMs and recognized by NTIA as a recognized SBOM format.

CycloneDX supports other capabilities apart from just SBOMs. A particularly interesting one is the VEX capability which introduce a standard format to attach vulnerability information.

intoto should document and introduce well defined predicate types for the various CycloneDX BOM formats (not just SBOM).

cc: @stevespringett @coderpatros

Add ability to differentiate between different types of `materials`

Currently all materials are lumped together (except recipe.definedInMaterial). It would be useful to be able to differentiate between different types of materials. We can probably look to SPDX Relationships for prior art and inspiration.

Example classifications:

  • "source" = application-specific code
  • "library" / "dependency" = code that was compiled in but part of another project
  • "build tool" / "dev dependency" = thing that was used as part of the build invocation but did not get "compiled in"
  • "base image" = starting point for the build invocation
  • "build orchestrator" = thing that ran on the build orchestrator (not sure if this is even in scope; see #25)

Example use cases:

  • Supply chain integrity / SLSA: better prioritize the "most important" materials.
  • Licensing: identify how the license of the material transfers to the product.
  • Vulnerably tracing: better identify which materials are likely to have affected the product.

Currently this can be done ad-hoc using extension fields, but it is probably valuable to standardize this. The main challenge is coming up with something that works well for most cases and can be done in practice by generic builders like GitHub Actions and Google Cloud Build.

How do attestations change what verifiers are expected to support?

An issue raised in the community meeting today is that it's not yet entirely clear how ITE-6 aka attestations change what verifiers are expected to support by default.

For example, Santiago suggested that verifiers should support classic link "attestations" by default, and the community should decide how to add support for the rest.

Explain why we recommend PURL and SPDX Download Location

Right now the docs simply recommend PURL and SPDX Download Location without explaining why. We have had a question for the rationale behind this suggestion.

Draft:

  • SPDX Download Location covers version control systems. There is no standard way to reference a git repo via a URI (and identify that it is git). SPDX Download Location is one of the most popular way of doing so, so we chose that.
  • PURL covers most packaging ecosystems. If one is missing, you can add it. It is really the only universal option. I haven't found any other scheme that does this.
  • Both SPDX and PURL are used extensively within the SBOM ecosystem, which fits nicely with attestations and SLSA.
  • Regular https can be used when it's literally just fetching a file with a GET request.

You can use a different URI scheme if needed. For example, within Google we'll use some internal URIs for systems that are not public. But on the internet, I suspect PURL and SPDX will cover most cases.

Move Provenance to SLSA repo

The current Provenance predicate is described as a generic way to express provenance, but it was designed expressly for SLSA. It makes certain assumptions and trade-offs, such as carefully designing the fields to avoid mistakes when applying a SLSA policy. Other use cases of "provenance" may make different trade-offs, such as including the list of build steps that were performed to allow policies to detect curl | bash, which for SLSA is unnecessary and may lead to confusion.

To avoid these issues, it might be best to move all of the predicates out of this repo and instead maintain an index of links.

  • Provenance -> SLSA
  • Link -> in-toto (a different repo)
  • SPDX -> something maintained by SPDX team

That would make it more clear that (a) other definitions of "provenance" are OK for different use cases, and (b) not all predicates need to be defined in this repo.

Any thoughts? cc @adityasaky @TomHennen @dlorenc

Provenance: add ability to add extensions?

We almost definitely will not get the Provenance schema right on the first try. When there is some new information that producers want to add to the provenance, it would be nice if they had a way to do so without having to either fork the spec or wait for a new version that adds that feature.

A few ideas:

  • Not allowed.
  • Allow additional fields to be added to any object, with any name.
    • Example: {"materials": [{..., "someNewField": "yay!"}]}
    • Pro: Simple.
    • Con: Possibility of name clash, where two producers use the same name but with different meaning.
    • Con: Precludes the "minor version" idea from #4, since a field of that name may be added in a future version.
  • Allow additional fields to be added to any object, with constraints:
    • Field name MUST be a URI (to address the two cons from above).
    • The meaning of all other fields MUST be unchanged if that field is ignored. For example, if you add a {"recipe": {..., "https://example.com/foo": true}}, then it should be perfectly safe for a consumer to ignore that new field.
  • Add an extensions field to each object, with basically the same constraints as above.

I'm leaning towards the second approach (URI fields) but would be happy to hear opinions.

Timestamp somewhere?

We've been discussing support for vulnerability scans as a type of "attestation" over here: sigstore/cosign#442

and it's clear that these will need some form of timestamp to work correctly. A vulnerability scan is timely, and should only be considered valid for a specific period of time after it is generated. This also helps align with the principle of "monotonicity", where the absence of an attestation should never move a decision from DISALLOW to ALLOW.

This could be done with a timestamp inside a custom scan predicate, but it might also be useful to place this at the statement layer. I'm not convinced either way yet.

cc @joshuagl @SantiagoTorres

Add complete examples

Add complete examples that show all the steps, including the crypto (with dummy public keys checked in).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.