guacsec / guac Goto Github PK
View Code? Open in Web Editor NEWGUAC aggregates software security metadata into a high fidelity graph database.
Home Page: https://guac.sh
License: Apache License 2.0
GUAC aggregates software security metadata into a high fidelity graph database.
Home Page: https://guac.sh
License: Apache License 2.0
Bots can help with various tasks, and it would be useful to setup a few for this project/organization:
I think we should consider to add at least these 3 bots.
Welcome! This thread is on expressing interest in contributing to GUAC! We are glad to welcome our fellow open source contributors! As the project is starting up, we will be creating issues that folks can pick up and work on. In the meantime, as the code base is forming up, we'd like to engage directly with our contributors!
BTW we now have a slack channel: https://openssf.slack.com/archives/C03U677QD46
If you are interested in contributing, it would be very helpful to provide the following details (copy and paste into your comment):
1. I am interested in contributing to:
- [ ] Development
- [ ] Documentation
- [ ] Issue triage and community
- [ ] Technical advisory (review [governance document](https://github.com/artifact-ff/artifact-ff/blob/main/GOVERNANCE.md#technical-advisory-members))
2. I am here because:
- [ ] Personal interest
- [ ] My company/orgs i work with are interested in this
3. What is your associated company/org if you're contributing in their capacity? _________
4. Depending on how things go, I may be interested in becoming a maintainer of the project
- [ ] Yes
5. (optional) I have expertise in:
- [ ] Neo4j
- [ ] Cypher
- [ ] GraphQL
- [ ] Intoto
- [ ] SPDX
- [ ] CycloneDX
- [ ] Others (fill in):
Create a makefile with make check to check golang compilation, testing and linting
Write a collector to ingest deps.dev bigquery data https://deps.dev/data
Current SLSA parser digest string has additional quotes
{
"identity": 568482,
"labels": [
"Artifact"
],
"properties": {
"name": "git+https://github.com/kubernetes/kubernetes",
"digest": "sha1:'3c7da84d8fc03c30d3409e9c846ae4bc2de0b4d5'"
}
}
As part of #26, we want to be able to create guessers to identify what format type a document is (using the guesser interface defined in https://github.com/guacsec/guac/tree/main/pkg/handler/processor/guesser). Based on the foundations laid in #27 .
This issue is to create ITE6 document guesser.
As we get more and more contributors coming in, we want to make sure that there are some contribution guides to help, and make sure that the modes of communication are available (i.e. ensure mailing list works)
Collectors that obtain documents need somewhere to emit them to. The processor, which is the next part of the pipeline needs to gather the documents and process them..
There are a couple options naturally:
This boils down to we collectors and processors want to be run in the architecture. The ingestor will most likely be tied to the assembler.
Deliberation:
We agreed that in the long term, the ingestor would need to have a way to communicate information up/down the tree in order to make edges and annotations between the elements of each node in the document tree.
However, to get started with an e2e poc, we decided to defer the implementation of the recursive processing model.
Relevant Conversation:
#39 (comment)
#39 (comment)
#39 (comment)
Update the Verifier interface that adds in the Key Wrapper and register new providers.
Note performance warning in README that the current proof of concept does not include optimizations to neo4j and may see some degradation of performance. Create a separate PERFORMANCE.md file to provide some ideas to increase performance in the time being.
Define an initial set of GuacNodes and GuacEdges to satisfy the basic example of SLSA attestations based on @mihaimaruseac 's script and @mlieberman85 's data.
Added a sigstore verifier to validate the signatures based on the public keys
Support the ingestion of SPDX documents.
Implement:
The identity for edge should apply to almost any type of document/node, and thus should be able to be defined on any GuacNode. This should be done as well as any other clean up required around identity for graphBuilder
As part of #26, we want to be able to create guessers to identify what format type a document is (using the guesser interface defined in https://github.com/guacsec/guac/tree/main/pkg/handler/processor/guesser). Based on the foundations laid in #27 .
This issue is to create DSSE document guesser.
Some entities may have multiple identifiers. Let's figure out what's the best way to handle them, especially for merging nodes and insertion of new edges/relating new information. Another tricky question also revolves around empty identifier fields and possible lists of identifiers vs having multiple nodes.
Currently parser tests do not test for GetIdentities
(in pkg/ingestor/parser)
EDIT: upon reading the tests more, it seems like it just moves a lot of the test case checks outside the test definition
The SLSA parser tests should specify expected edges and nodes within the test itself rather than having it just be purely part of the body (explicitly being linked to a test case)
Provide a guacone subcommand (or include as part of the files) code to create the indices to help performance.
Move the channel logic into Collect
and hide all this channel stuff? (reference discussion: https://github.com/guacsec/guac/pull/23/files#r953929317)
Make it Collect(ctx context.Context, emitter processor.Emitter, handleErr collector.ErrHandler)
type ErrHandler func(error) bool
type Emitter func(*processor.Document) error
Scorecards information is useful to help reason about source repositories, it would be great to integrate into GUAC data flow.
Create a collector that will read from a file path to import a bunch of documents.
https://github.com/guacsec/guac/blob/main/pkg/handler/collector/collector.go
For a simple test reference; see https://github.com/guacsec/guac/blob/main/cmd/collector/cmd/mockcollector/mock_collector.go
Create an end to end command line tool to take in a folder of documents and populate a graph for debugging and to show end to end flow.
Right now, there is a bit of difference between the CI and makefile
Identities should be considered separate from any given key material, as its potentially a many to many situation. One identity might have multiple keys and one key might be potentially associated with multiple identities.
Note: Identity in this context is still abstract. It should not be tied back to a specific person especially anonymous/pseudonymous folks. The primary goal is to associate keys identities associated with a project, most likely organizations or known maintainers.
Docs:
Create parser interface and make it a plugin model similar to the collector and processor.
Certain document type importers may not have sufficient heuristics to determine if a document is indeed the type guessed. For example, if the fields in the JSON are optional for that document type then it may mistake any JSON document as its document type. (This happens in certain cases in SPDX thus the requirement to check for existence of field).
We should write a unit test to make sure that no other document guesser misguesses a document.
We need to come up with a design on how queries will happen and examples of the interfaces that would be required for it. Example queries are that as in the usecase section of the GUAC design doc.
Create a graph DB package that will be to create an instance of the neo4j/cypher driver to talk to the graph. No need for plug-ability of graph DB for now, since we currently do not foresee supporting additional graph DBs in the near future.
Create a DocumentUnknown pre-processor that takes in a document blob and guess the format and document type between each iteration of the processor.
i.e. given a Document with a blob, tell me what the type and the format is
#27 added the initial foundations
TODO
Create collector interface to be able to emit lists of documents to be processed by the docprocessor. (#16)
Right now, syft isnt putting the top level package as SPDX objects
I think for now we can add a PURL OCI reference type by heuristics based on the name in the document. But ill open an issue in Syft to include this as well (anchore/syft#1241).
The checksum is not currently stored, but would be good to also include "name" as the package ref
{
"SPDXID": "SPDXRef-DOCUMENT",
"name": "gcr.io/google-containers/kube-addon-manager-v8.9",
"spdxVersion": "SPDX-2.2",
"creationInfo": {
"created": "2022-10-03T14:41:17.720701835Z",
"creators": [
"Organization: Anchore, Inc",
"Tool: syft-0.58.0"
],
inmemory key provider is an implementation of the key interface. This will store keys in-memeory
Create SLSA document ingestor off @mihaimaruseac 's script to take in SLSA documents as a list of Documents
and outputs GuacNodes
and GuacEdges
, depends on #11
For logging, the logrus library is being used. However is it not actively developed anymore:
Logrus is in maintenance-mode. We will not be introducing new features. It's simply too hard to do in a way that won't break many people's projects, which is the last thing you want from your Logging library (again...).
They recommend using other libraries:
SLSA level 3 attestations should contain source information, this information can be included within the graph which will help link to other data sources (e.g. scorecards)
SLSA parser crashes with multiple subjects or multiple hashes
Multiple digests error:
SLSA multiple digests example:
"subject": [
{
"name": "gs://kubernetes-release/release/v1.25.2/bin/linux/arm64/kube-apiserver",
"digest": {
"sha256": "5522c9bcd76863fa24a658d9faeb6fa2ca999d022806e301e922efca747043f6",
"sha512": "aa989e60525ac208bc1a7469b486eecb02bf4e7ceb3530c97bae5e0cbc8d4361ce040a8899fa7d9eb56f573fdfc605325e4fcaf956f5efa930cf1a52cb5ebb10"
}
}
],
Error:
panic: runtime error: index out of range [342] with length 342
goroutine 1 [running]:
github.com/guacsec/guac/pkg/assembler.StoreGraph({{0xc00017c800, 0x156, 0x180}, {0xc000372000, 0x3f9, 0x400}}, {0x1d75098?, 0xc0000c6f20?})
/Users/lumb/go/src/github.com/guacsec/guac/pkg/assembler/graphdb.go:62 +0xa1d
github.com/guacsec/guac/cmd/guacone/cmd.getAssembler.func1({0xc0005a7a70?, 0x1?, 0xc00019e540?})
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:178 +0xc5
github.com/guacsec/guac/cmd/guacone/cmd.glob..func1.1(0xc00019f020)
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:109 +0x13b
github.com/guacsec/guac/pkg/handler/collector.Collect({0x1d73be8?, 0xc0008001e0}, 0xc00035dcc8, 0xc00035dc58)
/Users/lumb/go/src/github.com/guacsec/guac/pkg/handler/collector/collector.go:84 +0x2f0
github.com/guacsec/guac/cmd/guacone/cmd.glob..func1(0x2488a80?, {0xc000800180, 0x1, 0x3})
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:125 +0x56a
github.com/spf13/cobra.(*Command).execute(0x2488a80, {0xc000800120, 0x3, 0x3})
/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0x2488d00)
/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:918
github.com/guacsec/guac/cmd/guacone/cmd.Execute()
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/root.go:35 +0x25
main.main()
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/main.go:23 +0x17
Multiple Subjects error:
SLSA subject section example:
"subject": [
{
"name": "gs://kubernetes-release/release/v1.25.2/bin/windows/amd64/kubectl-convert.exe",
"digest": {
"sha512": "aa989e60525ac208bc1a7469b486eecb02bf4e7ceb3530c97bae5e0cbc8d4361ce040a8899fa7d9eb56f573fdfc605325e4fcaf956f5efa930cf1a52cb5ebb10"
}
},
{
"name": "gs://kubernetes-release/release/v1.25.2/bin/linux/arm64/kube-apiserver",
"digest": {
"sha256": "5522c9bcd76863fa24a658d9faeb6fa2ca999d022806e301e922efca747043f6"
}
}
],
panic: runtime error: index out of range [5] with length 5
goroutine 1 [running]:
github.com/guacsec/guac/pkg/assembler.StoreGraph({{0xc00003c0f0, 0x5, 0x5}, {0xc0001049c0, 0x6, 0x6}}, {0x1d75098?, 0xc000486f20?})
/Users/lumb/go/src/github.com/guacsec/guac/pkg/assembler/graphdb.go:62 +0xa1d
github.com/guacsec/guac/cmd/guacone/cmd.getAssembler.func1({0xc00030ce70?, 0x1?, 0xc0001046c0?})
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:178 +0xc5
github.com/guacsec/guac/cmd/guacone/cmd.glob..func1.1(0xc000104780)
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:109 +0x13b
github.com/guacsec/guac/pkg/handler/collector.Collect({0x1d73be8?, 0xc00010fb30}, 0xc00063fcc8, 0xc00063fc58)
/Users/lumb/go/src/github.com/guacsec/guac/pkg/handler/collector/collector.go:84 +0x2f0
github.com/guacsec/guac/cmd/guacone/cmd.glob..func1(0x2488a80?, {0xc00010fad0, 0x1, 0x3})
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:125 +0x56a
github.com/spf13/cobra.(*Command).execute(0x2488a80, {0xc00010fa70, 0x3, 0x3})
/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0x2488d00)
/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:918
github.com/guacsec/guac/cmd/guacone/cmd.Execute()
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/root.go:35 +0x25
main.main()
/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/main.go:23 +0x17
Adding tracing monitor (such as jaeger) to allow for us to collect metrics for troubleshooting and tracking the time taken by each action.
In certain cases, there may be slight variation in identifiers, for example, digest being "sha256:abc..." vs "SHA256:abc...", this should be handled so that common identified nodes are not duplicated.
this can probably be done easily with the Properties
function (https://github.com/guacsec/guac/blob/main/pkg/assembler/nodes.go#L56)
Create an Key interface that will be used to implement various key providers
As part of #26, we want to be able to create guessers to identify what format type a document is (using the guesser interface defined in https://github.com/guacsec/guac/tree/main/pkg/handler/processor/guesser). Based on the foundations laid in #27 .
This issue is to create SLSA document guesser.
Add support to ingest CycloneDX documents
After DSSE processor has been completed the ITE6 processor will run next and determine if the predicate type is SLSA and unpack.
Implement a Rekor collector.
Pointers:
Create a new processor to unpack JSON lines, including creating a new format called FormatJSONLines
and document type DocumentJSONLines
JSON lines: https://jsonlines.org/
Simplify processor interface to remove trust info and validation and output a document tree instead.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.