Giter VIP home page Giter VIP logo

guacsec / guac Goto Github PK

View Code? Open in Web Editor NEW
1.2K 43.0 150.0 11.47 MB

GUAC aggregates software security metadata into a high fidelity graph database.

Home Page: https://guac.sh

License: Apache License 2.0

Go 99.57% Nix 0.01% Makefile 0.28% Shell 0.13% Starlark 0.01%
security software-supply-chain software-supply-chain-security supply-chain supply-chain-security supply-chain-visibility supply-chain-analytics

guac's Introduction

GUAC: Graph for Understanding Artifact Composition

build PkgGoDev Go Report Card OpenSSF Scorecard

Note: GUAC is under active development - if you are interested in contributing, please look at contributor guide.

Graph for Understanding Artifact Composition (GUAC) aggregates software security metadata into a high fidelity graph database—normalizing entity identities and mapping standard relationships between them. Querying this graph can drive higher-level organizational outcomes such as audit, policy, risk management, and even developer assistance.

Conceptually, GUAC occupies the “aggregation and synthesis” layer of the software supply chain transparency logical model:

image

A few examples of questions answered by GUAC include:

image

Quickstart

Our documentation is a good place to get started.

We have various demos use cases that you can take a look.

Starting the GUAC services with our docker compose quickstart.

Docs

All documentation for GUAC lives on docs.guac.sh, backed by the following docs github repository.

Architecture

Here is an overview of the architecture of GUAC:

guac_api

For an in-depth view and explanation of components of the GUAC Beta, please refer to how GUAC works.

Supported input documents

Note that GUAC uses software identifiers standards to help link metadata together. However, these identifiers are not always available and heuristics need to be used to link them. Therefore, there may be unhandled edge cases and errors occurring when ingesting data. We appreciate it if you could create a data quality issue if you encounter any errors or bugs with ingestion.

GraphQL backends

GUAC supports multiple backends behind a software abstraction layer. The GraphQL API is always the same and clients should be unaffected by which backend is in use. The backends are categorized into:

  1. Supported/Unsupported: Supported backends are those which the GUAC project is committed to actively maintain. Unsupported backends are not actively maintained but will accept community contributions.

  2. Complete/Incomplete: Complete backends support all mandatory GraphQL APIs. Incomplete backends support a subset of those APIs and may not be feature complete.

  3. Optimized: The backend has gone through a level of optimization to help improve performance.

The two backend that are Supported, Complete, and Optimized are:

The other backends are:

Additional References

Communication

For more information on how to get involved in the community, mailing lists and meetings, please refer to our community page

For security issues or code of conduct concerns, an e-mail should be sent to [email protected].

Governance

Information about governance can be found here.

guac's People

Contributors

arorasoham9 avatar cpendery avatar dejanb avatar dependabot[bot] avatar desmax74 avatar haosanzi avatar huggingpixels avatar jeffmendoza avatar kanchan-dhamane avatar knrc avatar lukehinds avatar lumjjb avatar m-brophy avatar mdeicas avatar mihaimaruseac avatar mlieberman85 avatar mrizzi avatar nadgowdas avatar nathannaveen avatar naveensrinivasan avatar nchelluri avatar neilnaveen avatar pxp928 avatar rgreinho avatar ridhoq avatar rmetzman avatar s-spindler avatar stevemenezes avatar sunnyyip avatar trmiller avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

guac's Issues

IdentityFor edge should be generic

The identity for edge should apply to almost any type of document/node, and thus should be able to be defined on any GuacNode. This should be done as well as any other clean up required around identity for graphBuilder

Write document guesser test to make sure that other guessers are not accidentally misguessing another document type

Certain document type importers may not have sufficient heuristics to determine if a document is indeed the type guessed. For example, if the fields in the JSON are optional for that document type then it may mistake any JSON document as its document type. (This happens in certain cases in SPDX thus the requirement to check for existence of field).

We should write a unit test to make sure that no other document guesser misguesses a document.

SLSA parser crashes on multiple subjects OR multiple hashes

SLSA parser crashes with multiple subjects or multiple hashes

Multiple digests error:
SLSA multiple digests example:

  "subject": [
        {
      "name": "gs://kubernetes-release/release/v1.25.2/bin/linux/arm64/kube-apiserver",
      "digest": {
        "sha256": "5522c9bcd76863fa24a658d9faeb6fa2ca999d022806e301e922efca747043f6",
        "sha512": "aa989e60525ac208bc1a7469b486eecb02bf4e7ceb3530c97bae5e0cbc8d4361ce040a8899fa7d9eb56f573fdfc605325e4fcaf956f5efa930cf1a52cb5ebb10"
      }
    }
      ],

Error:

panic: runtime error: index out of range [342] with length 342

goroutine 1 [running]:
github.com/guacsec/guac/pkg/assembler.StoreGraph({{0xc00017c800, 0x156, 0x180}, {0xc000372000, 0x3f9, 0x400}}, {0x1d75098?, 0xc0000c6f20?})
	/Users/lumb/go/src/github.com/guacsec/guac/pkg/assembler/graphdb.go:62 +0xa1d
github.com/guacsec/guac/cmd/guacone/cmd.getAssembler.func1({0xc0005a7a70?, 0x1?, 0xc00019e540?})
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:178 +0xc5
github.com/guacsec/guac/cmd/guacone/cmd.glob..func1.1(0xc00019f020)
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:109 +0x13b
github.com/guacsec/guac/pkg/handler/collector.Collect({0x1d73be8?, 0xc0008001e0}, 0xc00035dcc8, 0xc00035dc58)
	/Users/lumb/go/src/github.com/guacsec/guac/pkg/handler/collector/collector.go:84 +0x2f0
github.com/guacsec/guac/cmd/guacone/cmd.glob..func1(0x2488a80?, {0xc000800180, 0x1, 0x3})
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:125 +0x56a
github.com/spf13/cobra.(*Command).execute(0x2488a80, {0xc000800120, 0x3, 0x3})
	/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0x2488d00)
	/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:918
github.com/guacsec/guac/cmd/guacone/cmd.Execute()
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/root.go:35 +0x25
main.main()
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/main.go:23 +0x17

Multiple Subjects error:
SLSA subject section example:

  "subject": [
    {
      "name": "gs://kubernetes-release/release/v1.25.2/bin/windows/amd64/kubectl-convert.exe",
      "digest": {
        "sha512": "aa989e60525ac208bc1a7469b486eecb02bf4e7ceb3530c97bae5e0cbc8d4361ce040a8899fa7d9eb56f573fdfc605325e4fcaf956f5efa930cf1a52cb5ebb10"
      }
    },
        {
      "name": "gs://kubernetes-release/release/v1.25.2/bin/linux/arm64/kube-apiserver",
      "digest": {
        "sha256": "5522c9bcd76863fa24a658d9faeb6fa2ca999d022806e301e922efca747043f6"
      }
    }
      ],
panic: runtime error: index out of range [5] with length 5

goroutine 1 [running]:
github.com/guacsec/guac/pkg/assembler.StoreGraph({{0xc00003c0f0, 0x5, 0x5}, {0xc0001049c0, 0x6, 0x6}}, {0x1d75098?, 0xc000486f20?})
	/Users/lumb/go/src/github.com/guacsec/guac/pkg/assembler/graphdb.go:62 +0xa1d
github.com/guacsec/guac/cmd/guacone/cmd.getAssembler.func1({0xc00030ce70?, 0x1?, 0xc0001046c0?})
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:178 +0xc5
github.com/guacsec/guac/cmd/guacone/cmd.glob..func1.1(0xc000104780)
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:109 +0x13b
github.com/guacsec/guac/pkg/handler/collector.Collect({0x1d73be8?, 0xc00010fb30}, 0xc00063fcc8, 0xc00063fc58)
	/Users/lumb/go/src/github.com/guacsec/guac/pkg/handler/collector/collector.go:84 +0x2f0
github.com/guacsec/guac/cmd/guacone/cmd.glob..func1(0x2488a80?, {0xc00010fad0, 0x1, 0x3})
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/files.go:125 +0x56a
github.com/spf13/cobra.(*Command).execute(0x2488a80, {0xc00010fa70, 0x3, 0x3})
	/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0x2488d00)
	/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/lumb/go/pkg/mod/github.com/spf13/[email protected]/command.go:918
github.com/guacsec/guac/cmd/guacone/cmd.Execute()
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/cmd/root.go:35 +0x25
main.main()
	/Users/lumb/go/src/github.com/guacsec/guac/cmd/guacone/main.go:23 +0x17

Design and implement full integration with pub/sub for GUAC flow for collector/processor/assembler

Collectors that obtain documents need somewhere to emit them to. The processor, which is the next part of the pipeline needs to gather the documents and process them..

There are a couple options naturally:

  1. Processor runs as a gRPC server
  2. Processor obtains documents from a Pub/Sub queue (e.g. kafka, nats.io, etc.)
  3. Processor ingests from STDIN or file
  4. Processor and Collector are part of the same process.

This boils down to we collectors and processors want to be run in the architecture. The ingestor will most likely be tied to the assembler.

Deliberation:

  • Will all the collectors be run in a single executable? I.e. the processor will cache duplicate documents so it is beneficial to have an n:m relationship (where n>m) between collectors and executables. If the answer is no, this excludes option 3 and 4.
    • I think it is likely that this answer is no, given the access of collectors to need credentials and not a single account/team would have all credentials
  • Options 1 and 2 are similar, with a trade-off between simplicity and scale.

task: [assembler] create graphDB package for neo4j

Create a graph DB package that will be to create an instance of the neo4j/cypher driver to talk to the graph. No need for plug-ability of graph DB for now, since we currently do not foresee supporting additional graph DBs in the near future.

SPDX Heuristic for Syft SPDX SBOMs

Right now, syft isnt putting the top level package as SPDX objects

I think for now we can add a PURL OCI reference type by heuristics based on the name in the document. But ill open an issue in Syft to include this as well (anchore/syft#1241).

The checksum is not currently stored, but would be good to also include "name" as the package ref

{
 "SPDXID": "SPDXRef-DOCUMENT",
 "name": "gcr.io/google-containers/kube-addon-manager-v8.9",
 "spdxVersion": "SPDX-2.2",
 "creationInfo": {
  "created": "2022-10-03T14:41:17.720701835Z",
  "creators": [
   "Organization: Anchore, Inc",
   "Tool: syft-0.58.0"
  ],

SLSA parser digest format contains stray quotes

Current SLSA parser digest string has additional quotes

{
  "identity": 568482,
  "labels": [
    "Artifact"
  ],
  "properties": {
"name": "git+https://github.com/kubernetes/kubernetes",
"digest": "sha1:'3c7da84d8fc03c30d3409e9c846ae4bc2de0b4d5'"
  }
}

chore: reconcile CI with local makefile

Right now, there is a bit of difference between the CI and makefile

  • some tests require neo4j to run (perhaps tag them to not run locally)
  • Linting rules in makefile and CI are different

task: [processor] create DocumentUnknown pre-processor

Create a DocumentUnknown pre-processor that takes in a document blob and guess the format and document type between each iteration of the processor.

i.e. given a Document with a blob, tell me what the type and the format is

#27 added the initial foundations
TODO

  • add call from processor

Logging library not being developed anymore

For logging, the logrus library is being used. However is it not actively developed anymore:

Logrus is in maintenance-mode. We will not be introducing new features. It's simply too hard to do in a way that won't break many people's projects, which is the last thing you want from your Logging library (again...).

They recommend using other libraries:

Check out, for example, Zerolog, Zap, and Apex.

Add performance warning in README

Note performance warning in README that the current proof of concept does not include optimizations to neo4j and may see some degradation of performance. Create a separate PERFORMANCE.md file to provide some ideas to increase performance in the time being.

Interested in Dev/Contributing to GUAC?

Welcome! This thread is on expressing interest in contributing to GUAC! We are glad to welcome our fellow open source contributors! As the project is starting up, we will be creating issues that folks can pick up and work on. In the meantime, as the code base is forming up, we'd like to engage directly with our contributors!

BTW we now have a slack channel: https://openssf.slack.com/archives/C03U677QD46

If you are interested in contributing, it would be very helpful to provide the following details (copy and paste into your comment):

1. I am interested in contributing to:
- [ ] Development
- [ ] Documentation
- [ ] Issue triage and community
- [ ] Technical advisory (review [governance document](https://github.com/artifact-ff/artifact-ff/blob/main/GOVERNANCE.md#technical-advisory-members))

2. I am here because:
- [ ] Personal interest
- [ ] My company/orgs i work with are interested in this

3. What is your associated company/org if you're contributing in their capacity? _________

4. Depending on how things go, I may be interested in becoming a maintainer of the project
- [ ] Yes

5. (optional) I have expertise in:
- [ ] Neo4j
- [ ] Cypher
- [ ] GraphQL
- [ ] Intoto
- [ ] SPDX
- [ ] CycloneDX
- [ ] Others (fill in):

fyi: ingestor tree parsing

We agreed that in the long term, the ingestor would need to have a way to communicate information up/down the tree in order to make edges and annotations between the elements of each node in the document tree.

However, to get started with an e2e poc, we decided to defer the implementation of the recursive processing model.

Relevant Conversation:
#39 (comment)
#39 (comment)
#39 (comment)

Refactor for SLSA parser tests

EDIT: upon reading the tests more, it seems like it just moves a lot of the test case checks outside the test definition

The SLSA parser tests should specify expected edges and nodes within the test itself rather than having it just be purely part of the body (explicitly being linked to a test case)

task: map keys to identities and trust

Identities should be considered separate from any given key material, as its potentially a many to many situation. One identity might have multiple keys and one key might be potentially associated with multiple identities.

Note: Identity in this context is still abstract. It should not be tied back to a specific person especially anonymous/pseudonymous folks. The primary goal is to associate keys identities associated with a project, most likely organizations or known maintainers.

Docs:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.