safedep / vet Goto Github PK
View Code? Open in Web Editor NEWTool to achieve policy driven vetting of open source dependencies
License: Apache License 2.0
Tool to achieve policy driven vetting of open source dependencies
License: Apache License 2.0
As a security engineer, I want to adopt vet in CI/CD as a security gate to prevent introduction of new packages that violate my policy so that I can prevent increasing security & technical debt while I work on mitigating the existing problem
This basically means we need to provide a way to add the existing findings at the time of adoption into an exception list. The filters can ignore packages in exception list to prevent vet failing in CI for existing packages.
Use vet query command to generate exception list
vet query --from /path/to/json-dump --exception-add --exception-file /path/to/exceptions.yml
Subsequently, when filter query is executed, the packages in exception list should be ignored. vet
should also load a default exceptions file if available from:
$PWD/.vet/exceptions.yml
$scanDirectory/.vet/exceptions.yml
The generated exceptions.yml
should be pushed to repository in a standard path for autoload such as .vet/exceptions.yml
or explicitly passed as param during scan
vet scan -D /path/to/repo --exception-file /path/to/exception.yml --filter '...' --filter-fail
Package URL is a standard for representing a package using an URL notation. Example:
pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c
pkg:golang/google.golang.org/genproto#googleapis/api/annotations
pkg:maven/org.apache.xmlgraphics/[email protected]?packaging=sources
pkg:maven/org.apache.xmlgraphics/[email protected]?repository_url=repo.spring.io%2Frelease
pkg:npm/%40angular/[email protected]
pkg:npm/[email protected]
Having the ability to scan package URL is useful for vetting single packages. Example:
vet scan --purl pkg:npm/[email protected] --purl pkg:maven/org.apache.xmlgraphics/[email protected]
We need to support a list of PURL as input and scanning them as a single virtual manifest.
This should be implemented as a reader implemented in #53
Many projects especially legacy ones, still use package.json or yarn.json without lock files. As a result, Vet tool does not detect the dependencies and vulnerabilities too.
Our proto3
spec management is poor and causes developer friction because we are not using any package manager to manage external proto files such as https://buf.build/
Adopt https://buf.build/ for managing proto files in api/
. May be refactor all proto files to api/proto
Insight API will return 401 in case of bad or expired API Key. Handle and print correct error message from backend
whenever id starts with PYSEC-***, the title is empty. otherwise it is not.
023-11-21T10:21:25.896+0530 DEBUG vet/vet2events.go:139 Found vuln with empty title id:"PYSEC-2022-19" aliases:"BIT-2022-22818" aliases:"BIT-django-2022-22818" aliases:"CVE-2022-22818" aliases:"GHSA-95rw-fx8r-36v6" {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.896+0530 DEBUG vet/vet2events.go:139 Found vuln with empty title id:"PYSEC-2022-190" aliases:"BIT-2022-28346" aliases:"BIT-django-2022-28346" aliases:"CVE-2022-28346" aliases:"GHSA-2gwj-7jmv-h26r" {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.896+0530 DEBUG vet/vet2events.go:139 Found vuln with empty title id:"PYSEC-2022-191" aliases:"BIT-2022-28347" aliases:"BIT-django-2022-28347" aliases:"CVE-2022-28347" aliases:"GHSA-w24h-v9qh-8gxj" {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.896+0530 DEBUG vet/vet2events.go:139 Found vuln with empty title id:"PYSEC-2022-2" aliases:"BIT-2021-45116" aliases:"BIT-django-2021-45116" aliases:"CVE-2021-45116" aliases:"GHSA-8c5j-9r9f-c6w8" {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.896+0530 DEBUG vet/vet2events.go:139 Found vuln with empty title id:"PYSEC-2022-20" aliases:"BIT-2022-23833" aliases:"BIT-django-2022-23833" aliases:"CVE-2022-23833" aliases:"GHSA-6cw3-g6wv-c2xv" {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.896+0530 DEBUG vet/vet2events.go:139 Found vuln with empty title id:"PYSEC-2022-213" aliases:"BIT-2022-34265" aliases:"BIT-django-2022-34265" aliases:"CVE-2022-34265" aliases:"GHSA-p64x-8rxx-wf6q" {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.896+0530 DEBUG vet/vet2events.go:139 Found vuln with empty title id:"PYSEC-2022-245" aliases:"BIT-2022-36359" aliases:"BIT-django-2022-36359" aliases:"CVE-2022-36359" aliases:"CVE-2022-45442" aliases:"GHSA-2x8x-jmrp-phxw" aliases:"GHSA-8x94-hmjh-97hq" {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.896+0530 DEBUG vet/vet2events.go:139 Found vuln with empty title id:"PYSEC-2022-3" aliases:"BIT-2021-45452" aliases:"BIT-django-2021-45452" aliases:"CVE-2021-45452" aliases:"GHSA-jrh2-hc4r-7jwx" {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.896+0530 DEBUG vet/vet2events.go:139 Found vuln with empty title id:"PYSEC-2022-304" aliases:"BIT-2022-41323" aliases:"BIT-django-2022-41323" aliases:"CVE-2022-41323" aliases:"GHSA-qrw5-5h28-6cmg" {"service": "sd-github-app", "l": "zap"}
Other example
2023-11-21T10:21:25.897+0530 DEBUG vet/vet2events.go:128 Found vuln id:"GHSA-72xf-g2v4-qvf3" title:"tough-cookie Prototype Pollution vulnerability" aliases:"CVE-2023-26136" severities:{type:CVSSV3 score:"CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:N" risk:MEDIUM} {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.897+0530 DEBUG vet/vet2events.go:128 Found vuln id:"GHSA-wgfq-7857-4jcc" title:"Uncontrolled Resource Consumption in json-bigint" aliases:"CVE-2020-8237" severities:{type:CVSSV3 score:"CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H" risk:HIGH} {"service": "sd-github-app", "l": "zap"} 2023-11-21T10:21:25.897+0530 DEBUG vet/vet2events.go:128 Found vuln id:"GHSA-gwg9-rgvj-4h5j" title:"Code Injection in morgan" aliases:"CVE-2019-5413" severities:{type:CVSSV3 score:"CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H" risk:CRITICAL}{"service": "sd-github-app", "l": "zap"}
Vet should detect dependencies in a comprehensive manner as compared to other open source tools such as cdxgen
Currently we do not have a good way to visualize all information about a package. While showing this in markdown report is possible, it would be a problem if there are a lot of packages.
We need to identify the best way to show meta information, including vulnerability information of packages in a meaningful and useful manner
Currently ignorable directories are hard coded, it be nice if they were either
For eg: It considers node_modules
but does not yet consider python venv. It considers dir for git
but not for svn
Ran it over an npm codebase. and the recommendations are specifically pointing to packages that are 3 or 5 level deep in dependencies.
The current summary table as well as markdown report generator sorts the table by Risk Score
. This is causing confusion since Risk Score
is undefined. Also it is not known why the risk score came up to be high for a few packages.
Associate tags with each package during risk score calculation to help identify what type of risks (vulnerability, popularity, version drift) has caused the risk score.
It may be desirable to use vet
to build a security guardrail to prevent introducing new insecure dependencies while the existing ones (backlog) being worked up. To achieve this, we need the ability to scan only dependency changes across two version of code base.
We can implement a package manifest reader specifically to read diff of dependencies across two versions of code. This can be implemented by:
Github Dependency Graph API can be used to fetch dependency changes across base and head branches in a PR.
vet
is currently dependent on Insights API for package enrichment. This requires an API key.
Based on user feedback, we want to support an option to use vet
without strongly coupling with SafeDep Insights API (backend). This will help ease adoption by removing the must-have need to get an API Key
This is partly possible using OSV and Deps.dev API directly
Explore feasibility of using Deps.dev and OSV API directly and identify tasks required to support this as an optional feature incrementally. We will start with following experience
vet auth configure --community
Subsequent scans will use public data sources directly.
Currently different operations are performed to read package manifests from:
In future, we may need to be able to read from SBOM (SPDX, CycloneDX). To be able to ensure separation of concerns, we should
SPDX SBOM scanning, such as, what is used while scanning Github Dependency Insight API output (SPDX SBOM data) results in incorrect package ecosystem detection. In this case, a Manifest type is considered to be SPDX but Package ecosystem type should be based on detected ecosystem and should not be same as Manifest's ecosystem (SPDX)
Example:
vet scan --github https://github.com/safedep/vet --report-json /tmp/vet.json
cat /tmp/vet.json| jq '.packages[5]'
{
"package": {
"ecosystem": "SpdxSBOM",
"name": "postcss-minify-params",
"version": "5.1.4"
},
"manifests": [
"55b2cc8afd01bd61"
]
}
Consider upgrading the following libraries for maximum impact:
┌──────────────────────────┬───────────┬────────┐
│ PACKAGE │ UPDATE TO │ IMPACT │
├──────────────────────────┼───────────┼────────┤
│ [email protected] │ 39.0.1 │ 13 │
│ vulnerability drift │ │ │
├──────────────────────────┼───────────┼────────┤
│ [email protected] │ 39.0.1 │ 13 │
│ vulnerability drift │ │ │
└──────────────────────────┴───────────┴────────┘
https://github.com/ko-build/ko
The goreleaser-action
is broken due to CGO_ENABLED=1
.
Examples: https://github.com/safedep/vet/actions/workflows/goreleaser.yml
This is because we want to cross-compile for MacOS which needs native compiler tool chain when CGO_ENABLED=1
. We need CGO because we are now using Tree Sitter for static code analysis (parsing).
We need to explore using goreleaser cross compilation tool chain or look at using Github Action's MacOS build environment.
To be able to accurately analyse Open Source dependencies for a project, it is important to built an accurate dependency tree to identify all external (OSS) components that will be included in the final deployable artifact (build output). This list of components need to be hierarchical (tree) to maintain the knowledge of direct and transitively introduced components. This is required for effective querying because from a remediation perspective, direct dependencies are what matters even if a transitive dependency has a critical issue.
Lockfile (such as package-lock.json
, gradle.lockfile
) etc. base dependency identification is easy and gives coverage of all components that may be included in the build. However most lockfiles do not retain the direct / transitive relationship of dependencies. They appear as flat list of dependencies.
Package manifests like pom.xml
, build.gradle
etc. when resolved by the appropriate package manager, internally builds a dependency tree and can optionally write it as output as well. But parsing such package manifests is complex and is very specific to package managers in terms of behavior spec.
We need to explore the right approach that will allow us to:
Markdown report should include packages that have identified as violation using filter query or filter suites.
Show usage along with error msg when vet auth is invoked without required params
Vulnerability Exploitability eXchange (VEX) is a form of a security advisory where the goal is to communicate the exploitability of components with known vulnerabilities in the context of the product in which they are used.
https://cyclonedx.org/capabilities/vex/
Security scanners will detect and flag components in software that have been identified as being vulnerable. Often, software is not necessarily affected as signaled by security scanners for many reasons such as: the vulnerable component may have been already patched, may not be present, or may not be able to be executed. To turn off false alerts like these, a scanner may consume VEX data from the software supplier.
https://github.com/openvex/spec#about-vex
vet
Contextvet
is a tool intended to identify OSS dependencies and subsequently identify risks in such dependencies using configured policies. While generating SBOM is not an absolute requirement, vet
can do that using its data model.
From a user's perspective, it may be useful to continuously generate SBOM using vet
and maintain an inventory of SBOMs associated with each release of a software component. This may also be useful in audit use-cases, where an auditor persona uses vet
to generate an SBOM for an application which in turn is used for risk assessment.
In this context, it may be very useful for such user persona to generate VEX statements to associate additional information with the vulnerabilities / risks identified by vet
and included in SBOM. Particularly, vulnerabilities can be marked as fixed or not applicable as required.
More thought and user survey is required to define this. But at a high level, it should be:
Dependency on external services may increase the time and API cost of tool usage.
Implement a client side cache for Insight API responses. This can be implemented using an sqlite3 DB by re-using the caching interface implemented for Insights API Service
When the configured trial key is expired, scan runs without error but enrichment fails and hence no result is displayed. Only when we run with verbose logging can we know that the trial key has expired.
Introduce key verification and verify key before starting scan. Fail if key is invalid. This should be done only for scan command and not for offline analysis like query command
Generate an exportable software bill of materials (SBOM) in the NTIA-approved data formats (i.e., SPDX, CycloneDX, and SWID tags)
vet
can clone the repo but we must ensure:
We currently have quite a few linter issues.go
✗ golint ./... | wc -l
124
We need to fix them and introduce a linter guard rail in
https://github.com/safedep/vet/blob/main/.github/workflows/ci.yml
The guard rail can be as simple as integrating
https://github.com/golangci/golangci-lint-action
Create Github container action for using vet as a Github action.
This is a tool experience
./vet scan ...
This is a workflow experience
steps:
- name: OSS Vet
uses: safedep/vet
with:
fail_on_match: true # This is default
suite: default # or .vet/suites/custom.yml
exceptions: .vet/exceptions.yml
How do you know if a vulnerability in method-X in library-Y is actually reachable from your application and therefore has a real impact and not just another noise generated by scanning tools
This is a real problem for most SCA tools because of how they operate based on version matching algorithms. Implementing reachability analysis will greatly reduce false positives related to vulnerability detection. However, doing this, especially in a language agnostic manner is challenging, if not impossible.
We should explore this problem in two stages:
Doing [2] is not easy as it requires having source code of all 3rd party dependencies as well to identify paths that are reachable indirectly from the target application.
Implement vet auth verify
to verify validity of the configured API key.
This can be done in two ways:
[2] is probably the way forward given other use-cases in future.
Introduce version drift in filter input spec to allow filters such as:
drifts.major > 0
drifts.major > 2
drifts.minor > 3
Provide baseline filters that can be used to get started.
running vet on gohugo repository gives odd recommendations
felt like it wants to say the libs are low popularity but then its showing update to a version which already exists in repo the view is not very clear. even the markdown format is doing the same
also low popularity, drift kind of flags are missing in the md file.
We need to build a framework for running behavior tests on generated vet
binary to ensure we don't end up breaking cli contract. We need to explore if there are any cli testing frameworks. Alternatively we can just use something like RSpec and write custom helpers to wrap vet execution with params and package manifests (fixture files).
Key flows that need to be tested:
Exceptions management is currently implemented as a global package. This is bad because we can't use vet
as a package and run concurrent scans. We need to refactor exceptions management into an object of its own with an optional global instance to maintain current feature parity
Dogfood vet :)
Setup a vetting working for this repository using vet
. This should include creating an appropriate policy, exceptions configuration and a Github action that runs on PR to identify issues
vet
currently executes a scan based on command line arguments. While this is flexible, there are quite a lot of args and it will increase as the tool evolves. This will make CI integration complex, particularly building a Github Action runner while considering all args will not be a good experience. We have already identified this as a problem in #23
.vet/scan.yml
vet
automatically decodes the scan spec and executes the scan based on it without command line argsThe summary reporter presents a table of packages that are recommended for upgrade. It does not show the ecosystem of the package and only shows the name and version. While this is fine for a scan where only a single lockfile was scanned, this is a problem where multiple lockfiles were scanned with different ecosystem
To start improving the UX, we should start by showing the Ecosystem name in the report report table.
Any real-life application will depend on frameworks & other direct dependencies which in turn introduces multiple layers of transitive dependencies. The number of effective (direct & transitive) dependencies for any real-life application can be easily 100+.
When we scan dependencies, we end up finding issues (vulnerability / popularity / security posture) in a lot of dependencies, thus increasing the remediation cost significantly. Many a times, the remediation is infeasible or painful due to the sheer volume of issues produced by a tool, vet
included.
Our goal is to improve the user experience when it comes to remediating issues in OSS dependencies while ensuring that we do not provide a false sense of security by missing critical issues. To do this, we need to do provide a paved path for remediation journey instead of dumping issues to the user and having the user make the decision / prioritisation / plan.
We need an user experience like this
Lockfiles like gradle.lockfile
, package-lock.json
, Gemfile.lock
etc. already contains locked version of direct and transitive dependencies that actually compose the deployable app.
For these lockfiles, we do not need to depend on Insights API for resolving dependencies.
As a user, I want to perform a dependency scan on all/partial projects on GitHub org to generate the most critical risks such as license risks in one shot.
Optionally, I should be able to perform dependency scanning of selected projects in my orgs
The example command can be
vet scan https://github.com/OrgName --github-token ....
The scan should generate violations in a report
Possible behavior:
The tool can utilize the SBOM provided by Github to perform the assessment.
Dependency Track is a continuous SBOM management and analysis platform. For DT to be effective, it is important to continuously import SBOMs into DT. We want vet
to make it very easy for an organization to continuously sync there repositories into DT by generating SBOM and using DT's REST API to upload to DT
We will start by supporting Github and eventually may be Gitlab. For the Github integration, we will provide an experience on top of the existing --github
scan option to scan a remote Github repository. The scan will look like
vet scan --github-org https://github.com/safedep
For syncing results to DependencyTrack, we will build a new reporting module that syncs to DependencyTrack instance.
VET_DT_BASE_URL="..." VET_DT_TOKEN="..." \
vet scan --github-org https://github.com/safedep --report-dependency-track
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.