Giter VIP home page Giter VIP logo

vunnel's Introduction

vunnel

A tool for fetching, transforming, and storing vulnerability data from a variety of sources.

GitHub release License: Apache-2.0 Join our Discourse

vunnel-demo

Supported data sources:

Installation

With pip:

pip install vunnel

With docker:

docker run \
  --rm -it \
  -v $(pwd)/data:/data \
  -v $(pwd)/.vunnel.yaml:/.vunnel.yaml \
    ghcr.io/anchore/vunnel:latest  \
      run nvd

Where:

  • the data volume keeps the processed data on the host
  • the .vunnel.yaml uses the host application config (if present)
  • you can swap latest for a specific version (same as the git tags)

See the vunnel package for a full listing of available tags.

Getting Started

List the available vulnerability data providers:

$ vunnel list

alpine
amazon
chainguard
debian
github
mariner
nvd
oracle
rhel
sles
ubuntu
wolfi

Download and process a provider:

$ vunnel run wolfi

2023-01-04 13:42:58 root [INFO] running wolfi provider
2023-01-04 13:42:58 wolfi [INFO] downloading Wolfi secdb https://packages.wolfi.dev/os/security.json
2023-01-04 13:42:59 wolfi [INFO] wrote 56 entries
2023-01-04 13:42:59 wolfi [INFO] recording workspace state

You will see the processed vulnerability data in the local ./data directory

$ tree data

data
└── wolfi
    ├── checksums
    ├── metadata.json
    ├── input
    │   └── secdb
    │       └── os
    │           └── security.json
    └── results
        └── wolfi:rolling
            ├── CVE-2016-2781.json
            ├── CVE-2017-8806.json
            ├── CVE-2018-1000156.json
            └── ...

Note: to get more verbose output, use -v, -vv, or -vvv (e.g. vunnel -vv run wolfi)

Delete existing input and result data for one or more providers:

$ vunnel clear wolfi

2023-01-04 13:48:31 root [INFO] clearing wolfi provider state

Example config file for changing application behavior:

# .vunnel.yaml
root: ./processed-data

log:
  level: trace

providers:
  wolfi:
    request_timeout: 125
    runtime:
      existing_input: keep
      existing_results: delete-before-write
      on_error:
        action: fail
        input: keep
        results: keep
        retry_count: 3
        retry_delay: 10

Use vunnel config to get a better idea of all of the possible configuration options.

FAQ

Can I implement a new provider?

Yes you can! See the provider docs for more information.

Why is it called "vunnel"?

This tool "funnels" vulnerability data into a single spot for easy processing... say "vulnerability data funnel" 100x fast enough and eventually it'll slur to "vunnel" :).

vunnel's People

Contributors

andrew avatar asomya avatar benoitgui avatar bot190 avatar dependabot[bot] avatar joshbressers avatar juan131 avatar juanjsebgarcia avatar luhring avatar nurmi avatar popey avatar spiffcs avatar tomersein avatar wagoodman avatar westonsteimel avatar willmurphyscode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vunnel's Issues

schemas version 2 wishlist

The original vunnel schemas were very heavily influenced by the existing results format from Anchore Enterprise since we were trying to rip that out with as little disruption as possible. Now we would like to begin evaluating what a future iteration of these schemas could look like.

  • The OS schema should not emit FixedIn entries but rather actual constraints. This way each provider can tailor the constraint creation logic to its needs, and grype-db does not have to figure out how to interpret multiple FixedIn entries.
  • Needs to allow capture of upstream lifecycle dates (published, modified, withdrawn, etc) where available
  • Needs to allow override of the severity at a per-package level (example anchore/grype-db#108 (comment))

Use the new NVD CPE v2 API instead of the deprecated feeds

The NVD provider should be refactored to use the new CVE API (V2):

In late 2023, the NVD will retire its legacy data feeds while working to guide any remaining data feed users to updated application-programming interfaces (APIs). APIs have many benefits over data feeds and have been the proven and preferred approach to web-based automation for over a decade. For additional information on the NVD API, please visit the developers pages. Click here for more information on the NVD timeline.

(from https://nvd.nist.gov/vuln/data-feeds)

Assets

Issue used to upload assets to github for reference in markdown files. (Do not delete this issue)

Potential missing CVE/package associations in database for SLES

Steps to reproduce the issue
Build sles database with:
grype-db build -g -p sles
Open database and compare with original oval.
What happened:
Some packages seems to be missing in database for specifics CVE on some systems. If we for example take CVE-2016-5440, we have in database:
image
However in the original oval file we have the two following criteria (both are in the same OR criteria for vuln CVE-2016-5440, timestamp of the oval: 2024-07-23T05:27:20, line 693181):
image
and
image

What you expected to happen:
Please indicate if i am wrong there, but the packages in the first criteria should also be inside the database. This issue lead to a quite significant number of missing packages/CVE associations in database, leading to potentially missing visibility on CVE on a system.

Anything else we need to know?:
I did not tried to investigate where is the issue in the source code

Environment:
grype-db version: 0.23.2
Seems to affect all SLES systems

Normalize provider output to conform to defined in-repo schemas

Today the providers are outputting sightly different record shapes that don't entirely conform to the same shape (e.g. amazon vs alpine for the .Vulnerability.Metadata.CVE field... one is a []string while the other is an object). There seem to be issues downstream in grype-db-builder using the existing output. I'm not certain why this would be the case since there was no logical change of record processing, however, it may be possible that there was a last-minute alteration to drivers just before writing to the database that was not ported.

This issue is here to capture the work needed for this field (.Vulnerability.Metadata.CVE) as well as others found along the way while integrating vunnel and grype-db-builder.

Split all providers into "download" and "process" steps

Most of the parsers internally have a _download and _process (or similar) methods to organize the work, however, there is no strict separation of these processes when calling the get method on the parsers. Ideally there should be a way to download all state in one step and process the state in another. This could be exposed on the CLI as well (e.g. vunnel download <provider> vunnel process <provider> and leave vunnel run provider as is, calling both download and process internally).

This would help with provider development, where most of the development is on the processing-side, so naturally allowing for download to be separate and a guarantee of no network calls / updating of input state when calling "process".

Make troubleshooting quality gate failures easier

Today we run the quality gates in CI but throw away the .yardstick dir when the run ends. If we pushed an archive of this to CI we'd be able to pull and reproduce the results much faster (from upwards of an hour down to minutes)

Upstream SUSE OVAL archives and CVSS data is changing

What would you like to be added:

  • You can download SUSE oval data now .bz2 compressed. We plan to discontinue the .gz compressed data.
  • We changed the severity impact mappings. We now use the CVSS v3.1 offical values: low, medium, high, critical

Why is this needed:

changes on SUSE side.

Additional context:

Test: add quality gate for percentage of namespaces exercised by at least one test image

What would you like to be added:
During the quality gate we want to check if results are being tested from > 70% of namespaces.

Example: coverage of namespaces sampled is at 71%, a user adds a new provider that adds a new namespace, coverage would drop < 70% causing the gate to fail since the new namespace is not yet exercised by at least one test image.

Why is this needed:
To catch if there are failures using our test data against current/new providers before running a vunnel release

Additional context:
The tools team is able to point to a Post-Mortem regarding missing debian data that resulted in this issue being filed

Enhance quality gate to fail when a new provider is not configured

When adding a new provider you need to remember to add a new configuration in the tests section of https://github.com/anchore/vunnel/blob/main/tests/quality/config.yaml . This is problematic since you could have a passing QG that is unaware of the new provider (thus nothing was tested).

It would be more ideal if the QG could look at src/vunnel/providers and understand if there is a provider that is missing a config.yaml entry and fail when that case is detected.

Wolfi and Chainguard providers and not handling download errors as expected

What happened:
I ran a feed service sync on Anchore Enterprise that uses vunnel to download provider vulnerability data.
This was run with my internet disabled to see if each provider would error and mark the status as failed as it should.

Running:  
Completed:  chainguard wolfi
Failed:  anchore-match-exclusions sles mariner nvd rhel ubuntu oracle debian alpine amazon

However as you can see chainguard and wolfi returned their status as completed.

        {
            "driver_id": "chainguard",
            "end_time": "2024-03-14T10:14:09.651947+00:00",
            "parent_task_id": 139,
            "start_time": "2024-03-14T10:12:12.256996+00:00",
            "started_by": "system",
            "status": "completed",
            "task_id": 151,
            "task_type": "VunnelProviderExecutionTask"
        },
        {
            "driver_id": "wolfi",
            "end_time": "2024-03-14T10:12:12.255998+00:00",
            "parent_task_id": 139,
            "start_time": "2024-03-14T10:10:14.932096+00:00",
            "started_by": "system",
            "status": "completed",
            "task_id": 150,
            "task_type": "VunnelProviderExecutionTask"
        },
        ....
        {
            "driver_id": "sles",
            "end_time": "2024-03-14T10:09:55.441728+00:00",
            "parent_task_id": 139,
            "result": {
                "error": "The sles vunnel provider failed to complete succesfully.  Check the feed service logs for specific details of the failure.  If available, stale sles results will be used until the next successful run.",
                "error_details": "error: 1 error occurred:\n\t* failed to pull data from \"sles\" provider: command failed: 1\n\n",
                "failed_command": "grype-db pull -c /tmp/feeds_workspace/drivers/grypedb/grype-db.yaml -p sles"
            },
            "start_time": "2024-03-14T10:04:49.132466+00:00",
            "started_by": "system",
            "status": "failed",
            "task_id": 148,
            "task_type": "VunnelProviderExecutionTask"
        },
        ...

What you expected to happen:

I expected to see "status": "failed" due to the lack of internet connectivity to reach the provider service

And output similar to below

        {
            "driver_id": "sles",
            "end_time": "2024-03-14T10:09:55.441728+00:00",
            "parent_task_id": 139,
            "result": {
                "error": "The sles vunnel provider failed to complete succesfully.  Check the feed service logs for specific details of the failure.  If available, stale sles results will be used until the next successful run.",
                "error_details": "error: 1 error occurred:\n\t* failed to pull data from \"sles\" provider: command failed: 1\n\n",
                "failed_command": "grype-db pull -c /tmp/feeds_workspace/drivers/grypedb/grype-db.yaml -p sles"
            },
            "start_time": "2024-03-14T10:04:49.132466+00:00",
            "started_by": "system",
            "status": "failed",
            "task_id": 148,
            "task_type": "VunnelProviderExecutionTask"
        },

How to reproduce it (as minimally and precisely as possible):

Run vunnel with no internet, and see that the status returns 'completed'

Anything else we need to know?:

This might be due to how wolfi and chainguard both use the same endpoint
This could be due to some aggressive try catch that swallows all errors around downloading
https://github.com/anchore/vunnel/blob/main/src/vunnel/providers/wolfi/parser.py#L56

Environment:

  • Output of vunnel version:
    vunnel 0.18.4
  • OS (e.g: cat /etc/os-release or similar):
    Redhat (Anchore Enterprise 4.9.5 image)

Every provider should have "data to vuln object" unit test

What would you like to be added:

Every provider has unit tests that assert that it can process a representative sample of records - completeness of our ability to process provider data is asserted at the unit test level.

Why is this needed:

We recently had some bugs merged into vunnel that should have failed unit tests - this issue exists to track getting the test coverage high enough.

Additional context:
The recent failure was in the debian provider, but we should check all the providers.

Providers:

grype showing disputed CVE in Mariner 2.0

What happened:

Grype reports DISPUTED CVEs.

For instance, CVE-2023-0687 is not only DISPUTED upstream https://nvd.nist.gov/vuln/detail/CVE-2023-0687
but also in our OVAL file as follows: Not Applicable

https://raw.githubusercontent.com/microsoft/CBL-MarinerVulnerabilityData/main/cbl-mariner-2.0-oval.xml

    <definition class="vulnerability" id="oval:com.microsoft.cbl-mariner:def:13348" version="0">
      <metadata>
        <title>CVE-2023-0687 affecting package glibc 2.35-4</title>
        <affected family="unix">
          <platform>CBL-Mariner</platform>
        </affected>
        <reference ref_id="CVE-2023-0687" ref_url="https://nvd.nist.gov/vuln/detail/CVE-2023-0687" source="CVE"/>
    ==>    <patchable>Not Applicable</patchable>
        <advisory_id>13348</advisory_id>
        <severity>Critical</severity>
        <description>CVE-2023-0687 affecting package glibc 2.35-4. This CVE either no longer is or was never applicable.</description>
      </metadata>
      <criteria operator="AND">
        <criterion comment="Package glibc is installed with version 2.35-4 or earlier" test_ref="oval:com.microsoft.cbl-mariner:tst:13348000"/>
      </criteria>
    </definition>

What you expected to happen:
We expect grype to not report the CVE with patchable set to 'Not Applicable'.

How to reproduce it (as minimally and precisely as possible):
grype mcr.microsoft.com/cbl-mariner/base/core:2.0

 ✔ Vulnerability DB                [updated]
 ✔ Parsed image                                                                               sha256:1f28c8aa4ec798dfd78fc26e14165be1812c2767b36382e324113ef09afac75f   ✔ Cataloged packages              [72 packages]
 ✔ Scanned for vulnerabilities     [8 vulnerabilities]
   ├── 1 critical, 6 high, 1 medium, 0 low, 0 negligible
   └── 0 fixed
NAME       INSTALLED     FIXED-IN  TYPE  VULNERABILITY   SEVERITY
glibc      2.35-3.cm2              rpm   CVE-2010-4756   Medium
glibc      2.35-3.cm2              rpm   CVE-2021-3998   High
glibc      2.35-3.cm2              rpm   CVE-2023-0687   Critical
libgcc     11.2.0-4.cm2            rpm   CVE-2022-41724  High
libgcc     11.2.0-4.cm2            rpm   CVE-2022-41725  High
libstdc++  11.2.0-4.cm2            rpm   CVE-2022-41724  High
libstdc++  11.2.0-4.cm2            rpm   CVE-2022-41725  High
nghttp2    1.46.0-2.cm2            rpm   CVE-2021-46023  High

Anything else we need to know?:

Environment:

- Output of `grype version`:
Application:          grype
Version:              0.64.1
Syft Version:         v0.85.0
BuildDate:            2023-07-17T20:31:39Z
GitCommit:            43bcf301c445d13360d724971fd089cd7a61ead9
GitDescription:       v0.64.1
Platform:             linux/amd64
GoVersion:            go1.19.10
Compiler:             gc
Supported DB Schema:  5
  • OS (e.g: cat /etc/os-release or similar):
NAME="Common Base Linux Mariner"
VERSION="2.0.20230621"
ID=mariner
VERSION_ID="2.0"
PRETTY_NAME="CBL-Mariner/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://aka.ms/cbl-mariner"
BUG_REPORT_URL="https://aka.ms/cbl-mariner"
SUPPORT_URL="https://aka.ms/cbl-mariner"

amazon: improve version constraint construction for ALASKERNEL-x.x-* advisories to reduce false positive matches

What happened:

Amazon linux advisories like ALASKERNEL-5.10-2022-005 should only apply to that specific kernel series (5.10.x), because they create an advisory per kernel line in these cases:

So ALASKERNEL-5.10-2022-005 patches CVE-2021-3753 which shows:

image

Right now since we construct the affected version constraint as < 5.10.62-55.141.amzn2 for ALASKERNEL-5.10-2022-005, it means that even a 4.x series kernel with the patch from ALAS-2021-1704 or a 5.4 series kernel with the patch from ALASKERNEL-5.4-2022-007. In fact, it's even worse for a 4.x kernel since it'll potentially report all 3. After looking at the data I think it should be correct to build a version constraint something like >= 5.10, < 5.10.62-55.141.amzn2 for these cases. This should lead to correctly interpreted results for these cases.

There are some other instances where AWS issues multiple advisories across several lines of a product; however, in all of the other cases the package has a distinct name so shouldn't require this workaround, but the kernel packages do not have this property.

What you expected to happen:

Only ALAS-2021-1704 should be reported for a vulnerable 4.x kernel line, and only ALASKERNEL-5.4-2022-007 should be reported for a vulnerable 5.4.x kernel line.

How to reproduce it (as minimally and precisely as possible):

dcoker run --rm anchore/grype:v0.65.2 docker.io/anchore/test_images:vulnerabilities-amazonlinux-2-5c26ce9@sha256:cf742eca189b02902a0a7926ac3fbb423e799937bf4358b0d2acc6cc36ab82aa

Anything else we need to know?:

Environment:

  • Output of vunnel version:
  • OS (e.g: cat /etc/os-release or similar):

skip_if_exists not passed to centos parser within RHEL parser

The RHEL parser uses the centos parser in order to download results. The centos parser has a skip_if_exists attribute that is leveraged by the centos provider based on the runtime config (wired into centos.Parser.get()). The same functionality in the RHEL parser is not wired to the internal centos parser instance even though there is a skip_if_exists feature for the RHEL provider as a whole.

Question: is leaving out skip_if_exists on the centos parser calls within the RHEL parser intentional? If so, we need some comments as to why to help folks out in the future.

Add GitHub Security Advisory data for Swift

What would you like to be added:
Add support for pulling in GitHub Security Advisory data for Swift

Why is this needed:
Syft supports cataloging of swift packages but there is currently no vuln data being built into grype-db to match against in grype

Additional context:
Should hopefully just be a simple add to the mapping in:

ecosystem_map = {
"COMPOSER": "composer",
"GO": "go",
"MAVEN": "java",
"NPM": "npm",
"NUGET": "nuget",
"PIP": "python",
"RUBYGEMS": "gem",
"RUST": "rust",
}

Add GitHub Security Advisory data for Erlang packages

What would you like to be added:
Add support for pulling in GitHub Security Advisory data for Erlang

Why is this needed:
Syft supports cataloging of Erlang packages but there is currently no vuln data being built into grype-db to match against in grype

Additional context:
Should hopefully just be a simple add to the mapping in:

ecosystem_map = {
"COMPOSER": "composer",
"GO": "go",
"MAVEN": "java",
"NPM": "npm",
"NUGET": "nuget",
"PIP": "python",
"RUBYGEMS": "gem",
"RUST": "rust",
}

Consume bitnami vulnerability data

What would you like to be added:
Bitnami has started publishing vulnerability data for their products directly to https://github.com/bitnami/vulndb following the OSV format. Because they also include spdx sboms in their images, it might make sense to make use of the identifiers in those sboms to get much more accurate vulnerability matches for bitnami products.

Trivy is currently making use of this dataset

Redhat `package_name` with `/` do not always reference modules

While looking into anchore/grype#1541 we found a secondary issue: We have a section of code in the rhel provider that I believe (after some debate/discovery with @willmurphyscode @nurmi @kzantow @tgerla ) is incorrectly assuming an RPM modularity from a value, specifically the package_name field with /:

...
 {
    "product_name" : "Red Hat Enterprise Linux 9",
    "fix_state" : "Affected",
    "package_name" : "inkscape:flatpak/python-lxml",
    "cpe" : "cpe:/o:redhat:enterprise_linux:9"
  },
...
full output
❯ cat data/vunnel/rhel/input/cve/full/CVE-2022-2309
   1 {
   2   "threat_severity" : "Moderate",
   3   "public_date" : "2022-07-05T00:00:00Z",
   4   "bugzilla" : {
   5     "description" : "CVE-2022-2309 lxml: NULL Pointer Dereference in lxml",
   6     "id" : "2107571",
   7     "url" : "https://bugzilla.redhat.com/show_bug.cgi?id=2107571"
   8   },
   9   "cvss3" : {
  10     "cvss3_base_score" : "7.5",
  11     "cvss3_scoring_vector" : "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H",
  12     "status" : "verified"
  13   },
  14   "cwe" : "CWE-476",
  15   "details" : [ "NULL Pointer Dereference allows attackers to cause a denial of service (or application crash). This only applies when lxml is used together with libxml2 2.9.10 through 2.9.14. libxml2 2.
     9.9 and earlier are not affected. It allows triggering crashes through forged input data, given a vulnerable code sequence in the application. The vulnerability is caused by the iterwalk function (also u
     sed by the canonicalize function). Such code shouldn't be in wide-spread use, given that parsing + iterwalk would usually be replaced with the more efficient iterparse function. However, an XML converter
      that serialises to C14N would also be vulnerable, for example, and there are legitimate use cases for this code sequence. If untrusted input is received (also remotely) and processed via iterwalk functi
     on, a crash can be triggered.", "A NULL Pointer dereference vulnerability found in lxml, caused by the iterwalk function (also used by the canonicalize function). This flaw can lead to a crash when the i
     ncorrect parser input occurs together with usages." ],
  16   "affected_release" : [ {
  17     "product_name" : "Red Hat Enterprise Linux 9",
  18     "release_date" : "2022-11-15T00:00:00Z",
  19     "advisory" : "RHSA-2022:8226",
  20     "cpe" : "cpe:/a:redhat:enterprise_linux:9",
  21     "package" : "python-lxml-0:4.6.5-3.el9"
  22   } ],
  23   "package_state" : [ {
  24     "product_name" : "Red Hat Enterprise Linux 8",
  25     "fix_state" : "Not affected",
  26     "package_name" : "python38:3.8/python-lxml",
  27     "cpe" : "cpe:/o:redhat:enterprise_linux:8"
  28   }, {
  29     "product_name" : "Red Hat Enterprise Linux 8",
  30     "fix_state" : "Not affected",
  31     "package_name" : "python39:3.9/python-lxml",
  32     "cpe" : "cpe:/o:redhat:enterprise_linux:8"
  33   }, {
  34     "product_name" : "Red Hat Enterprise Linux 8",
  35     "fix_state" : "Not affected",
  36     "package_name" : "python-lxml",
  37     "cpe" : "cpe:/o:redhat:enterprise_linux:8"
  38   }, {
  39     "product_name" : "Red Hat Enterprise Linux 9",
  40     "fix_state" : "Affected",
  41     "package_name" : "inkscape:flatpak/python-lxml",
  42     "cpe" : "cpe:/o:redhat:enterprise_linux:9"
  43   }, {
  44     "product_name" : "Red Hat Software Collections",
  45     "fix_state" : "Not affected",
  46     "package_name" : "rh-python38-python-lxml",
  47     "cpe" : "cpe:/a:redhat:rhel_software_collections:3"
  48   } ],
  49   "references" : [ "https://www.cve.org/CVERecord?id=CVE-2022-2309\nhttps://nvd.nist.gov/vuln/detail/CVE-2022-2309" ],
  50   "name" : "CVE-2022-2309",
  51   "csaw" : false
  52 }

Specifically, in inkscape:flatpak/python-lxml I think the value inkscape:flatpak is being parsed as the RPM modularity:

sqlite> select id,package_name,package_qualifiers,version_constraint from vulnerability where id == "CVE-2022-2309" and namespace == "redhat:distro:redhat:9";
id             package_name  package_qualifiers                                       version_constraint
-------------  ------------  -------------------------------------------------------  ------------------
CVE-2022-2309  python-lxml   [{"kind":"rpm-modularity","module":"inkscape:flatpak"}]
CVE-2022-2309  python-lxml                                                            < 0:4.6.5-3.el9

(note the first row).

I cannot find any reference to python-lxml that is part of an appstream named this, however, the indicator :flatpak might indicate that this is referring to a python-lxml package within the contents of the inkscape flatpak image.

This means that the row in the DB is wrong (should not exist). My vote would be to drop any entry where the modularity value is *:flatpack.

CC @westonsteimel

Quality gate should use `expected_namespaces` to filter results

Today we filter the results of the grype scans down to what namespaces are in the subject database. A more robust way to do this would be to filter down to what is expected (which is available in the config.yaml file for each provider). Additionally this would allow for a narrower measure of what is under test. Today it's not as ideal since the github provider needs to have alpine results be accurate, however, we only need to measure language-specific results... which means we are not as sensitive as we could be to possible changes to the specific providers under test, which is not great.

Vunnel should retain "not applicable" items so that grype can use them as negative evidence

What would you like to be added:

Right now, at least 2 vunnel providers (RHEL and Mariner), simply drop vulnerabilities that the feed considers "not applicable". Instead, we should keep them in the database with fixed status "not applicable" and version constraint < 0.

Why is this needed:

When matching, in order to avoid false positives, grype should be able to consider the explicit claim by the feed operators that a give package is not vulnerable as evidence that it is not vulnerable. Right now, a distro feed being silent on a given CVE, and a distro feed explicitly reporting that the CVE is not applicable to their package, both result in having no row in the grype database for that CVE/namespace/package. But the explicit claim by the feed operators that a given package is not vulnerable is valuable evidence and should be retained.

More details at anchore/grype#1426 for the reason grype should have access to negative matches.

Additional context:

Mariner provider dropping N/A matches:

if d.metadata and d.metadata.patchable and d.metadata.patchable in IGNORED_PATCHABLE_VALUES:
continue

RHEL provider dropping "Not affected" matches:

elif state in [
"New",
"Not affected",
"Under investigation",

ubuntu provider git url should be configurable

What would you like to be added:
Need to make the url for the ubuntu-cve-tracker repo for the ubuntu provider configurable. This would allow specification of a mirrored data source or the option to use https://git.launchpad.net/ubuntu-cve-tracker even though it is less reliable than git://git.launchpad.net/ubuntu-cve-tracker. The default will remain using the unencrypted git:// protocol endpoint if nothing else is specified

Why is this needed:
Longer-term it likely makes sense for us to use automation to mirror this repository into github so that consumers pulling data won't be subjected to the flakiness of the ubuntu launchpad hosted repo. Currently using the unencrypted git protocol is much more reliable than using the https endpoint; however, there will likely be pushback around using that protocol since there are some security concerns around it (GitHub disabled its use entirely)

Add provider label features

What would you like to be added:
A labeling system that allows end users to select providers to run by capability or description. Something like:

Select providers that have a label describing a capability:

$ vunnel list --label linux
alpine
rhel
centos
ubuntu
debian
wolfi
...

Signal to downstream consumers the state of a provider (labels with values):

$ vunnel list --label stable=true
alpine
amazon
oracle
...

Why is this needed:
This allows for automatic select (and exclusion) of vunnel providers for consumers (like grype-db) in a more automatic way without needing to update the grype-db exclusion list when new providers are added.

chore: replace frequenty used dict literals with data classes

For example:

vulnerability_element = {
"Vulnerability": {
"Severity": None,
"NamespaceName": None,
"FixedIn": [],
"Link": None,
"Description": "",
"Metadata": {},
"Name": None,
"CVSS": [],
},
}
gets copy.deepcopy-ed around a lot, as a template for a dict literal. The code would be cleaner and have more useful type hinting if this were replaced with a data class.

Examples of copying

v = copy.deepcopy(vulnerability_element)
ns_name = config.ns_format.format(re.search(config.platform_version_pattern, platform_element.text).group(1))
v["Vulnerability"]["NamespaceName"] = ns_name
v["Vulnerability"]["Severity"] = severity or ""
v["Vulnerability"]["Metadata"] = (
{"Issued": issued, "Updated": updated, "RefId": ref_id} if updated else {"Issued": issued, "RefId": ref_id}
)
v["Vulnerability"]["Name"] = name
v["Vulnerability"]["Link"] = link
v["Vulnerability"]["Description"] = description

and

vuln_dict[vid] = copy.deepcopy(vulnerability.vulnerability_element)
vuln_record = vuln_dict[vid]
reference_links = vulnerability.build_reference_links(vid)
# populate the static information about the new vuln record
vuln_record["Vulnerability"]["Name"] = str(vid)
vuln_record["Vulnerability"]["NamespaceName"] = self.namespace + ":" + str(release)

There should be an easier way to add test cases for providers

When working on providers, it's common to add test cases that are made essentially by subsetting flat files that carry vulnerability data.

For example, trying to test #650, it would be nice to quickly change this file to also include the definition, rpminfo_tests, states, and objects for CVE-2016-5440. However, the file that contains this vulnerability definition is, as of this writing, 2681586 lines of XML. Many text editors I've tried have crashed when opening it, and there doesn't appear to be a tool as high quality as jq for doing stream transformations of the XML.

I think the right approach is probably to write a utility that accepts an OVAL XML file and a list of CVEs and returns the subset of the OVAL XML file that is relevant to those CVEs. It's possible such a tool exists.

Having such a script would make adding unit tests to PR that fix a class of incorrect parsing trivial, and therefore increase the rate at which we can improve Vunnel data.

Refactor provider names to be more consistent

Today we have names like rhel where el stands for "enterprise linux", however we additionally have "oracle" which is short for "oracle linux". This is inconsistent and should be normalized before release.

Other names that appear inconsistent: "amazon"

Add descriptions to amazonlinux vulnerability entries

What would you like to be added:
today vulnerabilities from amazon providers do not contain the description.
for example -
image
Since we already pull the html page of the vulnerability, is it possible to add the description so it can be provided in grype?
Why is this needed:
add description for ALAS vulnerabilities
Additional context:

Improve error handling of deterministic minor errors

Background:

Every now and then we see 403s from ALAS issues (e.g. as of this writing, https://alas.aws.amazon.com/AL2/ALAS-2024-2510.html returns 403).

Right now, this causes the entire operation of vunnel run -p amazon to exit non-zero, which might not be the behavior we want. Concretely, the first exception raised during the provider run halts the execution. HTTP GETs are retried 5 times, but this 403 is deterministic, so the retries don't help.

What would you like to be added:

We should be able to configure some continue-on-error semantics for vunnel; right now it's too all-or-nothing. For example, I should be able to write down, "provider X claims that there's a vulnerability we should download from example.com/some-cve, which is unreachable. Ignore this specific error." Or maybe "if you have fewer than 5 records that couldn't be retrieved, still consider the run successful."

This would allow us to better balance the competing priorities of "use yesterday's data instead of bad data," and "old data is bad."

Additional context:

Example failure: https://github.com/anchore/grype-db/actions/runs/8730962418/job/23970839142#step:6:1440

Investigate using the OVAL data for ubuntu provider

Today we parse the CVE information for ubuntu distributions from git://git.launchpad.net/ubuntu-cve-tracker . This is probably correct for unsupported distro versions, but for supported distro versions we should be leveraging the OVAL data https://security-metadata.canonical.com/oval/ . Searching through the git history for merging record changes is a slow process (hours with the current implementation), so if we could find ways to improve this section of the code or eliminate the need altogether that would be ideal.

More investigation is needed to understand:

  • where the bottle necks with the current implementation are today
  • can the hot spots be refactored to alleviate time and resource pains?
  • what the OVAL data has or doesn't have over the current git cve-tracker repo

Avoid scraping HTML in the amazon provider

Today the amazon provider ported from enterprise scrapes the posted HTML from https://alas.aws.amazon.com/ . However, this can be improved:

# get the release versions...
$ curl -O https://al2022-repos-us-west-2-9761ab97.s3.dualstack.us-west-2.amazonaws.com/core/releasemd.xml

# from this file you have versions like "2022.0.20221207"... use this to look up the mirror list
$ curl -O https://al2022-repos-us-east-1-9761ab97.s3.dualstack.us-east-1.amazonaws.com/core/mirrors/2022.0.20221207/x86_64/mirror.list

# the mirror list contains the URL to get the "repomd" index
$ curl -O https://al2022-repos-us-east-1-9761ab97.s3.dualstack.us-east-1.amazonaws.com/core/guids/581859ea114d36f96a58435ad4169541fe3fccb88e0130c85b3ed542a34171a2/x86_64/repodata/repomd.xml

# that index contains the checksums and location for "updateinfo" (contains vulnerability info)
$ curl -O https://al2022-repos-us-east-1-9761ab97.s3.dualstack.us-east-1.amazonaws.com/core/guids/581859ea114d36f96a58435ad4169541fe3fccb88e0130c85b3ed542a34171a2/x86_64/repodata/updateinfo.xml.gz

$ gunzip updateinfo.xml.gz
$ head updateinfo.xml
<?xml version="1.0" ?>
<updates><update status="final" version="1.4" author="[email protected]" type="security" from="[email protected]"><id>ALAS2022-2021-001</id><title>Amazon Linux 2022 - ALAS2022-2021-001: Medium priority package update for vim</title><issued date="2021-10-26 02:25" /><updated date="2021-10-27 00:24" /><severity>Medium</severity><description>Package updates are available for Amazon Linux 2022 that fix the following vulnerabilities:
CVE-2021-3875:
        There's an out-of-bounds read flaw in Vim's ex_docmd.c. An attacker who is capable of tricking a user into opening a specially crafted file could trigger an out-of-bounds read on a memmove operation, potentially causing an impact to application availability.
2014661: CVE-2021-3875 vim: heap-based buffer overflow
...

Disallow all bare try-except clauses

Today we have several noqa: E722 statements as well as ignoring blank noqa statements (here and here). We should:

  • not allow bare noqa statements, as this can hide a lot of issues
  • not allow any E722 exceptions, as this is a popular antipattern in this codebase and has lead to production regressions

Additionally, all except: blocks should log the exception and some useful context, to make feed failures easier to diagnose from logs.

Add GitHub Security Advisory data for Dart packages

What would you like to be added:
Add support for pulling in GitHub Security Advisory data for Dart

Why is this needed:
Syft supports cataloging of Dart packages but there is currently no vuln data being built into grype-db to match against in grype

Additional context:
Should hopefully just be a simple add to the mapping in:

ecosystem_map = {
"COMPOSER": "composer",
"GO": "go",
"MAVEN": "java",
"NPM": "npm",
"NUGET": "nuget",
"PIP": "python",
"RUBYGEMS": "gem",
"RUST": "rust",
}

github: persist all of the reference links in the final result

What would you like to be added:

The vunnel provider pulls all of the advisory reference links from the GitHub GraphQL API, but does not persist them in the final vunnel result shape

Why is this needed:
Additional metadata that is useful when triaging a specific finding.

Port remaining feed drivers from enterprise

Port the feed service drivers from enterprise to vunnel:

  • nvd #1
  • ghsa #7
  • alpine #6
  • centos (part of initial concept commit)
  • amazon #12
  • debian #13
  • oracle #15
  • rhel -#19
  • sles #20
  • ubuntu #22
  • wolfi #21

Remaining tasks:

Tempting refactors:

  • rewrite the NVD provider: #9 #11 (implemented in #27)
  • rewrite the ubuntu provider: #24 (refactored based on findings mentioned in #24 (comment))
  • split parsers into download and process steps: #26 (this can be deferred)

Items to capture as issues:

  • tech debt: remove all nosec linter ignore comments from ported providers
  • tech debt: remove all flake8: noqa (full file) linter ignore comments from ported providers

Change provider interface to separate "download" and "process" steps

When doing the initial port of all of the drivers, most of the drivers support a single entrypoint (fetch or refresh) which was rolled into the provider.Provider base class as update().

It would be more ideal to be able to split the operations of the provider into download() and process() to enable better local development and assurance that processing results will not incur network calls.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.