Current Behavior When uploading a BOM, components are merged based

Would you be open to PRs to implement this? <p dir="aut

Use stricter identity comparison when merging components about dependency-track HOT 5 OPEN

mykter commented on August 25, 2024

Use stricter identity comparison when merging components

from dependency-track.

Comments (5)

nscuro commented on August 25, 2024

Valid request. And it will be even more valid once we support additional metadata such as occurrences.

The identity based de-duplication has always been there, but I think with the recent refactoring of BOM processing, as well as introduction of component property support, it's now more obvious.

De-duplication is a major concern for users who merge multiple BOMs prior to upload - most merge tools don't pay attention to duplicates, so it's up to DT to resolve that. There are also BOM generators out there that will produce duplicate component records for monorepos, or multi-module projects.

That being said, even in those cases, I'd expect properties outside of the core identity to match as well. So I'm inclined to say we should be able to just switch to full equality and be done with it.

If we need to maintain multiple ways, we could just make it a flag in the BOM upload request, defaulting to identity-based de-duplication.

What could be problematic are BOM generators that yield non-reproducible outputs. For example if they put timestamps or otherwise dynamic data in properties. In that case you'll get lots of churn whenever you re-upload BOMs to existing projects.

from dependency-track.

mykter commented on August 25, 2024

most merge tools don't pay attention to duplicates, so it's up to DT to resolve that. There are also BOM generators out there that will produce duplicate component records for monorepos, or multi-module projects.

I think there's a reasonable argument that it's up to the BOM producers to resolve that, not DT. Being able to say the BOM is the source of truth is a powerful simplifier, both for users and developers.

So I'm inclined to say we should be able to just switch to full equality and be done with it.

Sounds good! Would you be open to PRs to implement this? Would we need it behind an experimental flag?

from dependency-track.

nscuro commented on August 25, 2024

Would you be open to PRs to implement this?

Most certainly.

Would we need it behind an experimental flag?

I think that would be good.

We can still decide to remove the flag later if we deem it unnecessary, but initially we should assume that there will be noticeable differences that users will need to "opt in" to.

from dependency-track.

mykter commented on August 25, 2024

I've been thinking about this some more and came up with a potential problem. Let's say you upgrade your BOM generator, and it adds a new metadata field as a property. I don't think anyone would expect this to cause a problem, but if we were using strict component equality then every vulnerability and policy violation would disappear and be recreated afresh the first time this new BOM was uploaded, with no triage status or notes etc.

So on either extreme we have:

Use strict equality (as we discussed above): consumers need to deal with vulnerabilities and policy violations that get recreated on any change to the generated BOM
Use the existing identity based equality: consumers have to deal with not being able to represent multiple different instances of the same dependency in a BOM

Middle grounds I can think of:

Choose the behaviour on upload. In theory this allows the best of both worlds, as you could use strict equality most of the time, and identity based when your BOM changes in some way. In practice I don't think this is realistic - you can't be fiddling with automated BOM uploads for one-off activities, and you'd probably only notice you needed this behaviour when it was too late and all your vulnerabilities had been recreated.
Make the equality check configurable to some degree: start with the existing fields as a base (or perhaps a bigger default set?), select other fields that you want to include, and if a component doesn't match on all these fields then it's treated as distinct.
- It could be configured in the BOM upload request, at the project level, or globally. These options get progressively simpler (good!) and less flexible (bad). Arguably the per-request option can be made to behave like the per-project one by the client: use a consistent equality definition whenever you're uploading to a project.

In addition to purl/cpe/swid/name/version, I can see deviations in fields like these warranting separate components:

pedigree
call stack
named properties
occurrences

Option 4 feels safer to me, whilst still meeting the need to be able to represent multiple instances of the same component. It is more complex and subtle though.

Are there other options I'm not thinking of?

from dependency-track.

nscuro commented on August 25, 2024

I think option 4 is going in the right direction - We need to find a minimal subset of component properties that can reliably uniquely identify a component.

I'm not sure if giving too much choice to clients is a good idea though. Ideally we would identify one "approved" way of doing things and run with it. The more opportunities for variation we offer, the farther away people's experiences will drift apart. It will be challenging to support users if the de-duplication is too customizable, if that makes sense.

In addition to purl/cpe/swid/name/version, I can see deviations in fields like these warranting separate components:

pedigree

call stack

named properties

occurrences

We definitely need to consider hashes as well. Probably also licenses.

RE occurrences: Consider that across project versions, the same component can appear in different places. Or additional occurrences can get added from one project version to the next. We wouldn't want the component to be recreated, just because it is imported from more locations. Call stack may have similar semantics.

from dependency-track.

Use stricter identity comparison when merging components about dependency-track HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent