package-url / purl-spec Goto Github PK

A minimal specification for purl aka. a package "mostly universal" URL, join the discussion at https://gitter.im/package-url/Lobby

Home Page: https://github.com/package-url/purl-spec

License: Other

purl package-url package url cyclonedx dependencies package-management sbom spdx

purl-spec's People

Contributors

Stargazers

Watchers

Forkers

pombredanne jonoyang chinyeungli brianf bhamail haikoschol jdillon isimluk mlinksva gsmall7 kartiksibal jeremylong theteebox siemens lirantal sa-tasche bradcupit voltone codeshane infamous19 sbs2001 tg1999 hboutemy psychohippo athos-ribeiro sthagen david-a-wheeler rlneumiller gaybro8777 josiahparry prebuilder tevoinea magnusbaeck mrandreastoth hixio-mh j-s-3 mattt mealingr vbarbaros stevelasker aidandelaney arawutp rnjudge nishakm masireddy65 nexb jhutchings1 cdupuis khuey coderpatros shibumi masahiro331 jgse captn3m0 brphelps surfndez gfs ericlarssen-wf jocluby jpinz foxboron samkenxstream westonsteimel fishseabowl jsteinhofff marklodato aetotvl6789 gorkalertxundi secure-sdl maxhbr cpendery puerco bluesentinelsec jkowalleck xiphoseer taoxinyi juxtin mkoch jlb-bb oliverchang sify21 tiegz briandealwis mprpic maitre-matt ninoseki lf32 hishamhm williamboman bman98 yessadik2 oatovar richardfan0606 allen7lee qpc-github ikesyo jasnow mrdvt92 jshmrtn sporho

purl-spec's Issues

Spec and Maven Rebuilds

A few questions:

Does the specification allow for specifying the location of the source repository? What happens if the source repository has been cloned from the original upstream and built from the clone?
Does the specification handle the scenario where the upstream version has been rebuilt and reversioned (as is common in the case of Maven builds and being released to a company repository). Which version should be specified - the upstream version or the released version?

Minor typo in qualifiers section

Following line of the qualifiers section reads as follows

- For each pair of `key` = `value`:
    ...
    - A `value` must be must be a percent-encoded string

Looks like the second must be a typo?

Rename docker with ociimage

Docker images are standardized under the OCI image specification. https://github.com/opencontainers/image-spec
Docker is only one vendor that support the image spec.

qualifiers sorting

The spec reads

If the qualifiers are not empty and not composed only of key/value pairs where the value is empty:
...
 * sort this list of qualifier strings lexicographically
...

Is the "qualifier strings" the entire key/value pair or just the key?

More so though why is it important to sort this at all. This implies and ordering; which here in this generic aspect of a purl for qualifiers is erroneous. I can't think of any case where the qualifier order would be important, and if it was important the importance would be based on the initial order given and the implementation of that order highly package-type specific.

It seems to me that this aspect of the spec should be removed; and either replaced by that qualifiers are non-ordered, or that they are ordered and order should be retained from original input purl. I'd suggest though that order here isn't mean to be important so the directive to sort make this a bit problematic and implies that the qualifiers are indeed meant to be ordered.

Since this aspect of a purl is so very format dependent, purl imposing this rule on how to render is dangerous as you could have formats that actually do need ordering here, and others that don't care. I'd suggest formats shouldn't care about order here but the fact that the spec requires order implies its an ordered structure and I think that is generally bad, and at the very least misleading.

What about packages on ftp?

There are projects who regularly publish their canonical releases on ftp.
A typical example would be something like ftp://ftp.gnu.org/gnu/gcc/gcc-7.2.0/gcc-7.2.0.tar.gz.
Are there plans for such a type? (because the spec now says:

Special URL schemes [...] such as [...] ftp:// are NOT valid purl types

Inconsistency in the description of the checksum qualifier.

According to the spec:

checksum is a qualifier for one or more checksums stored as a comma-separated list. Each item in the value is in form of lowercase_algorithm:hex_encoded_lowercase_value

and an abbreviated example is given as checksum=sha1:ad9503c3e994a4f...

However, also according to the spec:

A [qualifier] value must be a percent-encoded string

And to build a purl sting which has qualifiers, one must

create a string by joining the lowercased key, the equal '=' sign and the percent-encoded value to create a qualifier

In a percent-encoded string, the colon character, ':', is encoded as '%3A'. And in fact the reference java implementation will encode the above as checksum=sha1%3Aad9503c3e994a4f...

What do purls do about different artifacts for the same version?

This is somewhat related to #10 and #5.

Often, package maintainers re-release a new tarball for the same version and overwrite the old version at the same URL. Yes this is bad, but it happens in real life.

How should purls handle this? Should this just remain ambiguous? Should a (specific type of?) archive digest be included as a qualifier to disambiguate? e.g., should we standardize ?sha256=blahblah? Example: suppose the first release of foo version 1.0.0 has a vulnerability, but the developers push a new tarball after a few days without bumping the version number?

Seems like there should at least be a section in the spec for how this is handled (or not handled).

Note: there are other ways you could get different artifacts for a single version, e.g. some packages have different release archives for different platforms. I assume qualifiers will be used to disambiguate in that case.

Add Sonatype OSS as adopter

https://ossindex.sonatype.org/doc/coordinates here they clearly mention they use purl . By the way this implies the whole Nexus platform is using purl is some way or another.

Definitely add this in our README

Parsing spec is missing the version

The parsing section makes no mention of the version, I assume there should be another right split on '@' prior to the left split on ':', i.e.

Split the remainder once from right on '?'...

Split the remainder once from right on '@'

The left side is the remainder

Percent-decode the right side

UTF-8-decode the version if needed in your programming language

This is the version

Split the remainder once from left on ':'...

Clarify contributing... and "governance"

Here is what I suggest:

add a simple DCO to the repo and document this in a CONTRIBUTING file, requesting sign off by in the good ole and time tested Linux way that I am familiar with
clarify that the license of the spec and test data is a dedication to the public domain (as listed in the spec alright) OR alternatively available under the CC0 for jurisdiction that do not recognize PD dedications (e.g. anybody in Germany ;) )
clarify that tools are preferably licensed using the MIT license and should using the DCO too. This is what the Go and Python implementations are doing already. We also use a streamlined copyright notice to avoid overloading these with dates and names
eventually add an AUTHORS file or authors section to the repo for reference?

@andrew @ashcrow @R2wenD2 @sschuberth this should be easy peasy to agree to this

We need a code of conduct too and I defer to @andrew on this.

Clarify the handling of version control system URLs

This has been reported in #1
The discussion started here: #1 (comment) by @iwillbar
An these:

#1 (comment)
#1 (comment) by @tgamblin

Purl spec icon

Hi,

We here at Sonatype are building out a new search.maven.org site. We have a BETA running on https://search.maven.org/beta/

Having caught the attention of the purl-spec we have added it to our list of available modules by providing a quick way to get a generated purl.

On our artifacts page we added it as shown below :

For a life demo see : https://search.maven.org/beta/artifact/org.springframework.boot/spring-boot/2.0.1.RELEASE/jar

As you might see for the other formats we have a recognizable icon indicating their association. We were wondering if you have anything like that as well to indicate the purl-spec.

Concerns with type-specific component value transformations

Howdy folks, been looking over this specification and its pretty complete, but I have some concerns about the per-type specific component value transformations.

Specifically the various bits that are per-type that may need for canonical form to be case-sensitive or case-insensitive, or do various translation of chars (like "_" to "-") for example.

It seems like in terms of a generic spec and impls to be able to generically parse and form a package-url, that with such edge-cases that any impl would be eventually invalid since it could not possibly encode the details of presently unspecified package types, or whatever new package systems are created in the future.

The docs for the pypi type state that pypi treats "-" and "" the same, but requires that "" be translated into "-". This seems like over complication if the underlying system would treat them the same?

The docs for the npm type state that the value must be lower-cased. And while I understand the underlying npm system may require that, having to encode this detail into the package-url specification seems like it may lead to sustainability issues in the future. While an impl could encode this, when some new format comes along say some fictitious "upper" type for some fictitious package system where everything is always UPPER-CASE (and anything other than UPPER-CASE is not valid). Its not likely that existing package-url impls would know about that type and end up making invalid canonical string representations.

It seems almost like if you were to consider the URL specification, that the spec would treat path/query/fragment details different depending on the host:port part of the identifiers. Or similarly for URI spec that the scheme would indicate how you would transform the rest of the components. This would make for hugely complex implementations (which would probably be eventually if not already wrong). I feel like the package-url specification is already like that with these type specific transformation wrinkles.

I believe it would be simpler and more normal, to ignore case (except perhaps for type itself) and ignore content transformation (except for percent encoding). This would imply you could end up with:

npm:FOO@1

... which may not be proper with respect to the package expectations that name is lower-cased. But that seems like its an input problem and not really something that a generic specification to identify and generalize package identifiers should be concerned with.

npm:foo@1

... would be more correct in terms of how the NPM community has decided to normalize their identifiers, but in terms of package-url specification, it seems like it really should not care. Since its not reasonable (or even possible presently with various formats needing lower and some needing mixed case), it seems like the specification should to be more general and support future formats not require any such transformations.

Go is called Go, not Golang

Since the language is called Go, even though its domain is golang.org, it would be nice (and short) if the package identifier for Go modules was "go" rather than "golang".

Default Maven Repository

Currently the default repository_url value for maven is listed as "repo1.maven.org". I would propose changing this to "http://repo.maven.apache.org/maven2" to match the current Super POM definition and also to specify the full URL (so the protocol and path are not ambiguous).

Purpose of distro qualifier for Debian packages?

I don't understand the purpose of the "distro" qualifier for Debian packages. If I'm not completely mistaken, Debian and derivates use a shared package pool for all distribution releases, so e.g. jessie and stretch can share a completely identical package.

In turn, this means that namespace, name, version plus the arch qualifier fully describe a Debian package. So the distro qualifier should be optional.

This also means that the note "There is no default package repository:..." should state the the repository url should be derived from the namespace, not the "distro" qualifier.

(In contrast, RPM-based distributions have distinct package repositories for each release, so the distro key is necessary there!)

Invite @stevespringett as a co-admin

@andrew
@ashcrow
@R2wenD2
@sschuberth
unless you have an objection, I am inviting @stevespringett as a co-owner of this org.
Please reply with a +1 or -1 comment.
Steve is part of OWASP and has contributed the Java implementation of the package URL.. and uses purl in several projects.

Should all the qualifiers be type-specific?

The current spec defines purl qualifiers as:

qualifiers: extra qualifying data for a package such as an OS, architecture, a distro, etc. Optional and type-specific.

Should all of those be type-specific? Architectures and OS distro names/versions seem like things that should have standard names across different purls, and it may become very confusing to, say, query different purls if these are defined differently across them. Incidentally, we do include these types of things in Spack specs.

If there are standard/reserved qualifiers, how do the names become standardized?

JSON Schema for PURL?

Is there a JSON schema for PURL?

A well defined JSON schema would be able to encode the rules and restrictions on each PURL component in a programmer friendly way. This would enable the use of existing JSON schema validation and manipulation tools in multiple programming languages to create, validate or exchange PURL information. One may also embed or store PURL in a JSON document in a more programmatically accessible format (for eg. with CVE JSON data as on CVElist). This would also help in storing PURL information without the need to parse a PURL on every use (think SQL or NoSQL queries).

The test-suite-data.json seems to contain PURL components encoded as JSON, but is missing scheme, and may not need is_invalid.

  {
    "description": "valid maven purl",
    "purl": "pkg:maven/org.apache.commons/[email protected]",
    "canonical_purl": "pkg:maven/org.apache.commons/[email protected]",
    "type": "maven",
    "namespace": "org.apache.commons",
    "name": "io",
    "version": "1.3.4",
    "qualifiers": null,
    "subpath": null,
    "is_invalid": false
  }

Propose 1.0 Milestone

There are many pull requests that need merged and unanswered questions among some of the issues.

The security industry is in the process of fully adopting PackageURL with OWASP and Sonatype already supporting it, and others joining. However, we need to come to an agreed upon 1.0 release and that means setting a target date, addressing some of the testsuite issues and specification questions.

I'm open for having regularly scheduled calls (webex, etc) to sort some of this stuff out.

Update wiki when latest spec is merged

.... just a reminder

Clarifications

I have been going over the specification and I have a few minor things I was hoping to get clarification on.

The checksum qualifier doesn't have a formal restriction on the algorithm name, I'm assuming it should be one ASCII letter followed by any number of ASCII alphanumerics (possibly with the addition of a hyphen, though that seems like it could conflict with subresource integrity's use of "-" as a delimiter). Also, should the canonical form of checksum list be deduped or sorted?

When parsing, should "strip leading and trailing /" include runs of slashes or just a single slash? It seems minor, but with the introduction of the "pkg:" scheme, it means that the type could lead with a run. I wasn't sure if multiple slash removal was always necessary or not.

With character encoding, the Wikipedia article cites RFC 3986 in reference to the reserved and unreserved characters, I'm assuming that all reserved characters should be encoded (unless explicitly used as a purl delimiter) and none of the unreserved characters should be encoded. That is, when a reserved character has no special meaning to purl, it should still be encoded (e.g. the Maven GAV o'doyle:rules!:1.0 shoud be pkg:maven/o%27doyle/rules%[email protected] and not pkg:maven/o'doyle/[email protected]). Also, would referencing RFC 3986 directly make more sense then the Wikipedia article, it seems like one is less of a moving target then the other.

NuGet have no namespace

See NuGet/Home#6180 (comment) for details as reported by @anangaur

Authorities and private registries

Many companies run private deployments of package registries, and the names of packages in those registries are often both:

Entirely unrelated to the corresponding package with the same name in the well-known public registry
Only available at a private IP address

Suppose that I publish and consume private packages such as this. How would I mint PURLs that refer to these packages? If I can, how do I ensure they aren't confused with PURLs referring to the same package name in the public registry? If I can't, does that mean I can't ever use tools built on PURLs?

Version range

Is there desire for PURL to support version ranges or is that out of scope? For example, to describe vulnerable versions of a package.

clarify handling of "+" (plus) sign and blanks (spaces)

As "+" is frequently used within Debian package versions, I'd like to see a clarification (and probably examples/tests for it) whether this needs to be percent-encoded or not.

This also makes me wonder how blanks (" ") shall be encoded within the different parts of a purl - theoretically, I assume it should be "%20" in most cases? While I would consider "+" way more readable (as in usual HTML form encoding), it probably is just wrong according to RFC3986 if used anywhere else than in the qualifiers, right?

What's the use of namespace?

Why can't each type of purl simply have a name that might (or might not) be hierarchical?

The resulting purl would be the same, e.g. github:package-url/purl-spec@244fd47e07d1004f0aed9c, but it would be decomposed into github : package-url/purl-spec @ 244fd47e07d1004f0aed9c.

What is the advantage of having namespaces?

We really really need a cool logo

We really really need a cool logo: what about some cat-like logo ... 😹 this sound like purr and everybody loves them, and even if I am more of a dog person I melt when I see them

Based on #9
From @sschuberth and @R2wenD2

Percent encoding spec and : and /

The notes in the specification about percent encoding of ":" are a bit confusing:

the '#', '?', '@' and ':' characters must NOT be encoded when used as separators. They may need to be encoded elsewhere
the ':' type separator does not need to and must NOT be encoded. It is unambiguous unencoded everywhere

It seems like these 2 contradict either other. The former indicates that ":" may need to be encoded elsewhere. The later indicates that ":" is "unambiguous unencoded everywhere".

Similarly the qualifier component documentation says:

A value must be must be a percent-encoded string

... and does not mention anything about "/". But the test-suite-data.json references canonical_purl representations like repository_url=repo.spring.io/release.

And the percent-encoding docs state:

the '/' used as namespace/name and subpath segments separator does not need to and must NOT be percent-encoded. It is unambiguous unencoded everywhere

My interpretation of this boils down to... "/" is never encoded, but the language in the parts of the specification are unclear. The name, namespace and subpath parts are clear wrt to "/", which leaves the qualifier and version bits as vague as to if "/" is supposed to be percent-encoded or not.

Distinction from Persistent URLs (PURLs)

Hi, I just came across this project and I like the idea to provide universal identifiers for software packages. However, I would like to express some concerns regarding the name PURL. This name is already used for persistent URLs (see archive.org, Wikipedia). I think it is not a good idea to add unambiguousness at one place by introducing new ambiguousness at another place without need.

Overlap with SWID

I know there is a FAQ on this, and some further comment on grafeas about flexibility of using SWID or purl, though the efforts here are very much redundant with SWID.
Red Hat has raised the concerns around the paywall, but that doesn't negate the likes of standards. And I have heard that this barrier is currently or soon to be much lower (like orders of magnitude lower).
For the life of this project, adopters that may likely be faced with SWID requirements, it might likely be duplicate requirements

Add repo for test suite

See the spec for details. This should be a JSON file with purl and parts to test parsing and building and idempotence.

Retrieval isn't defined

URLs, in contrast to URIs, locate resources in addition to identifying them, allowing clients to retrieve a representation of the resource. Furthermore, how to retrieve such a representation must be defined solely by the specification of the URI scheme and must be globally unambiguous, so that two spec-conforming implementations of the scheme can't have two inconsistent interpretations of what "retrieval" means.

Note that consistent interpretation of retrieval does not imply identical retrieved resources, as a scheme specification's definition of retrieval can define ways for such retrieval to depend on configuration (such as a client's choice of DNS resolver) and other external state.

If a goal of purl-spec is to define URLs instead of URNs (which I suspect is a more useful goal), then the spec MUST (in the sense of RFC 2119) define how representations of package resources are retrieved and that definition MUST be unambiguous. Otherwise, two implementations could provide entirely different package representations for the same client retrieving the same package with no clear answer as to which retrieved representation is "correct".

Reasons for github, bitbucket and generic types?

I was just looking over the spec again and noticed we have these non-type specific generic formats:

github
bitbucket
generic

These do not actually express what the package is and I thing these types are an anti-pattern.

You could imaging that all other types could be expressed as these, but that does not help to indicate what the type of that thing is, only where it comes from as some opaque binary, or in the case of "generic" that its just some named octet-stream. I think these are perversions of the intended nature of this specification and should be removed.

You could consider maybe you have a type that is http or https and then its really just a redirected URL representation as a PURL but that really doesn't help anything. The point of this spec is IIUC to identify packages (which have a specific known type, and some agreed upon coordinates). So these github, bitbucket and generic types are really useless and IMO harmful to the viability of the package-url specification.

Add cargo packages in spec

Document adopters and implementations

@stevespringett implemented Purl in his dependency-track which a package vulnerabilities tracker
I think this is an awesome use case.
https://github.com/search?l=&q=purl+user%3Astevespringett&ref=advsearch&type=Code&utf8=%E2%9C%93

We should have a page or doc of sorts that showcases adopters and users!
For now some is in the spec alright but it should be eventually moved out of it and made more prominent,

Process suggestion

The single PR with comments is becoming hard to follow. I'd suggest merging the current PR as a version 0.1 of the spec. We can then use issues and pull requests so we can have one artifact per issue rather than trying to track it all in a single artifact.

Link relations?

PURLs seem to imply not just one resource, but a family of related resources:

Package name - some name in some package registry of some package ecosystem
Package version - some specific version of a package name, defined according to some sort of versioning system
Package registry - some namespace of packages within some package ecosystem, typically with the authority to handle requests for information about and installation of those packages
Package ecosystem - some existing system and protocol for depending on, publishing, and installing packages, such as "all Python packages installable with pip"
Package versioning system - some system for minting structured version identifiers that allows clients to reason about the relationships between different versions in the same system (e.g. semver, git commits, docker registry labels, etc)
Package variants - some alternate form of a specific package version which clients with specific needs may opt-in to

It seems within the PURL spec's scope to define these terms and standardize ways for a package registry to say something like "these are my supported versioning systems". This can be done by defining RFC 8288 Link Relations, and indeed that's the preferred means of doing so. There are many standardized ways in existing protocols for resources to declare relations to other resources dynamically (e.g. the HTTP Link header), so you'd be able to leverage a lot more existing work while making a simpler and more flexible spec.

RPM considerations, and maybe for others as well

It looks like some of the media references include a checksum of the final package or similar.

One thing that can be an issue for RPMs when determining an advisory is to the provenance of the sources and dependencies used in producing that binary.
e.g. I could have a malicious gcc or glibc-static present at the time of building rpm:fedora/[email protected]?arch=i386&distro=fedora-25 but the rpm attributes of that package do not change.

The added difficulty of this is that for packages like RPM, the signature of the RPM may be embedded in the archive itself, so simply a checksum of the *.rpm is not sufficient, nor only a checksum of the cpio stream contained within as metadata like xattrs is separated from that cpio stream.

There is something in having a way to cite the identity of the author of that build, and further a way to cite the context(s) that the package was built. In this way it could be possible to cobble together a URI referencing the specific build of the package in question.

Should the purl scheme/type be prefixed with `purl+`?

From #1 (comment) :

While reviewing this IANA list of registered URI schemes https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml two thoughts came to my crooked mind:

we should reference it in the spec here https://github.com/package-url/purl-spec/pull/1/files#diff-88b99bb28683bd5b7e3a204826ead112R138

to avoid any type/scheme naming conflict (there is already an official scheme for go://) we could consider stating that:

we cannot use any official or known schemes unless registered for purl usage

OR we could prefix all the type with purl+type as in purl+go:github.com/gorilla/context ....

I kinda like the purl+ type prefix as this makes it always clear we deal with a purl BUT at the same time this makes the string a tad more heavier with more characters.
And we could also register with IANA purl and purl+* as official schemes

Thoughts?

Media type instead of schemes?

So as is, the PURL specification's use of URIs has some issues standing in the way of adoption by the wider Internet community:

Multiple schemes that are part of a broader interconnected protocol with new schemes required whenever a new kind of package manager is supported. Scheme deployment is lengthy and expensive, often requiring custom code not just in consumers and producers but in proxies, gateways, network-level middleboxes, application-level routers, and any number of other agents.
Improper use of fragments which schemes are explicitly not allowed to place syntax constraints on. A fragment identifies a "secondary resource" contained within a retrieved representation of the primary resource. Because of it's dependence on the format of the representation, only format definitions (IANA Media Types) can specify how fragments are interpreted. A scheme therefore can only add meaning to fragments indirectly by restricting what media types are legal representations. But none of the PURL schemes define how to retrieve packages at all, let alone what media types can be retrieved.
Overlap with existing definition of URI authorities. The README states: "A namespace segment may sometimes look like a host but its interpretation is specific to a type." But schemes are already allowed to define authorities with custom interpretation; there's no requirement that locator schemes define locations in terms of IP / DNS addresses. RFC 3986 even explicitly defines authorities as governors of some namespace so there's clear overlap in purpose here.
Most of all: unclear utility to consumers. What is a client supposed to do with a PURL? From use cases you've described to me, it seems the primary purpose of a PURL is to have a compact textual representation of various kinds of package data that generic package tools parse and use for comparison to verify they're talking about the "same" package. None of "dereferencing", "installing", "publishing", or "deleting" occurs with a PURL because those protocols are completely dependent on what kind of package is used, which generic PURL clients are trying to avoid.

To expand on the unclear utility point, here are the proposed use cases as I understand them:

Cross-system metadata indexing to search and monitor packages by metadata like available versions, dependencies, contributors, etc. across multiple package managers (libraries.io)
Vulnerability tracking to determine whether a package's set of possible transitive dependencies includes a known vulnerability and whether the version constraints of that dependency graph allow or prevent patching
Other kinds of package-content-agnostic analysis tools, especially tools that look at the dependency graphs of package ecosystems

All of these use cases really just need a common format for package metadata, particularly one that allows testing for whether two representations refer to the "same" package. I don't think a whole new URI scheme is needed for that. Instead, make a media type that represents this metadata and use that type in existing schemes like HTTP. A good existing type to model yours after is the RFC 7807 application/problem+json type which looks like this:

Content-Type: application/problem+json
{
    "type": "https://example.com/probs/out-of-credit",
    "title": "You do not have enough credit.",
    "detail": "Your current balance is 30, but that costs 50.",
    "instance": "/account/12345/msgs/abc",
    "balance": 30,
    "accounts": ["/account/12345",
                 "/account/67890"]
}

You could define something similar for package metadata. You could also define a second format that's purely textual (e.g. text/pkgmeta to complement application/pkgmeta+json) and acts as a short human-friendly string. Having both standard structured representations and text representations means that only human-facing systems need to care about parsing, making it far easier for non-human-facing systems to implement the format and avoid encoding bugs.

Media type registrations are much easier to obtain than schemes and less impactful on the wider Internet. A vendor tree registration like application/vnd.librariesio.pkgmeta+json takes little more than an email; see RFC 6838 for details.

Reuse Subresource Integrity spec for checksums

The current checksum qualifier acts as an algorithm agnostic way to specify the exact representation that should be retrieved for a package, ensuring the PURL refers to a truly immutable revision of a package. But the alg:hex format doesn't really say what alg values are allowed, whether there are length limitations, how case sensitivity is handled, or whether known-broken algorithms should be disallowed even by intermediaries, and it uses hex-encoding instead of the more space-efficient base64 encoding preferred for URI-embedded binary data.

You could instead reuse the existing Subresource Integrity spec, which is designed to solve all of these sorts of problems for exactly this sort of use case. If that sounds reasonable, renaming the qualifier to integrity instead of checksum might be appropriate.

Expected PURL for swift

I was trying to identify the expected PURL for swift packages, and was wondering if there was a suggested format.

My first guess is that it is likely similar (but not quite the same as) golang, as it appears that swift packages require both a repository path and a package name. Since a package name is required, I suspect that the repository path is all contained within the namespace.

For example, from the examples on the linked swift package site perhaps we would have:

pkg:swift/github.com/apple/example-package-fisheryates/[email protected]
pkg:swift/github.com/apple/example-package-playingcard/[email protected]
pkg:swift/github.com/apple/example-package-deckofplayingcards/[email protected]

Note that there is one description of a possible ways to allow multiple swift modules in one repository here. It is not official by any means, from what I can tell, but it appears that it could be supported using the subpath component of PURL.

Does this fulfill the expectations of the PURL for swift?

Thanks

Ambiguity between test cases and specification

Hi there,

I'm working on a Rust module to parse and build purls, it's mostly complete but I do have a question concerning qualifiers order:

the specification, well, specifies that qualifiers must be added to the purl in lexicographic order
in test test case, the maven uses qualifiers and maven pom reference examples canonical purls do not have qualifiers in the right order

So, is it an error in the test case ? Or do maven purls have a special rule for qualifiers ? The same goes for maven pom reference: I'm not sure how the canonical purl is supposed to be derived from the purl.

Typo in README

The README gives three examples of Docker purl strings:

pkg:docker/cassandra@latest
pkg:docker/smartentry/debian@dc437cc87d10
docker:gcr.io/customer/dockerimage@sha256:244fd47e07d10

Unless I misunderstood the syntax, the third one appears to be a typo?

separate type from provider

In the current spec the type of a package and the provider of a package are compressed into the type element. For example, type = npm implies npmjs.com as the provider. While this is true in general, it gets complicated when talking about a package type that can live on different providers (e.g., an npm on GitHub).

One possible path is to use the git-style + approach to get something like

pkg://npm+github/myorg/foo@a68381e

or more generallly

pkg:type[+provider][/namespace]/[name][@version]

This example indicates that there is an npm formatted entity on github in the foo repo in the myorg org with commit hash a68381e.

In this way, the current type element remains the type or format of the entity being located by the purl but the provider (if supplied) dictates the rest of the purl structure in the same way that the type does currently. If the provider is omitted then a spec'd default provider for the given type is used (e.g., npmjs for npm)

The purl spec should enumerate separately the set of types and providers with canonical values. For providers it is likely best if the values are as symbolic as possible. That is, use npmjs rather than npmjs.com. This simplifies the URLs for the user (npmjs.com? npmjs.org? www.npmjs.*?) and insulates URLs from changes in the provider's deployment.

Move test suite to this repo

This is simpler

ambiguity: RPM versions with an epoch

The README says:

the version is the combined epoch (if not 0), version and release of an RPM.
but the list of RPM examples doesn't include any with an epoch:

pkg:rpm/fedora/[email protected]?arch=i386&distro=fedora-25
pkg:rpm/opensuse/[email protected].?arch=i386&distro=opensuse-tumbleweed

Question: how should we combine the epoch with the version?

Specify registry & organisation name for container images

Suggestion: for container images the registry and user/org name should always be specified e.g.

docker:docker.io/library/redis@sha256:123...

rather than

docker:redis@sha256:123...

and

docker:docker.io/lizrice/hello@sha256:456...

rather than

docker:lizrice/hello@sha256:456...

This would be consistent with containerd and would mean tooling doesn't have to handle default values.

How are golang sub-modules supposed to be expressed by purl?

I am confused reading the spec for purl in relation to golang sub-modules.
For example, looking at the submodule expressed in this go.mod file: https://github.com/go-modules-by-example/submodules/blob/master/a/go.mod, released by the a/v1.0.0 tag: https://github.com/go-modules-by-example/submodules/releases

Is the purl:

pkg:golang/github.com/go-modules-by-example/submodules/[email protected]
pkg:golang/github.com/go-modules-by-example%2Fsubmodules%[email protected]
pkg:golang/github.com/[email protected]#submodule/a
pkg:golang/github.com/go-modules-by-example/[email protected]#a
pkg:golang/github.com%2Fgo-modules-by-example%2Fsubmodules%[email protected]

It basically comes down to what is the namespace (if any), what is the name and what is the sub-path (if any) for this submodule.