Giter VIP home page Giter VIP logo

Comments (6)

acdha avatar acdha commented on June 20, 2024

Thanks for the report β€” this is a pretty busy week for me but I had a couple of initial thoughts. Clearly we need to add some additional tests to https://github.com/LibraryOfCongress/bagit-conformance-suite/!

One thing I was wondering is how many of these tools promise support for BagIt 1.0, and thus whether this might simply be treated as part of bringing an existing (likely 0.97-focused) tool into full support for the spec. I definitely would like to make a 1.0 update to bagit-python.

That language was introduced between 0.97 and the first 1.0 drafts, and I'm not sure whether which came first: this portion of the payload manifest and the same portion of the fetch.txt description which obviously is very URL-focused. I'd have to check the old discussion but I believe the intention was to avoid ambiguity by making it always be safe to run filenames through a URL decoding function even in cases where the original filename was itself URL-encoded (which is not uncommon in certain communities). Since most of the common escaping conventions are valid characters in filenames on at least some operating system and filesystem combinations (e.g. new\nline is a valid filename), I'm not sure there's a better option than working with each of the implementers to cover this case.

from bagit-spec.

pwinckles avatar pwinckles commented on June 20, 2024

It appears the encoding was introduced here as a response to newline characters appearing in some file names, a case the 0.97 spec did not support. However, the draft language did not also include encoding %, which I think is where the confusion originated. Encoding the % is essential otherwise you would not be able to distinguish between the distinct files new\nline.txt and new%0Aline.txt.

All of the implementations that I opened issues for claim to support 1.0. With the exception of bagit-python, I did not open issues against 0.97 implementations.

If there ever is a next version of the BagIt spec, I think it would be nice if the escaping was handled the same way as checksum utilities. Compatibility with them is enormously useful.

from bagit-spec.

pwinckles avatar pwinckles commented on June 20, 2024

I might also point out that if implementations percent-decode by only decoding the CR, LF, and β„… characters then they will remain mostly compatible with incorrect 1.0 implementations. Normally, when you percent-decode you decode any encoded character as described here. However, by not decoding every encoded character a correct 1.0 implementation could validate an unescaped path like testβ„…201.txt from a current implementation.

I don't think I have ever used a percent-encoding library that allows you to control the characters that are decoded, so doing this will likely require a custom implementation or a series of string search and replaces.

from bagit-spec.

richardrodgers avatar richardrodgers commented on June 20, 2024

See issue in my repo for response, in line with these remarks...

from bagit-spec.

acdha avatar acdha commented on June 20, 2024

Link to that comment: richardrodgers/bagit#33 (comment)

from bagit-spec.

pwinckles avatar pwinckles commented on June 20, 2024

After spending some time discussing this with some coworkers, I see that I did a poor job succinctly stating a desired change to the spec.

The existing language:

If a filepath includes a Line Feed (LF), a Carriage Return (CR), a Carriage-Return Line Feed (CRLF), or a percent sign (%), those characters (and only those) MUST be percent-encoded following [RFC3986].

Should be changed to something like:

If a filepath includes a Line Feed (LF), a Carriage Return (CR), a Carriage-Return Line Feed (CRLF), or a backslash (\), then those characters MUST be replaced with the literal strings \n, \r, \r\n, and \\ respectively. Additionally, if any characters are replaced in a path, then the manifest entry line MUST be prefixed with a backslash (\). For example: \d8e8fca2dc0f896fd7cb4cb0031ba249 file-with\nnewline

This would make the BagIt format compatible with unix checksum utilities.

from bagit-spec.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.