Comments (6)
Thanks for the report β this is a pretty busy week for me but I had a couple of initial thoughts. Clearly we need to add some additional tests to https://github.com/LibraryOfCongress/bagit-conformance-suite/!
One thing I was wondering is how many of these tools promise support for BagIt 1.0, and thus whether this might simply be treated as part of bringing an existing (likely 0.97-focused) tool into full support for the spec. I definitely would like to make a 1.0 update to bagit-python.
That language was introduced between 0.97 and the first 1.0 drafts, and I'm not sure whether which came first: this portion of the payload manifest and the same portion of the fetch.txt
description which obviously is very URL-focused. I'd have to check the old discussion but I believe the intention was to avoid ambiguity by making it always be safe to run filenames through a URL decoding function even in cases where the original filename was itself URL-encoded (which is not uncommon in certain communities). Since most of the common escaping conventions are valid characters in filenames on at least some operating system and filesystem combinations (e.g. new\nline
is a valid filename), I'm not sure there's a better option than working with each of the implementers to cover this case.
from bagit-spec.
It appears the encoding was introduced here as a response to newline characters appearing in some file names, a case the 0.97 spec did not support. However, the draft language did not also include encoding %
, which I think is where the confusion originated. Encoding the %
is essential otherwise you would not be able to distinguish between the distinct files new\nline.txt
and new%0Aline.txt
.
All of the implementations that I opened issues for claim to support 1.0. With the exception of bagit-python, I did not open issues against 0.97 implementations.
If there ever is a next version of the BagIt spec, I think it would be nice if the escaping was handled the same way as checksum utilities. Compatibility with them is enormously useful.
from bagit-spec.
I might also point out that if implementations percent-decode by only decoding the CR
, LF
, and β
characters then they will remain mostly compatible with incorrect 1.0 implementations. Normally, when you percent-decode you decode any encoded character as described here. However, by not decoding every encoded character a correct 1.0 implementation could validate an unescaped path like testβ
201.txt
from a current implementation.
I don't think I have ever used a percent-encoding library that allows you to control the characters that are decoded, so doing this will likely require a custom implementation or a series of string search and replaces.
from bagit-spec.
See issue in my repo for response, in line with these remarks...
from bagit-spec.
Link to that comment: richardrodgers/bagit#33 (comment)
from bagit-spec.
After spending some time discussing this with some coworkers, I see that I did a poor job succinctly stating a desired change to the spec.
The existing language:
If a filepath includes a Line Feed (LF), a Carriage Return (CR), a Carriage-Return Line Feed (CRLF), or a percent sign (%), those characters (and only those) MUST be percent-encoded following [RFC3986].
Should be changed to something like:
If a filepath includes a Line Feed (LF), a Carriage Return (CR), a Carriage-Return Line Feed (CRLF), or a backslash (
\
), then those characters MUST be replaced with the literal strings\n
,\r
,\r\n
, and\\
respectively. Additionally, if any characters are replaced in a path, then the manifest entry line MUST be prefixed with a backslash (\
). For example:\d8e8fca2dc0f896fd7cb4cb0031ba249 file-with\nnewline
This would make the BagIt format compatible with unix checksum utilities.
from bagit-spec.
Related Issues (15)
- Requiring every payload file to be in every payload manifest HOT 3
- Some feedback on the bagit1.0 branch HOT 3
- Reference RFC2234 for ABNF and core rules? HOT 3
- Should not have a MUST in a non-normative section HOT 1
- Tag and payload manifest checksum concordance HOT 4
- Changes requested from ISE review HOT 2
- Consider adding Bag-Software-Agent to bag info well-known field list HOT 4
- Version 2.0 HOT 1
- Convention for ZIP archiving a bag? HOT 5
- Minor suggestion
- Add note to spec pointing to this issues list?
- being explicit: all fetch.txt items in oxum?
- bag-info.txt clarifications HOT 1
- Specification-level support needed for soft links HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bagit-spec.