This issue has been migrated to an image.sc topic after the 2020-05-06 community discussion. Authors are still encouraged to make use of the specification in their own libraries. As the v3 extension mechanism matures, the specification will be updated and registered as appropriate. Feedback and request changes are welcome either on this repository or on image.sc.
As a first draft of support for the multiscale use-case (#23), this issue proposes an intermediate nomenclature for describing groups of Zarr arrays which are scaled down versions of one another, e.g.:
example/
βββ 0 # Full-sized array
βββ 1 # Scaled down 0, e.g. 0.5; for images, in the X&Y dimensions
βββ 2 # Scaled down 1, ...
βββ 3 # Scaled down 2, ...
βββ 4 # Etc.
This layout was independently developed in a number of implementations and has since been implemented in others, including:
Using a common metadata representation across implementations:
- fosters a common vocabulary between existing implementations
- enables other implementations to reliably detect multiscale arrays
- permits the upgrade of v0.1 arrays to future versions of this or other extension
- tests this extension for limitations against multiple use cases
A basic example of the metadata that is added to the containing Zarr group is seen here:
{
βmultiscalesβ: [
{
βdatasetsβ : [
{"path": "0"},
{"path": "1"},
{"path": "2"},
{"path": "3"},
{"path": "4"}
]
βversionβ : β0.1β
}
// See the detailed example below for optional metadata
]
}
Process
An RFC process for Zarr does not yet exist. Additionally, the v3 spec is a work-in-progress. However, since the implementations listed above as well as others are already being developed, I'd propose that if a consensus can be reached here, this issue should be turned into an .rst file similar to those in the v3 branches (e.g. filters) and used as a temporary spec for defining arrays with the understanding that this a prototype intended to be amended and brought into the general extension mechanism as it develops.
I'd welcome any suggestions/feedback, but especially around:
- Better terms for "multiscale" and "series"
- The most useful enum values
- Is this already too complicated? (Limit to one series per group?) or on the flip side:
- Are there existing use cases that aren't supported? (Note: I'm aware of some examples like BDV's N5 format but I'd suggest they are higher-level than just "multiscale arrays".)
Deadline for a first round of comments: March 15, 2020
Deadline for a second round of comments: April 15, 2020
Detailed example
Color key (according to https://www.ietf.org/rfc/rfc2119.txt):
- MUST : If these values are not present, the multiscale series will not be detected.
! SHOULD : Missing values may cause issues in future versions.
+ MAY : Optional values which can be readily omitted.
# UNPARSED : When updating between versions, no transformation will be performed on these values.
Color-coded example:
-{
- "multiscales": [
- {
! "version": "0.1",
! "name": "example",
- "datasets": [
- {"path": "0"},
- {"path": "1"},
- {"path": "2"}
- ],
! "type": "gaussian",
! "metadata": {
+ "method":
# "skiimage.transform.pyramid_gaussian",
+ "version":
# "0.16.1",
+ "args":
# [true],
+ "kwargs":
# {"multichannel": true}
! }
- }
- ]
-}
Explanation
- Multiple multiscale series of datasets can be present in a single group.
- By convention, the first multiscale should be chosen if all else is equal.
- Alternatively, a multiscale can be chosen by name or with slightly more effort, but the zarray metadata like chunk size.
- The paths to the arrays are ordered from largest to smallest.
- These paths could potentially point to datasets in other groups via β../foo/0β in the future. For now, the identifiers MUST be local to the annotated group.
- These values SHOULD (MUST?) come from the enumeration below.
- The metadata example is taken from https://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.pyramid_reduce
Type enumeration:
Sample code
#!/usr/bin/env python
import argparse
import zarr
import numpy as np
from skimage import data
from skimage.transform import pyramid_gaussian, pyramid_laplacian
parser = argparse.ArgumentParser()
parser.add_argument("zarr_directory")
ns = parser.parse_args()
# 1. Setup of data and Zarr directory
base = np.tile(data.astronaut(), (2, 2, 1))
gaussian = list(
pyramid_gaussian(base, downscale=2, max_layer=4, multichannel=True)
)
laplacian = list(
pyramid_laplacian(base, downscale=2, max_layer=4, multichannel=True)
)
store = zarr.DirectoryStore(ns.zarr_directory)
grp = zarr.group(store)
grp.create_dataset("base", data=base)
# 2. Generate datasets
series_G = []
for g, dataset in enumerate(gaussian):
if g == 0:
path = "base"
else:
path = "G%s" % g
grp.create_dataset(path, data=gaussian[g])
series_G.append({"path": path})
series_L = []
for l, dataset in enumerate(laplacian):
if l == 0:
path = "base"
else:
path = "L%s" % l
grp.create_dataset(path, data=laplacian[l])
series_L.append({"path": path})
# 3. Generate metadata block
multiscales = []
for name, series in (("gaussian", series_G),
("laplacian", series_L)):
multiscale = {
"version": "0.1",
"name": name,
"datasets": series,
"type": name,
}
multiscales.append(multiscale)
grp.attrs["multiscales"] = multiscales
which results in a .zattrs
file of the form:
{
"multiscales": [
{
"datasets": [
{
"path": "base"
},
{
"path": "G1"
},
{
"path": "G2"
},
{
"path": "G3"
},
{
"path": "G4"
}
],
"name": "gaussian",
"type": "gaussian",
"version": "0.1"
},
{
"datasets": [
{
"path": "base"
},
{
"path": "L1"
},
{
"path": "L2"
},
{
"path": "L3"
},
{
"path": "L4"
}
],
"name": "laplacian",
"type": "laplacian",
"version": "0.1"
}
]
}
and the following on-disk layout:
/var/folders/z5/txc_jj6x5l5cm81r56ck1n9c0000gn/T/tmp77n1ga3r.zarr
βββ G1
βΒ Β βββ 0.0.0
...
βΒ Β βββ 3.1.1
βββ G2
βΒ Β βββ 0.0.0
βΒ Β βββ 0.1.0
βΒ Β βββ 1.0.0
βΒ Β βββ 1.1.0
βββ G3
βΒ Β βββ 0.0.0
βΒ Β βββ 1.0.0
βββ G4
βΒ Β βββ 0.0.0
βββ L1
βΒ Β βββ 0.0.0
...
βΒ Β βββ 3.1.1
βββ L2
βΒ Β βββ 0.0.0
βΒ Β βββ 0.1.0
βΒ Β βββ 1.0.0
βΒ Β βββ 1.1.0
βββ L3
βΒ Β βββ 0.0.0
βΒ Β βββ 1.0.0
βββ L4
βΒ Β βββ 0.0.0
βββ base
βββ 0.0.0
...
βββ 1.1.1
9 directories, 54 files
Revision |
Source |
Date |
Description |
6 |
External feedback on twitter and image.sc |
2020-05-06 |
Remove "scale"; clarify ordering and naming |
5 |
External bug report from @mtbc |
2020-04-21 |
Fixed error in the simple example |
4 |
#50 (comment) |
2020-04-08 |
Changed "name" to "path" |
3 |
Discussions up through #50 (comment) |
2020-04-01 |
Updated naming schema |
2 |
#50 (comment) |
2020-03-07 |
Fixed typo |
1 |
@joshmoore |
2020-03-06 |
Original text from in person discussions |
Thanks to @ryan-williams, @jakirkham, @freeman-lab, @petebankhead, @jni, @sofroniewn, @chris-allan, and anyone else whose GitHub account I've forgotten for the preliminary discussions.