Comments (11)
Not presently with the explode
command, but there's nothing about the file format that would prevent it.
Can you describe in more detail what you'd like to do?
from asdf.
I wonder, for the sake of consistency/sanity, the ASDF standard shouldn't specify a default naming scheme for the files produced by "exploding" a file to exploded form, while giving libraries the option to use a different scheme (left up to the implementation) if requested by the user.
from asdf.
while giving libraries the option to use a different scheme (left up to the implementation) if requested by the user.
do you intend to do that in pyasdf?
from asdf.
Would the specification of a destination pattern be enough? For example:
some_directory/{source}_{block_no}.asdf
where {source}
is replaced with the original root filename, and {block_no}
is replaced with the block number?
By this convention, the current behavior would be defined as {source}{block_no}.asdf
.
from asdf.
That's sort of what I was thinking too. If just a directory destination is given it could use the default pattern. But allowing a user-specified pattern (including the directory) would work too.
from asdf.
actually in our case it would be more complicated, since we'd want to use a subdirectory structure based on the hierarchy in the Tree
from asdf.
Can you describe your use case in more detail? I think that may break down if data in a block is shared between multiple arrays in the tree.
from asdf.
I think writing out individual child-objects in a hierarchical data structure is a different use case than what exploded form is for.
from asdf.
To make a FITS analogy, exploded form is (somewhat) like writing the FITS header and the binary data to separate files. Whereas I think what @rossant is asking is more akin to writing each HDU to a separate file (albeit with a directory structure representing hierarchy that doesn't exist in FITS, but may in ASDF). That may be a little too application specific, but sounds worth talking about.
from asdf.
Long story short, we're looking for a format for neurophysiology data that enables easy discovery of key data arrays. For a given dataset, we have a hierarchy of data arrays, but only 1 or 2 are used by 95% of our users. Having explicit names for the files would let a typical user find these important arrays easily.
Here's an example. You're a typical user, you have a dataset, and you don't know anything about the format. You see a subdirectory named spike_times
containing a binary array and a metadata JSON file with the array's information (dtype, shape, etc.). Then you should be able to open that array with no difficulty in any programming language (typically MATLAB, which is still one of the dominant languages in the community...)
So far we've been using HDF5, but we're having way too many problems. Accessibility is bad; you need an HDF5 library in order to see what's in a file, whereas a text metadata file can be viewed by anyone, and a flat binary file can be opened easily in any language.
We were about to create our own custom format, but then we discovered ASDF which is pretty close to what we need. The two main differences are directory structure and YAML, which seems basically unsupported in MATLAB.
from asdf.
I did a quick looking around and came up with at least a couple YAML interfaces for MATLAB that use LibYAML wrapped in an MEX binary. But I'm guessing your point is that MATLAB has JSON support out of the box (I don't know)?
That said, I think with a YAML interface that a rudimentary ASDF reader in MATLAB could be achieved pretty easily. We also have plans for a C implementation of ASDF on the horizon, which could be added to MATLAB via the same approach.
Getting back to your specific use case though, it does make a lot of sense. However, even in the "exploded" form the individual binary blocks have a block header of I think about 40 bytes, so your user would still have to know at least enough to offset the array after that header.
The "exploded form" was not really meant for this case--I think (and @mdboom can expand) it is more of a performance trick. For example if an application has to stream some data to the end of a table that's embedded in an ASDF file, it can first "explode" the file so that the binary block containing the table is in a file by itself, and can be streamed to directly without having to shift around the rest of the file. But once the writing is done the full file can then be reassembled. There is also a kind of "streaming" block for this use case, which carries with the the restriction that no other blocks can follow it in the file.
That said, there might be a case for including simple instructions somewhere for manually reading the array data in an ASDF header, and translating that to reading the array in from the binary block. What do you think? It would be great to get the neuroscience community using ASDF--we have them to thank for matplotlib too by way of John Hunter :)
from asdf.
Related Issues (20)
- Combine package and build workflows
- masked arrays do not roundtrip with all false masks
- `AsdfSpec` misses expected match
- deprecate `AsdfSpec` and `format_tag`
- Tracking `sunpy` 6.0 and ASDF 1.6.0 HOT 2
- Old (<2.14) versions of asdf do not fully support ASDF standard 1.6.0
- `AsdfFile` instances are not pickleable HOT 1
- Chunking support HOT 2
- Investigate enabling `validate_checksum` as default `True`
- Investigate returning `ndarray` when `lazy_load=False` HOT 8
- Change scope of ndarray custom validators HOT 1
- Add to docs comparison of `tag` vs `$ref` usage in schema.
- Consider a new design for the info and search methods that avoids conversion of nodes when the lazy_tree option is used
- Fix stable docs version in RTD
- FAILED asdf/_tests/test_yaml.py::test_implicit_conversion_warning HOT 3
- `assert_tree_match` and `np.testing` ignores array masks
- Schema_info Returning Non Schema Keyword HOT 2
- ignore_version_mismatch doesn't appear to be used? HOT 10
- Deprecate and remove the now unused `ignore_version_mismatch`
- `asdf.util.load_yaml` fails on recursive object
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from asdf.