Giter VIP home page Giter VIP logo

Comments (11)

mdboom avatar mdboom commented on August 30, 2024

Not presently with the explode command, but there's nothing about the file format that would prevent it.

Can you describe in more detail what you'd like to do?

from asdf.

embray avatar embray commented on August 30, 2024

I wonder, for the sake of consistency/sanity, the ASDF standard shouldn't specify a default naming scheme for the files produced by "exploding" a file to exploded form, while giving libraries the option to use a different scheme (left up to the implementation) if requested by the user.

from asdf.

rossant avatar rossant commented on August 30, 2024

while giving libraries the option to use a different scheme (left up to the implementation) if requested by the user.

do you intend to do that in pyasdf?

from asdf.

mdboom avatar mdboom commented on August 30, 2024

Would the specification of a destination pattern be enough? For example:

some_directory/{source}_{block_no}.asdf

where {source} is replaced with the original root filename, and {block_no} is replaced with the block number?

By this convention, the current behavior would be defined as {source}{block_no}.asdf.

from asdf.

embray avatar embray commented on August 30, 2024

That's sort of what I was thinking too. If just a directory destination is given it could use the default pattern. But allowing a user-specified pattern (including the directory) would work too.

from asdf.

rossant avatar rossant commented on August 30, 2024

actually in our case it would be more complicated, since we'd want to use a subdirectory structure based on the hierarchy in the Tree

from asdf.

mdboom avatar mdboom commented on August 30, 2024

Can you describe your use case in more detail? I think that may break down if data in a block is shared between multiple arrays in the tree.

from asdf.

embray avatar embray commented on August 30, 2024

I think writing out individual child-objects in a hierarchical data structure is a different use case than what exploded form is for.

from asdf.

embray avatar embray commented on August 30, 2024

To make a FITS analogy, exploded form is (somewhat) like writing the FITS header and the binary data to separate files. Whereas I think what @rossant is asking is more akin to writing each HDU to a separate file (albeit with a directory structure representing hierarchy that doesn't exist in FITS, but may in ASDF). That may be a little too application specific, but sounds worth talking about.

from asdf.

rossant avatar rossant commented on August 30, 2024

Long story short, we're looking for a format for neurophysiology data that enables easy discovery of key data arrays. For a given dataset, we have a hierarchy of data arrays, but only 1 or 2 are used by 95% of our users. Having explicit names for the files would let a typical user find these important arrays easily.

Here's an example. You're a typical user, you have a dataset, and you don't know anything about the format. You see a subdirectory named spike_times containing a binary array and a metadata JSON file with the array's information (dtype, shape, etc.). Then you should be able to open that array with no difficulty in any programming language (typically MATLAB, which is still one of the dominant languages in the community...)

So far we've been using HDF5, but we're having way too many problems. Accessibility is bad; you need an HDF5 library in order to see what's in a file, whereas a text metadata file can be viewed by anyone, and a flat binary file can be opened easily in any language.

We were about to create our own custom format, but then we discovered ASDF which is pretty close to what we need. The two main differences are directory structure and YAML, which seems basically unsupported in MATLAB.

from asdf.

embray avatar embray commented on August 30, 2024

I did a quick looking around and came up with at least a couple YAML interfaces for MATLAB that use LibYAML wrapped in an MEX binary. But I'm guessing your point is that MATLAB has JSON support out of the box (I don't know)?

That said, I think with a YAML interface that a rudimentary ASDF reader in MATLAB could be achieved pretty easily. We also have plans for a C implementation of ASDF on the horizon, which could be added to MATLAB via the same approach.

Getting back to your specific use case though, it does make a lot of sense. However, even in the "exploded" form the individual binary blocks have a block header of I think about 40 bytes, so your user would still have to know at least enough to offset the array after that header.

The "exploded form" was not really meant for this case--I think (and @mdboom can expand) it is more of a performance trick. For example if an application has to stream some data to the end of a table that's embedded in an ASDF file, it can first "explode" the file so that the binary block containing the table is in a file by itself, and can be streamed to directly without having to shift around the rest of the file. But once the writing is done the full file can then be reassembled. There is also a kind of "streaming" block for this use case, which carries with the the restriction that no other blocks can follow it in the file.

That said, there might be a case for including simple instructions somewhere for manually reading the array data in an ASDF header, and translating that to reading the array in from the binary block. What do you think? It would be great to get the neuroscience community using ASDF--we have them to thank for matplotlib too by way of John Hunter :)

from asdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.