fnirs / snirf Goto Github PK

View Code? Open in Web Editor NEW

58.0 17.0 33.0 316 KB

SNIRF Format Specification

Home Page: http://fnirs.org/resources/software/snirf/

License: Other

format fnirs snirf

snirf's People

Contributors

Stargazers

Watchers

snirf's Issues

Cut a new release of SNIRF

The most recent release of the SNIRF protocol was 2019 (#51 (comment)). There has been many great improvements to the protocol since then, so I suggest cutting a new release so developers can work with something stationary rather than a moving target.

@dboas and I have been in contact about slight differences in implementation in recent days. These issues can not be avoided, but they are exacerbated by outdated specifications and validators. As such, I suggest merging #67, #68 and #69 then cutting another release.

The naming of the release to me is not important. There seems some reluctance to use the term 1.0, so I suggest v1.0 Draft 4.

While it may be tempting to wait until all the issues in #44 are completed, I see no issue in making some closely spaced releases as we move towards 1.0.

Thoughts @dboas @fangq? If you are happy with the idea then I am happy to write up a PR which contains a release procedure, bump the version number in the spec, etc. Then if that's merged I could tag the release.

sd.landmarkLabels for sources and detectors

Qianqian, sd.landmarkLabels allows for user defined landmark labels. Should we provide a specification for what labels to use to reference specific sources and detectors?
We already have sd.srcLabels and sd.detLabels. We could just indicate that the sd.landmakrLabels should match the src or det label.
What do you think?
If you agree, can you add that to the spec?

Units for data saved with dataType 99999: Processed

Does the specification describe the units that processed data is stored with? I took a fresh read and couldn't find it, but maybe I missed it.

I have some Kernel SNIRF files with HbO HbR data and I am guessing its storing the data in uMol due to the large values (is this correct @Zahra-M-Aghajan ?).

The default time unit is seconds, I assume to match with SI base units. If I was to store processed data I would have followed this convention and stored the data in mole. Many software also store everything internally as SI base unit to simplify calculations and reduce chance of scaling errors. So I think there is some ambiguity here that we may wish to address.

Should we add an optional metadata tag analogous to LengthUnit or TimeUnit for specifying the units of the 99999: Processed data types? Something like SubstanceUnit or ProcessedUnit?

Advancing the spec to Version 1.1.0

here is a summary of the action items from today's meeting. I will work on the spec changes, after we all agree on the changes, I will create a "Draft 4" document.

writing a validator (python)** - @huppertt
updating fNIRS/snirf-samples** - @fangq
- fix 1D arrays in easyh5, regenerate the sample data files (wavelengths, ...)
spec changes/clarifications (@fangq, @huppertt, @jayd1860, @dboas )**

measurementDate/Time - need to allow "unknown"
language for *Labels (k) need to change to a 1D string array or 2D char array
revisit the language for string/string array:

string: either a H5T.C_S1 (null terminated string) type or as ASCII encoded 8-bit char array or UNICODE UTF-16 array. Defined by the H5T.NATIVE_CHAR or H5T.H5T_NATIVE_B16 datatypes in H5T. (note, at this time HDF5 does not have a UTF16 native type, so H5T_NATIVE_B16 will need to be converted to/from unicode-16 within the read/write code).

drop src/det/landmarkPos
leave wavelengths for "nominal" wavelengths, and define measurementList.wavelengthsActual, wavelengthsEmissionActual (optional) for actual per-optode wavelength, if available
wavelengths list can be empty for processed data, but must present
~~[ ] measurementList.dataTypeIndex needs a probe.dataTypeLabels to look up~~

sourcePos2D is required but 3D is not. Suggest to make either required, but not 2D.

Currently the specification requires sourcePos2D positions. It states:

This field describes the position (in LengthUnit units) of each source optode. 
The positions are coordinates in a flattened 2D probe layout.

I would like to suggest that the specification is changed so that either the 2D or 3D positions are required, but not necessarily the 2D. I suggest this because 2D positions are not always available. For example, if you have a 3D digitizer. In this case I could invent some transform to 2D, but it would not be necessarily meaningful or possible for someone else to reverse engineer. (I would obviously open-source the code, but I can imagine situations where that is not possible in other labs). I could store both 3d and 2d, but would the point be then?

Another example is in development of a toolbox. Currently in MNE we read the data in 3D, how should I convert this to 2D? The current description is not sufficiently detailed for me to implement a transform and ensure its the same as other toolboxes. If the specification is to maintain its requirement for 2D locations, then I suggest that a standard transform from 3D to 2D is provided with the specification. Although I suspect this would be quite some work.

(edit: I have recently checked the 3D->2D transform in different toolboxes are different to Homer)

Finally, its my opinion that the specification should encourage users to store data in the most raw and accurate format, and not downgrade their spatial information. Of course this is just one opinion, so I would be keen to hear what you think of this proposal. I suspect their is some legacy reason why 2D was the preferred option, I get that.

This change would be backwards compatible. I.e., all existing SNIRF files would still be compatible with this change.

Difference between sourcePos and sourcePos3D

Hi,

I am starting to make brainstorm compatible with snirf (see brainstorm-tools/brainstorm3#283 (comment)) but I don't understand the difference between sourcePos and sourcePos3D.

SNIRF data format summary said that sourcePos contains 2D pos and sourcePos3D contains 3D pos but then in the specification, it is said that both contains x,y, z coordinates so they both seems to contains 3D pos.

Can someone explain me the difference between the two ?
Thanks a lot,
Edouard

acknowledgement at the end of the specification

There have been many contributors.
I need to dig through the emails.
Mainly, it was started by Blaise Frederick and David Boas, with significant contributions from Ted, Jay, and Qianqian. And then there have been a community of suggestions that have been adopted.
Qianqian or Jay, can you update this in the spec?

Aux time series dimensionality limit

We are examining how best to 'fit' auxiliary data from LUMO into a valid SNIRF file, pertinent examples of which are:

Up to 54 individual time series of 6- or 9-axis MPU data at 100Hz
Per-channel time series data metrics/flags, such as coupling quality

The most natural way for us to achieve this would be to write these data in the /aux group, which I note (from #86) is intended for the storage of arbitrary time series data. As specified, the auxiliary time series are limited to being vectors:

/nirs(i)/aux(j)/dataTimeSeries

Presence: optional; required if aux is used

Type: numeric 1-D array

Location: /nirs(i)/aux(j)/dataTimeSeries

If we are limited to a vector, the former example could require up to 486 auxiliary time series, and the latter two orders of magnitude more!

Might it be possible to extend the specification to permit (manufacturer specific) auxiliary data to be a matrix?

Supported Data Types

@jayd1860 @fangq and I and Ted discussed expanding the data types described in the appendix of the spec.
Ted had the excellent suggestion of organizing such that blocks of numbers are reserved for different modalities and then ranges within a block follow similar structures. So,

MODALITIES
0-99 : CW
100-199 : FD
200-299 : TD gates
300-399 : TD moments
400-499 : DCS

WITHIN MODALITY
0-49 : Non-fluorescent
50-99 : Fluorescent

We started to provide more detail to handle derived / processed data types
So, what would work for CW and also forFD and TD is
0 - Intensity
1 - dOD
2 - mua
10 - musp
11 - a (musp intercept for musp = a lambda^-b)
12 - b (lambda argument)
20 - HbO
21 - HbR
22 - HbT
23 - H2O
24 - Lipid

DCS would be different
0 - g2
10 - BFi
Have to think more
Also, what about CMRO2?

We will also want standard deviation... but this is likely best handled by adding data(i).dataTimeSeriesStdev which then has access to all of the data data type indices.

Converters for other formats to SNIRF

Thanks for the great initiative.

I would like to convert my existing data (NIRX) to the SNIRF format. Are there any conversion scripts available? I have looked in the fNIRS organisation and can only find SNIRF readers and writers.

Apologies if I have posted this in the wrong location, I was not sure what repository to ask this question in.

dataTypeIndex is defined ambiguously

The dataTypeIndex is not clearly defined. In the data format summary it is defined as an <i> integer but in the text description it states Data-type specific parameter indices and Note that the Time Domain and Diffuse Correlation Spectroscopy data types have two additional parameters and so the data type index must be a vector with 2 elements that index the additional parameters.
Also here the type needs to be changed from Type: integer to Type: numeric 1-D array.

Simplify the readme for USERS of the SNIRF specification

A colleague of mine opened the SNIRF specification to understand it better (after my pushing) and had two follow up comments that I thought I would share with you. 1) Do I need to install TortoiseGit? And, 2) I don't use git, so I don't think SNIRF is for me. As such, I suggest the following:

Split the README in to user instructions (friendly with text links), then developer instructions (all the git info you can dream of).

The instructions on how to use TortoiseGit are the second thing you see on the SNIRF landing page. It is almost the most dominant aspect of the README. For new users this may be confusing, as they will wonder if they need his software to use SNIRF, whereas they don't, SNIRF is a specification and the instructions of how to clone a repository are mainly developer focused. If you wish to link to software, the average user just wants to know that Homer, NIRS-Toolbox, MNE, Fieldtrip, etc support it. So maybe it would be better to include a list or table of software that support SNIRF?

I suggest moving the TortoiseGit section down to the bottom of the page under the How to Participate section. Or even removing it, its a very specific tool and not particularly widely used, is TortoiseGit something used at BU?

Move all the details about git recursive cloning, submodules, etc to the developer focused section.

I hope this user feedback helps. While your average user won't be implementing SNIRF, this is likely the page they will land when googling the topic.

If there is general agreement for this change I'll open a PR.

Need to fix the disabled italic markups

For some reason, github does not display italic markups (*...*) correctly in a number of items, such as in the sd.detPos section:

06e2103?diff=unified#diff-ee33e2b4d0875712a7909355275c703aR148

need to figure out why and fix those

Change the text on fnirs.org to match GitHub and reflect that SNIRF is in draft

@fangq provided a description of the versioning for SNIRF at #51 (comment) and concluded that we are not yet at version 1.0 and are still in drafts.

Can the https://fnirs.org/resources/software/snirf/ page be modified to match the decision above? Otherwise there is conflicting information being presented to users. I believe @dboas has the keys to that page?

Standardized aux channel names

It would be good to have a standardized list of aux channel identifiers in the appendix. Other formats like BIDS have these and it would be nice to have defined conversion.

Distinction between nominal and real wavelengths measurements

Extracted from the Google Docs: https://docs.google.com/document/d/1kLQnFNSXsCAcUNcCPtyTerSwatCOnLcX4fAaZsidCbI/edit

HITACHI ETG family used to make a distinction between "nominal" wavelength e.g. 690nm common to all channels and "real" measurement wavelength e.g. 696.6nm, specific at each channel. Is there any advantage to keep track of both?

Support for dataTimeSeries types

Extracted from the Google Docs: https://docs.google.com/document/d/1kLQnFNSXsCAcUNcCPtyTerSwatCOnLcX4fAaZsidCbI/edit

In the supported data types for “dataTimeSeries”, how are variants of DCS indicated e.g. TD-DCS?

Specifying /nirs(i)/probe/momentOrders as string as opposed to numeric array

At the moment, in the specifications, the momentOrders are required to be specified as 1D numeric arrays. However, this could easily lead to confusion and, as a results, I was wondering this could be changed to strings?
As an example, consider the following scenarios:

Some groups might provide 3 moments (sum, mean ToF, variance ToF) but encode them in alphabetical order (so 1: mean, 2: sum, and 3: var; numbers will correspond to the dataTypeIndex).
Some groups might provide only 2 moments: sum and mean in some of those, and mean and var in the others.. and both cases will be represented as 1 and 2.

I hope these example were helpful in demonstrating the clarity that the usage of strings can bring.

Local to global indexing

If we use local indexing, that is to say that:

for each channel we set a sourceModuleIndex, and a detectorModuleIndex, and use local indices for sourceIndex and detectorIndex
we set useLocalIndex non-zero

Is the nature of the global indexing into, e.g., /nirs(i)/probe/sourcePos3D defined by specification?

Meet the GitHub community guidelines

As this is a community orientated project I suggest we update the repository to meet the GitHub community guidelines https://docs.github.com/en/communities Most of this work has already been done by @fangq, but I suggest simply organising it as recommended by GitHub. See https://github.com/fNIRS/snirf/community for our scorecard

This will involve:

Move the contribution guidelines from the README to contributing.md (instructions)
Add issue and pull request templates (instructions)
Add code of conduct (instructions)

I am happy to chip away at this in those times where I am waiting for my code to run 😉

Source power unit inconsistency

Note that the current and draft specification states:

/nirs(i)/data(j)/measurementList(k)/sourcePower

Presence: optional

Type: numeric

Location: /nirs(i)/data(j)/measurementList(k)/sourcePower

Source power in milliwatt (mW).

But later in the discussion it is noted that:

sourcePower provides the option for information about the source power for that channel to be saved along with the data. The units are not defined, unless the user takes the option of using a metaDataTag described below to define, for instance, sourcePowerUnit

For our purposes we would prefer to be able to specify a unit (since our power measures are relative), but in any case one of the statements should be altered for clarity.

Data streaming support

hi @dboas, I remember Ted mentioned about this in an teleconference previously. Recently, my student Morris Vanegas also brought up this issue.

I think we should add some sequence numbers in the data structure to allow grouping data chunks into a larger dataset, possibly in the metaDataTags section. DICOM format uses the "Accession Number", although it is a string format.

I can think about something related to this feature, and propose an update.

question about stim value

Jay wrote:
David and Qianqian,

Had a question about the stim structure. We now have

stim(n).name
stim(n).data

stim(n).data is a 3 column array where each row corresponds to a stimulus trial and the three columns indicate [starttime duration value].

In Homer2, users can manually reject/toggle stims and we indicate this with various non-zero and non-one values. Assuming we want to preserve this functionality in SNIRF and Homer3, do we use the last column for this purpose?

I didn't see any description in the Spec about the last column. Should we add a short description of it's use?

Multiple aux streams each with their own sampling rates and properties

What is your recommendation for the cases where more than one type of aux data is present? For example, you can imagine a scenario where there are eeg channels, analog TTLs, digital TTLs and imu data.. i.e. both discrete (e.g., digital TTLs) and continuous data types (each with their own sampling frequency). Is it required to resample everything to a common frequency then (which is not ideal)? I couldn't find much info [here].(https://github.com/fNIRS/snirf/blob/master/snirf_specification.md#nirsiauxj).

Definition of Channel

Just a short remark to clarify the definition of a channel.
As there is no clear definition within the format I assume that a channel can contain basically any data defined in detail in the measurementList and that it is basically defined as a source detector pair.
This then implies that a source detector pair can define 0 - n channels.
By this definition, I could create a snirf file containing, say, only one channel of HbO data, or 20 channels of only WL1 data, for whatever reason I might want, but just to make the point that a source detector pair doesn't need to include all datatypes that are used in the snirf file. Also since if never directly defined within the format.

Thanks for the clarification!

compliance of sample SNIRF files and readers and writers

@huppertt ,
I am copying Jay's comments about your sample file here. We can use this thread to resolve compliance and consistency between your and his reader and writer.
David

On another note, I thought that it would be good at this point to make sure our various SNIRF readers/writers are compatible and in agreement. It's also a good test of one of our main goals which is shareability of data files. To that end, I downloaded Ted's SNIRF_example.snirf file and tried to open it in Homer3. As one would expect nothing ever works the first time it's tried so I looked into it. First, it was extremely helpful to see someone else's actual data file (thanks!) besides the ones I generate for myself and it helped me catch and fix a number of bugs and wrong assumptions in my code. Once I fixed those I still hit some snags in the file format itself which prevented correct loading. When I added debug code to compensate for those issues it loaded and I could display the data (See the screenshot attachment).

List of issues/questions about SNIRF_examaple.snirf:

/probe is under /nirs/data1 instead of under /nirs. Spec says probe is under /nirs?
/stim is under /nirs/data1 instead of under /nirs. Spec says stim is under /nirs?
/metaDataTags is under /nirs/data1. Spec says metaDataTags is under /nirs?
/timeOffset is under /nirs/aux1. Spec says its under /nirs.
The spec says that each row (i.e. first dimension) of probe/sourcePos and probe/detectorPos represents an optode and each column (i.e. second dimension) is a coordinate (x,y, or z). In SNIRF_example it's the opposite. I had to transpose the arrays when loading the file in order to get the correct info.
I see in the file dump that for string types such as aux1/name and stim1/name the file uses the H5T_STD_I8LE (int8) type. Wondering why not use the HDF5 supported string type H5T_STRING which can also be an array of strings useful for something like sourceLabels/detectorLabels? I wrote a routine for Homer3 to convert from the H5T_STD_I8LE to strings (it also has discard an extra int80 for each character).
Naive question...I had a bit of a hard time finding and downloading this patch-2 which contains the SNIRF_example and file dump . Just curious why is it an SVN project? I was sort of expecting a forked git repo or branch. I don't have all that much experience with git or github. In any case (back to Qianqian's question) I think it is a really, really great idea to have that example file + dump rolled into the main repo. In fact I'd have an Examples folder with several more small sample files.

Jay

Should consider a license for the document, and add a LICENSE.md file

Possible licenses (permit commercial use) include, but not limited to

public domain
CC-BY-SA
BSD
Apache 2.0

Fixed vs. variable length strings in snirf files

We have run into an issue in which strings saved to SNIRF files by Python interfaces appear differently after being loaded by the Homer3 interface in the MATLAB environment. These strings are loaded by the HDF5 code employed by the Homer3 SNIRF interface as 1x1 MATLAB cells containing the string itself. Strings saved by the Homer3 SNIRF interface do not exhibit this behavior.

Investigation w/ HDF5View suggets the issue is due to the strings being saved as fixed length vs variable length (see HDF5 datatypes guide ch6 section 5.1):

Data Type description in HDFView for SNIRF files generated by Python interface-- "variable" length

Data Type description in HDFView for SNIRF files generated by MATLAB Homer3 interface, fixed length = 14

Zahra Aghajan implemented a fix which used NumPy's interface to create a string with fixed maximum length:
np.array(length_unit.encode("UTF-8"), dtype="S10")
But really this is not a more correct string per the specification, it just appeases the Homer3 code.

Both strings passed the now-outdated Python validator.

Any thoughts? @dboas raised the point that probably the SNIRF specification does not fully specify an HDF5 file. Perhaps we should choose one string format and stick to that?

ready to release?

@fangq and @jayd1860 , what more is needed to be completed to release the spec?
Maybe all the issues except for the streaming one?
Should we invite all the supporters to this project and give them a little more time to comment?
Anything else?
@dboas

Consider removing support for multiple /nirs{i} blocks: for many applications, a SNIRF file should only contain one acquisition from a single subject

We (@dboas, @rob-luke) are making an effort to make SNIRF agree with the BIDS dataset.

The BIDS specification is dependent on acquisition files belonging to a single subject, session and "run"-- with multiple runs being broken up into several files.

This means that SNIRF files that include multiple /nirs{i} groups are not BIDS compliant.

It was suggested that @fangq saves files with multiple /nirs{i} groups?

We don't want to break backwards compatibility, but wanted to discuss here the motivation for supporting /nirs{i} blocks and encouraging their use in the future.

Add optional coordinate system field to probe

There are a variety of standardised coordinate systems used in fNIRS (this problem also exists in MRI, MEG, EEG). Currently it is difficult to determine the coordinate frame used in different SNIRF files, as you need to reverse engineer it from the optional fields of landmarklabels and landmarkpos.

Here are some nice references on the different coordinate systems:

I propose that we add a new optional field called /nirs(i)/probe/coordinatesystem. And specify a recommended list of standard names to encourage interoperability.

For reference, I believe both NIRx and Kernel uses the MRI coordinate system (same as freesurfer). I do not know what Homer uses (but I would like to find out please). But I was only able to determine this by trial and error, and can currently not determine this automatically from the SNIRF file alone (e.g. Kernel does not use the optional landmarklabels) field.

Possible reasons not to include this

It is possible to try and reverse engineer the coordinate frame from the landmarklabels and landmarkpos. So one might argue this new field is superfluous. But this reverse engineering is difficult, and sometimes you can get it wrong.

Easy way to convert an fNIRS dataset to SNIRF format from another format (like CSV)?

Hi SNIRF community,

I'm part of a team at Tufts that has released a new open-access fNIRS dataset [see links 1,2 below]

We hope this data could be of broad interest to the BCI/fNIRS community, especially those that want to build and evaluate machine learning classifiers of a user's mental workload intensity level given short windows (say 30 seconds) of multivariate fNIRS recordings.

So far, we've been releasing data in plain-text CSV format. We became aware of SNIRF due to some helpful reviewer comments, but we haven't used it ourselves before. We are hoping you can help us figure out if SNIRF might be a good fit for our work.

My questions are:

do you know of tools that would let us adapt our data from a CSV file (or some other format) to SNIRF format?
are there tools available for reading SNIRF data into machine learning pipelines? (we work in Python, so I'm especially curious if there are tools for helping SNIRF play well with sklearn or pytorch).

Helpful details about our data: for each subject we have a CSV file with a row for every timestep (we recorded measurements every 5.2 Hz). The columns would tell you the estimated oxy/deoxy hemoglobin concentrations at that instant across several channels. You can see a screenshot of the data format here: https://tufts-hci-lab.github.io/code_and_datasets/fNIRS2MW.html#sliding-window-fnirs-data-for-classifiers (I'm sure we have other more "raw" fNIRS measurements too, but this preprocessed format is what seems most helpful to release to encourage ML folks to work on this problem).

Other open-access datasets for fNIRS and mental workload also seem to be available in other formats (e.g. the dataset by Shin et al [3] recorded at TU-Berlin), so a general guide/outline of how to "convert" data to SNIRF I think would be broadly helpful in moving more of the community to use SNIRF and benefit from open standards.

Best,
Mike Hughes

[1] Paper: https://openreview.net/pdf?id=QzNHE7QHhut
[2] Project Website: https://tufts-hci-lab.github.io/code_and_datasets/fNIRS2MW.html
[3] Shin et al dataset link: http://doc.ml.tu-berlin.de/simultaneous_EEG_NIRS/

Datasets erroneously saved as arrays

Lots of implementations (critically, the Homer3/AtlasViewer BUNPC sponsored MATLAB apps) are saving fields that should just be single ints, floats, strings as H5_ARRAY with length 1, ndim 1.

This is a formatting difference that affects the way such Datasets must be loaded across reader implementations.

pysnirf2 invalidates these files. Because this implementation is so widespread, it is worth considering supporting it. I am opposed, just like I am opposed to supporting both fixed and variable length strings... As in #72 the issue has arisen because the default behavior of the HDF5 library differs.

container for multimodal data

Hasan Ayaz raised the issue

One suggestion is related to adding a dedicated multimodal data collection component by adding a new container, this could contain a string list of other biomedical signals (e.g. EEG, ECG, PPG, EMG, EDA, etc.) collected together with fNIRS as well as time synchronization (i.e. time offset between the other time-series and fNIRS origin, and/or other key events) which is needed between these datasets in any case. These could be implemented over existing part (e.g. stim or aux ) but having a dedicated mechanism would be useful going forward.

I believe that this is already covered within the aux container. Am I missing something?

proposal to add metadata to parallel /nirs(i)/stim(j)/data

Some users would like to have metadata attached to each stimulus trial that they can then utilize to include or exclude trials from averaging. This is quite useful, for instance, with infant looking time during each trial.

To permit this, we would need to add an optional matrix of values something like /nirs(i)/stim(j)/metadata
This would have to have the same number of rows as /nirs(i)/stim(j)/data
But the number of columns is unconstrained

/nirs(i)/stim(j)/metadataNames
This would be optional and provide a useful descriptor name for each column of the metadata.

@fangq @sstucker @jayd1860 and others
Please comment on any issues with adding these optional fields to SNIRF.

Question about how to handle calculated data type

Dear all,

This question is related to #22, as I don't understand how chromophore such as HbO,HbR, and HbT can be stored in snirf file.

What I understand from the specification and from #22 is that, when storing processed data such as Hb0, in /nirs(i)/data(j)/measurementList(k)/ :

dataTypeLabel is either "HbO", "HbR", or "HbT",
dataType is 99999

but then I don't know how to set the following values in :

dataTypeIndex : should it be 4 as Hb0 is the fourth data type label ?
wavelengthIndex ?

Also, shouldn't /nirs(i)/probe/wavelengths be optional when storing processed data such as HbO, HbR, Hbt ?

As similar question, when storing raw data, what should measurementList(k).dataTypeLabel contain ? Shouldn't there be a "raw" option here ? it's an optional field so I guess we don't specify it but measurementList(k)/dataTypeIndex is required so what should the value of dataTypeIndex in this case ?

Best regards,
Edouard

metaDataTags specification

Hi Qianqian,

NOTE: Before I respond to your email just wanted to mention, while we say what is an HDF5 Groups in the spec, we do not explicitly say what is a HDF5 Dataset (maybe we should to avoid confusion, but that's a discussion for a later time). We only imply it by using any non-group basic type such as "numeric" or "string" to mean HDF5 Dataset. So my answer will assume that "numeric" or "string" next to "Type:" is basically an HDF5 Dataset. Now on to my response to your email ...

Your statement: "the entire metaDataTags can be just a single dataset of string ...".
Yes exactly! If we choose to define metaDataTags as a Dataset of type string, then we simply change Type: from the current "group array" to "2-D string array" like this

  /nirs/metaDataTags
  Type: 2-D string array

Regarding the alternative way: Sorry, I didn't mean to imply anything complicated like hash maps (we don't care about quick searches through hash tables for instance). My other proposal was simply to use the same type of definition we use for all the other indexed groups of structured variables in the SNIRF spec. Let's take stim as an example: each element of the stim group contains a single name/data condition. In the SNIRF specification looks like this:

 /nirs(i)/stim(j)
 Type: indexed group

 /nirs(i)/stim(j)/name
 Type: string

 /nirs(i)/stim(j)/data
 Type: numeric 2-D array

I'm proposing we define metaDataTags the same way, as an indexed group with 2 Dataset fields: name and value. So each element of metaDataTags would contain ONE tag. This definition adapted to our SNIRF spec would look like this

  /nirs/metaDataTags(i)
  Type: indexed group

  /nirs/metaDataTags(i)/name
  Type: string

  /nirs/metaDataTags(i)/value
  Type: string

To be clear, it really does not matter to me which of the 2 ways we use, as long as we address my main point: that the current definition of metaDataTags does NOT define ANY dataset - we only define a group, which in the HDF5 world does not contain basic data (not directly anyway).

Either of the above proposals solves this issue by defining Datasets: the first as a string array, the second as an indexed group of name/value string Dataset fields.

Lastly, I'm not clear about one aspect of the "SNIRF data format summary" section (which is otherwise a great overview of the SNIRF data structure !). Under metaDataTags, there's a list of tag values like "SubjectID", "MeasurementDate", "MeasurementTime", "LengthUnit", etc as if they were variable names. But, unless I'm misunderstanding something, these are not variable names, they are tag values (required though they may be). It seems to me, their required status belongs only in the description not in the data structure specification where only variable names belong.

In any case let me know what you think.

I would be happy to modify the spec to use one of the above definitions if no one has any objections.

Jay

Not sure if @huppertt will get this.

landmark and landmarkName likely shouldn't be part of aux ?

Qianqian, Jay, Meryem and I think instead of aux.landmark and aux.landmarkName should be sd.landmark and sd.landmarkName
Are you okay with that?

Handling of calculated or derived data types

Extracted from the Google Docs: https://docs.google.com/document/d/1kLQnFNSXsCAcUNcCPtyTerSwatCOnLcX4fAaZsidCbI/edit

David Boas
One issue that remains to be resolved is how to handle calculated or derived data types. The specification presently supports several raw data types. It is desirable to add a data type for concentration results. An issue we are struggling with is that every channel of data, i.e. column of "d", has a corresponding descriptor in the "ml" structure. The "ml" structure indexes the source, detector, and data type for the corresponding data channel. It also indexes the wavelength. For concentration, there is no wavelength. Thus, it seems that if we have a data type for concentration, then the corresponding "ml(n).wavelengthIndex" field would be ignored. In addition, the "ml(n).dataTypeIndex" could be used to reference what chromophore is stored in the data. The list of chromophores could be provided by sd.Chromophores, which could be a string array with possible entries of "HbO", "HbR", "H2O", "aa3", etc.

Luca Pollonini
I agree on having to address the data type, as of raw vs. processed, if not both. For instance, Shimadzu outputs both raw (3 wavelengths) and hemoglobin (2 chromophores) data on the same file, and it would be nice to make SFNIRS compatible to their preferred output to facilitate them embracing the SNIRF standard.
A separate issue for discussion is whether to include other probe/anatomical variables of interests, e.g. a certain number of fiducial points alongside the source and detector positions. If the matrix qform is missing, these will be needed to operate any affine transformation towards another coordinate system or an anatomical scan. The aux seems to be suited for time course variables, so it is less than ideal for additional locations. We could always promote recording 3D locations in a separate file (e.g., AtlasViewer-like), but it would be nice to attempt standardize that as well.
In addition, I also wonder if there should be an additional field(s) in stim for adding optional trial-specific data, such as reaction time, response accuracy or other variables extracted from presentation programs like E-prime. As above, these could live in a separate file, but it could be useful to forward think about integrating other behavioral data.

Felipe Orihuela-Espina
Will a set of predefined constants ensure a better compatibility (e.g. it will avoid different nomenclature "HbO2" "O2Hb", etc although at the cost of needing periodic review of the constant list as new parameters may be measured?
Also, will this file format consider data not just reconstructed but also more or less heavy processed? Is it expected that it keeps track of the status of the data alone, or to be fully capable of "remembering" every processing operation that the data have underwent as well as the processing operations metaparameters? Perhaps a solution is to have a "list" of data, each "data" unit represents the data in a particular state of processing, and then include some information alike the "Comando" software pattern.

nirs(i).probe compatibility with nirs.SD probe definition

Is there a reason why snirf(i).probe does not simply carry over the .SD probe structure used previously by .nirs?

While I understand the addition of fields to support fluorescence (wavelengthEmission), frequency domain (frequencies), and time domain (timeDelays, timeDelayWidths, momentOrders) data types, it seems some important fields were lost. Some field names, while they changed from SD to snirf(i).probe, are easily converted (SD.SrcPos became snirf(i).sourcePos2D, SD.Lambda became snirf(i).wavelengths).

However, it is unclear how DummyPos, SpringList, and AnchorList are translated from .SD to snirf. These three fields allow the translation of a 2D probe onto a 3D model. In particular, modular fNIRS designs heavily leverage DummyPos, SpringList, and AnchorList to maintain rigid distances between optodes on the same module (intra-module channels), and allow for flexibility between optodes on different modules (inter-module channels), when registering a probe to a 3D head model.

For example (https://github.com/cotilab/moca#3d-export):
A modular probe designed in 2D

Maintains it's modular architecture through extensive optode relationships (inter vs intra module springs). These spring relationships ensure modules (fixed circuit boards) don't stretch--only distances between modules can be adjusted.

Which allow "modules" (or relationships between certain channels) to be registered.

I see two potential solutions.

add .SD as a field inside nirs(i).probe (for example, nirs(i).probe.SD), or
Add all the fields of the .SD file into the nirs(i).probe. (the .SD fields are copied below).

Option 1 seems like the simplest, most compatible option. It would allow AtlasViewer to export .SD files and easily append to snirf files. The downside if that you'll have multiple fields with different names but describing the same things (e.g. nirs(i).probe.SD.SrcPos and snirf(i).sourcePos2D define the same thing).
Option 2 would require AtlasViewer to export snirf files, with only the nirs(i).probe populated.

Whoever option we choose, we also need to update the nirs2snirf converter. (For example, I am unable to load probes using PlotProbe in Homer 3 from snirf files that have been converted from nirs files with nirs.SD inside (likely because these .SD fields are not carried over). However, I'll post this issue in Homer3 repository to keep us organized. )

Thoughts?

How to subscribe to the mailing list

I just attempted to sign up to the mailing list by following the link in the README to https://fnirs.org/resources/software/snirf/ and sending an email to the subscription address [email protected]. However, this email was unable to be delivered with the error snirf-subscribe wasn't found at fnirs.org.

Does the mailing list still exist? If so, is it down? Or have I misunderstood the instructions on the website?

Consider more human readable and consistent fieldnames

It is advantageous for outsiders and beginners to have user-readable fieldnames. I understand that many aspects from the .nirs specifications were taken over, but the fieldnames can be very confusing and certainly be longer than one or two characters.

Some proposals for more human-readable names:

data.t, why not call it time?
data.ml, why not call is measurementList? Even more intuititve would be something like probeMapping, probeGeometry, optodeMapping, optodeSpecification or similar
data.d - this is more complicated. data.data sounds strange, but not infeasible. data.raw or data.rawData or data.samples. could also make sense. Data.data could make sense because also stim.data exists (and not stim.d), but then the format is completely different again, so the same fieldname but different format would not be very smart...
sd, why not call it probeGeometry or just probe. Less cryptical than sd.

Some proposals concerning consistency:

why is it data.d and stim.data, but then aux.d again? Rectifying this would be more intuitive (edit: but see my note above about different formats for the two)
generally, why use abbreviations some times and other times not. sd.detPos is abbreviated. sd.landmarkLabels not. Either go for abbreviations or not.
sd.lambda vs. data.t (for time): once a field names is defined by the physical unit, and ones by the name. If you go for sd.lambda, then also go for data.second (not seconds, it is also not lambdas, see next one).
sd.srcLabels vs. sd.lambda: why are labels plural, but lambda not?
sd.srcLabels vs. aux.name: why is one label and the other name?

Reference:
Compare with other standard formats, such as:

BIDS: http://bids.neuroimaging.io/#download
NIfTi: https://nifti.nimh.nih.gov/nifti-1/documentation/nifti1fields
EEGLab: https://sccn.ucsd.edu/wiki/A05:_Data_Structures
FieldTrip: http://www.fieldtriptoolbox.org/reference/ft_datatype_raw/

Overhead of channel descriptor groups

We have been internally evaluating the use of SNIRF as a native output format for Gowerlabs' Lumo system.

Lumo is a high density system, and our full head adult caps contain 54 modules, each with 3 dual-wavelength sources and 4 detectors. We are able to provide a dense output, which results in (54 x 4 x 54 x 6 = ) circa 70k channels.

The use of an HDF5 group per channel descriptor (e.g. /data1/measurementList{i}) appears to incur significant overhead. For example, a SNIRF file containing only metadata (no channel data) for a full head system system amounts to ~200MiB, or ~3KiB per channel. The actual information content of each descriptor (containing only the required fields plus module indices) amounts to only (7 x 4 = ) 28 bytes, so this is an overhead of approximately 99%.

Our results appear vaguely consistent with this analysis:

The overhead involved just in representing the group structure is enough that it doesn't make sense to store small arrays, or to have many groups, each containing only a small amount of data. There does not seem to be any way to reduce the overhead per group, which I measured at about 2.2 kB.

Evidently the size of the metadata grows linearly with the number of channels, as does the data rate of the channel time series, and hence for longer recordings the size of the metadata becomes proportionally smaller. However in absolute terms we find that (with appropriate chunking and online compression) the metadata corresponds to around four minutes of compressed raw channel data. Given the length of a typical measurement session, the overhead remains significant.

I appreciate that the majority of systems (such as those of the manufacturers listed on the SNIRF specification page) are of a much lower density than Lumo, and that even high density systems often produce sparse data, but evidently the trend is towards increasing density and the number of wavelengths. Our future products would, based on the current SNIRF specification, generate over 0.5GiB of metadata.

Have you previously considered this?
Might it be possible to use an array of a compound datatype to represent channel descriptors?
Do you have any alternative suggestions as to how we might reduce this overhead?

SNIRF independent entries / groups in metaDataTags

Does the snirf format support to add individual fields that are not defined in the snirf format or would this cause an incompatibility with the format?
I'm thinking about more detailed descriptions of the preprocessing that may was performed.
Or more experimental / participant / manufacturer / device information that are currently not part of the snirf format.

I could already store additional information in /nirs(i)/metaDataTags/ but as far as I understood it this tag doesn't support groups but only datasets if I'm not mistaken ("Each metadata record is represented as a dataset...") . Or would it be OK to also store groups within the metaDataTags group?

Thanks for the clarification.

Specify data types, endianness and maximum string length

here is a list of sub-fields and their preferred data types:

Container type (struct or cell-like data in MATLAB):

data(idx)
- data(idx).ml
stim
sd
metaDataTags
aux

Floating-point types (double as default - should we also support single-precision?):

data(idx).d
data(idx).t
data(idx).ml.sourcePower
data(idx).ml.detectorGain
stim(n).data
sd.lambda
sd.lambdaEmission
sd.srcPos
sd.detPos
aux(n).d
aux(n).t
timeOffset

Integer types:

data(idx).ml.sourceIndex
data(idx).ml.detectorIndex
data(idx).ml.wavelengthIndex
data(idx).ml.dataType
data(idx).ml.dataTypeIndex

String types:

formatVersion
stim(n).name
sd.srcLabels
sd.detLabels
aux(n).name

I have a python library I'd like to contribute

I have a python library to read and write .snirf files here. Is it best to merge it into this tree, or keep it separate?

Release version 1.1.0

Once the following issues are resolved I suggest we release a new version 1.1.0:

In addition to the issues above, here is the changelog since version 1.0:

Add dataUnit to indexed groups aux and measurementList
Uses of the word "numerical" were replaced with "numeric" for consistency.
stim/data was incorrectly described as [<f>,...]+ in the table when it should be [[<f>,...]]+
aux/dataTimeSeries was incorrectly described as [[<f>,...]]+ in the table when it should be [<f>,...]+
aux/dataTimeSeries and aux/time were described as numeric in the document when they should be 1-D numeric array
probe/detectorPos2D was described as numeric in the document when it should be 2-D array

Additionally, we now have the wonderful validator, so that's great too.

Thoughts @sstucker @dboas ?

Contradictory array sizing in time vector

It is specified that for "the special case of equal sample spacing a shorthand <2x1> array is allowed" for the /nirs(i)/data(j)/time dataset (and other time vectors). However the type of this field is "numeric 1-D array", and it is also specified a "SNIRF field specified by this document as a numeric 1-D array must occupy a dataspace with rank of 1".

Evidently this is contradictory. Should this text perhaps read, e.g., "For the special case of equal sample spacing, an array of length 2 is allowed where ..." ?

How to address formatVersion for drafts

I am writing code for reading and writing snirf files. And I would like to check against specific versions of the specification. Currently I am targeting draft 3 which has a ‘formatVersion’ of 1.0.

will draft 4 also have a ‘formatVersion’ version of 1.0? And if so, how is the software to differentiate these two versions? My understanding is that snirf has been officially released [1]. So it would be useful to track different versions in our continuous integration.

if there is currently no way to differentiate between files created with different drafts then can I suggest that we specify that files with v1 draft 3 have ‘formatVersion’ 1.0.3 to indicate a minor non breaking change. And draft 4 would be 1.0.4? Or something like https://semver.org/

Adding multiple moduleIndex for inter-module channels

Rather than a single module index for both source and detector:
/nirs(i)/data(j)/measurementList(k)/moduleIndex,

Can we use individual optode module indices
/nirs(i)/data(j)/measurementList(k)/sourceModuleIndex
/nirs(i)/data(j)/measurementList(k)/detectorModuleIndex

As modular fNIRS become more abundant, inter-module channels (channels with sources and detectors on different modules) are a simple way of increasing channel density. The current snirf specification only allows for intra-module channels (ie channels where sources and detectors are on the same module since they have the same moduleIndex).

I understand that simply globally enumerating sources and detectors (regardless of modules) solves this problem, but it doesn't allow a user to study only intra- or only inter-module channels.

fnirs / snirf Goto Github PK

snirf's People

Contributors

Stargazers

Watchers

Forkers

snirf's Issues

/nirs(i)/data(j)/measurementList(k)/sourcePower

Possible reasons not to include this

Container type (struct or cell-like data in MATLAB):

Floating-point types (double as default - should we also support single-precision?):

Integer types:

String types:

Recommend Projects

Recommend Topics

Recommend Org