fnirs / snirf Goto Github PK
View Code? Open in Web Editor NEWSNIRF Format Specification
Home Page: http://fnirs.org/resources/software/snirf/
License: Other
SNIRF Format Specification
Home Page: http://fnirs.org/resources/software/snirf/
License: Other
The most recent release of the SNIRF protocol was 2019 (#51 (comment)). There has been many great improvements to the protocol since then, so I suggest cutting a new release so developers can work with something stationary rather than a moving target.
@dboas and I have been in contact about slight differences in implementation in recent days. These issues can not be avoided, but they are exacerbated by outdated specifications and validators. As such, I suggest merging #67, #68 and #69 then cutting another release.
The naming of the release to me is not important. There seems some reluctance to use the term 1.0, so I suggest v1.0 Draft 4
.
While it may be tempting to wait until all the issues in #44 are completed, I see no issue in making some closely spaced releases as we move towards 1.0
.
Thoughts @dboas @fangq? If you are happy with the idea then I am happy to write up a PR which contains a release procedure, bump the version number in the spec, etc. Then if that's merged I could tag the release.
Qianqian, sd.landmarkLabels allows for user defined landmark labels. Should we provide a specification for what labels to use to reference specific sources and detectors?
We already have sd.srcLabels and sd.detLabels. We could just indicate that the sd.landmakrLabels should match the src or det label.
What do you think?
If you agree, can you add that to the spec?
Does the specification describe the units that processed data is stored with? I took a fresh read and couldn't find it, but maybe I missed it.
I have some Kernel SNIRF files with HbO HbR data and I am guessing its storing the data in uMol due to the large values (is this correct @Zahra-M-Aghajan ?).
The default time unit is seconds, I assume to match with SI base units. If I was to store processed data I would have followed this convention and stored the data in mole. Many software also store everything internally as SI base unit to simplify calculations and reduce chance of scaling errors. So I think there is some ambiguity here that we may wish to address.
Should we add an optional metadata tag analogous to LengthUnit
or TimeUnit
for specifying the units of the 99999: Processed
data types? Something like SubstanceUnit
or ProcessedUnit
?
here is a summary of the action items from today's meeting. I will work on the spec changes, after we all agree on the changes, I will create a "Draft 4" document.
measurementDate/Time
- need to allow "unknown"
language for *Labels (k) need to change to a 1D string array or 2D char array
revisit the language for string/string array:
string: either a H5T.C_S1 (null terminated string) type or as ASCII encoded 8-bit char array or UNICODE UTF-16 array. Defined by the H5T.NATIVE_CHAR or H5T.H5T_NATIVE_B16 datatypes in H5T. (note, at this time HDF5 does not have a UTF16 native type, so H5T_NATIVE_B16 will need to be converted to/from unicode-16 within the read/write code).
drop src/det/landmarkPos
leave wavelengths for "nominal" wavelengths, and define measurementList.wavelengthsActual
, wavelengthsEmissionActual
(optional) for actual per-optode wavelength, if available
wavelengths list can be empty for processed data, but must present
[ ] measurementList.dataTypeIndex
needs a probe.dataTypeLabels
to look up
Currently the specification requires sourcePos2D
positions. It states:
This field describes the position (in LengthUnit units) of each source optode.
The positions are coordinates in a flattened 2D probe layout.
I would like to suggest that the specification is changed so that either the 2D or 3D positions are required, but not necessarily the 2D. I suggest this because 2D positions are not always available. For example, if you have a 3D digitizer. In this case I could invent some transform to 2D, but it would not be necessarily meaningful or possible for someone else to reverse engineer. (I would obviously open-source the code, but I can imagine situations where that is not possible in other labs). I could store both 3d and 2d, but would the point be then?
Another example is in development of a toolbox. Currently in MNE we read the data in 3D, how should I convert this to 2D? The current description is not sufficiently detailed for me to implement a transform and ensure its the same as other toolboxes. If the specification is to maintain its requirement for 2D locations, then I suggest that a standard transform from 3D to 2D is provided with the specification. Although I suspect this would be quite some work.
(edit: I have recently checked the 3D->2D transform in different toolboxes are different to Homer)
Finally, its my opinion that the specification should encourage users to store data in the most raw and accurate format, and not downgrade their spatial information. Of course this is just one opinion, so I would be keen to hear what you think of this proposal. I suspect their is some legacy reason why 2D was the preferred option, I get that.
This change would be backwards compatible. I.e., all existing SNIRF files would still be compatible with this change.
Hi,
I am starting to make brainstorm compatible with snirf (see brainstorm-tools/brainstorm3#283 (comment)) but I don't understand the difference between sourcePos and sourcePos3D.
SNIRF data format summary said that sourcePos contains 2D pos and sourcePos3D contains 3D pos but then in the specification, it is said that both contains x,y, z coordinates so they both seems to contains 3D pos.
Can someone explain me the difference between the two ?
Thanks a lot,
Edouard
There have been many contributors.
I need to dig through the emails.
Mainly, it was started by Blaise Frederick and David Boas, with significant contributions from Ted, Jay, and Qianqian. And then there have been a community of suggestions that have been adopted.
Qianqian or Jay, can you update this in the spec?
We are examining how best to 'fit' auxiliary data from LUMO into a valid SNIRF file, pertinent examples of which are:
The most natural way for us to achieve this would be to write these data in the /aux
group, which I note (from #86) is intended for the storage of arbitrary time series data. As specified, the auxiliary time series are limited to being vectors:
/nirs(i)/aux(j)/dataTimeSeries
- Presence: optional; required if aux is used
- Type: numeric 1-D array
- Location: /nirs(i)/aux(j)/dataTimeSeries
If we are limited to a vector, the former example could require up to 486 auxiliary time series, and the latter two orders of magnitude more!
Might it be possible to extend the specification to permit (manufacturer specific) auxiliary data to be a matrix?
@jayd1860 @fangq and I and Ted discussed expanding the data types described in the appendix of the spec.
Ted had the excellent suggestion of organizing such that blocks of numbers are reserved for different modalities and then ranges within a block follow similar structures. So,
MODALITIES
0-99 : CW
100-199 : FD
200-299 : TD gates
300-399 : TD moments
400-499 : DCS
WITHIN MODALITY
0-49 : Non-fluorescent
50-99 : Fluorescent
We started to provide more detail to handle derived / processed data types
So, what would work for CW and also forFD and TD is
0 - Intensity
1 - dOD
2 - mua
10 - musp
11 - a (musp intercept for musp = a lambda^-b)
12 - b (lambda argument)
20 - HbO
21 - HbR
22 - HbT
23 - H2O
24 - Lipid
DCS would be different
0 - g2
10 - BFi
Have to think more
Also, what about CMRO2?
We will also want standard deviation... but this is likely best handled by adding data(i).dataTimeSeriesStdev which then has access to all of the data data type indices.
Thanks for the great initiative.
I would like to convert my existing data (NIRX) to the SNIRF format. Are there any conversion scripts available? I have looked in the fNIRS organisation and can only find SNIRF readers and writers.
Apologies if I have posted this in the wrong location, I was not sure what repository to ask this question in.
The dataTypeIndex is not clearly defined. In the data format summary it is defined as an <i> integer but in the text description it states Data-type specific parameter indices and Note that the Time Domain and Diffuse Correlation Spectroscopy data types have two additional parameters and so the data type index must be a vector with 2 elements that index the additional parameters.
Also here the type needs to be changed from Type: integer to Type: numeric 1-D array.
A colleague of mine opened the SNIRF specification to understand it better (after my pushing) and had two follow up comments that I thought I would share with you. 1) Do I need to install TortoiseGit? And, 2) I don't use git, so I don't think SNIRF is for me. As such, I suggest the following:
Split the README in to user instructions (friendly with text links), then developer instructions (all the git info you can dream of).
The instructions on how to use TortoiseGit are the second thing you see on the SNIRF landing page. It is almost the most dominant aspect of the README. For new users this may be confusing, as they will wonder if they need his software to use SNIRF, whereas they don't, SNIRF is a specification and the instructions of how to clone a repository are mainly developer focused. If you wish to link to software, the average user just wants to know that Homer, NIRS-Toolbox, MNE, Fieldtrip, etc support it. So maybe it would be better to include a list or table of software that support SNIRF?
I suggest moving the TortoiseGit section down to the bottom of the page under the How to Participate
section. Or even removing it, its a very specific tool and not particularly widely used, is TortoiseGit something used at BU?
Move all the details about git recursive cloning, submodules, etc to the developer focused section.
I hope this user feedback helps. While your average user won't be implementing SNIRF, this is likely the page they will land when googling the topic.
If there is general agreement for this change I'll open a PR.
For some reason, github does not display italic markups (*...*) correctly in a number of items, such as in the sd.detPos section:
06e2103?diff=unified#diff-ee33e2b4d0875712a7909355275c703aR148
need to figure out why and fix those
@fangq provided a description of the versioning for SNIRF at #51 (comment) and concluded that we are not yet at version 1.0 and are still in drafts.
Can the https://fnirs.org/resources/software/snirf/ page be modified to match the decision above? Otherwise there is conflicting information being presented to users. I believe @dboas has the keys to that page?
It would be good to have a standardized list of aux channel identifiers in the appendix. Other formats like BIDS have these and it would be nice to have defined conversion.
Extracted from the Google Docs: https://docs.google.com/document/d/1kLQnFNSXsCAcUNcCPtyTerSwatCOnLcX4fAaZsidCbI/edit
HITACHI ETG family used to make a distinction between "nominal" wavelength e.g. 690nm common to all channels and "real" measurement wavelength e.g. 696.6nm, specific at each channel. Is there any advantage to keep track of both?
Extracted from the Google Docs: https://docs.google.com/document/d/1kLQnFNSXsCAcUNcCPtyTerSwatCOnLcX4fAaZsidCbI/edit
In the supported data types for “dataTimeSeries”, how are variants of DCS indicated e.g. TD-DCS?
At the moment, in the specifications, the momentOrders are required to be specified as 1D numeric arrays. However, this could easily lead to confusion and, as a results, I was wondering this could be changed to strings?
As an example, consider the following scenarios:
I hope these example were helpful in demonstrating the clarity that the usage of strings can bring.
If we use local indexing, that is to say that:
sourceModuleIndex
, and a detectorModuleIndex
, and use local indices for sourceIndex
and detectorIndex
useLocalIndex
non-zero Is the nature of the global indexing into, e.g., /nirs(i)/probe/sourcePos3D
defined by specification?
As this is a community orientated project I suggest we update the repository to meet the GitHub community guidelines https://docs.github.com/en/communities Most of this work has already been done by @fangq, but I suggest simply organising it as recommended by GitHub. See https://github.com/fNIRS/snirf/community for our scorecard
This will involve:
I am happy to chip away at this in those times where I am waiting for my code to run 😉
Note that the current and draft specification states:
/nirs(i)/data(j)/measurementList(k)/sourcePower
- Presence: optional
- Type: numeric
- Location:
/nirs(i)/data(j)/measurementList(k)/sourcePower
Source power in milliwatt (mW).
But later in the discussion it is noted that:
sourcePower
provides the option for information about the source power for that channel to be saved along with the data. The units are not defined, unless the user takes the option of using ametaDataTag
described below to define, for instance,sourcePowerUnit
For our purposes we would prefer to be able to specify a unit (since our power measures are relative), but in any case one of the statements should be altered for clarity.
hi @dboas, I remember Ted mentioned about this in an teleconference previously. Recently, my student Morris Vanegas also brought up this issue.
I think we should add some sequence numbers in the data structure to allow grouping data chunks into a larger dataset, possibly in the metaDataTags section. DICOM format uses the "Accession Number", although it is a string format.
I can think about something related to this feature, and propose an update.
Jay wrote:
David and Qianqian,
Had a question about the stim structure. We now have
stim(n).name
stim(n).data
stim(n).data is a 3 column array where each row corresponds to a stimulus trial and the three columns indicate [starttime duration value].
In Homer2, users can manually reject/toggle stims and we indicate this with various non-zero and non-one values. Assuming we want to preserve this functionality in SNIRF and Homer3, do we use the last column for this purpose?
I didn't see any description in the Spec about the last column. Should we add a short description of it's use?
What is your recommendation for the cases where more than one type of aux data is present? For example, you can imagine a scenario where there are eeg channels, analog TTLs, digital TTLs and imu data.. i.e. both discrete (e.g., digital TTLs) and continuous data types (each with their own sampling frequency). Is it required to resample everything to a common frequency then (which is not ideal)? I couldn't find much info [here].(https://github.com/fNIRS/snirf/blob/master/snirf_specification.md#nirsiauxj).
Just a short remark to clarify the definition of a channel.
As there is no clear definition within the format I assume that a channel can contain basically any data defined in detail in the measurementList and that it is basically defined as a source detector pair.
This then implies that a source detector pair can define 0 - n channels.
By this definition, I could create a snirf file containing, say, only one channel of HbO data, or 20 channels of only WL1 data, for whatever reason I might want, but just to make the point that a source detector pair doesn't need to include all datatypes that are used in the snirf file. Also since if never directly defined within the format.
Thanks for the clarification!
@huppertt ,
I am copying Jay's comments about your sample file here. We can use this thread to resolve compliance and consistency between your and his reader and writer.
David
On another note, I thought that it would be good at this point to make sure our various SNIRF readers/writers are compatible and in agreement. It's also a good test of one of our main goals which is shareability of data files. To that end, I downloaded Ted's SNIRF_example.snirf file and tried to open it in Homer3. As one would expect nothing ever works the first time it's tried so I looked into it. First, it was extremely helpful to see someone else's actual data file (thanks!) besides the ones I generate for myself and it helped me catch and fix a number of bugs and wrong assumptions in my code. Once I fixed those I still hit some snags in the file format itself which prevented correct loading. When I added debug code to compensate for those issues it loaded and I could display the data (See the screenshot attachment).
List of issues/questions about SNIRF_examaple.snirf:
/probe is under /nirs/data1 instead of under /nirs. Spec says probe is under /nirs?
/stim is under /nirs/data1 instead of under /nirs. Spec says stim is under /nirs?
/metaDataTags is under /nirs/data1. Spec says metaDataTags is under /nirs?
/timeOffset is under /nirs/aux1. Spec says its under /nirs.
The spec says that each row (i.e. first dimension) of probe/sourcePos and probe/detectorPos represents an optode and each column (i.e. second dimension) is a coordinate (x,y, or z). In SNIRF_example it's the opposite. I had to transpose the arrays when loading the file in order to get the correct info.
I see in the file dump that for string types such as aux1/name and stim1/name the file uses the H5T_STD_I8LE (int8) type. Wondering why not use the HDF5 supported string type H5T_STRING which can also be an array of strings useful for something like sourceLabels/detectorLabels? I wrote a routine for Homer3 to convert from the H5T_STD_I8LE to strings (it also has discard an extra int80 for each character).
Naive question...I had a bit of a hard time finding and downloading this patch-2 which contains the SNIRF_example and file dump . Just curious why is it an SVN project? I was sort of expecting a forked git repo or branch. I don't have all that much experience with git or github. In any case (back to Qianqian's question) I think it is a really, really great idea to have that example file + dump rolled into the main repo. In fact I'd have an Examples folder with several more small sample files.
Jay
Possible licenses (permit commercial use) include, but not limited to
related documents see
https://help.github.com/articles/licensing-a-repository/
https://en.wikipedia.org/wiki/Creative_Commons_license
https://www.dreamsongs.com/IHE/IHE-50.html
We have run into an issue in which strings saved to SNIRF files by Python interfaces appear differently after being loaded by the Homer3 interface in the MATLAB environment. These strings are loaded by the HDF5 code employed by the Homer3 SNIRF interface as 1x1 MATLAB cells containing the string itself. Strings saved by the Homer3 SNIRF interface do not exhibit this behavior.
Investigation w/ HDF5View suggets the issue is due to the strings being saved as fixed length vs variable length (see HDF5 datatypes guide ch6 section 5.1):
Data Type description in HDFView for SNIRF files generated by Python interface-- "variable" length
Data Type description in HDFView for SNIRF files generated by MATLAB Homer3 interface, fixed length = 14
Zahra Aghajan implemented a fix which used NumPy's interface to create a string with fixed maximum length:
np.array(length_unit.encode("UTF-8"), dtype="S10")
But really this is not a more correct string per the specification, it just appeases the Homer3 code.
Both strings passed the now-outdated Python validator.
Any thoughts? @dboas raised the point that probably the SNIRF specification does not fully specify an HDF5 file. Perhaps we should choose one string format and stick to that?
We (@dboas, @rob-luke) are making an effort to make SNIRF agree with the BIDS dataset.
The BIDS specification is dependent on acquisition files belonging to a single subject, session and "run"-- with multiple runs being broken up into several files.
This means that SNIRF files that include multiple /nirs{i} groups are not BIDS compliant.
It was suggested that @fangq saves files with multiple /nirs{i} groups?
We don't want to break backwards compatibility, but wanted to discuss here the motivation for supporting /nirs{i} blocks and encouraging their use in the future.
There are a variety of standardised coordinate systems used in fNIRS (this problem also exists in MRI, MEG, EEG). Currently it is difficult to determine the coordinate frame used in different SNIRF files, as you need to reverse engineer it from the optional fields of landmarklabels
and landmarkpos
.
Here are some nice references on the different coordinate systems:
I propose that we add a new optional field called /nirs(i)/probe/coordinatesystem
. And specify a recommended list of standard names to encourage interoperability.
For reference, I believe both NIRx and Kernel uses the MRI
coordinate system (same as freesurfer). I do not know what Homer uses (but I would like to find out please). But I was only able to determine this by trial and error, and can currently not determine this automatically from the SNIRF file alone (e.g. Kernel does not use the optional landmarklabels
) field.
It is possible to try and reverse engineer the coordinate frame from the landmarklabels
and landmarkpos
. So one might argue this new field is superfluous. But this reverse engineering is difficult, and sometimes you can get it wrong.
See also mne-tools/mne-python#9929
Hi SNIRF community,
I'm part of a team at Tufts that has released a new open-access fNIRS dataset [see links 1,2 below]
We hope this data could be of broad interest to the BCI/fNIRS community, especially those that want to build and evaluate machine learning classifiers of a user's mental workload intensity level given short windows (say 30 seconds) of multivariate fNIRS recordings.
So far, we've been releasing data in plain-text CSV format. We became aware of SNIRF due to some helpful reviewer comments, but we haven't used it ourselves before. We are hoping you can help us figure out if SNIRF might be a good fit for our work.
My questions are:
Helpful details about our data: for each subject we have a CSV file with a row for every timestep (we recorded measurements every 5.2 Hz). The columns would tell you the estimated oxy/deoxy hemoglobin concentrations at that instant across several channels. You can see a screenshot of the data format here: https://tufts-hci-lab.github.io/code_and_datasets/fNIRS2MW.html#sliding-window-fnirs-data-for-classifiers (I'm sure we have other more "raw" fNIRS measurements too, but this preprocessed format is what seems most helpful to release to encourage ML folks to work on this problem).
Other open-access datasets for fNIRS and mental workload also seem to be available in other formats (e.g. the dataset by Shin et al [3] recorded at TU-Berlin), so a general guide/outline of how to "convert" data to SNIRF I think would be broadly helpful in moving more of the community to use SNIRF and benefit from open standards.
Best,
Mike Hughes
[1] Paper: https://openreview.net/pdf?id=QzNHE7QHhut
[2] Project Website: https://tufts-hci-lab.github.io/code_and_datasets/fNIRS2MW.html
[3] Shin et al dataset link: http://doc.ml.tu-berlin.de/simultaneous_EEG_NIRS/
Lots of implementations (critically, the Homer3/AtlasViewer BUNPC sponsored MATLAB apps) are saving fields that should just be single ints, floats, strings as H5_ARRAY with length 1, ndim 1.
This is a formatting difference that affects the way such Datasets must be loaded across reader implementations.
pysnirf2 invalidates these files. Because this implementation is so widespread, it is worth considering supporting it. I am opposed, just like I am opposed to supporting both fixed and variable length strings... As in #72 the issue has arisen because the default behavior of the HDF5 library differs.
Hasan Ayaz raised the issue
One suggestion is related to adding a dedicated multimodal data collection component by adding a new container, this could contain a string list of other biomedical signals (e.g. EEG, ECG, PPG, EMG, EDA, etc.) collected together with fNIRS as well as time synchronization (i.e. time offset between the other time-series and fNIRS origin, and/or other key events) which is needed between these datasets in any case. These could be implemented over existing part (e.g. stim or aux ) but having a dedicated mechanism would be useful going forward.
I believe that this is already covered within the aux container. Am I missing something?
Some users would like to have metadata attached to each stimulus trial that they can then utilize to include or exclude trials from averaging. This is quite useful, for instance, with infant looking time during each trial.
To permit this, we would need to add an optional matrix of values something like /nirs(i)/stim(j)/metadata
This would have to have the same number of rows as /nirs(i)/stim(j)/data
But the number of columns is unconstrained
/nirs(i)/stim(j)/metadataNames
This would be optional and provide a useful descriptor name for each column of the metadata.
@fangq @sstucker @jayd1860 and others
Please comment on any issues with adding these optional fields to SNIRF.
Dear all,
This question is related to #22, as I don't understand how chromophore such as HbO,HbR, and HbT can be stored in snirf file.
What I understand from the specification and from #22 is that, when storing processed data such as Hb0, in /nirs(i)/data(j)/measurementList(k)/ :
but then I don't know how to set the following values in :
Also, shouldn't /nirs(i)/probe/wavelengths be optional when storing processed data such as HbO, HbR, Hbt ?
As similar question, when storing raw data, what should measurementList(k).dataTypeLabel contain ? Shouldn't there be a "raw" option here ? it's an optional field so I guess we don't specify it but measurementList(k)/dataTypeIndex is required so what should the value of dataTypeIndex in this case ?
Best regards,
Edouard
Hi Qianqian,
NOTE: Before I respond to your email just wanted to mention, while we say what is an HDF5 Groups in the spec, we do not explicitly say what is a HDF5 Dataset (maybe we should to avoid confusion, but that's a discussion for a later time). We only imply it by using any non-group basic type such as "numeric" or "string" to mean HDF5 Dataset. So my answer will assume that "numeric" or "string" next to "Type:" is basically an HDF5 Dataset. Now on to my response to your email ...
Your statement: "the entire metaDataTags can be just a single dataset of string ...".
Yes exactly! If we choose to define metaDataTags as a Dataset of type string, then we simply change Type: from the current "group array" to "2-D string array" like this
/nirs/metaDataTags
Type: 2-D string array
Regarding the alternative way: Sorry, I didn't mean to imply anything complicated like hash maps (we don't care about quick searches through hash tables for instance). My other proposal was simply to use the same type of definition we use for all the other indexed groups of structured variables in the SNIRF spec. Let's take stim as an example: each element of the stim group contains a single name/data condition. In the SNIRF specification looks like this:
/nirs(i)/stim(j)
Type: indexed group
/nirs(i)/stim(j)/name
Type: string
/nirs(i)/stim(j)/data
Type: numeric 2-D array
I'm proposing we define metaDataTags the same way, as an indexed group with 2 Dataset fields: name and value. So each element of metaDataTags would contain ONE tag. This definition adapted to our SNIRF spec would look like this
/nirs/metaDataTags(i)
Type: indexed group
/nirs/metaDataTags(i)/name
Type: string
/nirs/metaDataTags(i)/value
Type: string
To be clear, it really does not matter to me which of the 2 ways we use, as long as we address my main point: that the current definition of metaDataTags does NOT define ANY dataset - we only define a group, which in the HDF5 world does not contain basic data (not directly anyway).
Either of the above proposals solves this issue by defining Datasets: the first as a string array, the second as an indexed group of name/value string Dataset fields.
Lastly, I'm not clear about one aspect of the "SNIRF data format summary" section (which is otherwise a great overview of the SNIRF data structure !). Under metaDataTags, there's a list of tag values like "SubjectID", "MeasurementDate", "MeasurementTime", "LengthUnit", etc as if they were variable names. But, unless I'm misunderstanding something, these are not variable names, they are tag values (required though they may be). It seems to me, their required status belongs only in the description not in the data structure specification where only variable names belong.
In any case let me know what you think.
I would be happy to modify the spec to use one of the above definitions if no one has any objections.
Jay
Not sure if @huppertt will get this.
Qianqian, Jay, Meryem and I think instead of aux.landmark and aux.landmarkName should be sd.landmark and sd.landmarkName
Are you okay with that?
Extracted from the Google Docs: https://docs.google.com/document/d/1kLQnFNSXsCAcUNcCPtyTerSwatCOnLcX4fAaZsidCbI/edit
David Boas
One issue that remains to be resolved is how to handle calculated or derived data types. The specification presently supports several raw data types. It is desirable to add a data type for concentration results. An issue we are struggling with is that every channel of data, i.e. column of "d", has a corresponding descriptor in the "ml" structure. The "ml" structure indexes the source, detector, and data type for the corresponding data channel. It also indexes the wavelength. For concentration, there is no wavelength. Thus, it seems that if we have a data type for concentration, then the corresponding "ml(n).wavelengthIndex" field would be ignored. In addition, the "ml(n).dataTypeIndex" could be used to reference what chromophore is stored in the data. The list of chromophores could be provided by sd.Chromophores, which could be a string array with possible entries of "HbO", "HbR", "H2O", "aa3", etc.
Luca Pollonini
I agree on having to address the data type, as of raw vs. processed, if not both. For instance, Shimadzu outputs both raw (3 wavelengths) and hemoglobin (2 chromophores) data on the same file, and it would be nice to make SFNIRS compatible to their preferred output to facilitate them embracing the SNIRF standard.
A separate issue for discussion is whether to include other probe/anatomical variables of interests, e.g. a certain number of fiducial points alongside the source and detector positions. If the matrix qform is missing, these will be needed to operate any affine transformation towards another coordinate system or an anatomical scan. The aux seems to be suited for time course variables, so it is less than ideal for additional locations. We could always promote recording 3D locations in a separate file (e.g., AtlasViewer-like), but it would be nice to attempt standardize that as well.
In addition, I also wonder if there should be an additional field(s) in stim for adding optional trial-specific data, such as reaction time, response accuracy or other variables extracted from presentation programs like E-prime. As above, these could live in a separate file, but it could be useful to forward think about integrating other behavioral data.
Felipe Orihuela-Espina
Will a set of predefined constants ensure a better compatibility (e.g. it will avoid different nomenclature "HbO2" "O2Hb", etc although at the cost of needing periodic review of the constant list as new parameters may be measured?
Also, will this file format consider data not just reconstructed but also more or less heavy processed? Is it expected that it keeps track of the status of the data alone, or to be fully capable of "remembering" every processing operation that the data have underwent as well as the processing operations metaparameters? Perhaps a solution is to have a "list" of data, each "data" unit represents the data in a particular state of processing, and then include some information alike the "Comando" software pattern.
Is there a reason why snirf(i).probe does not simply carry over the .SD probe structure used previously by .nirs?
While I understand the addition of fields to support fluorescence (wavelengthEmission), frequency domain (frequencies), and time domain (timeDelays, timeDelayWidths, momentOrders) data types, it seems some important fields were lost. Some field names, while they changed from SD to snirf(i).probe, are easily converted (SD.SrcPos became snirf(i).sourcePos2D, SD.Lambda became snirf(i).wavelengths).
However, it is unclear how DummyPos, SpringList, and AnchorList are translated from .SD to snirf. These three fields allow the translation of a 2D probe onto a 3D model. In particular, modular fNIRS designs heavily leverage DummyPos, SpringList, and AnchorList to maintain rigid distances between optodes on the same module (intra-module channels), and allow for flexibility between optodes on different modules (inter-module channels), when registering a probe to a 3D head model.
For example (https://github.com/cotilab/moca#3d-export):
A modular probe designed in 2D
Maintains it's modular architecture through extensive optode relationships (inter vs intra module springs). These spring relationships ensure modules (fixed circuit boards) don't stretch--only distances between modules can be adjusted.
Which allow "modules" (or relationships between certain channels) to be registered.
I see two potential solutions.
Option 1 seems like the simplest, most compatible option. It would allow AtlasViewer to export .SD files and easily append to snirf files. The downside if that you'll have multiple fields with different names but describing the same things (e.g. nirs(i).probe.SD.SrcPos and snirf(i).sourcePos2D define the same thing).
Option 2 would require AtlasViewer to export snirf files, with only the nirs(i).probe populated.
Whoever option we choose, we also need to update the nirs2snirf converter. (For example, I am unable to load probes using PlotProbe in Homer 3 from snirf files that have been converted from nirs files with nirs.SD inside (likely because these .SD fields are not carried over). However, I'll post this issue in Homer3 repository to keep us organized. )
Thoughts?
I just attempted to sign up to the mailing list by following the link in the README to https://fnirs.org/resources/software/snirf/ and sending an email to the subscription address [email protected]. However, this email was unable to be delivered with the error snirf-subscribe wasn't found at fnirs.org
.
Does the mailing list still exist? If so, is it down? Or have I misunderstood the instructions on the website?
It is advantageous for outsiders and beginners to have user-readable fieldnames. I understand that many aspects from the .nirs specifications were taken over, but the fieldnames can be very confusing and certainly be longer than one or two characters.
Some proposals for more human-readable names:
Some proposals concerning consistency:
Reference:
Compare with other standard formats, such as:
We have been internally evaluating the use of SNIRF as a native output format for Gowerlabs' Lumo system.
Lumo is a high density system, and our full head adult caps contain 54 modules, each with 3 dual-wavelength sources and 4 detectors. We are able to provide a dense output, which results in (54 x 4 x 54 x 6 = ) circa 70k channels.
The use of an HDF5 group per channel descriptor (e.g. /data1/measurementList{i}
) appears to incur significant overhead. For example, a SNIRF file containing only metadata (no channel data) for a full head system system amounts to ~200MiB, or ~3KiB per channel. The actual information content of each descriptor (containing only the required fields plus module indices) amounts to only (7 x 4 = ) 28 bytes, so this is an overhead of approximately 99%.
Our results appear vaguely consistent with this analysis:
The overhead involved just in representing the group structure is enough that it doesn't make sense to store small arrays, or to have many groups, each containing only a small amount of data. There does not seem to be any way to reduce the overhead per group, which I measured at about 2.2 kB.
Evidently the size of the metadata grows linearly with the number of channels, as does the data rate of the channel time series, and hence for longer recordings the size of the metadata becomes proportionally smaller. However in absolute terms we find that (with appropriate chunking and online compression) the metadata corresponds to around four minutes of compressed raw channel data. Given the length of a typical measurement session, the overhead remains significant.
I appreciate that the majority of systems (such as those of the manufacturers listed on the SNIRF specification page) are of a much lower density than Lumo, and that even high density systems often produce sparse data, but evidently the trend is towards increasing density and the number of wavelengths. Our future products would, based on the current SNIRF specification, generate over 0.5GiB of metadata.
Does the snirf format support to add individual fields that are not defined in the snirf format or would this cause an incompatibility with the format?
I'm thinking about more detailed descriptions of the preprocessing that may was performed.
Or more experimental / participant / manufacturer / device information that are currently not part of the snirf format.
I could already store additional information in /nirs(i)/metaDataTags/ but as far as I understood it this tag doesn't support groups but only datasets if I'm not mistaken ("Each metadata record is represented as a dataset...") . Or would it be OK to also store groups within the metaDataTags group?
Thanks for the clarification.
here is a list of sub-fields and their preferred data types:
I have a python library to read and write .snirf files here. Is it best to merge it into this tree, or keep it separate?
Once the following issues are resolved I suggest we release a new version 1.1.0
:
In addition to the issues above, here is the changelog since version 1.0:
[<f>,...]+
in the table when it should be [[<f>,...]]+
[[<f>,...]]+
in the table when it should be [<f>,...]+
numeric
in the document when they should be 1-D numeric array
numeric
in the document when it should be 2-D array
Additionally, we now have the wonderful validator, so that's great too.
It is specified that for "the special case of equal sample spacing a shorthand <2x1>
array is allowed" for the /nirs(i)/data(j)/time
dataset (and other time vectors). However the type of this field is "numeric 1-D array", and it is also specified a "SNIRF field specified by this document as a numeric 1-D array must occupy a dataspace with rank of 1".
Evidently this is contradictory. Should this text perhaps read, e.g., "For the special case of equal sample spacing, an array of length 2 is allowed where ..." ?
I am writing code for reading and writing snirf files. And I would like to check against specific versions of the specification. Currently I am targeting draft 3 which has a ‘formatVersion’ of 1.0.
will draft 4 also have a ‘formatVersion’ version of 1.0? And if so, how is the software to differentiate these two versions? My understanding is that snirf has been officially released [1]. So it would be useful to track different versions in our continuous integration.
if there is currently no way to differentiate between files created with different drafts then can I suggest that we specify that files with v1 draft 3 have ‘formatVersion’ 1.0.3 to indicate a minor non breaking change. And draft 4 would be 1.0.4? Or something like https://semver.org/
Rather than a single module index for both source and detector:
/nirs(i)/data(j)/measurementList(k)/moduleIndex
,
Can we use individual optode module indices
/nirs(i)/data(j)/measurementList(k)/sourceModuleIndex
/nirs(i)/data(j)/measurementList(k)/detectorModuleIndex
As modular fNIRS become more abundant, inter-module channels (channels with sources and detectors on different modules) are a simple way of increasing channel density. The current snirf specification only allows for intra-module channels (ie channels where sources and detectors are on the same module since they have the same moduleIndex).
I understand that simply globally enumerating sources and detectors (regardless of modules) solves this problem, but it doesn't allow a user to study only intra- or only inter-module channels.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.