hed-standard / hed-javascript Goto Github PK
View Code? Open in Web Editor NEWHED/BIDS-friendly JavaScript validator.
Home Page: https://hed-javascript.readthedocs.io/en/latest
License: MIT License
HED/BIDS-friendly JavaScript validator.
Home Page: https://hed-javascript.readthedocs.io/en/latest
License: MIT License
In order to provide more flexibility and separation of concerns, we're planning to restructure the SchemaAttributes
object. Instead of its current role as a direct container for mappings from (string) entity names to attribute data, the plan is to create a new suite of objects to contain this data, transforming SchemaAttributes
into a container for these new entity objects.
The error message when no HEDVersion
is given in the dataset_description.json
is not informative:
Your dataset is not a valid BIDS dataset.
view 1 error in 1 file
Error 1: [Code 104] HED_ERRORClick here for more information about this issue
The validation on this HED string returned an error.
The ParsedHedTag
class assumes in too many places that its schema
field is filled with an actual Schema
object, rather than undefined
. This causes errors like the one seen in bids-standard/bids-validator#1869 (comment).
Currently, non-SI units that should have plural forms generated do not have plurals in HED 3.
The convertOldSpecToSchemasSpec
function was left unimplemented when support for the current BIDS HED schema specification model was released. It was intended to map the originally proposed format (see hed-standard/hed-specification#156) to the approved spec. Two questions:
CC @VisLab
The warning for missing column value in the BIDS validator does not say what column value is missing:
Warning 2: [Code 108] HED_MISSING_VALUE_IN_SIDECAR
Click here for more information about this issueThe json sidecar does not contain this column value as a possible key to a HED string.
The link just goes to neurostars and isn't relevant.
The stringParser
test module is light on tests that focus on issues specific to HED 3 concerns, like strings with multiple groups and deeply nested groups. We should add specific tests to verify that the parser correctly handles the many parentheses used in HED 3 strings, as well as correctly dealing with string conversion issues.
As a long-range goal, we should try to integrate Babel so we can rewrite some of the more verbose code using more modern ECMAScript idioms and syntax. This should help with the Code Climate score.
Definitions in BIDS datasets are often defined using sidecar keys that are dedicated to this purpose. These keys do not correspond to any actual TSV column names. Currently, since no TSV references this data, the definitions contained in these key values do not make it to the definition map, thus causing a plethora of missingDefinition
errors (since this issue does not have a defined message, this shows up as a generic HED error).
This should not be an error. Users do not know about HED schema nodes when creating their tsv files. Those columns can contain anything.
Currently, BIDS TSV issues do not contain the line number in the TSV file at which the problematic string occurs. The Issue
constructor in BIDS contains a line
parameter for this purpose, and we can add a line
field to BidsIssue
, which would be passed verbatim to that constructor by the BIDS linking code.
However, the line index integrity is not kept by the HED string array returned by parseTsvHed
, which omits blanks. There are two steps to resolving this issue:
Issue
objects to the line indices (or indirectly through the HED strings) so the mapped BidsHedIssue
can be passed the line index.This issue results from, but is independent of, #58.
If event-level validation fails, file-level validation may produce unexpected or useless errors. Therefore, it should be skipped.
The reduced version of the schema is obsolete. We need to update the tests to use v8.0.0-alpha.1.
The ugly tokenizeHedString
function should be refactored using an OO model. This will help clean up the variable scoping and modularize the code.
Reorganize the tests using the foreach testing facility to run the same tests with different versions of the schema (i.e., 8.0.0, 8.1.0) in an organized fashion.
The Hed3Validator
class uses its own code path to access the list of tags with the unique
and required
attributes. Therefore, for code coverage reasons, there should be separate HED 3-specific tests for these two validation checks.
As of the posting of this issue, there are no tags in any released HED 3 stable schema with either attribute. Therefore, this issue serves as a reminder to go back once such a schema is available and add tests using those tags.
Temporal tags (i.e. Onset
, Offset
, and Inset
) are dependent on the timestamp of their event in the event's TSV file. Currently, the timestamps (found in the onset
column of the TSV file where present) are not used in validation.
In order to properly validate temporal tags in a time-respecting way, we must:
A clarification has been made in the spec that allows values to be node names. This is in the process of being incorporated into the hed validator.
Definitions are a special case: Definition names cannot be node names. However, Definitions can also take values:
(Definition/My-def/# .... )
The values substituted for #
can be node names in Def/My-def/Red
or (Def-expand/My-def/Red .... )
The modeling for schema entries was completely revamped for HED 3 schemas by #56. The HED 3 schema parsing tests therefore need to be rewritten to use the new classes.
The warning of tag extensions reports every occurrence rather than once: It doesn't show location in the file --- just that it occurred. The message shows "102985 more files" have this problem. In fact that, this number corresponds to the total number of occurrences of the issue across the dataset, not the number of files. This causes the validator to run extremely slowly. How can we address?
3: [WARN] The validation on this HED string returned a warning. (code: 105 - HED_WARNING) ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore" ... and 102985 more files having this issue (Use --verbose to see them all).
The HED 3 spec specifies special handling for Onset
and Offset
tags, as follows:
Onset
and Offset
must have an associated Label
tag in the same tag group.Offset
tag must occur after a corresponding Onset
tag with the same Label
.A validation check (or checks) must be added to check for both of these requirements.
The HED 3 specification specifies valid values for placeholders (#
) in accordance with hed-standard/hed-specification#33 (comment). The validation of placeholder values in this validator needs to be rewritten to comply with this new requirement.
The top-level validation code (which checks for required tags) does not check for a valid schema, which would lead to errors.
The schema attribute parser will have to be rewritten to conform to the new schema format. The new SchemaAttributes
format is as follows:
{
"tags": [ /*array of tags*/ ],
"tagAttributes": {/*tag*/: {/*attribute*/: /*value or [array of values]*/}},
"tagUnitClasses": {/*tag*/: [ /*array of unit classes*/ ]},
"unitClasses": {/*unitClassName*/: [ /*array of units*/ ]},
"unitClassAttributes": {/*unitClassName*/: {/*attribute*/: /*value or [array of values]*/}},
"unitAttributes": {/*unitName*/: [ /*array of attributes*/ ]},
"unitModifiers" {/*unitModifierType/: [ /*array of unit modifiers*/ ]},
}
The HED specification provides a JSON-based test suite at https://github.com/hed-standard/hed-specification/tree/master/tests, which provides a set of example data (including strings, sidecars, TSV files, and combinations thereof), which should pass, and expected errors for those that should fail. This validator should eventually support validating against this test suite in an automated fashion. This would not directly affect the validator's built-in test modules.
The validation could be implemented in two stages:
HED 3 introduced the concept of value classes to represent the sets of legal values for each value-taking tag. This will make value validation more robust and schema-driven, as well as allow library schemas to define their own value classes. In order to support value validation using value classes, the value classes have to be parsed (an initial implementation was added as part of the rewrite in #56), and a new validation function must be written to use the value class data required for a given tag. HED 2 will continue to use the existing implementation.
The HED 3 spec allows people to implement library schemas to supplement the base schema. We must support these library schemas by writing code to load them, modifying the parser to detect when they are used in tags, and adjusting existing code to reference their SchemaEntry
data when tags are pulled from them.
Configure the .codeclimate.yml to allow threshold.
See https://github.com/hed-standard/hed-python/blob/master/.codeclimate.yml.
As a result of fixes made in #45, node names which do not appear in the schema in any form now generate two invalidTag
issues with basically the same content, one from short-to-long conversion and the other from tag validation. Since we can't eliminate either check entirely (since they don't completely overlap), we'll need to decide whether to conflate the issues (i.e. merge them) or keep both (perhaps distinguishing them with a field introduced for #46).
Currently, conversion (i.e. short-to-long and long-to-short) issues are only distinguished from standard validation issues by the inclusion of the sourceString
field, which represents the pre-conversion string. Adding an additional field to Issue
to tag whether an issue is the result of conversion or validation could be useful in the future.
The expected issue lists in the event.spec.js
test suite contain some redundant code, namely repeated calls to generateIssue
and converterGenerateIssue
. In the future, we should consider whether it would be desirable to condense these lists to include just the data and move those calls to the validator base functions.
For example, the current test in dataset.spec.js
(a good example of one with multiple issue types), edited to the form of event.spec.js
, would be:
[
generateIssue('extension', {
tag: testDatasets.multipleInvalid[0],
}),
generateIssue('unitClassInvalidUnit', {
tag: testDatasets.multipleInvalid[1],
unitClassUnits: legalTimeUnits.sort().join(','),
}),
converterGenerateIssue(
'invalidTag',
testDatasets.multipleInvalid[2],
{},
[0, 12],
),
]
Several possibilities exist for condensing this. Some are listed below.
{
extension: [{
tag: testDatasets.multipleInvalid[0],
}],
unitClassInvalidUnit: [{
tag: testDatasets.multipleInvalid[1],
unitClassUnits: legalTimeUnits.sort().join(','),
}],
invalidTag: [[
testDatasets.multipleInvalid[2],
{},
[0, 12],
]],
}
Support for validating Delay
and Duration
tag groups is required as of HED schema version 8.2.0. Currently, these tag groups are only validated using the topLevelTagGroup
schema attribute on the respective tags. The following validation also needs to be performed:
Delay
and Duration
behave similarly to Onset
and Inset
syntactically, in that tag groups with that tag must contain the tag and a tag group, and nothing else (with one exception).Delay
and Duration
may appear together in the same tag group (at the first level of the group).See #140 for further details.
Even though HED 2 (schema versions < 8.0.0-alpha) is considered superseded, we must still support validation of HED 2 datasets indefinitely. However, much of the code restructuring that is currently ongoing is based on the HED 3 schema model, which is not entirely compatible with HED 2's, and adapting the new structure to the older HED 2 schema model would be painful. Therefore, as much of the HED 2-specific code should be sectioned off and frozen, with only the bare minimum number of changes made from the current state of the code to ensure compatibility. This will likely involve liberal use of additional object-orientation and subclassing (much like the recent rewrite of the event-level validation code).
There are currently several skipped tests (as of the posting of this issue):
Onset
and Offset
references in the same stringThese need to be cleaned up, modified, or deleted.
The issue return format from several stringParser
functions was changed in #45, resulting in a few tests breaking. A temporary fix was included in that PR to convert the new format into a version compatible with the old one, which deferred the needed test adjustments at that time. This issue has been posted as a note to go back and fix those to match the new return format, which is more fine-grained and should result in better tests overall.
async
tests with Promise returns.() => {}
forms.The readme has not been updated to include short/long conversion, and is generally hard to add to. We should rewrite it to make it read more like an API document.
The HED 3 spec states that Definition
tags may only appear within "dummy" entries within a sidecar. This must be checked at the BIDS layer.
HED
columns and sidecar keys corresponding to TSV columns.Definition
groups. Any other tag combination is an error.
Definition
group. (Any sidecar key containing a Definition
group must only contain Definition
groups.)potentialSidecars
array in descending order of slashes (leaf to root) and iterate until the column name is found. Add that key to that sidecar's list of non-Definition
keys.Some of the testcases use the bundled v8.0.0-alpha schema versions. These need to be ported to the stable v8.0.0 schema, which will require some string rewrites.
Validator should give a warning when a tag extension is done (not for #). What has happened is that people are accidentally extending when they actually meant to use a tag already in the hierarchy.
The memoized properties in ParsedHedGroup
were accidentally merged with empty-string placeholder keys, rendering them useless. This needs to be fixed.
Copied from hed-standard/hed-python#189.
The HED validator is not correctly detecting duplicate tags that are extensions.
Example: Attribute/Red
has no validation errors. Short-to-long conversion correctly detects the error:
Unable to convert HED string:
ERROR: 'Red' appears as 'Attribute/Sensory/Visual/Color/CSS-color/Red-color/Red' and cannot be used as an extension. 10, 13
Based on the Python issue, this is how it should be handled:
Red
in their hierarchy so Attribute/Red
would always be an error regardless of HED version.Attribute/Bananas
would always generate a warning regardless of the version since Bananas
is a tag extension.The idea is that putting in the restriction that tag extensions can't be in the hierarchy already simplifies the logic.
The validator does not currently validate the isNumeric
attribute, thus allowing tags like (in 8.0.0-alpha.1) RGB-red/Blah
. This should be fixed.
Most of the short-to-long converter tests are duplicated, where the short-to-long test is an inversion of the long-to-short test. The test module could be greatly simplified by merging duplicate data.
Currently, there is a stub named validateSchemaSpec
that was intended to validate schema specifications (i.e. nickname/library/version specs), probably in the already parsed form of SchemaSpec
objects, but nothing calls it and it currently only does an object type check. What would need to be done to implement this?
HED data is being included in BIDS data files beyond HED's traditional use in events, such as participants.tsv
. This validator needs to support validating those newer uses.
The ONSET_INSET_INSET_OFFSET_ERROR
has been renamed TEMPORAL_TAG_ERROR
in the HED specification so that Duration
and Delay
errors will fall under this umbrella.
What is the status of validation for the Duration
and Delay
tags beyond the top-level-tag group? How is the requirement that at most one Duration and one Delay
can be in a tag group being handled?
Ref: PRs hed-python#879 and hed-specification#567.
Currently, the HED
column of BIDS TSV files only undergoes event-level validation. However, fixing #58 will require restricting tag extension warning generation to string-level validation only, thus necessitating string-level validation of this column as well.
Over time, we should take steps to improve the quality of the code base, and consequently the Code Climate score.
This is a tracking bug.
The error message strings should be stored separately from the validator code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.