hed-standard / hed-javascript Goto Github PK

View Code? Open in Web Editor NEW

2.0 3.0 6.0 2.71 MB

HED/BIDS-friendly JavaScript validator.

Home Page: https://hed-javascript.readthedocs.io/en/latest

License: MIT License

JavaScript 99.98% Shell 0.02%

hed bids bids-format event annotation validation javascript

hed-javascript's People

Contributors

Stargazers

Watchers

Forkers

rwblair nellh hed-maintainers garrettmflynn ianca vislab

hed-javascript's Issues

Restructure SchemaAttributes object

In order to provide more flexibility and separation of concerns, we're planning to restructure the SchemaAttributes object. Instead of its current role as a direct container for mappings from (string) entity names to attribute data, the plan is to create a new suite of objects to contain this data, transforming SchemaAttributes into a container for these new entity objects.

Error message when no HEDVersion

The error message when no HEDVersion is given in the dataset_description.json is not informative:

Your dataset is not a valid BIDS dataset.
view 1 error in 1 file
Error 1: [Code 104] HED_ERROR

Click here for more information about this issue
The validation on this HED string returned an error.

ParsedHedTag assumes schema field is populated

The ParsedHedTag class assumes in too many places that its schema field is filled with an actual Schema object, rather than undefined. This causes errors like the one seen in bids-standard/bids-validator#1869 (comment).

Non-SI units do not have plurals in HED 3 schemas

Currently, non-SI units that should have plural forms generated do not have plurals in HED 3.

Implement or delete converter stub from old-style BIDS schema spec to current spec

The convertOldSpecToSchemasSpec function was left unimplemented when support for the current BIDS HED schema specification model was released. It was intended to map the originally proposed format (see hed-standard/hed-specification#156) to the approved spec. Two questions:

Does anyone still use the originally proposed model for specifying HED schema versions?
Does this function still need to exist?

CC @VisLab

Warning for missing column values does not identify

The warning for missing column value in the BIDS validator does not say what column value is missing:

Warning 2: [Code 108] HED_MISSING_VALUE_IN_SIDECAR
Click here for more information about this issue

The json sidecar does not contain this column value as a possible key to a HED string.

The link just goes to neurostars and isn't relevant.

Add new HED 3-specific stringParser tests

The stringParser test module is light on tests that focus on issues specific to HED 3 concerns, like strings with multiple groups and deeply nested groups. We should add specific tests to verify that the parser correctly handles the many parentheses used in HED 3 strings, as well as correctly dealing with string conversion issues.

Rewrite in modern JavaScript dialect

As a long-range goal, we should try to integrate Babel so we can rewrite some of the more verbose code using more modern ECMAScript idioms and syntax. This should help with the Code Climate score.

Definition-only sidecar keys are not used

Definitions in BIDS datasets are often defined using sidecar keys that are dedicated to this purpose. These keys do not correspond to any actual TSV column names. Currently, since no TSV references this data, the definitions contained in these key values do not make it to the definition map, thus causing a plethora of missingDefinition errors (since this issue does not have a defined message, this shows up as a generic HED error).

Values replacing #'s may be node names

This should not be an error. Users do not know about HED schema nodes when creating their tsv files. Those columns can contain anything.

Pass TSV line number through to BidsHedIssue object

Currently, BIDS TSV issues do not contain the line number in the TSV file at which the problematic string occurs. The Issue constructor in BIDS contains a line parameter for this purpose, and we can add a line field to BidsIssue, which would be passed verbatim to that constructor by the BIDS linking code.

However, the line index integrity is not kept by the HED string array returned by parseTsvHed, which omits blanks. There are two steps to resolving this issue:

Restoring the index integrity.
Linking the generated (HED) Issue objects to the line indices (or indirectly through the HED strings) so the mapped BidsHedIssue can be passed the line index.

This issue results from, but is independent of, #58.

Don't perform event file-level (dataset-level) validation if event-level validation fails

If event-level validation fails, file-level validation may produce unexpected or useless errors. Therefore, it should be skipped.

Update new version tests to use v8.0.0-alpha.1

The reduced version of the schema is obsolete. We need to update the tests to use v8.0.0-alpha.1.

Rewrite tokenizeHedString as class

The ugly tokenizeHedString function should be refactored using an OO model. This will help clean up the variable scoping and modularize the code.

Add foreach testing to allow multiple schema tested

Reorganize the tests using the foreach testing facility to run the same tests with different versions of the schema (i.e., 8.0.0, 8.1.0) in an organized fashion.

Add HED 3-specific tests for unique and required tags

The Hed3Validator class uses its own code path to access the list of tags with the unique and required attributes. Therefore, for code coverage reasons, there should be separate HED 3-specific tests for these two validation checks.

As of the posting of this issue, there are no tags in any released HED 3 stable schema with either attribute. Therefore, this issue serves as a reminder to go back once such a schema is available and add tests using those tags.

Use TSV event times in temporal tag validation

Temporal tags (i.e. Onset, Offset, and Inset) are dependent on the timestamp of their event in the event's TSV file. Currently, the timestamps (found in the onset column of the TSV file where present) are not used in validation.

In order to properly validate temporal tags in a time-respecting way, we must:

Extract the timestamp data for the row and associate it with the row.
Merge rows with the same timestamp into a single event.
Sort the events by timestamp.
- This must be done before temporal tag validation (cf. hed-standard/hed-python#843 and bids-standard/bids-specification#1691)
Check that conflicting temporal tags for the same label do not occur at the same timestamp (cf. hed-standard/hed-schemas#65)

Values that are node names

A clarification has been made in the spec that allows values to be node names. This is in the process of being incorporated into the hed validator.

Definitions are a special case: Definition names cannot be node names. However, Definitions can also take values:

(Definition/My-def/# .... ) The values substituted for # can be node names in Def/My-def/Red or (Def-expand/My-def/Red .... )

Rewrite HED 3 schema parsing tests to match new SchemaEntries model

The modeling for schema entries was completely revamped for HED 3 schemas by #56. The HED 3 schema parsing tests therefore need to be rewritten to use the new classes.

Warning issues causing the validator to run extremely slowly.

The warning of tag extensions reports every occurrence rather than once: It doesn't show location in the file --- just that it occurred. The message shows "102985 more files" have this problem. In fact that, this number corresponds to the total number of occurrences of the issue across the dataset, not the number of files. This causes the validator to run extremely slowly. How can we address?

   3: [WARN] The validation on this HED string returned a warning. (code: 105 - HED_WARNING)
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ... and 102985 more files having this issue (Use --verbose to see them all).

Implement validation of Onset and Offset tags

The HED 3 spec specifies special handling for Onset and Offset tags, as follows:

Both Onset and Offset must have an associated Label tag in the same tag group.
An Offset tag must occur after a corresponding Onset tag with the same Label.

A validation check (or checks) must be added to check for both of these requirements.

Rewrite placeholder value validation

The HED 3 specification specifies valid values for placeholders (#) in accordance with hed-standard/hed-specification#33 (comment). The validation of placeholder values in this validator needs to be rewritten to comply with this new requirement.

Semantic-only top-level validation check runs with no schema

The top-level validation code (which checks for required tags) does not check for a valid schema, which would lead to errors.

Write new schema attribute parser

The schema attribute parser will have to be rewritten to conform to the new schema format. The new SchemaAttributes format is as follows:

{
    "tags": [ /*array of tags*/ ],
    "tagAttributes": {/*tag*/: {/*attribute*/: /*value or [array of values]*/}},
    "tagUnitClasses": {/*tag*/: [ /*array of unit classes*/ ]},
    "unitClasses": {/*unitClassName*/: [ /*array of units*/ ]},
    "unitClassAttributes": {/*unitClassName*/: {/*attribute*/: /*value or [array of values]*/}},
    "unitAttributes": {/*unitName*/: [ /*array of attributes*/ ]},
    "unitModifiers" {/*unitModifierType/: [ /*array of unit modifiers*/ ]},
}

Implement support for HED specification's JSON test suite

The HED specification provides a JSON-based test suite at https://github.com/hed-standard/hed-specification/tree/master/tests, which provides a set of example data (including strings, sidecars, TSV files, and combinations thereof), which should pass, and expected errors for those that should fail. This validator should eventually support validating against this test suite in an automated fashion. This would not directly affect the validator's built-in test modules.

The validation could be implemented in two stages:

Simple pass-fail validation (without regard to the exact error code).
Validating the actual error code.

Re-implement value validation using value classes

HED 3 introduced the concept of value classes to represent the sets of legal values for each value-taking tag. This will make value validation more robust and schema-driven, as well as allow library schemas to define their own value classes. In order to support value validation using value classes, the value classes have to be parsed (an initial implementation was added as part of the rewrite in #56), and a new validation function must be written to use the value class data required for a given tag. HED 2 will continue to use the existing implementation.

Implement support for library schemas

The HED 3 spec allows people to implement library schemas to supplement the base schema. We must support these library schemas by writing code to load them, modifying the parser to detect when they are used in tags, and adjusting existing code to reference their SchemaEntry data when tags are pulled from them.

Look at setting line length to 100 in prettier.

Configure the .codeclimate.yml to allow threshold.

See https://github.com/hed-standard/hed-python/blob/master/.codeclimate.yml.

Fix duplicate invalidTag issue generation

As a result of fixes made in #45, node names which do not appear in the schema in any form now generate two invalidTag issues with basically the same content, one from short-to-long conversion and the other from tag validation. Since we can't eliminate either check entirely (since they don't completely overlap), we'll need to decide whether to conflate the issues (i.e. merge them) or keep both (perhaps distinguishing them with a field introduced for #46).

Explicitly tag conversion issues

Currently, conversion (i.e. short-to-long and long-to-short) issues are only distinguished from standard validation issues by the inclusion of the sourceString field, which represents the pre-conversion string. Adding an additional field to Issue to tag whether an issue is the result of conversion or validation could be useful in the future.

Simplify issue lists in event test suite

The expected issue lists in the event.spec.js test suite contain some redundant code, namely repeated calls to generateIssue and converterGenerateIssue. In the future, we should consider whether it would be desirable to condense these lists to include just the data and move those calls to the validator base functions.

For example, the current test in dataset.spec.js (a good example of one with multiple issue types), edited to the form of event.spec.js, would be:

[
  generateIssue('extension', {
    tag: testDatasets.multipleInvalid[0],
  }),
  generateIssue('unitClassInvalidUnit', {
    tag: testDatasets.multipleInvalid[1],
    unitClassUnits: legalTimeUnits.sort().join(','),
  }),
  converterGenerateIssue(
    'invalidTag',
    testDatasets.multipleInvalid[2],
    {},
    [0, 12],
  ),
]

Several possibilities exist for condensing this. Some are listed below.

{
  extension: [{
    tag: testDatasets.multipleInvalid[0],
  }],
  unitClassInvalidUnit: [{
    tag: testDatasets.multipleInvalid[1],
    unitClassUnits: legalTimeUnits.sort().join(','),
  }],
  invalidTag: [[
    testDatasets.multipleInvalid[2],
    {},
    [0, 12],
  ]],
}

Implement validation for Delay and Duration groups

Support for validating Delay and Duration tag groups is required as of HED schema version 8.2.0. Currently, these tag groups are only validated using the topLevelTagGroup schema attribute on the respective tags. The following validation also needs to be performed:

Delay and Duration behave similarly to Onset and Inset syntactically, in that tag groups with that tag must contain the tag and a tag group, and nothing else (with one exception).
Delay and Duration may appear together in the same tag group (at the first level of the group).

See #140 for further details.

Segregate and freeze HED 2-specific code

Even though HED 2 (schema versions < 8.0.0-alpha) is considered superseded, we must still support validation of HED 2 datasets indefinitely. However, much of the code restructuring that is currently ongoing is based on the HED 3 schema model, which is not entirely compatible with HED 2's, and adapting the new structure to the older HED 2 schema model would be painful. Therefore, as much of the HED 2-specific code should be sectioned off and frozen, with only the bare minimum number of changes made from the current state of the code to ensure compatibility. This will likely involve liberal use of additional object-orientation and subclassing (much like the recent rewrite of the event-level validation code).

Clean up skipped tests

There are currently several skipped tests (as of the posting of this issue):

HED 2 BIDS datasets
Remote loading of HED schemas
Node names as values (in converter) (see #57)
Node names as extensions (written as a HED 2 test)
Overlapping Onset and Offset references in the same string
Detecting slash locations
Validating values (HED 2)
Stripping valid units from valid values (HED 2)

These need to be cleaned up, modified, or deleted.

Improve stringParser tests affected by #45

The issue return format from several stringParser functions was changed in #45, resulting in a few tests breaking. A temporary fix was included in that PR to convert the new format into a version compatible with the old one, which deferred the needed test adjustments at that time. This issue has been posted as a note to go back and fix those to match the new return format, which is more fine-grained and should result in better tests overall.

Code quality

Replace all async tests with Promise returns.
Replace anonymous function syntax with () => {} forms.

Update readme

The readme has not been updated to include short/long conversion, and is generally hard to add to. We should rewrite it to make it read more like an API document.

Check that Definition tags are only found within BIDS sidecar "dummy" columns

The HED 3 spec states that Definition tags may only appear within "dummy" entries within a sidecar. This must be checked at the BIDS layer.

Non-"dummy" columns include TSV HED columns and sidecar keys corresponding to TSV columns.
"Dummy" columns can only contain Definition groups. Any other tag combination is an error.
- Assume for this check that a "dummy" column is any sidecar key containing a Definition group. (Any sidecar key containing a Definition group must only contain Definition groups.)
When a TSV column is encountered, sort the potentialSidecars array in descending order of slashes (leaf to root) and iterate until the column name is found. Add that key to that sidecar's list of non-Definition keys.

Port HED 3 tests to stable v8.0.0 schema

Some of the testcases use the bundled v8.0.0-alpha schema versions. These need to be ported to the stable v8.0.0 schema, which will require some string rewrites.

Validator should give a warning on tag extension

Validator should give a warning when a tag extension is done (not for #). What has happened is that people are accidentally extending when they actually meant to use a tag already in the hierarchy.

Fill in Memoizer keys for ParsedHedGroup

The memoized properties in ParsedHedGroup were accidentally merged with empty-string placeholder keys, rendering them useless. This needs to be fixed.

Validator is not detecting duplicate extension tags

Copied from hed-standard/hed-python#189.

The HED validator is not correctly detecting duplicate tags that are extensions.

Example: Attribute/Red has no validation errors. Short-to-long conversion correctly detects the error:

Unable to convert HED string:
ERROR: 'Red' appears as 'Attribute/Sensory/Visual/Color/CSS-color/Red-color/Red' and cannot be used as an extension. 10, 13

Based on the Python issue, this is how it should be handled:

No duplicate tags should ever allowed as extensions regardless of HED version. HED-2G and HED-3G both have Red in their hierarchy so Attribute/Red would always be an error regardless of HED version.
Tag extensions would always generate a warning, so Attribute/Bananas would always generate a warning regardless of the version since Bananas is a tag extension.

The idea is that putting in the restriction that tag extensions can't be in the hierarchy already simplifies the logic.

Validate isNumeric

The validator does not currently validate the isNumeric attribute, thus allowing tags like (in 8.0.0-alpha.1) RGB-red/Blah. This should be fixed.

Simplify converter testcases

Most of the short-to-long converter tests are duplicated, where the short-to-long test is an inversion of the long-to-short test. The test module could be greatly simplified by merging duplicate data.

Implement validation for BIDS HED schema version specifications

Currently, there is a stub named validateSchemaSpec that was intended to validate schema specifications (i.e. nickname/library/version specs), probably in the already parsed form of SchemaSpec objects, but nothing calls it and it currently only does an object type check. What would need to be done to implement this?

Validate HED in BIDS TSV files other than events

HED data is being included in BIDS data files beyond HED's traditional use in events, such as participants.tsv. This validator needs to support validating those newer uses.

ONSET_INSET_OFFSET_ERROR renamed -- need support for Duration and Delay validation

The ONSET_INSET_INSET_OFFSET_ERROR has been renamed TEMPORAL_TAG_ERROR in the HED specification so that Duration and Delay errors will fall under this umbrella.

What is the status of validation for the Duration and Delay tags beyond the top-level-tag group? How is the requirement that at most one Duration and one Delay can be in a tag group being handled?

Ref: PRs hed-python#879 and hed-specification#567.

Perform string-level validation on HED column of TSV files

Currently, the HED column of BIDS TSV files only undergoes event-level validation. However, fixing #58 will require restricting tag extension warning generation to string-level validation only, thus necessitating string-level validation of this column as well.

Improve Code Climate score

Over time, we should take steps to improve the quality of the code base, and consequently the Code Climate score.

This is a tracking bug.

Move error messages to separate module

The error message strings should be stored separately from the validator code.