Giter VIP home page Giter VIP logo

hed-javascript's People

Contributors

dependabot[bot] avatar happy5214 avatar ianca avatar vislab avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

hed-javascript's Issues

Restructure SchemaAttributes object

In order to provide more flexibility and separation of concerns, we're planning to restructure the SchemaAttributes object. Instead of its current role as a direct container for mappings from (string) entity names to attribute data, the plan is to create a new suite of objects to contain this data, transforming SchemaAttributes into a container for these new entity objects.

Add new HED 3-specific stringParser tests

The stringParser test module is light on tests that focus on issues specific to HED 3 concerns, like strings with multiple groups and deeply nested groups. We should add specific tests to verify that the parser correctly handles the many parentheses used in HED 3 strings, as well as correctly dealing with string conversion issues.

Rewrite in modern JavaScript dialect

As a long-range goal, we should try to integrate Babel so we can rewrite some of the more verbose code using more modern ECMAScript idioms and syntax. This should help with the Code Climate score.

Definition-only sidecar keys are not used

Definitions in BIDS datasets are often defined using sidecar keys that are dedicated to this purpose. These keys do not correspond to any actual TSV column names. Currently, since no TSV references this data, the definitions contained in these key values do not make it to the definition map, thus causing a plethora of missingDefinition errors (since this issue does not have a defined message, this shows up as a generic HED error).

Pass TSV line number through to BidsHedIssue object

Currently, BIDS TSV issues do not contain the line number in the TSV file at which the problematic string occurs. The Issue constructor in BIDS contains a line parameter for this purpose, and we can add a line field to BidsIssue, which would be passed verbatim to that constructor by the BIDS linking code.

However, the line index integrity is not kept by the HED string array returned by parseTsvHed, which omits blanks. There are two steps to resolving this issue:

  1. Restoring the index integrity.
  2. Linking the generated (HED) Issue objects to the line indices (or indirectly through the HED strings) so the mapped BidsHedIssue can be passed the line index.

This issue results from, but is independent of, #58.

Rewrite tokenizeHedString as class

The ugly tokenizeHedString function should be refactored using an OO model. This will help clean up the variable scoping and modularize the code.

Add HED 3-specific tests for unique and required tags

The Hed3Validator class uses its own code path to access the list of tags with the unique and required attributes. Therefore, for code coverage reasons, there should be separate HED 3-specific tests for these two validation checks.

As of the posting of this issue, there are no tags in any released HED 3 stable schema with either attribute. Therefore, this issue serves as a reminder to go back once such a schema is available and add tests using those tags.

Use TSV event times in temporal tag validation

Temporal tags (i.e. Onset, Offset, and Inset) are dependent on the timestamp of their event in the event's TSV file. Currently, the timestamps (found in the onset column of the TSV file where present) are not used in validation.

In order to properly validate temporal tags in a time-respecting way, we must:

Values that are node names

A clarification has been made in the spec that allows values to be node names. This is in the process of being incorporated into the hed validator.

Definitions are a special case: Definition names cannot be node names. However, Definitions can also take values:

(Definition/My-def/# .... ) The values substituted for # can be node names in Def/My-def/Red or (Def-expand/My-def/Red .... )

Warning issues causing the validator to run extremely slowly.

The warning of tag extensions reports every occurrence rather than once: It doesn't show location in the file --- just that it occurred. The message shows "102985 more files" have this problem. In fact that, this number corresponds to the total number of occurrences of the issue across the dataset, not the number of files. This causes the validator to run extremely slowly. How can we address?

   3: [WARN] The validation on this HED string returned a warning. (code: 105 - HED_WARNING)
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ./sub-001/eeg/sub-001_task-AuditoryVisualShift_run-01_events.tsv
                    Evidence: WARNING: [HED_TAG_EXTENDED] Tag extension found - "Think/Ignore"
            ... and 102985 more files having this issue (Use --verbose to see them all).

Implement validation of Onset and Offset tags

The HED 3 spec specifies special handling for Onset and Offset tags, as follows:

  • Both Onset and Offset must have an associated Label tag in the same tag group.
  • An Offset tag must occur after a corresponding Onset tag with the same Label.

A validation check (or checks) must be added to check for both of these requirements.

Write new schema attribute parser

The schema attribute parser will have to be rewritten to conform to the new schema format. The new SchemaAttributes format is as follows:

{
    "tags": [ /*array of tags*/ ],
    "tagAttributes": {/*tag*/: {/*attribute*/: /*value or [array of values]*/}},
    "tagUnitClasses": {/*tag*/: [ /*array of unit classes*/ ]},
    "unitClasses": {/*unitClassName*/: [ /*array of units*/ ]},
    "unitClassAttributes": {/*unitClassName*/: {/*attribute*/: /*value or [array of values]*/}},
    "unitAttributes": {/*unitName*/: [ /*array of attributes*/ ]},
    "unitModifiers" {/*unitModifierType/: [ /*array of unit modifiers*/ ]},
}

Implement support for HED specification's JSON test suite

The HED specification provides a JSON-based test suite at https://github.com/hed-standard/hed-specification/tree/master/tests, which provides a set of example data (including strings, sidecars, TSV files, and combinations thereof), which should pass, and expected errors for those that should fail. This validator should eventually support validating against this test suite in an automated fashion. This would not directly affect the validator's built-in test modules.

The validation could be implemented in two stages:

  1. Simple pass-fail validation (without regard to the exact error code).
  2. Validating the actual error code.

Re-implement value validation using value classes

HED 3 introduced the concept of value classes to represent the sets of legal values for each value-taking tag. This will make value validation more robust and schema-driven, as well as allow library schemas to define their own value classes. In order to support value validation using value classes, the value classes have to be parsed (an initial implementation was added as part of the rewrite in #56), and a new validation function must be written to use the value class data required for a given tag. HED 2 will continue to use the existing implementation.

Implement support for library schemas

The HED 3 spec allows people to implement library schemas to supplement the base schema. We must support these library schemas by writing code to load them, modifying the parser to detect when they are used in tags, and adjusting existing code to reference their SchemaEntry data when tags are pulled from them.

Fix duplicate invalidTag issue generation

As a result of fixes made in #45, node names which do not appear in the schema in any form now generate two invalidTag issues with basically the same content, one from short-to-long conversion and the other from tag validation. Since we can't eliminate either check entirely (since they don't completely overlap), we'll need to decide whether to conflate the issues (i.e. merge them) or keep both (perhaps distinguishing them with a field introduced for #46).

Explicitly tag conversion issues

Currently, conversion (i.e. short-to-long and long-to-short) issues are only distinguished from standard validation issues by the inclusion of the sourceString field, which represents the pre-conversion string. Adding an additional field to Issue to tag whether an issue is the result of conversion or validation could be useful in the future.

Simplify issue lists in event test suite

The expected issue lists in the event.spec.js test suite contain some redundant code, namely repeated calls to generateIssue and converterGenerateIssue. In the future, we should consider whether it would be desirable to condense these lists to include just the data and move those calls to the validator base functions.

For example, the current test in dataset.spec.js (a good example of one with multiple issue types), edited to the form of event.spec.js, would be:

[
  generateIssue('extension', {
    tag: testDatasets.multipleInvalid[0],
  }),
  generateIssue('unitClassInvalidUnit', {
    tag: testDatasets.multipleInvalid[1],
    unitClassUnits: legalTimeUnits.sort().join(','),
  }),
  converterGenerateIssue(
    'invalidTag',
    testDatasets.multipleInvalid[2],
    {},
    [0, 12],
  ),
]

Several possibilities exist for condensing this. Some are listed below.

{
  extension: [{
    tag: testDatasets.multipleInvalid[0],
  }],
  unitClassInvalidUnit: [{
    tag: testDatasets.multipleInvalid[1],
    unitClassUnits: legalTimeUnits.sort().join(','),
  }],
  invalidTag: [[
    testDatasets.multipleInvalid[2],
    {},
    [0, 12],
  ]],
}

Implement validation for Delay and Duration groups

Support for validating Delay and Duration tag groups is required as of HED schema version 8.2.0. Currently, these tag groups are only validated using the topLevelTagGroup schema attribute on the respective tags. The following validation also needs to be performed:

  • Delay and Duration behave similarly to Onset and Inset syntactically, in that tag groups with that tag must contain the tag and a tag group, and nothing else (with one exception).
  • Delay and Duration may appear together in the same tag group (at the first level of the group).

See #140 for further details.

Segregate and freeze HED 2-specific code

Even though HED 2 (schema versions < 8.0.0-alpha) is considered superseded, we must still support validation of HED 2 datasets indefinitely. However, much of the code restructuring that is currently ongoing is based on the HED 3 schema model, which is not entirely compatible with HED 2's, and adapting the new structure to the older HED 2 schema model would be painful. Therefore, as much of the HED 2-specific code should be sectioned off and frozen, with only the bare minimum number of changes made from the current state of the code to ensure compatibility. This will likely involve liberal use of additional object-orientation and subclassing (much like the recent rewrite of the event-level validation code).

Clean up skipped tests

There are currently several skipped tests (as of the posting of this issue):

  • HED 2 BIDS datasets
  • Remote loading of HED schemas
  • Node names as values (in converter) (see #57)
  • Node names as extensions (written as a HED 2 test)
  • Overlapping Onset and Offset references in the same string
  • Detecting slash locations
  • Validating values (HED 2)
  • Stripping valid units from valid values (HED 2)

These need to be cleaned up, modified, or deleted.

Improve stringParser tests affected by #45

The issue return format from several stringParser functions was changed in #45, resulting in a few tests breaking. A temporary fix was included in that PR to convert the new format into a version compatible with the old one, which deferred the needed test adjustments at that time. This issue has been posted as a note to go back and fix those to match the new return format, which is more fine-grained and should result in better tests overall.

Code quality

  • Replace all async tests with Promise returns.
  • Replace anonymous function syntax with () => {} forms.

Update readme

The readme has not been updated to include short/long conversion, and is generally hard to add to. We should rewrite it to make it read more like an API document.

Check that Definition tags are only found within BIDS sidecar "dummy" columns

The HED 3 spec states that Definition tags may only appear within "dummy" entries within a sidecar. This must be checked at the BIDS layer.

  • Non-"dummy" columns include TSV HED columns and sidecar keys corresponding to TSV columns.
  • "Dummy" columns can only contain Definition groups. Any other tag combination is an error.
    • Assume for this check that a "dummy" column is any sidecar key containing a Definition group. (Any sidecar key containing a Definition group must only contain Definition groups.)
  • When a TSV column is encountered, sort the potentialSidecars array in descending order of slashes (leaf to root) and iterate until the column name is found. Add that key to that sidecar's list of non-Definition keys.

Port HED 3 tests to stable v8.0.0 schema

Some of the testcases use the bundled v8.0.0-alpha schema versions. These need to be ported to the stable v8.0.0 schema, which will require some string rewrites.

Validator should give a warning on tag extension

Validator should give a warning when a tag extension is done (not for #). What has happened is that people are accidentally extending when they actually meant to use a tag already in the hierarchy.

Validator is not detecting duplicate extension tags

Copied from hed-standard/hed-python#189.

The HED validator is not correctly detecting duplicate tags that are extensions.

Example: Attribute/Red has no validation errors. Short-to-long conversion correctly detects the error:

Unable to convert HED string:
ERROR: 'Red' appears as 'Attribute/Sensory/Visual/Color/CSS-color/Red-color/Red' and cannot be used as an extension. 10, 13

Based on the Python issue, this is how it should be handled:

  1. No duplicate tags should ever allowed as extensions regardless of HED version. HED-2G and HED-3G both have Red in their hierarchy so Attribute/Red would always be an error regardless of HED version.
  2. Tag extensions would always generate a warning, so Attribute/Bananas would always generate a warning regardless of the version since Bananas is a tag extension.

The idea is that putting in the restriction that tag extensions can't be in the hierarchy already simplifies the logic.

Validate isNumeric

The validator does not currently validate the isNumeric attribute, thus allowing tags like (in 8.0.0-alpha.1) RGB-red/Blah. This should be fixed.

Simplify converter testcases

Most of the short-to-long converter tests are duplicated, where the short-to-long test is an inversion of the long-to-short test. The test module could be greatly simplified by merging duplicate data.

Implement validation for BIDS HED schema version specifications

Currently, there is a stub named validateSchemaSpec that was intended to validate schema specifications (i.e. nickname/library/version specs), probably in the already parsed form of SchemaSpec objects, but nothing calls it and it currently only does an object type check. What would need to be done to implement this?

Perform string-level validation on HED column of TSV files

Currently, the HED column of BIDS TSV files only undergoes event-level validation. However, fixing #58 will require restricting tag extension warning generation to string-level validation only, thus necessitating string-level validation of this column as well.

Improve Code Climate score

Over time, we should take steps to improve the quality of the code base, and consequently the Code Climate score.

This is a tracking bug.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.