bridge2ai / standards-schemas Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 2.0 1.9 MB

Data schema for Bridge2AI Standards.

Home Page: https://bridge2ai.github.io/standards-schemas/

License: MIT License

Makefile 7.67% Python 91.98% Shell 0.35%

standards-schemas's People

Contributors

Stargazers

Watchers

Forkers

turbomam monicacecilia

standards-schemas's Issues

Register w3id(s)

The schema URIs are defined with w3id prefixes, and with just a few short steps they can be real live identifiers!
See https://w3id.org/#new

Add validation counterexamples

Examples are great, but it's also helpful to have examples of invalid data, especially if it's invalid in multiple ways.
Place it in its own directory in src/data/.

`NamedThing` needs `related_to` slot

In standards_schema.yaml, NamedThing needs the related_to slot in addition to subclass_of, as it doesn't currently inherit the slot from anywhere.

Consider adding slots/relation types to link to anatomical entities

Some data topics in particular, like "Ophthalmic Imaging" may be linked to Uberon to show shared links to anatomical regions.

Tests and site building don't work for multiple schemas

The test scripts run through the Makefile (make test) assume there will be a direct path to a specific schema YAML.
This isn't the case here - the linter is happy to operate on a directory, but the tests aren't.

The makefile runs the following:

poetry run gen-project -d tmp src/standards_schemas/schema/

but we get:

ALL_SCHEMAS = ['src/standards_schemas/schema/']
INFO:root:Generating: graphql
INFO:root: SCHEMA: src/standards_schemas/schema/
INFO:root: PARENT=tmp/graphql
IsADirectoryError: [Errno 21] Is a directory: '/home/harry/standards-schemas/src/standards_schemas/schema'
make: *** [Makefile:97: test-schema] Error 1

Include some example data

Include some example data for each schema to clearly delineate classes and how relation types are applied.
This should go in src/data/.

contact.md missing

contact.md is referenced in CODE_OF_CONDUCT.md for reporting conduct violations but is missing in the repository.

Remove docs referencing obsolete slots

Some docs, e.g. MeSH_ID.md, are left over from earlier project builds and the corresponding slots have since been renamed. The old files should be removed to avoid confusion.

Set up with schemasheets

When these schemas were still getting assembled, I attempted to translate them to schemasheets - this did not go as well as expected, likely because the data model was still incompletely defined. If we can translate the current version to GSheet and sync w/COGS then we may be able to update the model more smoothly (i.e., without needing to update the YAML schema directly).

Consider adding DUO values

As per discussion at Bridge2AI allhands April 18 2024

Organizations incorrectly passing validation when not CURIEs

As seen in bridge2ai/b2ai-standards-registry#76 , Organization values (and potentially others) are passing validation when only strings but not URIs/CURIEs. There may be more stringency to be applied through the model, as these values are only linkable in that form.

Add provenance slots

NamedThing objects should contain the following slots to track provenance, at minimum:

Contributor Name
Contributor GitHub Username
Contributor ORCID
Date of Contribution

Add workflow for validating examples

If the examples are in a specific file structure (see #10) they can be validated using linkml-run-examples (see https://github.com/linkml/linkml/blob/main/linkml/workspaces/example_runner.py)
This can accompany a workflow with all the other standard tests, linting, etc.

Align with NIH Common Data Elements

See https://cde.nlm.nih.gov/cde/search
These Common Data Elements vary from very general, like "Address" to specific survey questions like this one from PROMIS.

Consider alternative namespace prefix(es)

The current prefixes, like STANDARDSDATASTANDARDORTOOL, are long, not terribly readable, and more specific than they need to be.
Consider alternatives:

Shorten "standards" to something more project-specific, like "BRIDGE2AI" or "B2AI"
One namespace for all objects
Namespace subsets, something like B2AI.TOPIC
Names instead of numbers - they are much more human-readable, though their uniqueness still needs to be enforced (this will already happen as part of validation)

All namespace changes will require updates to https://github.com/bridge2ai/b2ai-standards-registry and potentially elsewhere in Bridge2AI standards, but it will make for more pleasant and usable standards in the long run.

Incorporate class/slots for data validation rules

As per discussion at Bridge2AI allhands on April 18 2024

Add container classes

Container classes allow sets of data objects to be defined together, like in this example borrowed from the LinkML tests:

persons:
  - id: P:001
    name: fred bloggs
    age_in_years: 33
  - id: P:002
    name: joe schmoe
    has_employment_history:
      - employed_at: ROR:1
        started_at_time: 2019-01-01
        is_current: true

In that example, the schema defines the container like this:

classes:
  Dataset:
    attributes:
      persons:
        range: Person
        inlined: true
        inlined_as_list: true
        multivalued: true

It could also look like:

classes:
  Container:
    tree_root: true
    slots:
      - name
      - persons
      - organizations
      - places

slots:
  persons:
    range: Person
    inlined: true
    inlined_as_list: true
    multivalued: true

The containers could be defined in a generic way for standards-schemas within standards_schema_all.yaml, but that limits some of the point of having the schemas treated as different modules. The modules could still have their own container types, defined in each schema.

Modify this repo to encompass all Bridge2AI Standards schemas

Add category slot

Entities currently only have categories as far as the schema is concerned - they're essentially determined by the container they're in and the data document, but otherwise there's nothing explicitly stating what type of thing an entity is. A category slot would clear that up, especially for the Standards and Tools, in which a variety of categories exist but don't really differ in slot usage.

Standardize naming conventions

Some class, slot, and type names may not be named consistently (CamelCase vs under_score vs ALLCAPS etc) - this is a job for the linkml linter.