Giter VIP home page Giter VIP logo

standards-schemas's People

Contributors

amc-corey-cox avatar caufieldjh avatar monicacecilia avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

standards-schemas's Issues

Add validation counterexamples

Examples are great, but it's also helpful to have examples of invalid data, especially if it's invalid in multiple ways.
Place it in its own directory in src/data/.

`NamedThing` needs `related_to` slot

In standards_schema.yaml, NamedThing needs the related_to slot in addition to subclass_of, as it doesn't currently inherit the slot from anywhere.

Tests and site building don't work for multiple schemas

The test scripts run through the Makefile (make test) assume there will be a direct path to a specific schema YAML.
This isn't the case here - the linter is happy to operate on a directory, but the tests aren't.

The makefile runs the following:

poetry run gen-project -d tmp src/standards_schemas/schema/

but we get:

ALL_SCHEMAS = ['src/standards_schemas/schema/']
INFO:root:Generating: graphql
INFO:root: SCHEMA: src/standards_schemas/schema/
INFO:root: PARENT=tmp/graphql
IsADirectoryError: [Errno 21] Is a directory: '/home/harry/standards-schemas/src/standards_schemas/schema'
make: *** [Makefile:97: test-schema] Error 1

Include some example data

Include some example data for each schema to clearly delineate classes and how relation types are applied.
This should go in src/data/.

contact.md missing

contact.md is referenced in CODE_OF_CONDUCT.md for reporting conduct violations but is missing in the repository.

Set up with schemasheets

When these schemas were still getting assembled, I attempted to translate them to schemasheets - this did not go as well as expected, likely because the data model was still incompletely defined. If we can translate the current version to GSheet and sync w/COGS then we may be able to update the model more smoothly (i.e., without needing to update the YAML schema directly).

Add provenance slots

NamedThing objects should contain the following slots to track provenance, at minimum:

  • Contributor Name
  • Contributor GitHub Username
  • Contributor ORCID
  • Date of Contribution

Consider alternative namespace prefix(es)

The current prefixes, like STANDARDSDATASTANDARDORTOOL, are long, not terribly readable, and more specific than they need to be.
Consider alternatives:

  • Shorten "standards" to something more project-specific, like "BRIDGE2AI" or "B2AI"
  • One namespace for all objects
  • Namespace subsets, something like B2AI.TOPIC
  • Names instead of numbers - they are much more human-readable, though their uniqueness still needs to be enforced (this will already happen as part of validation)

All namespace changes will require updates to https://github.com/bridge2ai/b2ai-standards-registry and potentially elsewhere in Bridge2AI standards, but it will make for more pleasant and usable standards in the long run.

Add container classes

Container classes allow sets of data objects to be defined together, like in this example borrowed from the LinkML tests:

persons:
  - id: P:001
    name: fred bloggs
    age_in_years: 33
  - id: P:002
    name: joe schmoe
    has_employment_history:
      - employed_at: ROR:1
        started_at_time: 2019-01-01
        is_current: true

In that example, the schema defines the container like this:

classes:
  Dataset:
    attributes:
      persons:
        range: Person
        inlined: true
        inlined_as_list: true
        multivalued: true

It could also look like:

classes:
  Container:
    tree_root: true
    slots:
      - name
      - persons
      - organizations
      - places

slots:
  persons:
    range: Person
    inlined: true
    inlined_as_list: true
    multivalued: true

The containers could be defined in a generic way for standards-schemas within standards_schema_all.yaml, but that limits some of the point of having the schemas treated as different modules. The modules could still have their own container types, defined in each schema.

Add category slot

Entities currently only have categories as far as the schema is concerned - they're essentially determined by the container they're in and the data document, but otherwise there's nothing explicitly stating what type of thing an entity is. A category slot would clear that up, especially for the Standards and Tools, in which a variety of categories exist but don't really differ in slot usage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.