Giter VIP home page Giter VIP logo

cdfregistrywg's Introduction

EarthCube CDF Registry Working Group

TLDR;

The work of the registry working group can be summed up rather quickly. Use existing vocabularies like schema.org and re3data terms to expose facility metadata using web architecture patterns. Leverage HTML5 microdata publishing, JSON-LD and standard web architecture (hypermedia) to both expose and collect metadata.

About

The EarthCube Council of Data Facilities (CDF) formed the Registry Working Group to review alignment of existing approaches to research facility description and discovery. The involved parties include the EarthCube CDF, Coalition for Publishing Data in the Earth and Space Sciences (COPDESS) and the Registry of Research Data Repositories (re3data).

Documents

Repository structure

  • JSON-LD Docuements A collection of JONS-LD documents being used to test ideas and use of the schema.org and re3data types and terms.
  • Documentation Assorted presentations and posters.
  • Notebooks A simple notebook (Jupyter) to demonstrate a potential where more human approachable formats like YAML allow people to more easily create example JSON-LD documents for reference.
  • Server code The Go based code for hosting the test interface and triple store This is the service available at repograph.net
  • Schema Builder Related to the "notebooks" above this is a thought about creating a method to allow more human approachable schema.org building. Like what can be seen at Structured Markup Editor but focused on CDF needs.

Simple Scenario

  1. A facility has both metadata about the facility as well as links to service description documents like Swagger, OGC or Threads.
  2. These are assembled together into a JSON-LD document following schema.org patterns with possible use of external vocabularies. This is then placed into the facility landing page (or other designated page) via
    <script type="application/ld+json">
  1. Items that can not be defined by schema.org can be then be defined via an external vocabulary
  2. The white list of site/URLs is feed through something like https://github.com/fils/contextBuilder or by DateOne tools. This example code will look for schema.org JSON-LD packages defined in item 2. More advanced crawling solutions might use tools like: https://github.com/anaskhan96/soup or https://github.com/PuerkitoBio/fetchbot

After reading in the JSON-LD it could be converted to RDF for a triple store or other data storage or index approaches used by a harvesting group.
There is no blessed harvesting or presentation site. Any number of groups or organizations could harvest and provide access to this material.

The following image gives a brief overview of how facilities might take their descriptor documents and metadata and expose this material up through a workflow to aggregation and interface clients.

Image of Flow

Errata

On ad hoc implementation

As noted a test crawler, harvester and indexer is being developed at contextBuilder. This is a simple (and not production ready) application for harvesting from a whitelist and extracting the JSON-LD package. The next step will be to convert this JSON-LD to triples and moved into a standard triple store. A focused JSON-LD crawler is also in development at https://github.com/ESIPFed/snapHacks/tree/master/sh01-jsonldCrawl

On external vocabularies

The registryC5 file is testing some external vocabulary uses. It is valid JSON-LD but Google will always through an error since it doesn't see this as a property of some known schema.org class. This should be fine and I have tested this, but it is always a worry with Google that you will not know when how they deal with this case will be changed. Their typical response has been, "try and get things you need in core schema.org".

cdfregistrywg's People

Contributors

ashepherd avatar fils avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdfregistrywg's Issues

ID's for certain shared concepts

Need to use a UID for shared connections like the re3data in members... (use its DOI?) and also things like SPARQL (use the mimetype?)

Add in funder

Need to use the schema.org/Organization funder property to add in NSF as a funder organization

URL templates are not being used according to rfc6570

The schema.org guidance for the urlTemplate element says they should be templates according to IETF RFC6570, but the urlTemplate content in the example docs are URLs for service-description documents (Swagger, WADL).

Seems like the encoding should be more like this:

"target":{
    "@type": "EntryPoint",
     "contentType":"mime type that identifies Swagger, OpenAPI, WADL, WSDL, getCapabilties, or other service self description document; unfortunately conventions for such mime types do not appear to be standardized, see e.g. [media type for Swagger Object](https://github.com/OAI/OpenAPI-Specification/issues/110)",
      "identifier":"alternate option, if there is a URI that identifies an appropriate specification for the service description document",
     "potentialAction":"service invocation (this is another possible property that might help client identifier the entryPoint of interest)",
     "url": "http://service.iris.edu/fdsnws/dataselect/1/application.wadl",
     "description": "Data Select WADL description document",
     "httpMethod": "GET"

}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.