Giter VIP home page Giter VIP logo

coleridge-initiative / adrf-onto Goto Github PK

View Code? Open in Web Editor NEW
3.0 9.0 0.0 289 KB

Controlled vocabulary used for Rich Context and ADRF.

Home Page: https://coleridgeinitiative.org/richcontext

License: Creative Commons Zero v1.0 Universal

Python 100.00%
linked-data ontology skos metadata rich-context json-ld data-governance dcat rdflib knowledge-graph controlled-vocabularies datasets evidence-based jupyter

adrf-onto's Introduction

ADRF Ontology

Welcome to adrf-onto which is used to construct, validate, and leverage the Rich Context knowledge graph for the ADRF framework.

Some files of particular interest:

  • adrf.ttl -- a mid-level ontology specification for ADRF
  • rcc.ttl -- a subgraph of results from the Rich Context Competition, using this vocabulary for ADRF
  • onto.py -- a brief Python script used to load and validate the graph data
  • vocab.json -- a JSON-LD context for compaction of the output graph files

Note that this data is represented in TTL format (pronounced "turtle") for the parts that humans read and write. We use JSON-LD format for the parts of the graph that machines consume or produce. A couple lines of Python convert between those two formats rather quickly.

Dependencies

The following assumes that your Python binary is located at /usr/bin/python3 -- change that as needed.

To set up a virtual environment for Python 3.x using virtualenv:

virtualenv -p /usr/bin/python3 ~/venv
pip install --upgrade pip
pip install -r requirements.txt

Then to activate the environment:

source ~/venv/bin/activate

Validation

A variety of "unit tests" can be performed on this ontology spec, so that as multiple people are collaborating to develop it, we can make sure that the committed file is consistent.

To load, parse, and validate the files used to construct the graph:

./onto.py adrf.ttl rcc.ttl

Then review the generated tmp.ttl output file to make sure it doesn't show any errors.

The tmp.json shows that same graph in JSON-LD format (machine readable), which is used as a test case for the JupyterLab metadata service.

For more details about the inference rules used, see SKOS-Inference.

We're using the following packages:

Roadmap

Later, we'll automate the tests.

See the wiki for this repo for more detailed specifications.

adrf-onto's People

Contributors

abhi-balaji avatar ceteri avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adrf-onto's Issues

adrf ontology is all wrong

  • You can't just graft classes from different ontologies, instantiate them and put them into a SKOS concept scheme
  • Eg when you say :Dataset a dcat:Dataset, it means that :Dataset is one particular dataset. This cannot be owl:sameAs dbr:Data_set because that's (arguably) the concept of a Dataset.
  • Eg when you say :Provider a foaf:Organization, it means :Provider is some specific organization. And it's doubly impossible to say owl:sameAs dbr:Data_publishing because publishing (providing) data is an activity, not an organization.
  • prefix : is missing, so it seems to me https://github.com/Coleridge-Initiative/adrf-onto/blob/master/adrf.ttl is invalid
  • https://github.com/Coleridge-Initiative/adrf-onto/wiki/Vocabulary says "subclassed from" but a (rdf:type) is not subclassing, it's instantiation
  • It says "Corpus Subclassed from SKOS:Collection" (even though that's missing from the actual ontology), but skos:Collection has a specific use (ad-hoc collection of Concepts), whereas Corpus is a collection of documents (texts), so you can't do that
  • you have two different definitions of :Topic. One is tied to LCSH, which is wrong

It seems to me you are in dire need of some ontology consulting, by someone who knows ontologies for describing:

  • machine learning and data science (there are at least 8 I know of)
  • datasets (DCAT2 and ADMS are the most important)
  • corpora and NLP (NIF and friends)

rcc.ttl is all wrong

Continuing the examination from #3:

  • again the : prefix is missing, so rcc.ttl is invalid
  • what's the purpose of using xsd:anyURI literals rather than actual URIs?
  • You say :Catalog :dataset :dataset481, :dataset_x001 but you have not defined such property. Thinking about dcat: doesn't magically bring its props into your namespace
  • You say :Corpus :publication :publication338, :publication340 but you have not defined such property

We've done significant work in KGs of Science so the following recommendations come from that experience:

  • authors and publications: use separate props to capture identifiers in important datasets like ORCID, DOI, Google Scholar, ResearchGate, Scopus. Eg Wikidata has external-ids for all of that
  • publications: use some established bibliographic ontology, don't just wing it with dc/dct
  • If you use DCT for a bunch of props, why suddenly switch to PAV for pav:createdOn? Use dct:created
  • dct:alternative is declared subprop of dct:title, so you'll get two dct:title: "National Health and Nutrition Examination Survey" and "NHANES", is that what you want?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.