Giter VIP home page Giter VIP logo

callahantiff / pheknowlator Goto Github PK

View Code? Open in Web Editor NEW
139.0 12.0 25.0 44.52 MB

PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models

Home Page: https://github.com/callahantiff/PheKnowLator/wiki

License: Apache License 2.0

Python 65.02% Jupyter Notebook 34.09% Dockerfile 0.61% Shell 0.29%
knowledge-graph ontologies biomedical-applications mechanisms translational-research linked-open-data owl semantic-web benchmarks obofoundry

pheknowlator's Introduction

logo

PyPI project Pypi total project downloads

GitHub Action Rosey the Robot ABRA

SonarCloud Quality Maintainability Codacy Maintainability Code Climate Coverage Coveralls Coverage

What is PheKnowLator?

PheKnowLator (Phenotype Knowledge Translator) or pkt_kg is the first fully customizable knowledge graph (KG) construction framework enabling users to build complex KGs that are Semantic Web compliant and amenable to automatic Web Ontology Language (OWL) reasoning, generate contemporary property graphs, and are importable by today’s popular graph toolkits. Please see the project Wiki for additional information.

📢 Please see our preprint 👉 https://arxiv.org/abs/2307.05727

What Does This Repository Provide?

  1. A Knowledge Graph Sharing Hub: Prebuilt KGs and associated metadata. Each KG is provided as triple edge lists, OWL API-formatted RDF/XML and NetworkX graph-pickled MultiDiGraphs. We also make text files available containing node and relation metadata.
  2. A Knowledge Graph Building Framework: An automated Python 3 library designed for optimized construction of semantically-rich, large-scale biomedical KGs from complex heterogeneous data. The framework also includes Jupyter Notebooks to greatly simplify the generation of required input dependencies.

NOTE. A table listing and describing all output files generated for each build along with example output from each file can be found here.

How do I Learn More?

  • Join and/or start a Discussion
  • The Project Wiki for available knowledge graphs, pkt_kg data sources, and the knowledge graph construction process
  • A Zenodo Community has been established to provide access to software releases, presentations, and preprints related to this project

Releases


Getting Started

Install Library

This program requires Python version 3.6. To install the library from PyPI, run:

pip install pkt_kg

You can also clone the repository directly from GitHub by running:

git clone https://github.com/callahantiff/PheKnowLator.git

Note. Sometimes OWLTools, which comes with the cloned/forked repository (./pkt_kg/libs/owltools) loses "executable" permission. To avoid any potential issues, I recommend running the following in the terminal from the PheKnowLator directory:

chmod +x pkt_kg/libs/owltools

Set-Up Environment

The pkt_kg library requires a specific project directory structure.

  • If you plan to run the code from a cloned version of this repository, then no additional steps are needed.
  • If you are planning to utilize the library without cloning the library, please make sure that your project directory matches the following:
PheKnowLator/
    |
    |---- resources/
    |         |
    |     construction_approach/
    |         |
    |     edge_data/
    |         |
    |     knowledge_graphs/
    |         |
    |     node_data/
    |         |
    |     ontologies/
    |         |
    |     owl_decoding/
    |         |
    |     relations_data/

Dependencies

Several input documents must be created before the pkt_kg library can be utilized. Each of the input documents are listed below by knowledge graph build step:

DOWNLOAD DATA

This code requires three documents within the resources directory to run successfully. For more information on these documents, see Document Dependencies:

For assistance in creating these documents, please run the following from the root directory:

python3 generates_dependency_documents.py

Prior to running this step, make sure that all mapping and filtering data referenced in resources/resource_info.txt have been created. To generate these data yourself, please see the Data_Preparation.ipynb Jupyter Notebook for detailed examples of the steps used to build the v2.0.0 knowledge graph.

Note. To ensure reproducibility, after downloading data, a metadata file is output for the ontologies (ontology_source_metadata.txt) and edge data sources (edge_source_metadata.txt).

CONSTRUCT KNOWLEDGE GRAPH

The KG Construction Wiki page provides a detailed description of the knowledge construction process (please see the knowledge graph README for more information). Please make sure the documents listed below are presented in the specified location prior to constructing a knowledge graph. Click on each document for additional information. Note, that cloning this library will include a version of these documents that points to the current build. If you use this version then there is no need to download anything prior to running the program.


Running the pkt Library

pkt_kg can be run via the provided main.py script or using the main.ipynb Jupyter Notebook or using a Docker container.

Main Script or Jupyter Notebook

The program can be run locally using the main.py script or using the main.ipynb Jupyter Notebook. An example of the workflow used in both of these approaches is shown below.

import psutil
import ray
from pkt import downloads, edge_list, knowledge_graph

# initialize ray
ray.init()

# determine number of cpus available
available_cpus = psutil.cpu_count(logical=False)

# DOWNLOAD DATA
# ontology data
ont = pkt.OntData('resources/ontology_source_list.txt')
ont.downloads_data_from_url()
ont.writes_source_metadata_locally()

# edge data sources
edges = pkt.LinkedData('resources/edge_source_list.txt')
edges.downloads_data_from_url()
edges.writes_source_metadata_locally()

# CREATE MASTER EDGE LIST
combined_edges = dict(edges.data_files, **ont.data_files)

# initialize edge dictionary class
master_edges = pkt.CreatesEdgeList(data_files=combined_edges, source_file='./resources/resource_info.txt')
master_edges.runs_creates_knowledge_graph_edges(source_file'./resources/resource_info.txt',
                                                data_files=combined_edges,
                                                cpus=available_cpus)

# BUILD KNOWLEDGE GRAPH
# full build, subclass construction approach, with inverse relations and node metadata, and decode owl
kg = PartialBuild(kg_version='v2.0.0',
                  write_location='./resources/knowledge_graphs',
                  construction='subclass,
                  node_data='yes,
                  inverse_relations='yes',
                  cpus=available_cpus,
                  decode_owl='yes')

kg.construct_knowledge_graph()
ray.shutdown()

main.py

The example below provides the details needed to run pkt_kg using ./main.py.

python3 main.py -h
usage: main.py [-h] [-p CPUS] -g ONTS -e EDG -a APP -t RES -b KG -o OUT -n NDE -r REL -s OWL -m KGM

PheKnowLator: This program builds a biomedical knowledge graph using Open Biomedical Ontologies
and linked open data. The program takes the following arguments:

optional arguments:
-h, --help            show this help message and exit
-p CPUS, --cpus CPUS  # workers to use; defaults to use all available cores
-g ONTS, --onts ONTS  name/path to text file containing ontologies
-e EDG,  --edg EDG    name/path to text file containing edge sources
-a APP,  --app APP    construction approach to use (i.e. instance or subclass
-t RES,  --res RES    name/path to text file containing resource_info
-b KG,   --kg KG      the build, can be "partial", "full", or "post-closure"
-o OUT,  --out OUT    name/path to directory where to write knowledge graph
-r REL,  --rel REL    yes/no - adding inverse relations to knowledge graph
-s OWL,  --owl OWL    yes/no - removing OWL Semantics from knowledge graph

main.ipynb

The ./main.ipynb Jupyter notebook provides detailed instructions for how to run the pkt_kg algorithm and build a knowledge graph from scratch.

Docker Container

pkt_kg can be run using a Docker instance. In order to utilize the Dockerized version of the code, please make sure that you have downloaded the newest version of Docker. There are two ways to utilize Docker with this repository:

  • Obtain Pre-Built Container from DockerHub
  • Build the Container (see details below)

Obtaining a Container

Obtain Pre-Built Containiner: A pre-built containers can be obtained directly from DockerHub.

Build Container: To build the pkt_kg download a stable release of this repository (or fork/clone it repository). Once downloaded, you will have everything needed to build the container, including the ./Dockerfile and ./dockerignore. The code shown below builds the container. Make sure to replace [VERSION] with the current pkt_kg version before running the code.

cd /path/to/PheKnowLator (Note, this is the directory containing the Dockerfile file)
docker build -t pkt:[VERSION] .

Notes:

  • Update PheKnowLator/resources/resource_info.txt, PheKnowLator/resources/edge_source_list.txt, and PheKnowLator/resources/ontology_source_list.txt
  • Building the container "as-is" off of DockerHub will include a download of the data used in the latest releases. No need to update any scripts or pre-download any data.

Running a Container

The following code can be used to run pkt_kg from outside of the container (after obtaining a prebuilt container or after building the container locally). In:

docker run --name [DOCKER CONTAINER NAME] -it pkt:[VERSION] --app subclass --kg full --nde yes --rel yes --owl no --kgm yes

Notes:

  • The example shown above builds a full version of the knowledge graph using the subclass construction approach with node metadata, inverse relations, and decoding of OWL classes. See the Running the pkt Library section for more information on the parameters that can be passed to pkt_kg
  • The Docker container cannot write to an encrypted filesystem, however, so please make sure /local/path/to/PheKnowLator/resources/knowledge_graphs references a directory that is not encrypted

Finding Data Inside a Container

In order to enable persistent data, a volume is mounted within the Dockerfile. By default, Docker names volumes using a hash. In order to find the correctly mounted volume, you can run the following:

Command 1: Obtains the volume hash:

docker inspect --format='{{json .Mounts}}' [DOCKER CONTAINER NAME] | python -m json.tool

Command 2: View data written to the volume:

sudo ls /var/lib/docker/volumes/[VOLUME HASH]/_data

Get In Touch or Get Involved

Contribution

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Contact Us

We’d love to hear from you! To get in touch with us, please join or start a new Discussion, create an issue or send us an email 💌

Attribution

Licensing

This project is licensed under Apache License 2.0 - see the LICENSE.md file for details.

Citing this Work

Please see our preprint: https://arxiv.org/abs/2307.05727

pheknowlator's People

Contributors

bill-baumgartner avatar callahantiff avatar dependabot[bot] avatar jwyrwa avatar lucacappelletti94 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pheknowlator's Issues

Cannot Apply OWLAPI Formatting to Very Large KGs

Problem: When applying OWLAPI formatting to very large KGs ~86 million triples (i.e. subclass + inverse relations KG and adding non-ontology metadata) OWLAPI hangs and process never completes. See error message from Dave Farrell below:

The current process that is in the “S” state is hung on a futex wait. At one point, there was a thread with process id of 3912 that was holding a resource. The thread has ended or crashed but it did not release the hold which is why, I believe, the current parent process 3907 is “waiting/sleeping”. In Java, methods such as lock(), park() or unpark() use futex_wait(). Strace is the utility that can help track these events but I do not know how to use it to pinpoint the actual resource being blocked. The command that I used to see what was happening with your process was:

strace -p 3907
strace: Process 3907 attached
futex(0x7f77b8f7f9d0, FUTEX_WAIT, 3912, NULL

Script: knowledge_graph.py

Current Solution: Not adding non-ontology metadata to KG and leaving the rest of the KG build workflow in tact. All ontology and non-ontology metadata (i.e. labels, definitions, and synonyms) get written out to a .txt file the information is still available to users, just not as part of the KG.

Multi-Stage Docker Build -- Not Runnable Outside of Container

The multi-stage Docker container is runnable from within the container, but is not runnable from outside of the container. No error message is generated, but container hangs in main() and never seems to instantiate the first class in the workflow.


Dockerfile Location: https://github.com/callahantiff/PheKnowLator/blob/adding_docker/Dockerfile


Updates:

  • Docker runs successfully on my laptop
  • Dave helping me update Tantor to enable updating Docker to 1.17.05, the version needed to run multi-stage container

Preparing for ISMB Bio-Ontologies Task

Task

Complete the needed tasks in order to get PheKnowLator prepared for the 2021 ISMB Bio-Ontologies Challenge. The specific tasks are organized and described below.

SPARQL Endpoint

Not sure this is actually needed to support the challenge, but if it is, we need to answer the following question:

  • What do we use?
  • Where do we host it?
  • Do we want to think about using something other than SPARQL (i.e. Neo4J) that can support other

Frequent Refresh of KG Builds

  • Confirm with organizing committee which pkt build type to use - OWL-NETS versions of the builds are not true property graphs and may not be compatible with the evaluation framework. Probably want an instance and subclass build and their
  • Discuss potential changes that may be needed in order to create monthly KG updates to support challenge
    • What needs to be done in the build framework for ontology preprocessing and other preprocessing steps in order to automate monthly builds
      • Figure out about API set-up and requiring/providing keys for folks to download resources
      • Probably best to support a system that refreshes existing edge lists and not yet worry about sustaining/adding new edges sources yet
      • Set-up CI system that builds it once a month and deposits files (see if this works with GitHub Actions)

TODO: Perform OWL Reasoner Evaluation

TODO: Perform comparison of OWL reasoners.

During today's meeting with @bill-baumgartner, we outlined how we will perform an evaluation of OWL reasoners on the PheKnowLator V.2.0 KG.

OWL Reasoner Selection Criteria
Using the following reviews (shown below), we selected reasoners that met the following criteria:

  1. Low response time
  2. Available via the OWLAPI
  3. Open source
Khamparia A, Pandey B. Comprehensive analysis of semantic
web reasoners and tools: a survey. Education and Information
Technologies. 2017 Nov 1;22(6):3121-45.
Parsia B, Matentzoglu N, Gonçalves RS, Glimm B, Steigmiller A.
The OWL reasoner evaluation (ORE) 2015 competition report.
Journal of Automated Reasoning. 2017 Dec 1;59(4):455-82.

Eligible Reasoners:

Reasoner Language OWLTools
ELK EL Yes
ELepHant EL No
Pellet DL Yes
RACER DL No
FACT++ DL No
Chainsaw DL No
Konclude DL No
Crack DL No
TrOWL DL+EL No
MORe DL+EL No

@bill-baumgartner - to determine what reasoners were available in OWLTools, I ran the following:

./owltools -h

Next Steps:

  • Verify which of the reasoners above are currently included in OWLTools. For those that are not in the list, @bill-baumgartner and I will look into what is involved with adding the missing algorithms to OWLTools.

Evaluation Steps:

  1. Benchmark each of the algorithms on HPO+Imports

    • Run-time
    • Justifications
    • Count of inferred axioms
    • Consistency
  2. For all algorithms that pass the benchmark, run them against PheKnowLator

    • Including disjointness axioms
    • Excluding disjointness axioms
  3. Clinician Evaluation via @jwyrwa

    • Create a spreadsheet of the inferred axioms by algorithm and mark them as:
      • Correct/Incorrect
      • Definitely Clinically relevant, Maybe clinically relevant, not clinically relevant

@bill-baumgartner - did I forget anything?

Add integer and identifiers to node metadata

Problem: Right now the node metadata that is output is keyed by an identifier, which means if you use the integer edge lists, but want node labels you have to use the provided dictionary that maps node integers to identifiers first.

Solution: In the next iteration, I will add a new column that includes the identifier and the integer. Examples of each output are shown below.

An example of the current output:

node_id label description/definition synonym
388324 INCA1 INCA1 has locus group 'protein-coding' and is located on chromosome 17 (map_location: 17p13.2). HSD45protein INCA1
92106 OXNAD1 OXNAD1 has locus group 'protein-coding' and is located on chromosome 3 (map_location: 3p25.1-p24.3). oxidoreductase NAD-binding domain-containing protein 1
56140 PCDHA8 PCDHA8 has locus group 'protein-coding' and is located on chromosome 5 (map_location: 5q31.3). PCDH-ALPHA8protocadherin alpha-8

An example of the improved output:

node_integer node_id label description/definition synonym
0 388324 INCA1 INCA1 has locus group 'protein-coding' and is located on chromosome 17 (map_location: 17p13.2). HSD45protein INCA1
1 92106 OXNAD1 OXNAD1 has locus group 'protein-coding' and is located on chromosome 3 (map_location: 3p25.1-p24.3). oxidoreductase NAD-binding domain-containing protein 1
2 56140 PCDHA8 PCDHA8 has locus group 'protein-coding' and is located on chromosome 5 (map_location: 5q31.3). PCDH-ALPHA8protocadherin alpha-8

Build V3 - edge_list.py Work Needed

V3 Build Changes.
Script: edge_list.py

Requested Changes:

  • using eval() to handle filtering of downloaded data, should consider replacing this usage in the filter_data()
  • modify the data_reader() method to stream/chunk large data files instead of reading all into memory

Coding: Improve storage of original triples when running OWL-NETS

TASK

Task Type: CODEBASE

Improve the storage of removed OWL semantics when running OWL-NETS version of the build. Remove dictionary constraint for something network-based.

TODO

  • Come up with a better solution to handle triples removed from the full KG when creating OWL-NETS
    • Information that is compressed when converting a class from OWL to OWL-NETS
    • Information that is purposefully ignored or deleted (i.e. specific triples, disjointness)
  • Determine if we want to update OWL-NETS to follow any of the OWL to RDF specifications mentioned in the OWL2Vec* paper:
    • W3C Defined Graph Mapping (here)
    • Projection rule-based approach (here)

@bill-baumgartner - can we talk about the other OWL transformations soon? From what I can tell, OWL-NETS is a more extreme version of the transformations described above.

Other: Also use persistent RDFlib store for output graphs

Once a graph has been built, it may be useful to also import the resulting .owl file into an RDFlib persistent store. Use of a persistent store allows for the graph to be accessed using RDFlib without having to import the entire structure into memory, which may be advantageous when working with large graphs. Below is a sample implementation that uses the Berkeley Database as a persistent backend. RDFlib has built-in support for this particular backend. Note that Berkeley DB was formerly developed by Sleepycat Software, hence the use of "Sleepycat" as the backend name when creating the Graph object.

import rdflib
# The persistent store requires an identifier
graph_id = rdflib.URIRef(identifier)
# Open the graph with the "Sleepycat" Berkeley DB Backend
graph = rdflib.Graph("Sleepycat", identifier=graph_id)
# Open the graph and create it if it doesn't exist
graph.open(uri, create=True)
# Parse the graph at 'graph_path', typically XML formatted
# This could take many hours if the graph is large
graph.parse(graph_path)
# Close the graph to free resources. Mostly unneccessary due
# to the small overhead of the on-disk store
graph.close()

Alternatively, the following code wraps the above functionality in a context manager, allowing the graph to be managed inside of a with block for convenience:

from contextlib import contextmanager
import rdflib


@contextmanager
def open_persistent_graph(uri, identifier, graph_path=None):
    """Provides a context manager for working with an OWL graph while also
    automatically closing it afterward. URI is the location of the
    graph store directory and IDENTIFIER is the name of the graph
    within that store. Optional argument GRAPH_PATH specifies an
    appropriately formatted RDF file to import when opening the graph.

    """
    try:
        # Only force create if a path is provided
        create_graph = bool(graph_path)
        # Open and load the on-disk store
        graph_id = rdflib.URIRef(identifier)
        graph = rdflib.Graph("Sleepycat", identifier=graph_id)
        graph.open(uri, create=create_graph)
        # Parse the file at GRAPH_PATH if set
        if graph_path:
            graph.parse(graph_path)
        yield graph
    finally:
        graph.close()

TODO: Create YouTube PKT Video

TASK

Task Type: PKT DATA DELIVERY

Create a YouTube video (maybe more than one) that can help new users understand the codebase and learn how to build their own KG and/or obtain data on current releases/builds

TODO

  • Create a YouTube video that provides a walk through the repo and includes examples of:
    • Where to find a current build's data
    • How to build the project from PyPi vs. command line vs. Jupyter Notebook vs. Docker

HELP - Verifying README Content

Thanks so much for being willing to help with this too @jwyrwa! 🙏

I am hoping that you can proof each of the following README pages (listed below) to verify it's free of spelling/grammar errors and to make sure that the content makes sense (i.e. if you were trying to use this repo you this content would be helpful):

TODO: Finalize KG Construction Survey

I am working on the qualitative component of our evaluation and am requesting your review of a Google Form I created to help organize this information.

TODO: Please take a look at the Google Form (link to form can be found here) and let me know if you have any edits by 11:59pm on May 22, 2020?

Verify KG Output File Types

TASK

Task Type: CODEBASE

Improve the naming of generated data and verify output file types we will provided for each build.

TODO

Problem:

  • Add n-triples format for OWL-NETS builds
  • Check input file specifications for GraphDB
  • Check input file specifications for Neo4J
  • Fix OWL-NETS build output
    • Problem: When running the OWL-NETS parameter, the full .owl file created during the build is named "NoOWL" when when it is actually the original OWL KG. Make sure to fix the name of this file to ensure it's clear that this is not part of the OWL-NETS output. Thanks to @rkboyce for helping identify this error!

Update Dockerfile

See issue #43 - Need to update current Dockerfile to copy pkt repo rather than cloning from GitHub.

Bug: Instance-based OWL-NETS build have anonymous nodes

Bug: Thanks to @MSBradshaw for noticing that the OWL-NETS code, applied to the instance-based builds, includes blank nodes.

TODO: Clean up code to identify what the blank nodes are and add tests to catch this type of error in the future. Potential fixes could be something as simple as needing to replace the instance URI with the node URI that is represented by that instance.

HELP: Creating New Ontology Classes with Constructors

@bill-baumgartner

I was hoping I could ask for your advice on how to go about adding new terms, which reflect my many hp concepts:1 clinical concept mappings, to an existing ontology. I understand that I will create a new class, give it an identifier (making sure that identifier does not already exist in HP), a label, and then create the connection between it and existing terms using an equivalent class. This equivalent class would be constructed using the and (owl:intersectionOf), or (owl:unionOf), and not (owl:complementOf) operators.

OK, so with that in mind, I’m not entirely clear if I fully understand how to do this. Still permits me to close my knowledge graph. Below I include 3 examples I found in HP/CL along with my attempt at applying this logic to my examples. Note I reuse the anonymous nodes from the examples for ease.

Would you mind taking a look and letting me know if this is correct?




EXAMPLE 1: owl:intersectionOf
Class: http://purl.obolibrary.org/obo/HP_0040261

class `has part`
  some (`increased size`
    and ('inheres in’ some 'pharyngeal tonsil’)
    and ('has modifier’ some `abnormal`))
# class has_part
http://purl.obolibrary.org/obo/HP_0040261, http://www.w3.org/2002/07/owl#equivalentClass, Nf8a96b7801764cb7859eb8289ee641e4
Nf8a96b7801764cb7859eb8289ee641e4, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Restriction
Nf8a96b7801764cb7859eb8289ee641e4, http://www.w3.org/2002/07/owl#onProperty, http://purl.obolibrary.org/obo/BFO_0000051

# some
Nf8a96b7801764cb7859eb8289ee641e4, http://www.w3.org/2002/07/owl#someValuesFrom, N21ee3aecd67e44f3a64920531702a4a7
N21ee3aecd67e44f3a64920531702a4a7, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class

# and  
N21ee3aecd67e44f3a64920531702a4a7, http://www.w3.org/2002/07/owl#intersectionOf, Nb60bf88cd2554c9d8ee95d3bd8d7caf5
Nb60bf88cd2554c9d8ee95d3bd8d7caf5, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, Nc5e5c1d339564599a1824bbc4d35b0ec

# increased size
Nb60bf88cd2554c9d8ee95d3bd8d7caf5, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/PATO_0000586
Nb60bf88cd2554c9d8ee95d3bd8d7caf5, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, Nca0ddadf8e974e9a91db380d88cd5ad7

# inheres in 
Nca0ddadf8e974e9a91db380d88cd5ad7, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, Nfca2a5a88bc44e5da8ff8a38601fd834
Nca0ddadf8e974e9a91db380d88cd5ad7, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, N3a404d2e114e46da98d369033e6ee7c0
Nfca2a5a88bc44e5da8ff8a38601fd834, http://www.w3.org/2002/07/owl#onProperty, http://purl.obolibrary.org/obo/RO_0000052
Nfca2a5a88bc44e5da8ff8a38601fd834, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Restriction

# some pharyngeal tonsil
Nfca2a5a88bc44e5da8ff8a38601fd834, http://www.w3.org/2002/07/owl#someValuesFrom, http://purl.obolibrary.org/obo/UBERON_0001732

# and
N3a404d2e114e46da98d369033e6ee7c0, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, Nc5e5c1d339564599a1824bbc4d35b0ec
N3a404d2e114e46da98d369033e6ee7c0, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil

# has modifier abnormal
Nc5e5c1d339564599a1824bbc4d35b0ec, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Restriction
Nc5e5c1d339564599a1824bbc4d35b0ec, http://www.w3.org/2002/07/owl#onProperty,, http://purl.obolibrary.org/obo/RO_0002573
Nc5e5c1d339564599a1824bbc4d35b0ec, http://www.w3.org/2002/07/owl#someValuesFrom, http://purl.obolibrary.org/obo/PATO_0000460

My Example
The OMOP_4128371 concept (Acute rejection of renal transplant) maps to AND(Acute, Renal insufficiency, Status post organ transplantation)

Class: https://github.com/callahantiff/PheKnowLator/obo/ext/OMOP_4128371 (Or does this need to be a NEW HP term?)

class `has part`
  some ('acute'
    and (`renal insufficiency` some `status post organ transplantation`)
# class has_part
https://github.com/callahantiff/PheKnowLator/obo/ext/OMOP_4128371, http://www.w3.org/2002/07/owl#equivalentClass, Nf8a96b7801764cb7859eb8289ee641e4 . Nf8a96b7801764cb7859eb8289ee641e4, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Restriction  
Nf8a96b7801764cb7859eb8289ee641e4, http://www.w3.org/2002/07/owl#onProperty, http://purl.obolibrary.org/obo/BFO_0000051

# some
Nf8a96b7801764cb7859eb8289ee641e4, http://www.w3.org/2002/07/owl#someValuesFrom, N21ee3aecd67e44f3a64920531702a4a7 
N21ee3aecd67e44f3a64920531702a4a7, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class

# and  
N21ee3aecd67e44f3a64920531702a4a7, http://www.w3.org/2002/07/owl#intersectionOf, Nb60bf88cd2554c9d8ee95d3bd8d7caf5
Nb60bf88cd2554c9d8ee95d3bd8d7caf5, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, Nc5e5c1d339564599a1824bbc4d35b0ec

# acute
Nb60bf88cd2554c9d8ee95d3bd8d7caf5, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/HP_0011009
Nb60bf88cd2554c9d8ee95d3bd8d7caf5, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, Nca0ddadf8e974e9a91db380d88cd5ad7

# and
Nca0ddadf8e974e9a91db380d88cd5ad7, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, Nfca2a5a88bc44e5da8ff8a38601fd834
Nca0ddadf8e974e9a91db380d88cd5ad7, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil

# renal insufficiency
Nfca2a5a88bc44e5da8ff8a38601fd834, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/HP_0000083	 
Nfca2a5a88bc44e5da8ff8a38601fd834	http://www.w3.org/1999/02/22-rdf-syntax-ns#rest	http://www.w3.org/1999/02/22-rdf-syntax-ns#nil

# and
Nc5e5c1d339564599a1824bbc4d35b0ec http://www.w3.org/1999/02/22-rdf-syntax-ns#first N3a404d2e114e46da98d369033e6ee7c0
Nc5e5c1d339564599a1824bbc4d35b0ec, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil

# status post organ transplantation
N3a404d2e114e46da98d369033e6ee7c0, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/HP_0032444
N3a404d2e114e46da98d369033e6ee7c, 0http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil



EXAMPLE 2: owl:unionOf
Class: http://purl.obolibrary.org/obo/HP_0100258

class `has part`
  some (`Preaxial hand polydactyly` 
    or `Preaxial foot polydactyly`)
# some
http://purl.obolibrary.org/obo/HP_0100258, http://www.w3.org/2002/07/owl#equivalentClass, N26938016efc64d77800b84e87e7ec4f9
N26938016efc64d77800b84e87e7ec4f9, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Restriction
N26938016efc64d77800b84e87e7ec4f9, http://www.w3.org/2002/07/owl#onProperty, http://purl.obolibrary.org/obo/BFO_0000051
N26938016efc64d77800b84e87e7ec4f9, http://www.w3.org/2002/07/owl#someValuesFrom, N1619f38e707c466db37a30fc78b91db5

# or
N1619f38e707c466db37a30fc78b91db5, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class
N1619f38e707c466db37a30fc78b91db5, http://www.w3.org/2002/07/owl#unionOf, N931ca444e1c84f1ea3e1250819f9f47d

# preaxial hand polydactyly
N931ca444e1c84f1ea3e1250819f9f47d, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/HP_0001177
N931ca444e1c84f1ea3e1250819f9f47d, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, Nd5c2d15afbda49e89f803d0a426523f2

# preaxial foot polydactyly
Nd5c2d15afbda49e89f803d0a426523f2, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/HP_0001841
Nd5c2d15afbda49e89f803d0a426523f2, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil

My Example
The OMOP_4048191 concept (Enlargement of tonsil or adenoid) maps to OR(Enlarged tonsils, Increased size of nasopharyngeal adenoids)

Class: https://github.com/callahantiff/PheKnowLator/obo/ext/OMOP_4048191 (Or does this need to be a NEW HP term?)

class 'has part’
  some ('Enlarged tonsils’ 
    or 'Increased size of nasopharyngeal adenoids’)
# some
https://github.com/callahantiff/PheKnowLator/obo/ext/OMOP_4048191, http://www.w3.org/2002/07/owl#equivalentClass, N26938016efc64d77800b84e87e7ec4f9
N26938016efc64d77800b84e87e7ec4f9, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Restriction
N26938016efc64d77800b84e87e7ec4f9, http://www.w3.org/2002/07/owl#onProperty, http://purl.obolibrary.org/obo/BFO_0000051
N26938016efc64d77800b84e87e7ec4f9, http://www.w3.org/2002/07/owl#someValuesFrom, N1619f38e707c466db37a30fc78b91db5

# or
N1619f38e707c466db37a30fc78b91db5, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class
N1619f38e707c466db37a30fc78b91db5, http://www.w3.org/2002/07/owl#unionOf, N931ca444e1c84f1ea3e1250819f9f47d

# enlarged tonsils
N931ca444e1c84f1ea3e1250819f9f47d, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/HP_0030812
N931ca444e1c84f1ea3e1250819f9f47d, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, Nd5c2d15afbda49e89f803d0a426523f2

# increased size of nasopharyngeal adenoids
Nd5c2d15afbda49e89f803d0a426523f2, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/HP_0040261
Nd5c2d15afbda49e89f803d0a426523f2, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil



EXAMPLE 3: owl:complementOf
Class: http://purl.obolibrary.org/obo/CL_0001068

class 'group 1 innate lymphoid cell’
  and(not('capable of’
    some 'leukocyte mediated cytotoxicity’))
# and group 1 innate lymphoid cell
http://purl.obolibrary.org/obo/CL_0001068, http://www.w3.org/2002/07/owl#equivalentClass, N3b65c766ed0b4fa5a5a8cca6d1371af9
N3b65c766ed0b4fa5a5a8cca6d1371af9, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class
N3b65c766ed0b4fa5a5a8cca6d1371af9, http://www.w3.org/2002/07/owl#intersectionOf, N00ff83ae947d44fba9c2a2ec4c4a443b
N00ff83ae947d44fba9c2a2ec4c4a443b, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/CL_0001067
N00ff83ae947d44fba9c2a2ec4c4a443b, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, N5332defaa8f149d28d5ea2b137ea6315

# not
N5332defaa8f149d28d5ea2b137ea6315, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, N0e9ef34e56a7491eae04492497dcf34d
N5332defaa8f149d28d5ea2b137ea6315, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil
N0e9ef34e56a7491eae04492497dcf34d, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class
N0e9ef34e56a7491eae04492497dcf34d, http://www.w3.org/2002/07/owl#complementOf, N9df33997e3d14fea8be760c624beae89

# capable of some leukocyte mediated cytotoxicity
N9df33997e3d14fea8be760c624beae89, http://www.w3.org/2002/07/owl#onProperty, http://purl.obolibrary.org/obo/RO_0002215
N9df33997e3d14fea8be760c624beae89, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Restriction
N9df33997e3d14fea8be760c624beae89, http://www.w3.org/2002/07/owl#someValuesFrom, http://purl.obolibrary.org/obo/GO_0001909

My Example
The OMOP_4021760 concept (Non-infectious pneumonia) maps to AND(pneumonia(NOT(disease by infectious agent))

Class: https://github.com/callahantiff/PheKnowLator/obo/ext/OMOP_4021760 (Or does this need to be a NEW DOID term?)

class 'pneumonia’
  and (not(
    some 'disease by infectious agent’))
# and pneumonia
https://github.com/callahantiff/PheKnowLator/obo/ext/OMOP_4021760, http://www.w3.org/2002/07/owl#equivalentClass, N3b65c766ed0b4fa5a5a8cca6d1371af9  
N3b65c766ed0b4fa5a5a8cca6d1371af9, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class
N3b65c766ed0b4fa5a5a8cca6d1371af9, http://www.w3.org/2002/07/owl#intersectionOf, N00ff83ae947d44fba9c2a2ec4c4a443b
N00ff83ae947d44fba9c2a2ec4c4a443b, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, http://purl.obolibrary.org/obo/DOID_552
N00ff83ae947d44fba9c2a2ec4c4a443b, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, N5332defaa8f149d28d5ea2b137ea6315

# not
N5332defaa8f149d28d5ea2b137ea6315, http://www.w3.org/1999/02/22-rdf-syntax-ns#first, N0e9ef34e56a7491eae04492497dcf34d
N5332defaa8f149d28d5ea2b137ea6315, http://www.w3.org/1999/02/22-rdf-syntax-ns#rest, http://www.w3.org/1999/02/22-rdf-syntax-ns#nil
N0e9ef34e56a7491eae04492497dcf34d, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class
N0e9ef34e56a7491eae04492497dcf34d, http://www.w3.org/2002/07/owl#complementOf, N9df33997e3d14fea8be760c624beae89

# disease by infectious agent
N9df33997e3d14fea8be760c624beae89, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Restriction
N9df33997e3d14fea8be760c624beae89, http://www.w3.org/2002/07/owl#someValuesFrom, http://purl.obolibrary.org/obo/DOID_0050117

Coding: Add Jupyter Notebook to Dockerfile

Enhancement: It would be useful to add Jupyter notebook to Dockerfile to facilitate running example notebooks within a Docker container. This would also include an updated Docker usage example in the README. I'm happy to put forth a pull request with this if you'd like @callahantiff, just let me know. The work you're doing is very cool!

Add Baseline KG Embeddings To CI/CD

TASK

Task Type: CODEBASE

Create code that generates a single set of baseline embeddings for each KG build output by PheKnowLator. This code should be run as part of the CD/CI

TODO

  • Choose an embedding method
  • Select reasonable hyperparameter settings for each OWL and OWL-NETS build
  • Incorporate build into CI/CD workflow (#68)

TODO: Update DOID nodes with MONDO Identifiers

TASK

Task Type: CODEBASE
Update all DOID identifiers with MONDO identifiers

TODO

Script(s) Impacted: Data_Preparation.ipynb

Proposed Solution:

  • Add small function to notebook that pulls all DbXRefs from MONDO to DOID and store as a dictionary
  • Add dictionary with mappings to Wiki

Other: Add sparse KG representation output

TASK

Task Type: CODEBASE

TODO

  • Consider including an additional KG output format that is sparse. A sparse representation is needed for most graph representation learning algorithms and significantly decreases the time to load the graph into memory

Libraries to Consider: CSRGraph

Enhancement: Improve Networkx MultiDiGraph Metadata

TASK

Task Type: CODEBASE

Improve the node and edge metadata when outputting the Networkx MultiDiGraph versions of each build. Thanks to @rkboyce, who suggested that we could make very small changes to the current Network graph and drastically improve the usability of the output structure.

TODO

Impacted Scripts:

  • knowledge_graph.py
  • converts_rdflib_to_networkx() in utils/kg_utils.py

Needed Functionality:

  • Add a helper function to utils/kg_utils.py that can be called by converts_rdflib_to_networkx(). The helper function will set graph attributes for edges:
    • key: a unique value for each predicate with respect to the triple it appears in, could be a hash of the triple. Just need to ensure that it is unique
    • weight: default to 0

@rkboyce, can you please verify that I have covered the needed changes that we discussed this week correctly above?

I will also be implementing a few changes to the OWL-NETS architecture (issue #56) and will be storing the collapsed semantic information from the full graph as attributes of the transformed OWL-NETS graph, likely in the form of edge and and node dictionary entries.

codacy and code climate test coverage st-up

It looks like both the Codacy and Code Climate test coverage reports are not being generated correctly. In the web dashboards for each of these apps it's stating that coverage still needs to be set-up.

@LucaCappelletti94 - would you be willing to take a look at this with me? I'm sure it is something simple I am missing.

TODO - Project Organization: add contributing information

Task: Add documentation for contributing

Description: Create or modify contribution information for project. A good example of how to do this can be seen here.

Here is a general outline:

Contributing to the PheKnowLater Project

🎉 👏 First off, thanks so much for being willing to contribute to our project! 👍 :bowtie:

We welcome contributions to our project and ask that you please follow the Code of Conduct.


We Support Reproducible Research

Please also take a look at how we use GitHub to enable reproducible research. We are also working on creating guidelines we would like our project collaborators to follow. Please take a look, if you have suggestions we'd love to hear them here.


Contributing

Issues

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change. Whenever possible, the issue templates should be selected according to their description:

  • Bug report: A built-in issue template that should be used when you find an issue in the code base that needs to be fixed.
  • Coding Tasks: A custom template that should be used when you want to request a change be made to existing code or when you want to suggest new code that could be added to the code base.
  • Feature request: A built-in issue template that is used when you have a new idea or suggestion that you would like to share with the project developers.
  • Help: A custom template that should be used when you have a question on how to contribute to the repository. This can also be used a place for asking any question on how to contribute to this repository.
  • Manuscript Tasks: A custom template that should be used when you want to create a task that is related to a manuscript being written about/using this project.
  • Meetings: A custom template that should be used when you want follow-up on a task assigned during a meeting or when you want to suggest a new topic for discussion at an upcoming meeting.
  • Other: A custom template that should be used when you are unable to use any of the other issue templates (e.g. general questions about
  • Project Organization Tasks: A custom template that should be used when you want to add a task related to the organization of the project (e.g. adding collaborators or modifying project boards or milestones).
  • Wiki: A custom template that should be used when you want to suggest and edit to the project Wiki page.

Once you have selected the type of issue you want to submit, you will be presented with an empty template, specific to that issue, and asked to provide certain information.

Pull Requests

In general, we follow the "fork-and-pull" Git workflow.

  1. Fork the repo on GitHub
  2. Clone the project to your own machine
  3. Commit changes to your own branch
  4. Push your work back up to your fork
  5. Submit a Pull request to our development branch so that we can review and add your changes

NOTE: Be sure to merge the latest from "development" before making a pull request!

Code of Conduct

Our Pledge

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and. our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.

Our Standards

Examples of behavior that contributes to creating a positive environment include:

  • Using welcoming and inclusive language
  • Being respectful of differing viewpoints and experiences
  • Gracefully accepting constructive criticism
  • Focusing on what is best for the community
  • Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

  • The use of sexualized language or imagery and unwelcome sexual attention or
    advances
  • Trolling, insulting/derogatory comments, and personal or political attacks
  • Public or private harassment
  • Publishing others' private information, such as a physical or electronic
    address, without explicit permission
  • Other conduct which could reasonably be considered inappropriate in a
    professional setting

Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.

Scope

This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.

Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team here. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.

Attribution

This document was inspired by atom's CONTRIBUTION documentation

This Code of Conduct is adapted from the Contributor Covenant, version 1.4,
available at http://contributor-covenant.org/version/1/4

Handling Unicode Encoding Errors in Ontology Metadata

Problem: unicode errors occurring when writing out knowledge graph metadata locally --depending on the OS and Python version used.

Script: metadata.py

Current Solution: encode/decode ontology term labels, definitions, and synonyms and explicitly ignore UnicodeEncodeError.

Proposed Solution: Add functionality to better handle processing of UnicodeEncodeError

Build V3 - knowledge_graph.py

V3 Build Changes.
Script: knowledge_graph.py.py

Requested Changes:

  • Extend functionality of the code to improve KR for:
    • Instance-based builds that include connections between 2 instance nodes (for example, the complex and reaction nodes from Reactome (shown in Figure below)
    • The ability to combine instance and subclass-based methods

PheKnowLator_v2 0 0_KnowledgeRepresentation_Instance_all

Set-up SPARQL Endpoint

TASK

Task Type: PKT DATA DELIVERY

Select and set-up a SPARQL endpoint for exploring KG build data

TODO

  • Pick an endpoint. Here is a Medium article that compares and contrasts existing triplestores. Considering what we care about (Docker compatibility, RDF and Querying speed), I have selected a few and ordered them from best to worst:
  • Configure with CI/CD
  • Figure out where to host it
    • Through Google Cloud Run?

Questions:

  • @bill-baumgartner - Which one do you think we should use?
  • Two versions needed, 1 for our production build and 1 for those who want to build their own

TODO: Move print statements to logging

TASK

Task Type: CODEBASE

There are a lot of print statements created during the KG build that provide useful information as well track build progress. Most of these should be moved to logging.

TODO

  • Move all print statements and processing to logging

Project Meeting -- 09/18/2019 @ 13:00

Meeting Date: September 18, 2019
Topic: Weekly Meeting
Attendees: @bill-baumgartner

Proposed Agenda:

  • Comparing full KG to full KG + reasoner inferences
    • Discuss KaBOB script used to convert anonymous nodes to non-anonymous nodes to properly add edges inferred from running reasoners
  • Continue discussing developing KR for clinical to biological concept mappings (see #12)

KG V2.0.0 - Finalizing Knowledge Representation

Extending Knowledge Representation for current KG

Current Release: v2.0.0

Description
Adding the following entities/data sources to the current KG build:

  • Variants via Clinvar
  • Proteins via PRO
  • New connections between existing ChEBI, GO, and Reactome concepts to proteins and genes - here:
    • protein-protein, RO_0002434 (interacts with)
    • protein-gobp, RO_0000056 (participates in)
    • protein-gomf, RO_0000085 (has function)
    • protein-gocc, RO_0001025 (located in)
    • protein-cofactor/catalyst (ChEBI), RO_0002436 (molecularly interacts with)
    • protein-complex (reactome), RO_0002436 (molecularly interacts with)
    • gene-protein, RO_0002211 (regulates)
    • chemical (ChEBI)-complex, RO_0002436 (molecularly interacts with)
    • complex (reactome)-complex (reactome), RO_0002436 (molecularly interacts with)
    • protein-pathway (reactome), RO_0000056 (participates in)
    • protein-reaction (reactome), RO_0000056 (participates in)

TODO 📋 💻 📝

  • Create edge types to connect variants to KG
  • Verify ontological assumptions for edges provided by @ignaciot to ensure satisfiability and consistency with existing KR
  • Investigate which version of PRO to download, specifically searching for one which only includes human proteins
  • Update KR schema and verify it
  • Update input documentation
  • Add new data sources to wiki

@callahantiff Due Dates:

  • Have KR and Wiki updated and finalized by 10/23/19
  • Begin building KG v2.1.0 by 10/23/19

Setting Up and End-to-End CI/CD Framework

Task

Task Type: INFRASTRUCTURE
Determine which tools we will use in order to set-up an end-to-end CI/CD framework.

TODO

The requirements for this system include:

  • Leveraging GitHub Actions to:
    • Test the codebase
    • Downloaded needed resources and build the Docker Container
    • Deploy and run the Docker container via Google Cloud Run (one for each KG build type)
    • Generate baseline embeddings (#71)
    • Returning all results
    • Pushing certain files to Neo4J instance and SPARQL Endpoint

Potential Configurations:

  • CI/CD with Serverless Containers on GCP - Described here
  • Consider using Google Cloud Composer to kick-off the first task of the monthly build process which downloads and preprocess the data used for each build (LOD and Ontology data)

Proposed Tasks for CI/CD

  • Download all LOD and Ontology data
  • Preprocess and Clean data
  • KG Build

Related GitHub Issues: #47, #49

CTD Data Source - CAPTCHA

Issue: CTD now has a CAPTCHA is place to prevent automatic downloading of data. This impacts the current build as there is no solution currently in place to work around this.

Temporary Workaround: All CTD data sources need to be manually downloaded to the resources/edge_data repo prior to running the download step of the build. The downloaded file also needs to be unzipped and have the edge type label appended to the front of the file name (example below).


File: edge_source_list.txt

chemical-disease, http://ctdbase.org/reports/CTD_chemicals_diseases.tsv.gz
chemical-gene, http://ctdbase.org/reports/CTD_chem_gene_ixns.tsv.gz
chemical-phenotype, http://ctdbase.org/reports/CTD_chemicals_diseases.tsv.gz
chemical-protein, http://ctdbase.org/reports/CTD_chem_gene_ixns.tsv.gz

Repository: resources/edge_data/
chemical-disease_CTD_chemicals_diseases.tsv
chemical-gene_CTD_chem_gene_ixns.tsv
chemical-phenotype_CTD_chemicals_diseases.tsv
chemical-protein_CTD_chem_gene_ixns.tsv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.