Giter VIP home page Giter VIP logo

np-kg's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

np-kg's Issues

Documentation for NP-KG construction

Add documentation for constructing NP-KG from scratch.

Include -

  1. How to install PheKnowLator (also link forked repository).
  2. How to run PheKnowLator notebooks.
  3. How to install relation extraction systems - SemRep and INDRA with REACH.
  4. Relation extraction workflow.
  5. Literature graph processing and construction.
  6. NP-KG construction and evaluation.

Construct literature graphs for 30 natural products

v2.0.0 extends the literature-based graph to include predications for 30 natural products from full texts of scientific articles. See NP-KG paper for search strategies, PMIDs folder for included PubMed IDs. Graph artifacts and code is available as well.

[Data Source] Add SPLICER side effects database

Add side effects database to NP-KG to be able to distinguish side effects of drugs in KG. SPLICER uses structured product labels to extract side effects for drugs. Data is standardized to RxNorm concepts for drugs and MedDRA concepts for side effects/adverse events.

Requirements for KG construction

Add requirements for constructing NP-KG components from scratch -

  1. PheKnowLator KG
  2. Literature-based graph.

PheKnowLator KG requirements can be linked to the original or forked repositories. Create requirements file for the literature-based graph construction, including the systems (SemRep, INDRA, REACH) and processing (e.g., clips).

Get TSV data file

Dear authors,

I have been looking into the data files deposited on Zenodo but they seem to be TASV files generated from RDF. Would it be possible for you to provide the TSV file in the following format: source curie, relation, target curie

Thank You.

Nodes in edge list missing from node label file

Graph report from ๐Ÿ‡ GRAPE

Hello, I will leave the report for the directed graph version here. Do feel free to do what you may want with it.

I will soon post the undirected version in a comment.

NPKG

The directed multigraph NPKG has 757.83K heterogeneous nodes and 7.25M heterogeneous edges. The RAM requirements for the nodes and edges data structures are 193.02MB and 17.07MB, respectively.

Degree centrality

The minimum node degree is 0, the maximum node degree is 21.86K, the mode degree is 2, the mean degree is 9.57, and the node degree median is 3.

The nodes with the highest degree centrality are http://purl.obolibrary.org/obo/UBERON_0000473 (degree 21.86K and node type testis), http://purl.obolibrary.org/obo/GO_0005515 (degree 12.33K and node type protein binding), http://purl.obolibrary.org/obo/UBERON_0000007 (degree 10.73K and node type pituitary gland), http://purl.obolibrary.org/obo/UBERON_0002046 (degree 10.56K and node type thyroid gland) and http://purl.obolibrary.org/obo/UBERON_0001323 (degree 10.39K and node type tibial nerve).

Node types

The graph has 707.46K node types, of which the 10 most common are 3R (6.37K nodes, 0.84%), 3S (2.89K nodes, 0.38%), 4S (2.33K nodes, 0.31%), 4R (1.47K nodes, 0.19%), 9S (1.39K nodes, 0.18%), 9R (1.33K nodes, 0.18%), signal peptide removed form (human)" (1.21K nodes, 0.16%), 7R (1.15K nodes, 0.15%), 7S (1.15K nodes, 0.15%) and 2 (1.13K nodes, 0.15%). The RAM requirement for the node types data structure is 165.16MB.

Singleton node types

Singleton node types are node types that are assigned exclusively to a single node, making the node type relatively meaningless, as it adds no more information than the node name itself. The graph contains 702.25K singleton node types, which are syntaphilin isoform h1 (human) (http://purl.obolibrary.org/obo/PR_O15079-1 (degree 2 and node type syntaphilin isoform h1 (human))), ASXL2-205 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000672666 (degree 3 and node type ASXL2-205)), GNG2-205 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000554736 (degree 4 and node type GNG2-205)), alpha-D-galactosyl-(1->3)-beta-D-galactosyl-(1->4)-N-acetyl-beta-D-glucosaminyl-(1->3)-beta-D-galactosyl-(1->4)-N-acetyl-beta-D-glucosaminyl-(1->3)-beta-D-galactosyl-(1->4)-beta-D-glucosyl-(1<->1')-ceramide(t18:0) (http://purl.obolibrary.org/obo/CHEBI_144633 (degree 3 and node type alpha-D-galactosyl-(1->3)-beta-D-galactosyl-(1->4)-N-acetyl-beta-D-glucosaminyl-(1->3)-beta-D-galactosyl-(1->4)-N-acetyl-beta-D-glucosaminyl-(1->3)-beta-D-galactosyl-(1->4)-beta-D-glucosyl-(1<->1')-ceramide(t18:0))), Pro-Thr-Phe (http://purl.obolibrary.org/obo/CHEBI_162683 (node type Pro-Thr-Phe)), pachyonychia congenita 4 (http://purl.obolibrary.org/obo/MONDO_0014325 (degree 7 and node type pachyonychia congenita 4)), Arg-Thr-Asn (http://purl.obolibrary.org/obo/CHEBI_159322 (node type Arg-Thr-Asn)), PCDH7-208 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000621961 (degree 3 and node type PCDH7-208)), ZNF630-201 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000276054 (degree 4 and node type ZNF630-201)) and MDM2-215 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000428863 (degree 3 and node type MDM2-215)), plus other 702.24K singleton node types.

Unknown node types

Nodes with unknown node types are nodes with a node type that was not provided during the creation of the graph, which may be desired as the output of a node-label holdout. The graph contains 61 nodes with unknown node type, which are http://purl.obolibrary.org/obo/NCIT_C68749 (degree 1), http://purl.obolibrary.org/obo/PR_P69345 (degree 3), http://purl.obolibrary.org/obo/GO_0005578 (degree 1), http://purl.obolibrary.org/obo/NCBITaxon_9775 (degree 1), http://purl.obolibrary.org/obo/MFOMD_0000107 (degree 1), http://purl.obolibrary.org/obo/NCBITaxon_399537 (degree 1), http://purl.obolibrary.org/obo/NCIT_C37109 (degree 1), http://purl.obolibrary.org/obo/NCBITaxon_9261 (degree 1), http://purl.obolibrary.org/obo/MFOMD_0000101 (degree 1) and http://purl.obolibrary.org/obo/NBO_0000034 (degree 1), plus other 51 nodes with unknown node types, making up 0.01 of the nodes.

Edge types

The graph has 299 edge types, of which the 10 most common are http://www.w3.org/2000/01/rdf-schema#subClassOf (1.16M edges, 16.01%), http://purl.obolibrary.org/obo/RO_0002436 (1.01M edges, 14.00%), http://purl.obolibrary.org/obo/RO_0001025 (688.97K edges, 9.50%), http://purl.obolibrary.org/obo/RO_0001015 (688.79K edges, 9.50%), http://purl.obolibrary.org/obo/RO_0002201 (420.57K edges, 5.80%), http://purl.obolibrary.org/obo/RO_0002200 (420.57K edges, 5.80%), http://purl.obolibrary.org/obo/RO_0000057 (381.16K edges, 5.26%), http://purl.obolibrary.org/obo/RO_0000056 (381.14K edges, 5.26%), http://purl.obolibrary.org/obo/RO_0002606 (275.76K edges, 3.80%) and http://purl.obolibrary.org/obo/RO_0002302 (275.28K edges, 3.80%). The RAM requirement for the edge types data structure is 29.07MB.

Singleton edge types

Singleton edge types are edge types that are assigned exclusively to a single edge, making the edge type relatively meaningless, as it adds no more information than the name of edge itself. The graph contains 42 edges with singleton edge types, which are http://purl.obolibrary.org/obo/GO_0016579, http://purl.obolibrary.org/obo/RO_0002480, http://purl.obolibrary.org/obo/RO_0002002, http://purl.obolibrary.org/obo/RO_0003309, http://purl.obolibrary.org/obo/mondo#disease_has_basis_in_accumulation_of, http://purl.obolibrary.org/obo/uberon/core#evolved_from, http://purl.obolibrary.org/obo/RO_0002482, http://purl.obolibrary.org/obo/RO_0002497, http://purl.obolibrary.org/obo/CLO_0054408 and http://purl.obolibrary.org/obo/CLO_0054409, plus other 32 edges with singleton edge types.

Topological Oddities

A topological oddity is a set of nodes in the graph that may be derived by an error during the generation of the edge list of the graph and, depending on the task, could bias the results of topology-based models. Note that in a directed graph we only support the detection of isomorphic nodes. In the following paragraph, we will describe the detected topological oddities.

Singleton nodes

A singleton node is a node disconnected from all other nodes. We have detected 12.31K singleton nodes in the graph, involving a total of 12.31K nodes (1.62%). The detected singleton nodes are:

And other 12.30K singleton nodes.

Isomorphic node groups

Isomorphic groups are nodes with exactly the same neighbours and node types (if present in the graph). Nodes in such groups are topologically indistinguishable, that is swapping their ID would not change the graph topology. We have detected 5.98K isomorphic node groups in the graph, involving a total of 57.72K nodes (7.62%) and 756.89K edges (10.44%), with the largest one involving 8.78K nodes and 52.65K edges. The detected isomorphic node groups, sorted by decreasing size, are:

  1. Group with 8.78K nodes (degree 6 and node type DA04093 cell): http://purl.obolibrary.org/obo/CLO_0029419, http://purl.obolibrary.org/obo/CLO_0022715, http://purl.obolibrary.org/obo/CLO_0017887, http://purl.obolibrary.org/obo/CLO_0036426, http://purl.obolibrary.org/obo/CLO_0036483 and other 8.77K.

  2. Group with 758 nodes (degree 55 and node type Y_RNA.281-201): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000364774, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000363549, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000516933, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000363746, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000362807 and other 753.

  3. Group with 6.33K nodes (degree 6 and node type GM18869 cell): http://purl.obolibrary.org/obo/CLO_0026252, http://purl.obolibrary.org/obo/CLO_0020148, http://purl.obolibrary.org/obo/CLO_0010791, http://purl.obolibrary.org/obo/CLO_0026960, http://purl.obolibrary.org/obo/CLO_0010628 and other 6.33K.

  4. Group with 426 nodes (degree 56 and node type LINC00342-207): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000660608, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000667622, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000428029, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000667637, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000658836 and other 421.

  5. Group with 3.09K nodes (degree 5 and node type IGF047/78 cell): http://purl.obolibrary.org/obo/CLO_0005370, http://purl.obolibrary.org/obo/CLO_0006275, http://purl.obolibrary.org/obo/CLO_0005758, http://purl.obolibrary.org/obo/CLO_0006025, http://purl.obolibrary.org/obo/CLO_0004699 and other 3.09K.

  6. Group with 238 nodes (degree 56 and node type PCBP1-AS1-270): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000614826, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000660903, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000418564, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000415060, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000598586 and other 233.

  7. Group with 151 nodes (degree 56 and node type FRG1HP-202): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000611606, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000674524, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000507287, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000582986, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000674747 and other 146.

  8. Group with 143 nodes (degree 54 and node type SNHG14-312): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000667788, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000656463, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000580438, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000654902, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000551631 and other 138.

  9. Group with 169 nodes (degree 44 and node type Metazoa_SRP.125-201): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000578843, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000613165, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000616324, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000366402, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000615037 and other 164.

  10. Group with 1.00K nodes (degree 7 and node type ND13629 cell): http://purl.obolibrary.org/obo/CLO_0034115, http://purl.obolibrary.org/obo/CLO_0020098, http://purl.obolibrary.org/obo/CLO_0036969, http://purl.obolibrary.org/obo/CLO_0022263, http://purl.obolibrary.org/obo/CLO_0022794 and other 998.

  11. Group with 143 nodes (degree 45 and node type MIR99AHG-339): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000671058, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000659209, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000669276, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000602323, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000663516 and other 138.

  12. Group with 176 nodes (degree 34 and node type PVT1-295): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000660631, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000657356, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000656693, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000658018, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000521600 and other 171.

  13. Group with 727 nodes (degree 7 and node type ND03608 cell): http://purl.obolibrary.org/obo/CLO_0013069, http://purl.obolibrary.org/obo/CLO_0015523, http://purl.obolibrary.org/obo/CLO_0017605, http://purl.obolibrary.org/obo/CLO_0027906, http://purl.obolibrary.org/obo/CLO_0028339 and other 722.

  14. Group with 86 nodes (degree 57 and node type SNHG17-282): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000669114, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000662568, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000656320, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000660064, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000662572 and other 81.

  15. Group with 678 nodes (degree 7 and node type ND13864 cell): http://purl.obolibrary.org/obo/CLO_0034600, http://purl.obolibrary.org/obo/CLO_0029029, http://purl.obolibrary.org/obo/CLO_0031286, http://purl.obolibrary.org/obo/CLO_0034092, http://purl.obolibrary.org/obo/CLO_0031698 and other 673.

And other 5.97K isomorphic node groups.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.