sanyabt / np-kg Goto Github PK
View Code? Open in Web Editor NEWNP-KG: Knowledge Graph Framework for Natural Product-Drug Interactions
Home Page: https://doi.org/10.5281/zenodo.6814507
License: Apache License 2.0
NP-KG: Knowledge Graph Framework for Natural Product-Drug Interactions
Home Page: https://doi.org/10.5281/zenodo.6814507
License: Apache License 2.0
Ontology extensions for natural products include phytoconstituents previously extracted only from GSRS. v2.0.0 extracts constituents from both GSRS and European Medicinal Agency to create the ontology extensions in NP-KG.
Include -
v2.0.0 extends the literature-based graph to include predications for 30 natural products from full texts of scientific articles. See NP-KG paper for search strategies, PMIDs folder for included PubMed IDs. Graph artifacts and code is available as well.
Refactor code (especially utils) to use the curies library (https://github.com/cthoyt/curies) for URI to CURIE conversion and vice versa. Notebooks that should be changed - https://github.com/sanyabt/np-kg/blob/main/util-notebooks/NPKG-generate-tsv.ipynb
Add side effects database to NP-KG to be able to distinguish side effects of drugs in KG. SPLICER uses structured product labels to extract side effects for drugs. Data is standardized to RxNorm concepts for drugs and MedDRA concepts for side effects/adverse events.
The manuscript contains results of literature-based graph counts and figures with edge counts and labels. Add notebooks for reproducing the counts and figures.
GitHub release v1.0.0 after testing evaluation notebooks. Link to Zenodo and get DOI for GitHub package.
I have loaded the edge list directly from Zenodo, skipping rows with NaNs, and we get that there are more nodes than edges, specifically: 21711446
and 21711146
. Is this correct? Am I doing something wrong?
With such low density, I cannot execute a surprisingly large amount of library features.
Add requirements for constructing NP-KG components from scratch -
PheKnowLator KG requirements can be linked to the original or forked repositories. Create requirements file for the literature-based graph construction, including the systems (SemRep, INDRA, REACH) and processing (e.g., clips).
Dear authors,
I have been looking into the data files deposited on Zenodo but they seem to be TASV files generated from RDF. Would it be possible for you to provide the TSV file in the following format: source curie, relation, target curie
Thank You.
Hello,
I was trying to create a visualization of the graph that also included the node types, but I see that a significant amount of nodes appear in the edge list but is missing from the node label file. A small subset is:
Should I just ignore the node types?
New data source to be added in v2.0.0 for chemical-disease edges ('treats'). repoDB - https://unmtid-shinyapps.net/shiny/repodb/
MASI (http://www.aiddlab.com/MASI/index.html) contains herbal substance-microbe associations with identifiers for microbes. Ingest data in next KG version with predicate for "Substances alter microbe abundance".
Hello, I will leave the report for the directed graph version here. Do feel free to do what you may want with it.
I will soon post the undirected version in a comment.
The directed multigraph NPKG has 757.83K heterogeneous nodes and 7.25M heterogeneous edges. The RAM requirements for the nodes and edges data structures are 193.02MB and 17.07MB, respectively.
The minimum node degree is 0, the maximum node degree is 21.86K, the mode degree is 2, the mean degree is 9.57, and the node degree median is 3.
The nodes with the highest degree centrality are http://purl.obolibrary.org/obo/UBERON_0000473 (degree 21.86K and node type testis), http://purl.obolibrary.org/obo/GO_0005515 (degree 12.33K and node type protein binding), http://purl.obolibrary.org/obo/UBERON_0000007 (degree 10.73K and node type pituitary gland), http://purl.obolibrary.org/obo/UBERON_0002046 (degree 10.56K and node type thyroid gland) and http://purl.obolibrary.org/obo/UBERON_0001323 (degree 10.39K and node type tibial nerve).
The graph has 707.46K node types, of which the 10 most common are 3R (6.37K nodes, 0.84%), 3S (2.89K nodes, 0.38%), 4S (2.33K nodes, 0.31%), 4R (1.47K nodes, 0.19%), 9S (1.39K nodes, 0.18%), 9R (1.33K nodes, 0.18%), signal peptide removed form (human)" (1.21K nodes, 0.16%), 7R (1.15K nodes, 0.15%), 7S (1.15K nodes, 0.15%) and 2 (1.13K nodes, 0.15%). The RAM requirement for the node types data structure is 165.16MB.
Singleton node types are node types that are assigned exclusively to a single node, making the node type relatively meaningless, as it adds no more information than the node name itself. The graph contains 702.25K singleton node types, which are syntaphilin isoform h1 (human) (http://purl.obolibrary.org/obo/PR_O15079-1 (degree 2 and node type syntaphilin isoform h1 (human))), ASXL2-205 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000672666 (degree 3 and node type ASXL2-205)), GNG2-205 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000554736 (degree 4 and node type GNG2-205)), alpha-D-galactosyl-(1->3)-beta-D-galactosyl-(1->4)-N-acetyl-beta-D-glucosaminyl-(1->3)-beta-D-galactosyl-(1->4)-N-acetyl-beta-D-glucosaminyl-(1->3)-beta-D-galactosyl-(1->4)-beta-D-glucosyl-(1<->1')-ceramide(t18:0) (http://purl.obolibrary.org/obo/CHEBI_144633 (degree 3 and node type alpha-D-galactosyl-(1->3)-beta-D-galactosyl-(1->4)-N-acetyl-beta-D-glucosaminyl-(1->3)-beta-D-galactosyl-(1->4)-N-acetyl-beta-D-glucosaminyl-(1->3)-beta-D-galactosyl-(1->4)-beta-D-glucosyl-(1<->1')-ceramide(t18:0))), Pro-Thr-Phe (http://purl.obolibrary.org/obo/CHEBI_162683 (node type Pro-Thr-Phe)), pachyonychia congenita 4 (http://purl.obolibrary.org/obo/MONDO_0014325 (degree 7 and node type pachyonychia congenita 4)), Arg-Thr-Asn (http://purl.obolibrary.org/obo/CHEBI_159322 (node type Arg-Thr-Asn)), PCDH7-208 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000621961 (degree 3 and node type PCDH7-208)), ZNF630-201 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000276054 (degree 4 and node type ZNF630-201)) and MDM2-215 (https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000428863 (degree 3 and node type MDM2-215)), plus other 702.24K singleton node types.
Nodes with unknown node types are nodes with a node type that was not provided during the creation of the graph, which may be desired as the output of a node-label holdout. The graph contains 61 nodes with unknown node type, which are http://purl.obolibrary.org/obo/NCIT_C68749 (degree 1), http://purl.obolibrary.org/obo/PR_P69345 (degree 3), http://purl.obolibrary.org/obo/GO_0005578 (degree 1), http://purl.obolibrary.org/obo/NCBITaxon_9775 (degree 1), http://purl.obolibrary.org/obo/MFOMD_0000107 (degree 1), http://purl.obolibrary.org/obo/NCBITaxon_399537 (degree 1), http://purl.obolibrary.org/obo/NCIT_C37109 (degree 1), http://purl.obolibrary.org/obo/NCBITaxon_9261 (degree 1), http://purl.obolibrary.org/obo/MFOMD_0000101 (degree 1) and http://purl.obolibrary.org/obo/NBO_0000034 (degree 1), plus other 51 nodes with unknown node types, making up 0.01 of the nodes.
The graph has 299 edge types, of which the 10 most common are http://www.w3.org/2000/01/rdf-schema#subClassOf (1.16M edges, 16.01%), http://purl.obolibrary.org/obo/RO_0002436 (1.01M edges, 14.00%), http://purl.obolibrary.org/obo/RO_0001025 (688.97K edges, 9.50%), http://purl.obolibrary.org/obo/RO_0001015 (688.79K edges, 9.50%), http://purl.obolibrary.org/obo/RO_0002201 (420.57K edges, 5.80%), http://purl.obolibrary.org/obo/RO_0002200 (420.57K edges, 5.80%), http://purl.obolibrary.org/obo/RO_0000057 (381.16K edges, 5.26%), http://purl.obolibrary.org/obo/RO_0000056 (381.14K edges, 5.26%), http://purl.obolibrary.org/obo/RO_0002606 (275.76K edges, 3.80%) and http://purl.obolibrary.org/obo/RO_0002302 (275.28K edges, 3.80%). The RAM requirement for the edge types data structure is 29.07MB.
Singleton edge types are edge types that are assigned exclusively to a single edge, making the edge type relatively meaningless, as it adds no more information than the name of edge itself. The graph contains 42 edges with singleton edge types, which are http://purl.obolibrary.org/obo/GO_0016579, http://purl.obolibrary.org/obo/RO_0002480, http://purl.obolibrary.org/obo/RO_0002002, http://purl.obolibrary.org/obo/RO_0003309, http://purl.obolibrary.org/obo/mondo#disease_has_basis_in_accumulation_of, http://purl.obolibrary.org/obo/uberon/core#evolved_from, http://purl.obolibrary.org/obo/RO_0002482, http://purl.obolibrary.org/obo/RO_0002497, http://purl.obolibrary.org/obo/CLO_0054408 and http://purl.obolibrary.org/obo/CLO_0054409, plus other 32 edges with singleton edge types.
A topological oddity is a set of nodes in the graph that may be derived by an error during the generation of the edge list of the graph and, depending on the task, could bias the results of topology-based models. Note that in a directed graph we only support the detection of isomorphic nodes. In the following paragraph, we will describe the detected topological oddities.
A singleton node is a node disconnected from all other nodes. We have detected 12.31K singleton nodes in the graph, involving a total of 12.31K nodes (1.62%). The detected singleton nodes are:
http://purl.obolibrary.org/obo/CHEBI_166390 (node type Val-Thr-Trp)
http://purl.obolibrary.org/obo/CHEBI_160256 (node type Lys-Ser-Ala)
http://purl.obolibrary.org/obo/CHEBI_162384 (node type Gln-Phe-Cys)
http://purl.obolibrary.org/obo/CHEBI_165305 (node type 15-Methyl-15S-PGE2)
http://purl.obolibrary.org/obo/CHEBI_163670 (node type Thr-Arg-Lys)
http://purl.obolibrary.org/obo/CHEBI_161717 (node type Cys-Thr-His)
http://purl.obolibrary.org/obo/CHEBI_162506 (node type Gln-Thr-Glu)
http://purl.obolibrary.org/obo/CHEBI_166674 (node type Emmotin A)
http://purl.obolibrary.org/obo/CHEBI_165101 (node type Tyr-His-Leu)
http://purl.obolibrary.org/obo/CHEBI_163620 (node type Thr-Ala-Cys)
http://purl.obolibrary.org/obo/CHEBI_159925 (node type Lys-Glu-Met)
http://purl.obolibrary.org/obo/CHEBI_161347 (node type Cys-Gly-Asp)
http://purl.obolibrary.org/obo/CHEBI_160854 (node type Asp-Pro-Phe)
http://purl.obolibrary.org/obo/CHEBI_159656 (node type Asn-Cys-Lys)
http://purl.obolibrary.org/obo/CHEBI_163062 (node type Ser-Glu-Asn)
And other 12.30K singleton nodes.
Isomorphic groups are nodes with exactly the same neighbours and node types (if present in the graph). Nodes in such groups are topologically indistinguishable, that is swapping their ID would not change the graph topology. We have detected 5.98K isomorphic node groups in the graph, involving a total of 57.72K nodes (7.62%) and 756.89K edges (10.44%), with the largest one involving 8.78K nodes and 52.65K edges. The detected isomorphic node groups, sorted by decreasing size, are:
Group with 8.78K nodes (degree 6 and node type DA04093 cell): http://purl.obolibrary.org/obo/CLO_0029419, http://purl.obolibrary.org/obo/CLO_0022715, http://purl.obolibrary.org/obo/CLO_0017887, http://purl.obolibrary.org/obo/CLO_0036426, http://purl.obolibrary.org/obo/CLO_0036483 and other 8.77K.
Group with 758 nodes (degree 55 and node type Y_RNA.281-201): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000364774, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000363549, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000516933, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000363746, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000362807 and other 753.
Group with 6.33K nodes (degree 6 and node type GM18869 cell): http://purl.obolibrary.org/obo/CLO_0026252, http://purl.obolibrary.org/obo/CLO_0020148, http://purl.obolibrary.org/obo/CLO_0010791, http://purl.obolibrary.org/obo/CLO_0026960, http://purl.obolibrary.org/obo/CLO_0010628 and other 6.33K.
Group with 426 nodes (degree 56 and node type LINC00342-207): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000660608, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000667622, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000428029, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000667637, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000658836 and other 421.
Group with 3.09K nodes (degree 5 and node type IGF047/78 cell): http://purl.obolibrary.org/obo/CLO_0005370, http://purl.obolibrary.org/obo/CLO_0006275, http://purl.obolibrary.org/obo/CLO_0005758, http://purl.obolibrary.org/obo/CLO_0006025, http://purl.obolibrary.org/obo/CLO_0004699 and other 3.09K.
Group with 238 nodes (degree 56 and node type PCBP1-AS1-270): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000614826, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000660903, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000418564, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000415060, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000598586 and other 233.
Group with 151 nodes (degree 56 and node type FRG1HP-202): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000611606, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000674524, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000507287, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000582986, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000674747 and other 146.
Group with 143 nodes (degree 54 and node type SNHG14-312): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000667788, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000656463, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000580438, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000654902, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000551631 and other 138.
Group with 169 nodes (degree 44 and node type Metazoa_SRP.125-201): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000578843, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000613165, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000616324, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000366402, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000615037 and other 164.
Group with 1.00K nodes (degree 7 and node type ND13629 cell): http://purl.obolibrary.org/obo/CLO_0034115, http://purl.obolibrary.org/obo/CLO_0020098, http://purl.obolibrary.org/obo/CLO_0036969, http://purl.obolibrary.org/obo/CLO_0022263, http://purl.obolibrary.org/obo/CLO_0022794 and other 998.
Group with 143 nodes (degree 45 and node type MIR99AHG-339): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000671058, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000659209, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000669276, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000602323, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000663516 and other 138.
Group with 176 nodes (degree 34 and node type PVT1-295): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000660631, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000657356, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000656693, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000658018, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000521600 and other 171.
Group with 727 nodes (degree 7 and node type ND03608 cell): http://purl.obolibrary.org/obo/CLO_0013069, http://purl.obolibrary.org/obo/CLO_0015523, http://purl.obolibrary.org/obo/CLO_0017605, http://purl.obolibrary.org/obo/CLO_0027906, http://purl.obolibrary.org/obo/CLO_0028339 and other 722.
Group with 86 nodes (degree 57 and node type SNHG17-282): https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000669114, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000662568, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000656320, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000660064, https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?t=ENST00000662572 and other 81.
Group with 678 nodes (degree 7 and node type ND13864 cell): http://purl.obolibrary.org/obo/CLO_0034600, http://purl.obolibrary.org/obo/CLO_0029029, http://purl.obolibrary.org/obo/CLO_0031286, http://purl.obolibrary.org/obo/CLO_0034092, http://purl.obolibrary.org/obo/CLO_0031698 and other 673.
And other 5.97K isomorphic node groups.
Natural Product Activity and Species Source Database (https://bidd.group/NPASS/downloadnpass.html) contains properties of natural products including structural information, physico-chemical properties, ADMET properties, biological activity (source: CHEMBL and literature) with references. Download and ingest in KG with appropriate relations.
Question: what is the most important data to include? - https://bidd.group/NPASS/compound.php?compoundID=NPC291948
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.