ncats / ramp-db Goto Github PK

Dockerfile 0.14% R 99.83% Rez 0.03%

ramp-db's Introduction

New! RaMP 2.0!

RaMP 2.0 is now released and includes an updated backend database with expanded annotations for >150,000 metabolites and ~14,000 genes/proteins. Annotations include biological pathways, chemical classes and structures (for metabolites only), ontologies (metabolites only), and enzyme-metabolite relationships based on chemical reactions. Annotations are drawn from HMDB, KEGG (through HMDB), Lipid-MAPS, WikiPathways, Reactome, and CheBI.

This R package includes functions that allow users to interface with this up-do-date and comprehensive resource. Functionalities include 1) simple and batch queries for pathways, ontologies, chemical annotations, and reaction-level gene-metabolite relationships; 2) pathway and chemical enrichment analyses.

The code used to build the backend RaMP database is freely available at https://github.com/ncats/RaMP-Backend.

Please click here to view our latest manuscript.

Web Interface

Our new revamped web interface can be found at https://rampdb.nih.gov/. The code is publicly available at https://github.com/ncats/RaMP-Client/.

APIs

API access is now available here.

Why RaMP (Relational Database of Metabolomic Pathways)

The purpose of RaMP is to provide a publicly available database that integrates metabolite and gene/protein biological, chemical and other from multiple sources. The database structure and data is available as a MySQL dump file and it can be directly downloaded from Figshare for integration into any tool. Please see the Installation Instructions for the database download link. Please note that this project is in continuous development and we appreciated any feedback.

Contact Info:

For any questions or feedback, please send us a note at [email protected].

If you find a bug, please submit an issue through this GitHub repo.

Basic Features:

The R packages and associated app perform the following queries:

1. Retrieve analytes (genes, proteins, metabolites) given pathway(s) as input.
2. Retrieve pathway annotations given analytes as input.
3. Retrieve chemical annotations/structures given metabolites as input.
4. Retrieve analytes involved in the same reaction (e.g. enzymes catalyzing reactions involving input metabolites)
5. Retrieve ontologies (e.g. biospecimen location, disease, etc.) given input meteabolites.

The following analyses are also supported:

1. Multi-omic pathway enrichment analysis
2. Chemical enrichment analyses

Last date of dump file update: 03/02/2023

Vignette

Detailed instructions for installing RaMP locally are below. We've also put together a vignette to get you started on the analyses. Click here for vignette.

Citation

If you use RaMP-DB, please cite the following work:

Braisted J, Patt A, Tindall C, Sheils T, Neyra J, Spencer K, Eicher T, Mathé EA. RaMP-DB 2.0: a renovated knowledgebase for deriving biological and chemical insight from metabolites, proteins, and genes. Bioinformatics. 2023 Jan 1;39(1):btac726. doi: 10.1093/bioinformatics/btac726. PMID: 36373969; PMCID: PMC9825745. To access, click here

Zhang, B., et al., RaMP: A Comprehensive Relational Database of Metabolomics Pathways for Pathway Enrichment Analysis of Genes and Metabolites. Metabolites, 2018. 8(1). PMID: 29470400; PMCID: PMC5876005; DOI: 10.3390/metabo8010016 To access, click here

Installation Instructions

In order to use this R package locally, you will need the following:

The R code under this repo
The mysql dump file that contains the RaMP database. Download here.

If you would like to know how to build RaMP database from scratch, please check another GitHub site at RaMP-BackEnd

MySQL set-up

RaMP requires that MySQL and the RaMP database be set up on the machine that you will be running the R package from. To download MySQL, you can go to the MySQL Downloads page

When installing, you will be prompted to create a password for the user "root", or it will create one automatically for you. Importantly, remember your MySQL password! You will need to get into mysql and to pass it as an argument to the RaMP R shiny web application.

If you want to reset your password , you can go to [MySQL References 5.7 - How to reset root password ] (https://dev.mysql.com/doc/refman/5.7/en/resetting-permissions.html)

Please note that you will need administrator privileges for this step..

If you are using a Mac, we recommend using brew to install MySQL. Here's a good tutorial: https://www.novicedev.com/blog/how-install-mysql-macos-homebrew.

Creating the database locally

Once your MySQL environment is in place, creating the RaMP database locally is trivial. First, launch MySQL and create the database:

> mysql -u root -p
mysql> create database ramp;
mysql> exit;

Here, we are naming the database "ramp" but you can use any name you'd like. It is worth noting though that the R package assumes that the name of the database is "ramp" by default. So if you change the name, remember to pass that name as arguments in the R package functions.

Second, download and unzip the latest RaMP database. Download here.

Third, populate the named database with the mysql dump file Supply the path and file name to the unzipped sql file that you've downloaded.

> mysql -u root -p ramp < /your/file/path/here/ramp_<current_version_id_here>.sql

You're done!

Your "ramp" database should contain the following 12 tables:

analyte
analyehasontology
analytehaspathway
analytesynonym
catalyzed
chem_props
db_version
metabolite_class
ontology
pathway
source
version_info

If you want to explore this in MySQL, you can try:

mysql -u root
use ramp;
show tables;
select * from source limit 4; 
select * from source where commonName = "creatine riboside";
select distinct(HMDBOntologyType) from ontology;

Install and load the RaMP package

You can install this package directly from GitHub using the install_github() function available through the devtools package. In the R Console, type the following:

# Locally install RaMP
install.packages("devtools")
library(devtools)
install_github("ncats/RAMP-DB")

# Load the package
library(RaMP)

# Set up your connection to the RaMP2.0 database:
pkg.globals <- setConnectionToRaMP(dbname="ramp",username="root",conpass="",host = "localhost")

Note that prior to using RaMP functions, users much establish required parameters to appropriately connect to your local database (if you are not using the web app). This step is simplified by a single function call (last line in the above code snippet).

If the username is different then root, then specify the username in the "username" parameter. Similarly, if the name of the database is different than "ramp2", then specify the "dbname" parameter.

Important Notes

If you reinstall the latest version of the RaMP package, be sure to also install the latest version of the MySQL RaMP dump file.

Also, when gene or metabolite ids are input for queries, IDs should be prepended with their database of origin, e.g. kegg:C02712, hmdb:HMDB04824, or CAS:2566-39-4. The list of metabolite or gene/protien IDs may be of mixed source. Remember to include the colon in the prefix. The id prefixes that are currently included in RaMP are:

Analyte Type	ID Prefix Types
Metabolites	hmdb, pubchem, chebi, chemspider, kegg, CAS, LIPIDMAPS, swisslipids, lipidbank, wikidata, plantfa, kegg_glycan
Genes/Proteins	ensembl, entrez, gene_symbol, uniprot, hmdb, ncbiprotein, EN, wikidata, chebi

To query the ID types supports in MySQL:

select distinct(IDtype) from source where geneOrCompound ="compound";
mysql> select distinct(IDtype) from source where geneOrCompound ="gene";

Current Authors and Testers

John Braisted - [email protected]
Tara Eicher - [email protected]
Ewy Mathé - [email protected]
Andrew Patt - [email protected]
Tim Sheils - [email protected]
Kyle Spencer - [email protected]

Previous Authors/Testers

Cole Tindall -
Bofei Zhang - Bofei5675
Shunchao Wang -
Rohith Vanam -
Jorge Neyra - Jorgeso

ramp-db's People

Contributors

Stargazers

Watchers

Forkers

mapleknight jijo070 andyptt21 melimore86 iszhi le-chang jorainer bcrl-tylu animesh chevvak2

ramp-db's Issues

Error installing RaMP locally with R

Hi,
I am using a Windows laptop + Rstudio and I am having the following error when installing RaMP as a remote db.
Could you please take a look?

remotes::install_github("Mathelab/RaMP-DB")
....
── R CMD build ───────────────────────────────────────────────────────────────
WARNING: Rtools is required to build R packages, but is not currently installed.

Please download and install Rtools 4.2 from https://cran.r-project.org/bin/windows/Rtools/ or https://www.r-project.org/nosvn/winutf8/ucrt3/.
✔  checking for file 'C:\Users\krist\AppData\Local\Temp\Rtmpy6pOjk\remotes2cf01a0a77eb\Mathelab-RaMP-DB-5acaded/DESCRIPTION' ...
─  preparing 'RaMP':
✔  checking DESCRIPTION meta-information ... 
   Warning in grepl(e, files, perl = TRUE, ignore.case = TRUE) :
     PCRE pattern compilation error
   	'unrecognized character follows \'
   	at 'img/.*$'
   Error in grepl(e, files, perl = TRUE, ignore.case = TRUE) : 
     invalid regular expression '^\img/.*$'
   Execution halted
Error: Failed to install 'RaMP' from GitHub:
  ! System command 'Rcmd.exe' failed

I am not sure if that's related to Rtools that cannot be detected.
Thank you in advance
Kristina

504 Server Error: Gateway Time-out

Hi,

Thank you for making such a great resource, that's really handy for the metabolomics research community.

I am using the API through python and when requesting the "analytes-from-pathways" I am occasionally having the following error:

....
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 504 Server Error: Gateway Time-out for url: https://rampdb.nih.gov/api/analytes-from-pathways?pathway=2...

My list includes about 80 pathways.

Do you maybe have any suggestion that can make my request working? Have you experienced a previous issue like this?

Thank you in advance
Kristina

RaMP Pathway Result Download - Option to export input ids (or names) found in each pathway

If a pathway contains 10 metabolites and 5 of them are within our input list, the exported table might include the set of matched input ids or names. These values can be a concatenated list of input ids that match, or RaMP synonyms. Perhaps a pipe delimiter? Format TBD.

Install failed

I successfully loaded the mySQL database, but am getting an error when I try to download the R package from this website:

`> library(devtools)

install_github("ncats/RAMP-DB")
Downloading GitHub repo ncats/RAMP-DB@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet, :
download from 'https://api.github.com/repos/ncats/RAMP-DB/tarball/HEAD' failed`

I haven't had problems using the install_github() function with other packages. Would appreciate any advice. Thanks!

Option to supply a 'population' of analytes.

Fisher exact scores build a 2x2 contingency matrix that tally the items (analytes) being tested for associations. If the analytes are genes, for instance, the population of genes of interest isn't the full genome, but rather the list of genes that were quantified from which the set of interest was selected (by stats or clustering, etc.).

Cannot install RaMP R package on NIH laptop: PCRE compilation error

I am running RStudio Version 1.3.959 on an NIH laptop with Windows 10 Enterprise. I have saved the RaMP database file, installed MySQL version 5.7, and set up the RaMP database in MySQL as specified in the instructions. I have also installed the devtools package in R. However, when I try to install the RaMP R package, I get a PCRE compilation error as shown below:

library(devtools)
Loading required package: usethis
install_github("ncats/RAMP-DB")
Downloading GitHub repo ncats/RAMP-DB@master
√ checking for file 'C:\Users\eichertd\AppData\Local\Temp\RtmpemI9FW\remotes297c1e725c1b\ncats-RaMP-DB-1e7a7d2/DESCRIPTION' (365ms)
-- preparing 'RaMP': (339ms)
√ checking DESCRIPTION meta-information ...
Warning in grepl(e, files, perl = TRUE, ignore.case = TRUE) :
PCRE pattern compilation error
'unrecognized character follows '
at 'img/.$'
Error in grepl(e, files, perl = TRUE, ignore.case = TRUE) :
invalid regular expression '^\img/.$'
Execution halted
Error: Failed to install 'RaMP' from GitHub:
System command 'Rcmd.exe' failed, exit status: 1, stdout + stderr:
E> * checking for file 'C:\Users\eichertd\AppData\Local\Temp\RtmpemI9FW\remotes297c1e725c1b\ncats-RaMP-DB-1e7a7d2/DESCRIPTION' ... OK
E> * preparing 'RaMP':
E> * checking DESCRIPTION meta-information ... OK
E> Warning in grepl(e, files, perl = TRUE, ignore.case = TRUE) :
E> PCRE pattern compilation error
E> 'unrecognized character follows '
E> at 'img/.$'
E> Error in grepl(e, files, perl = TRUE, ignore.case = TRUE) :
E> invalid regular expression '^\img/.$'
E> Execution halted

Biological Pathway Enrichment Issues with Web GUI

Hi,
I am analyzing around 115 HMDB numbers from metabolites identified in blood samples and was trying to perform biological pathway enrichment. However, the biological pathway enrichment only completes when no sample type is selected and no p-values are ever calculated, even with a subset of data. Additionally, I cannot download the output for the full dataset, only a subset. I am currently attempting to switch to running this analysis in R on my local device. Will that help? Any other suggestions?
Thanks for your help!

SQL Call issue only_full_group_by

Running into a MySQL issue when calling any RaMP functions that connect with the ramp MySQL database:

> fisher.results <- runCombinedFisherTest(analytes = c("hmdb:HMDB0000033","hmdb:HMDB0000052"))
[1] "Running Fisher's tests on metabolites"
[1] "Fisher Testing ......"
[1] "Starting getPathwayFromAnalyte()"
[1] "Working on ID List..."
Error: Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'ramp2.p.pathwayName' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by [1055]

The issue is due to the "ONLY_FULL_GROUP_BY" which is introduced in some MySQL versions.

Issue with RaMP when performing Pathway Enrichment

I am trying to return pathways from given analytes by using a batch query for multiple analytes on the server hosted by NCATS at https://rampdb.ncats.io. When I paste the list of analytes and hit submit query everything appears to be fine. When I hit run pathway analyses the following error in the attached screenshot occurs. I have also provided an Excel file with the list of analytes (one on each row). Any help would be greatly appreciated.

Analytes_for_GitHub_Issue.xlsx

Source DB Table, change commonName Field Type to prevent truncation

We should extend source::commonName from 30 to 64 characters to prevent name truncation. 82,784 records of ~221K records have truncated commonName fields.

Biological pathway enrichment only include metabolites

I'm using the web tool for biological pathway enrichment analysis using inputs from genes, proteins, metabolites and lipids. However, the analytes included in the enriched pathways only include metabolites and lipids.
Is there a bug in the code that only consider metabolites and lipids for the analysis ?

warnings in findCluster under R 4.2

Hi. Running the vignette, findCluster gives warnings related to the && in line 107 of the function definition under R 4.2.0. This will become an error at some point in the future. See "Changes to logical functions" at https://www.r-bloggers.com/2022/04/new-features-in-r-4-2-0/.

It's great to be able to access all of these resources in one place with a clean, programmable interface. Looking forward to using it.

Issue with Data Tables Implementation on web based server.

Kyle 8:34 PM
When using the RaMP tool online I keep getting this error. Does anyone know what it means?
"DataTables warning: table id = DataTables_Table_2 - Requested unknown parameter for '1' for row 44, column 1."

Garrett 11:22 AM
DataTables is an html/javascript package for displaying tables. The R package is convenient wrapper, but all the logic/display is in javascript.
11:23
That error message (https://datatables.net/manual/tech-notes/4) seems to be coming from the javascript. Looks like shiny is producing a table with a missing element somewhere.
11:23
I'd check to make sure row 44 column 1 of the requested table doesn't have an NA or something weird in it

Kyle 11:35 AM
I have confirmed the input file looks fine

Question about RaMP database installing

I am trying to install RaMP database according to the instruction. I am not familiar with mysql language, but it looks like it does not work.
Could you please help me out? Below is my screenshot in R. Many thanks for your help.

mysql -u root -p ramp < myramp.sql
Error: unexpected symbol in "mysql -u root"

Thanks,
Yuanyuan

Proposed Table Enhancements for RaMP

1.) Express stat p-values in scientific notation with a limit on the number of significant figures shown to 3 or 4.
2.) Consider moving the RaMP pathway name to be the second column next to Ramp Pathway ID so that viewers hit that first as they look left to right.
3.) Double check initial sorting behavior. Do we want to focus on results with multi-omics support, sorted by adjusted p-values?

Pathway Annotation Survey

Supply a collection of analytes representing your identified set (not set of interest) and receive a report indicating RaMP pathway coverage. So if there's a pathway of interest, this would tell you if you have good representation in the assay relative to the total number of analytes possible for a pathway.

Are there plans to incorporate Metacyc

Hi RaMP DB team,

Are there any future plans to incorporate data from Metacyc to the current RaMP DB? If not, have you attempted to quantify how much information we would lose by depending solely on RaMP?

Thanks and this is an awesome resource

issue with load MySQL dump file

I’m trying to load the MySQL dump file downloaded here: https://figshare.com/ndownloader/files/34941486.

I got the following error:
$ mysql -u **** -p**** -h **** PubChem < ramp_2.0.7_20220428.sql
ERROR 1146 (42S02) at line 30: Table 'PubChem.analyte' doesn't exist

I have already created the PubChem db. It looks that there is no schema defined in the above sql file, i.e. no CREATE TABLE statements. So mysql has no idea where to insert the data. Could you please check?

Error with call to RaMP() on Mac

The version of RaMP on the sqlite branch was installed. The below traceback occurred with github actions when running R CMD check on the RcometsAnalytics R package. The error only occurs on a Mac.

*** caught segfault ***
address 0x0, cause 'unknown'

Traceback:
1: getLoadedDLLs()
2: get_lib_path()
3: extension_load(db@ptr, get_lib_path(), paste0("sqlite3_", extension, "_init"))
4: initExtension(conn)
5: .local(drv, ...)
6: dbConnect(SQLite(), dbname = dbfile, cache_size = 64000L, synchronous = "off", flags = SQLITE_RO, vfs = "unix-none")
7: dbConnect(SQLite(), dbname = dbfile, cache_size = 64000L, synchronous = "off", flags = SQLITE_RO, vfs = "unix-none")
8: .sql_connect_RO(.sql_dbfile(bfc))
9: tryCatchList(expr, classes, parentenv, handlers)
10: tryCatch({ info <- .sql_connect_RO(.sql_dbfile(bfc)) con <- info$con src <- src_dbi(con) tbl <- tbl(src, "metadata") %>% collect(n = Inf)}, finally = { .sql_disconnect(info)})
11: .sql_schema_version(bfc)
12: .sql_validate_version(bfc)
13: .sql_create_db(bfc)
14: BiocFileCache(cache = getBFCOption("CACHE"), ask = FALSE)
15: listRaMPVersions(local = TRUE)
16: RaMP()

Calculate Effect Size from Chemical Class Enrichment

This is not necessarily an issue, but how would someone calculate effect size from chemical class enrichment results? I've looked at a few methods for fisher exact test such as Cramer's V or odds ratio, but it's not clear to me how to apply the output from chemical class enrichment to these methods. Perhaps you have a better suggestion for calculating an effect size?

Thank you for your help!

RAMP not recognizing known HMDBs

Hello,

I recently input a list of HMDBs into both "Chemical Classes from Metabolites" and "Biological Pathway Enrichment" into the GUI version and was surprised to see how many metabolites were not recognized, even though they seem to be present in HMDB 5.0, dating back to before the date you have listed on the Source Data section. Any help/explanation you can provide for this?
Examples: "There were no matches for the following pathways: hmdb:hmdb00532, hmdb:hmdb00821, hmdb:hmdb32055, hmdb:hmdb62551, hmdb:hmdb61115, hmdb:hmdb04983"
Thanks!