evolgeniusteam / gmrepoprogrammableaccess Goto Github PK
View Code? Open in Web Editor NEWprogrammable access to GM repo
License: GNU General Public License v3.0
programmable access to GM repo
License: GNU General Public License v3.0
Hi I follow the API docs of Get relative species/genus abundances for a sample/run
but only retrieved run information.
query = {"run_id":"ERR475468"}
url = 'https://gmrepo.humangut.info/api/getRunDetailsByRunID'
data = requests.post(url, data=json.dumps(query)).json()
## --get run List
run = data.get("run")
## --get DataFrames
species = pd.DataFrame(data.get("species"))
genus = pd.DataFrame(data.get("genus"))
{'run': {'project_id': 'PRJEB6070',
'original_sample_description': 'Potential of fecal microbiota for early stage detection of colorectal cancer',
'run_id': 'ERR475468',
'experiment_type': 'Amplicon',
'instrument_model': 'Illumina',
'nr_reads_sequenced': None,
'host_age': 74,
'sex': None,
'BMI': 27,
'country': 'France',
'longitude': None,
'latitude': None,
'loaded_uid': 54204,
'QCStatus': 0,
'QCMessage': 'a single taxon unknown account for 100 percent of abundance, which is too much!!',
'Original_Project_description': 'Several bacterial species have been implicated in the development of colorectal carcinoma (CRC), but CRC-associated changes of fecal microbiota and their potential for cancer screening remain to be explored. Here we used metagenomic sequencing of fecal samples to identify taxonomic markers that distinguished CRC patients from tumor-free controls in a study population of 156 participants. Accuracy of metagenomic CRC detection was similar to the standard fecal occult blood test (FOBT) and when both approaches were combined, sensitivity improved >45% relative to the FOBT while maintaining its specificity. Accuracy of metagenomic CRC detection did not differ significantly between early and late-stage cancer and could be validated in independent patient and control populations (N=335) from different countries. CRC-associated changes in the fecal microbiome at least partially reflected microbial community composition at the tumor itself, indicating that observed gene pool differences may reveal tumor-related host-microbe interactions. Indeed, we deduced a metabolic shift from fiber degradation in controls to utilization of host carbohydrates and amino acids in CRC patients accompanied by an increase of lipopolysaccharide metabolism. '},
'phenotypes': [{'disease': 'D006262', 'term': 'Health'}],
'phenotypes_exist': True}
Could you please help me retrieve full info?
Retrieving an abundance profile using the API and the website return different abundance profiles.
When using getRunDetailsByRunID
as described in the documentation, the abundance profile for sample ERR475468 is as follows:
scientific_name | relative_abundance |
---|---|
Others | 33.1793295 |
Unknown | 30.255 |
Ruminococcus bromii | 11.7594 |
Faecalibacterium sp. MC_41 | 5.4017 |
Bacteroides vulgatus | 3.96235 |
[Eubacterium] eligens | 3.09785 |
Bacteroides uniformis | 2.79168 |
Escherichia coli | 2.6621 |
Sphingomonas sanguinis | 2.39532 |
Dialister invisus | 2.33688 |
Sphingomonas paucimobilis | 2.15839 |
However, when downloading the relative species abundance table as a TSV from the website, the abundance profile for sample ERR475468 is as follows:
relative_abundance | scientific_name |
---|---|
30.255 | Unknown |
11.7594 | Ruminococcus bromii |
5.4017 | Faecalibacterium sp. MC_41 |
3.96235 | Bacteroides vulgatus |
3.09785 | [Eubacterium] eligens |
2.79168 | Bacteroides uniformis |
2.6621 | Escherichia coli |
2.39532 | Sphingomonas sanguinis |
2.33688 | Dialister invisus |
2.15839 | Sphingomonas paucimobilis |
2.04215 | Oscillibacter valericigenes |
1.88589 | Blautia obeum |
1.88018 | Streptococcus salivarius |
1.09825 | Methanobrevibacter smithii |
0.971213 | Streptococcus mutans |
etc | etc |
When downloading data from the website, the taxonomic breakdown of the "Others" group is reported. I am interested in 100s of samples and don't want to download their profiles manually.
How can I retrieve the full taxonomic profile programatically?
Hi, I get an HTTP 500 error when getting data through Python API for certain projects. The following code is an example to reproduce this error.
import json
import requests
mesh_id = "D008103"
project_id = "PRJNA431746"
query = {"mesh_id": mesh_id, "project_id": project_id}
# Query data
url = 'https://gmrepo.humangut.info/api/getMicrobeAbundancesByPhenotypeMeshIDAndProjectID'
post_result = requests.post(url, data=json.dumps(query))
print(post_result)
# <Response [500]>
print(post_result.text == "")
# True
Hi, I have downloaded the microbial abundances and phenotype information. But I can't find a data dictionary or a way to map the run/sample ids to the microbiome+phenotype data (e.g., I want to be able to map the microbial abundances to the phenotypes of each run/sample so I can have labeled data for testing a machine learning classifier). How can I do that by using the programmable access tool?
Is there a way to get a list of curated projects through the API?
Could I please clarify one technical question. It is stated in the article (https://academic.oup.com/nar/article/50/D1/D777/6426060#authorNotesSectionTitle) that mapping to the GreenGenes database was used for amplicon data, while I see the NСBI taxonomy in the database itself. Could you please tell how the transition from the GreenGenes taxonomy to NСBI was made?
Thank you in advance!
Hi there,
When using "download data as TSV" functionality in the main GMrepo webpage, I get 13 columns in the .txt
file when the values have 12 columns. I believe "Disease name" should be dropped but would be great to confirm with the team.
The link for GMRepo (https://gmrepo.humangut.info/) produces an error that the "Sign can't be reached."
If anyone knows how to solve this issue that would be highly appreciated. Thank you
Hi, when I use api countAssociatedRunsByPhenotypeMeshID to get total number of runs, for example, D003424, I get 2671 runs. But when I use api getAssociatedRunsByPhenotypeMeshIDLimit to retrieve the runs with 1000 limit each time, I can only get 2000 runs. This is very weird, please help me out, thanks!
For reproducibility, the python script is attached.
gmrepo_age_distribution.zip
Hello, and thanks for your repository.
I have been looking at the data, and I see that all abundances accessible are relative abundance. I haven't found any way to download counts data. Is count data available??
Also, as I am interested in count abundances, looking for a way to calculate it from relative abundance, I found a "nr_reads_sequenced," and I wonder what that number represents. For example, when it is metagenomic data, nr_reads_sequenced means all the oligonucleotides sequenced or bins?
Hi,
Is there is some maintenance task of the GMrepo? I'm trying to enter from all the day from different browsers but in all the cases, the session is time out and does not appear the GMrepo main page. Also I cannot access to specific projects...
Someone else have the same problem?
I've googled if exists some incidence but I did not find any issue and therefore I write here. Apologise.
Thanks,
Hi there,
This is an extension to #8, where getCuratedProjectsList
method was added to the API.
Here's the code I used to fetch the list of curated project IDs.
def get_curated_project_ids():
query = {}
url = 'https://gmrepo.humangut.info/api/getCuratedProjectsList'
content = requests.post(url, data=json.dumps(query))
project_id_set = set([x["project_id"] for x in content.json()])
return project_id_set
Upon running this code, I manually verified if the curated project IDs are included in the output. For example, PRJEB1775 is a project involving metagenomics samples with diarrhea. However,
pid_set = get_cureated_project_ids()
"PRJEB1775" in pid_set
# False
Is it possible that getCuratedProjectsList
returns an incomplete list of project IDs?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.