biothings / myvariant.info Goto Github PK
View Code? Open in Web Editor NEWMyVariant.info: A BioThings API for human variant annotations
Home Page: http://myvariant.info
License: Other
MyVariant.info: A BioThings API for human variant annotations
Home Page: http://myvariant.info
License: Other
http://www.hgmd.cf.ac.uk/ac/index.php
Seems to be very frequently used by a lot of labs working on variant annotation pipelines.
https://goo.gl/ will discontinue on March 30, 2019.
provide annotation based on ACMG guidlines
ACMG guidelines is widely used to interpret variants.
We could provide variant classification results based on ACMG guidelines.
https://www.acmg.net/docs/standards_guidelines_for_the_interpretation_of_sequence_variants.pdf
When installing myvariant and testing, it asks for a BioThings config file. Which file should we use or how should we configure it? Thanks.
Output file: list of all _id in each myvariant's assembly. Feature was deactivated in 2cf6144 after switching to cold/hot collection design.
With cold/hot, since we never have the full merged collection in mongo, the only way to generate such list in an efficient manner is to use cache file, cold and hot ones, then sort/uniq them (as some hot _ids are already in cold) to create the output file.
Note: this file is used by clingen team to generate CAID for myvariant
CIViC is loaded through API query. Should trigger it every month.
switch to right-handed helix
dbNSFP parser needs to be updated for version 4.0b1a
Details could be found at: https://sites.google.com/site/jpopgen/dbNSFP
Thank you for this amazing resource!
We are in the process of adding selected relevant information from myvariant.info to CIViC (civicdb.org).
While considering options, we hoped to add cosmic mutation frequency. But the mutation frequency available appears to be the frequency for a single tumor site? And this is chosen arbitrarily from several possibilities perhaps?
Consider this example (which seem representative of other variants in myvariant.info):
http://myvariant.info/v1/variant/chr7:g.140453136A%3ET
This is BRAF V600E.
http://cancer.sanger.ac.uk/cosmic/mutation/overview?id=476
The mutation frequency information returned for COSMIC is:
"mut_freq": 2.83,
"tumor_site": "biliary_tract"
This seems odd. How is this being determined? Would it be possible to determine overall mutation frequency across all tumor_sites, and then for each tumor_site and perhaps return the top site(s) and their frequencies?
Our relevant CIViC github issues are:
griffithlab/civic-server#243
griffithlab/civic-server#38
For now we will move on without using the COSMIC info but it would be great to have more options to select from here.
Hi,
We are wondering how the variant normalization is done in myvariant.info? When you import the variants from each database, do you do any sort of internal variant normalization or just take the chr,pos,ref,alt directly from the source?
Thanks
Currently, we use "hg19" and "hg38" reference genomes (from UCSC) to produce snpeff annotations. The result misses "gene_id" field (the value is the same as "gene_name"). We can switch to use GRCh37 and GRCh38 reference genomes available here:
https://sourceforge.net/projects/snpeff/files/databases/v4_3/
Also we could upgrade the snpeff version we used too.
minor note -- I just noticed that usage stats on the front page of mygene.info are updating, but not so for myvariant.info. (last stamp is 2016-11-15...)
... it should send the notification only when both premerge and hot collections are indexed
The format for the field ann, nested in snpeff, is a list in variants like in chr1:g.35367G>A, and an object in variants like chr7:g.140453136A>T. While trying to parse the output, this complicates the mapping of the key and values. Was this intended?
Thanks!
http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000113368&facets=cadd.polyphen.cat&size=0
gives
{
"success": false,
"error": "Could not execute query due to the following exception(s): ['illegal_argument_exception Fielddata is disabled on text fields by default. Set fielddata=true on [cadd.polyphen.cat] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.']"
}
Need to update CADD mapping (and rebuilt pre-merge/cold collection)
We now have clingen CA id loaded for hg38 index, we should have this clingen.caid field indexed (as "string_lowercase").
Example:
CHROM: 22
POS: 18898839
REF: A
ALT: NONE
This example comes from dbSNP v151
Those notes can be added here:
It will then rendered in the "Notes" column of the available-fields table in the docs:
http://docs.myvariant.info/en/latest/doc/data.html#available-fields
Hi!
We've noticed that there seem to be some inconsistencies with the COSMIC data being returned in the variant annotation service.
Here's an example query:
GET myvariant.info/v1/variant/chr4:g.55141036T>C?fields=cosmic,mutdb
And the response:
{
"_id": "chr4:g.55141036T>C",
"_version": 2,
"cosmic": {
"alt": "C",
"chrom": "4",
"cosmic_id": "COSM1430077",
"hg19": {
"end": 55141036,
"start": 55141036
},
"mut_freq": 0.14,
"mut_nt": "T>C",
"ref": "T",
"tumor_site": "large_intestine"
},
"mutdb": {
"alt": "C",
"chrom": "4",
"cosmic_id": "85787",
"hg19": {
"end": 55141036,
"start": 55141036
},
"mutpred_score": -1,
"ref": "T",
"rsid": null,
"strand": "p"
}
}
The cosmic id returned in the cosmic
top level key (body['cosmic']['cosmid_id']
doesn't match the cosmic id returned in the mutdb
top level key (body['mutdb']['cosmic_id']
). Additionally, the cosmic id returned in the cosmic
section isn't a valid cosmic id at all, while the one in the mutdb
section appears to be the correct one for the variant in question.
I assume this is likely to come from discrepancies in the underlying data sources, but it was a little surprising to find a non-existent cosmic id in the cosmic
section.
In the "Query Examples" of the myvariant.info home page, we currently show http://myvariant.info/v1/variant/chr1:g.35367G>A for annotation retrieval. But, that specific variant has a rather limited set of annotation sources. I'd suggest choosing another variant that better highlights as many of the annotation resources as possible.
MyVariant.info release notes are here:
http://docs.myvariant.info/en/latest/doc/release_changes.html
It would be handy to add the anchor (for the direct URL) to each release, something like this:
http://docs.myvariant.info/en/latest/doc/release_changes.html#release-20190226
and even deeper into each of hg19 and hg38 release notes:
http://docs.myvariant.info/en/latest/doc/release_changes.html#release-20190226-hg19
http://docs.myvariant.info/en/latest/doc/release_changes.html#release-20190226-hg38
When the hash exists, it should expand the specific release note content.
The rendering of the "anchor" can be made the same as the other anchors on this page, e.g. this one:
http://docs.myvariant.info/en/latest/doc/release_changes.html#myvariant-releases
(the anchor icon will show up when mouse-over)
The same changes can be applied to docs.mygene.info and docs.mychem.info as well.
Hi,
Great work with variant info project.
Infact I was part of the hackathon where you guys came up with this.
I am wondering how stable is this now and what are your future plans.
Any plans integrating with mygene.info or making more stable service on its own?
Thanks,
Nikhil
also applies to mygene and mychem...
VEP: http://uswest.ensembl.org/info/docs/tools/vep/index.html
Similar to the SnpEff annotation we have already, VEP is a tool to compute variant impact.
Use case: try to normalize vcf before using the get_pos_start_end function.
Problem:
In the case of deletion: REF -> TTTCTTTTTCTTTTTCTTTTTCTTTCTT, ALT -> TG
_normalize_vcf would trim the first T from both REF and ALT
However, get_pos_start_end asserts the first nucleotide in both REF and ALT is the same
see: https://github.com/biothings/myvariant.info/blob/master/src/utils/hgvs.py#L150
These two functions could not be used together to handle deletion cases.
The mapping file for ExAC contains a small problem. The ac_hom field should be put in 'ac' rather than 'hom'.
Potential solutions:
change the mapping
add an additional field called 'ac_hom' under 'hom'
Currently, the newest release of dbSNP is v152. Our latest version in MyVariant.info is v151.
We download from: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/
The last update time for the file is: 4/22/2018 (v151)
v152 is stored in: ftp://ftp.ncbi.nih.gov/snp/latest_release/VCF
Also, from v152, dbSNP provides the JSON version of the data dump:
ftp://ftp.ncbi.nih.gov/snp/latest_release/JSON
Related post regarding the change from dbSNP: https://ncbiinsights.ncbi.nlm.nih.gov/2017/07/07/dbsnp-redesign-supports-future-data-expansion/
Hi,
One task I'd like to run with myvariant.info is to return all variants in a gene. For example TP53, so I tried
http://myvariant.info/v1/query?q=TP53&fields=_id
which returns with count of 5918.
I also tried query with ensembleID
http://myvariant.info/v1/query?q=ENSG00000141510&fields=_id
which returns nothing.
Then I tried
http://myvariant.info/v1/query?q=dbnsfp.ensembl.geneid:ENSG00000141510&fields=_id
http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000141510&fields=_id
which returns 3318 and 4539.
So the question I have is when I just search for TP53, which fields are searched exactly. It seems the default query in elasticseach is search _all fields? and why I can't get any results back with just ensembleID? Is range query a better way to get all variants related to a gene? Or what is the best way to do this task with myvariant.info api?
Thank you very much
Matt Wright and Jimmy Zhen from the ClinGen team seemed interested in this idea at the CIViC hackathon. Need to reach out to them for more info on logistics...
Suggested by Beth Pitel (I think) at the CIViC Hackathon...
I think the data is in this file http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/
Just noticed that the data from http://myvariant.info/v1/variant/chr12:g.1299226A%3EG?fields=wellderly differs from what's available from https://genomics.scripps.edu/browser/#. The allele/genotype frequencies are different, and they also separate out the illumina data from complete genomics.
Looks like the VCFs are here: https://genomics.scripps.edu/browser/files/wellderly/vcf/
In the recent release, I noticed that there're some handy new features, including the always_list
and allow_null
option. But when they are used in combination, the result is probably not in the nicest format. Instead of returning an empty list []
when there's no data, it returns a list of a null object like so: [null]
.
It will cause some confusion for the client side, since usually you would check if the returned list is empty, as opposed to checking each element in the list if they are empty.
A sample request to reproduce this error would be:
https://myvariant.info/v1/query?q=rs12131234&fields=dbsnp&always_list=dbsnp.gene&allow_null=dbsnp.gene
I'm wondering if it's possible to change this behaviour? Thanks.
sebastienlelong [2:51 PM]
@ChunleiWu also I see a lot of CHEBI:None in chembl: http://mychem.info/v1/drug/GWNBDVRVUYBAGA-UHFFFAOYSA-N?fields=chembl.chebi_par_id
User report variant "rs281865162" is missing in MyVariant.info. The problem comes from the dbSNP parser:
In https://github.com/biothings/myvariant.info/blob/master/src/hub/dataload/sources/dbsnp/dbsnp_vcf_parser.py#L60, we specifically remove all variants which are not single nucleotide deletion. Not sure if it is on purpose.
Need @newgene to confirm.
RCV000008604, RCV000008605, RCV000008606 and RCV000008607 share one variant (ClinVar variation 8131, also called measureSet id and variant id in their xml file). The API works for RCV000008604 only, but not for any others. Input data as mv.querymany(['RCV000008604'], scopes='clinvar.rcv_accession', fields='clinvar.clinvar_id')
https://phewascatalog.org/phewas
I think that should contain all the data in supp tables 3 and 8 in https://www.nature.com/articles/nbt.2749, but would be good to double check.
also, this is from an older 2013 paper. After this is loaded, would be good to check with the Vanderbilt team (eg Lisa Bastarache) whether there are any other relevant large-scale data available...
I would like to query the following variants using POST (i.e. on http://myvariant.info/v1/query
):
q="chr1:54844G>A,chr1:61987A>G,chr1:61989G>C,chr1:86018C>G,chr1:86303G>T"
I've tried the above paramaters, but it returns the follows:
[
{
"query": "chr1:54844G>A",
"notfound": true
}
]
I understand that I also need to input a scope in order to make it work but I'm not sure what the scope should be in this case...
Thanks
Ismail
The clinvar_xml_parser.py data loader is referencing a clinvar or clinvar1 import that is not listed in the requirements:
https://github.com/SuLab/myvariant.info/blob/master/src/dataload/contrib/clinvar/clinvar_xml_parser.py#L5
It's changed from clinvar to clinvar1 - is the clinvar library that does the parseString() call available from you or is it a separate 3rd party lib to be installed?
https://github.com/SuLab/myvariant.info/blob/master/src/dataload/contrib/clinvar/clinvar_xml_parser.py#L315
record_parsed = clinvar1.parseString(record, silence=1)
http://myvariant.info/v1/metadata
Looks like the version number and license info is missing for gnomAD.
From Alex Wagner, this link https://s3-us-west-2.amazonaws.com/g2p-0.10/index.html has the current release of the VICC-harmonized data (described in https://www.biorxiv.org/content/early/2018/07/11/366856). It is subject to change as that manuscript goes through peer review. But once that's done and the data set is finalized, seems like a good source to import. (obviously we already have civic data directly, but this resource will provide access to several other sources as well in a standardized format.)
cc @ahwagner
Wrap this function https://github.com/biothings/myvariant.info/blob/master/src/utils/hgvs.py#L88 as a new "compute" edge (compute a result from input data instead of lookin up data from mongodb)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.