Giter VIP home page Giter VIP logo

myvariant.info's Issues

Which config file?

When installing myvariant and testing, it asks for a BioThings config file. Which file should we use or how should we configure it? Thanks.

Generate and store list of _id in s3

Output file: list of all _id in each myvariant's assembly. Feature was deactivated in 2cf6144 after switching to cold/hot collection design.

With cold/hot, since we never have the full merged collection in mongo, the only way to generate such list in an efficient manner is to use cache file, cold and hot ones, then sort/uniq them (as some hot _ids are already in cold) to create the output file.

Note: this file is used by clingen team to generate CAID for myvariant

CIViC auto upload

CIViC is loaded through API query. Should trigger it every month.

Cosmic mutation frequency information seems limited/arbitrary

Thank you for this amazing resource!

We are in the process of adding selected relevant information from myvariant.info to CIViC (civicdb.org).

While considering options, we hoped to add cosmic mutation frequency. But the mutation frequency available appears to be the frequency for a single tumor site? And this is chosen arbitrarily from several possibilities perhaps?

Consider this example (which seem representative of other variants in myvariant.info):
http://myvariant.info/v1/variant/chr7:g.140453136A%3ET

This is BRAF V600E.
http://cancer.sanger.ac.uk/cosmic/mutation/overview?id=476

The mutation frequency information returned for COSMIC is:
"mut_freq": 2.83,
"tumor_site": "biliary_tract"

See attached:
myvariant example

This seems odd. How is this being determined? Would it be possible to determine overall mutation frequency across all tumor_sites, and then for each tumor_site and perhaps return the top site(s) and their frequencies?

Our relevant CIViC github issues are:
griffithlab/civic-server#243
griffithlab/civic-server#38

For now we will move on without using the COSMIC info but it would be great to have more options to select from here.

variant normalization

Hi,
We are wondering how the variant normalization is done in myvariant.info? When you import the variants from each database, do you do any sort of internal variant normalization or just take the chr,pos,ref,alt directly from the source?

Thanks

usage stats on front page not updating

minor note -- I just noticed that usage stats on the front page of mygene.info are updating, but not so for myvariant.info. (last stamp is 2016-11-15...)

snpeff ann field is sometimes a list, sometimes an object

The format for the field ann, nested in snpeff, is a list in variants like in chr1:g.35367G>A, and an object in variants like chr7:g.140453136A>T. While trying to parse the output, this complicates the mapping of the key and values. Was this intended?

Thanks!

facet query on cadd fails (unittest MyVariantTest.test_query_facets)

http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000113368&facets=cadd.polyphen.cat&size=0

gives

{
"success": false,
"error": "Could not execute query due to the following exception(s): ['illegal_argument_exception Fielddata is disabled on text fields by default. Set fielddata=true on [cadd.polyphen.cat] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.']"
}

Need to update CADD mapping (and rebuilt pre-merge/cold collection)

clingen.caid should be indexed

We now have clingen CA id loaded for hg38 index, we should have this clingen.caid field indexed (as "string_lowercase").

Discrepancies in returned COSMIC ids

Hi!

We've noticed that there seem to be some inconsistencies with the COSMIC data being returned in the variant annotation service.

Here's an example query:

GET myvariant.info/v1/variant/chr4:g.55141036T>C?fields=cosmic,mutdb

And the response:

{
    "_id": "chr4:g.55141036T>C",
    "_version": 2,
    "cosmic": {
        "alt": "C",
        "chrom": "4",
        "cosmic_id": "COSM1430077",
        "hg19": {
            "end": 55141036,
            "start": 55141036
        },
        "mut_freq": 0.14,
        "mut_nt": "T>C",
        "ref": "T",
        "tumor_site": "large_intestine"
    },
    "mutdb": {
        "alt": "C",
        "chrom": "4",
        "cosmic_id": "85787",
        "hg19": {
            "end": 55141036,
            "start": 55141036
        },
        "mutpred_score": -1,
        "ref": "T",
        "rsid": null,
        "strand": "p"
    }
}

The cosmic id returned in the cosmic top level key (body['cosmic']['cosmid_id'] doesn't match the cosmic id returned in the mutdb top level key (body['mutdb']['cosmic_id']). Additionally, the cosmic id returned in the cosmic section isn't a valid cosmic id at all, while the one in the mutdb section appears to be the correct one for the variant in question.

I assume this is likely to come from discrepancies in the underlying data sources, but it was a little surprising to find a non-existent cosmic id in the cosmic section.

change example query

In the "Query Examples" of the myvariant.info home page, we currently show http://myvariant.info/v1/variant/chr1:g.35367G>A for annotation retrieval. But, that specific variant has a rather limited set of annotation sources. I'd suggest choosing another variant that better highlights as many of the annotation resources as possible.

MyVariant.info release notes should have anchors for each release

MyVariant.info release notes are here:

http://docs.myvariant.info/en/latest/doc/release_changes.html

It would be handy to add the anchor (for the direct URL) to each release, something like this:

http://docs.myvariant.info/en/latest/doc/release_changes.html#release-20190226

and even deeper into each of hg19 and hg38 release notes:

http://docs.myvariant.info/en/latest/doc/release_changes.html#release-20190226-hg19
http://docs.myvariant.info/en/latest/doc/release_changes.html#release-20190226-hg38

When the hash exists, it should expand the specific release note content.

The rendering of the "anchor" can be made the same as the other anchors on this page, e.g. this one:

http://docs.myvariant.info/en/latest/doc/release_changes.html#myvariant-releases

(the anchor icon will show up when mouse-over)

The same changes can be applied to docs.mygene.info and docs.mychem.info as well.

Production stability

Hi,

Great work with variant info project.
Infact I was part of the hackathon where you guys came up with this.

I am wondering how stable is this now and what are your future plans.
Any plans integrating with mygene.info or making more stable service on its own?

Thanks,
Nikhil

The logic of get_pos_start_end and _normalize_vcf is conflicting

Use case: try to normalize vcf before using the get_pos_start_end function.

Problem:
In the case of deletion: REF -> TTTCTTTTTCTTTTTCTTTTTCTTTCTT, ALT -> TG
_normalize_vcf would trim the first T from both REF and ALT

However, get_pos_start_end asserts the first nucleotide in both REF and ALT is the same
see: https://github.com/biothings/myvariant.info/blob/master/src/utils/hgvs.py#L150

These two functions could not be used together to handle deletion cases.

ExAC mapping

The mapping file for ExAC contains a small problem. The ac_hom field should be put in 'ac' rather than 'hom'.
Potential solutions:

  • change the mapping

  • add an additional field called 'ac_hom' under 'hom'

dbSNP download site change (Maybe?)

Currently, the newest release of dbSNP is v152. Our latest version in MyVariant.info is v151.

We download from: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/
The last update time for the file is: 4/22/2018 (v151)

v152 is stored in: ftp://ftp.ncbi.nih.gov/snp/latest_release/VCF

Also, from v152, dbSNP provides the JSON version of the data dump:
ftp://ftp.ncbi.nih.gov/snp/latest_release/JSON

Related post regarding the change from dbSNP: https://ncbiinsights.ncbi.nlm.nih.gov/2017/07/07/dbsnp-redesign-supports-future-data-expansion/

query variants with genename

Hi,
One task I'd like to run with myvariant.info is to return all variants in a gene. For example TP53, so I tried
http://myvariant.info/v1/query?q=TP53&fields=_id
which returns with count of 5918.
I also tried query with ensembleID
http://myvariant.info/v1/query?q=ENSG00000141510&fields=_id
which returns nothing.
Then I tried
http://myvariant.info/v1/query?q=dbnsfp.ensembl.geneid:ENSG00000141510&fields=_id
http://myvariant.info/v1/query?q=cadd.gene.gene_id:ENSG00000141510&fields=_id
which returns 3318 and 4539.

So the question I have is when I just search for TP53, which fields are searched exactly. It seems the default query in elasticseach is search _all fields? and why I can't get any results back with just ensembleID? Is range query a better way to get all variants related to a gene? Or what is the best way to do this task with myvariant.info api?

Thank you very much

load data from ClinGen VCI database

Matt Wright and Jimmy Zhen from the ClinGen team seemed interested in this idea at the CIViC hackathon. Need to reach out to them for more info on logistics...

Better format when using both always_list and allow_null options?

In the recent release, I noticed that there're some handy new features, including the always_list and allow_null option. But when they are used in combination, the result is probably not in the nicest format. Instead of returning an empty list [] when there's no data, it returns a list of a null object like so: [null].

It will cause some confusion for the client side, since usually you would check if the returned list is empty, as opposed to checking each element in the list if they are empty.

A sample request to reproduce this error would be:

https://myvariant.info/v1/query?q=rs12131234&fields=dbsnp&always_list=dbsnp.gene&allow_null=dbsnp.gene

I'm wondering if it's possible to change this behaviour? Thanks.

live query API does not work for some ClinVar RCVs

RCV000008604, RCV000008605, RCV000008606 and RCV000008607 share one variant (ClinVar variation 8131, also called measureSet id and variant id in their xml file). The API works for RCV000008604 only, but not for any others. Input data as mv.querymany(['RCV000008604'], scopes='clinvar.rcv_accession', fields='clinvar.clinvar_id')

how to query a position with a POST

I would like to query the following variants using POST (i.e. on http://myvariant.info/v1/query):

q="chr1:54844G>A,chr1:61987A>G,chr1:61989G>C,chr1:86018C>G,chr1:86303G>T"

I've tried the above paramaters, but it returns the follows:

[
  {
    "query": "chr1:54844G>A",
    "notfound": true
  }
]

I understand that I also need to input a scope in order to make it work but I'm not sure what the scope should be in this case...

Thanks
Ismail

Unable to run clinvar_xml_parser dataloader

The clinvar_xml_parser.py data loader is referencing a clinvar or clinvar1 import that is not listed in the requirements:
https://github.com/SuLab/myvariant.info/blob/master/src/dataload/contrib/clinvar/clinvar_xml_parser.py#L5

It's changed from clinvar to clinvar1 - is the clinvar library that does the parseString() call available from you or is it a separate 3rd party lib to be installed?

https://github.com/SuLab/myvariant.info/blob/master/src/dataload/contrib/clinvar/clinvar_xml_parser.py#L315
record_parsed = clinvar1.parseString(record, silence=1)

load VICC-harmonized data

From Alex Wagner, this link https://s3-us-west-2.amazonaws.com/g2p-0.10/index.html has the current release of the VICC-harmonized data (described in https://www.biorxiv.org/content/early/2018/07/11/366856). It is subject to change as that manuscript goes through peer review. But once that's done and the data set is finalized, seems like a good source to import. (obviously we already have civic data directly, but this resource will provide access to several other sources as well in a standardized format.)

cc @ahwagner

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.