Giter VIP home page Giter VIP logo

clinanno's Introduction

clinAnno

clinAnno is a collection of Python scripts designed to annotate genetic variants in a .vcf file. clinAnno currently provides annotations for PS1 and PM5 variants according to criteria set forth in the ACMG recommendations. The ClinVar database is used to identify previously established, pathogenic variants.

Consider the pathogenic, missense clinVar variant in AGRN:

#CHROM POS ID REF ALT QUAL FILTER INFO
1 985955 rs199476396 G C . . NM_198576.3(AGRN):c.5125G>C (p.Gly1709Arg)

A variant is annotated with PS1 when it has the same amino acid change as an established pathogenic variant, regardless of nucleotide change.

#CHROM POS ID REF ALT QUAL FILTER INFO
1 985955 . G A . . PS1=18241;NM_198576.3(AGRN):c.5125G>A (p.Gly1709Arg)

Notice:
The clinVar variant results in Arg from a CGG codon
The example variant results in Arg from a AGG codon


A variant is annotated PM5 when "a novel missense amino acid change occurs at the same position as another pathogenic missense change".

#CHROM POS ID REF ALT QUAL FILTER INFO
1 985956 . G C . . PM5=18241;NM_198576.3(AGRN):c.5126G>C (p.Gly1709Ala)

Again, notice:
The clinVar variant results in Arg from a CGG codon
The example variant results in Ala from a GCG codon

Installation

Clone on github:

git clone https://github.com/arvkevi/clinAnno.git

Requirements

  1. An annotated .vcf file that contains amino acid change information according to HGVS nomenclature (p.Trp26Cys). Variant Effect Predictor

Usage

clinVar_parser.py -- parse & save clinVar

clinVar_parser.py should be executed once, before any .vcf annotations. Runtime exceeds 15 minutes.

clinAnno$ python clinVar_parser.py

It will save a snapshot of all "pathogenic" and/or "conflicting" SNVs and Indels from clinVar as a Python dictionary object (pickle file) in the current working directory.

After the initial execution of clinVar_parser.py it's uneccessary to run it again, until clinVar releases a new update, or whenever you'd like.

clinAnno.py -- .vcf annotations

The file, clinVar_obj.p should be in your repo.
Now, annotate a .vcf:
After executing clinVar_parser.py, the file clinVar_obj.p should be in your repo.

Now, annotate any .vcf (with HGVS amino acid change annotation): The example, clinvar.chr1.anno.vcf, contains chromosome 1 variants from clinVar's .vcf.

clinAnno$ python clinAnno.py --vcf_in=clinvar.chr1.anno.vcf --vcf_out=clinvar.chr1.anno.clinAnno.vcf

Processing for large .vcf files can be sped up with the --nproc parameter:

clinAnno$ python clinAnno.py --vcf_in=clinvar.chr1.anno.vcf --vcf_out=clinvar.chr1.anno.clinAnno.vcf --nproc=4

clinAnno has been tested on multi-sample and gzipped .vcf's with success.

Results

The output of clinAnno is a copy of the original .vcf with the following changes:
If no PS1 or PM5 variants are found in clinVar, the record (variant) will be unchanged.
Otherwse, the record (variant) will have either PS1=varid(s) or PM5=varid(s) prepended to the INFO field, with multiple entries separated by ;.

The first two records from the clinAnno annotated .vcf:

clinAnno$ grep -v '^##' clinvar.chr1.anno.clinAnno.vcf | head -3
#CHROM POS ID REF ALT QUAL FILTER INFO
1 949523 rs786201005 C T . . PS1=183381;RS=786201005;RSPOS=949523;dbSNPBuildID=144;
1 949696 rs672601345 C CG . . RS=672601345;RSPOS=949699;dbSNPBuildID=142;

The INFO field has been truncated
Notice the second variant was not annotated with either PS1= or PM5=, indicating that the variant did not meet the criteria.

The integer after PS1= is clinVar's unique variation identifier, this directs you to the variant landing page in clinVar:
http://www.ncbi.nlm.nih.gov/clinvar/variation/183381/

Citation

Richards, Sue, et al. "Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology." Genetics in Medicine (2015).

clinanno's People

Contributors

arvkevi avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

clinanno's Issues

not vcf 4.2 compliant

When more than one clinVar unique ID are annotated to either PS1 or PM5 the list of unique ID's should be separated by a comma, not a semi-colon.

Current:
PS1=1234;9876;PM5=3333;

Proposed (vcf 4.2):
PS1=1234,9876;PM5=3333;


Also, two meta-info lines describing the annotations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.