Giter VIP home page Giter VIP logo

pcgr's Introduction

Personal Cancer Genome Reporter (PCGR)- variant interpretation report for precision oncology

Overview

The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package for functional annotation and translation of individual cancer genomes for precision oncology. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the Ensembl’s Variant Effect Predictor (VEP) with oncology-relevant, up-to-date annotations retrieved flexibly through vcfanno, and produces interactive HTML reports intended for clinical interpretation (Figure 1).

PCGR overview

News

  • November 29th 2017: 0.5.3 release
    • Fixed bug with propagation of default options
  • November 23rd 2017: 0.5.2 release
  • November 15th 2017: 0.5.1 pre-release
    • Bug fixing (VCF validation)
  • November 14th 2017: 0.5.0 pre-release
    • Updated version of VEP (v90)
    • Updated versions of ClinVar, Uniprot KB, CIViC, CBMDB
    • Removal of ExAC (replaced by gnomAD), removal of COSMIC due to licensing restrictions
    • Users can analyze samples run without matching control (i.e. tumor-only)
    • PCGR pipeline is now configured through a TOML-based configuration file
    • Bug fixes / general speed improvements
    • Work in progress: Export of report data through JSON

Example reports

PCGR documentation

Documentation Status

If you use PCGR, please cite our recent publication:

Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, Ola Myklebost, and Eivind Hovig. Personal Cancer Genome Reporter: variant interpretation report for precision oncology (2017). Bioinformatics (in press). doi:10.1093/bioinformatics/btx817

Annotation resources included in PCGR (v0.5.3)

  • VEP v90 - Variant Effect Predictor release 90 (GENCODE v27 as the gene reference dataset)
  • dBNSFP v3.4 - Database of non-synonymous functional predictions (March 2017)
  • gnomAD r1 - Germline variant frequencies exome-wide (March 2017)
  • dbSNP b147 - Database of short genetic variants (April 2016)
  • 1000 Genomes Project - phase3 - Germline variant frequencies genome-wide (May 2013)
  • TCGA release 9.0 - somatic mutations discovered across 33 tumor type cohorts (The Cancer Genome Atlas)
  • ClinVar - Database of clinically related variants (November 2017)
  • DoCM - Database of curated mutations (v3.2, April 2016)
  • CIViC - Clinical interpretations of variants in cancer (November 11th 2017)
  • CBMDB - Cancer Biomarkers database (November 11th 2017)
  • IntOGen catalog of driver mutations - (May 2016)
  • DisGeNET - Database of curated gene-tumor type associations (May 2017)
  • Cancer Hotspots - Resource for statistically significant mutations in cancer (2016)
  • UniProt/SwissProt KnowledgeBase 2017_10 - Resource on protein sequence and functional information (October 2017)
  • Pfam v31 - Database of protein families and domains (March 2017)
  • DGIdb - Database of targeted cancer drugs (v3.0, September 2017)
  • TSGene v2.0 - Tumor suppressor/oncogene database (November 2015)

Getting started

STEP 0: Python

A local installation of Python (it has been tested with version 2.7.13) is required to run PCGR. Check that Python is installed by typing python --version in a terminal window. In addition, a Python library for parsing configuration files encoded with TOML is needed. To install, simply run the following command:

pip install toml

STEP 1: Installation of Docker

  1. Install the Docker engine on your preferred platform
    • installing Docker on Linux
    • installing Docker on Mac OS
    • NOTE: We have not yet been able to perform enough testing on the Windows platform, and we have received feedback that particular versions of Docker/Windows do not work with PCGR (an example being mounting of data volumes)
  2. Test that Docker is running, e.g. by typing docker ps or docker images in the terminal window
  3. Adjust the computing resources dedicated to the Docker, i.e.:

STEP 2: Download PCGR

  1. Download and unpack the latest software release (0.5.3)

  2. Download and unpack the data bundle (approx. 16Gb) in the PCGR directory

    • Download the accompanying data bundle from Google Drive to ~/pcgr-X.X (replace X.X with the version number, e.g ~/pcgr-0.5.3)
    • Unpack the data bundle, e.g. through the following Unix command: gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -

    A data/ folder within the pcgr-X.X software folder should now have been produced

  3. Pull the PCGR Docker image (0.5.3) from DockerHub (approx 4.2Gb):

    • docker pull sigven/pcgr:0.5.3 (PCGR annotation engine)

STEP 3: Input preprocessing

The PCGR workflow accepts two types of input files:

  • An unannotated, single-sample VCF file (>= v4.2) with called somatic variants (SNVs/InDels)
  • A copy number segment file

NOTE: GRCh37 is currently supported as the reference genome build

PCGR can be run with either or both of the two input files present.

  • We strongly recommend that the input VCF is compressed and indexed using bgzip and tabix
  • If the input VCF contains multi-allelic sites, these will be subject to decomposition
  • Variants used for reporting should be designated as 'PASS' in the VCF FILTER column

The tab-separated values file with copy number aberrations MUST contain the following four columns:

  • Chromosome
  • Start
  • End
  • Segment_Mean

Here, Chromosome, Start, and End denote the chromosomal segment (GRCh37), and Segment_Mean denotes the log(2) ratio for a particular segment, which is a common output of somatic copy number alteration callers. Below shows the initial part of a copy number segment file that is formatted correctly according to PCGR's requirements:

Chromosome	Start	End	Segment_Mean
1 3218329 3550598 0.0024
1 3552451 4593614 0.1995
1 4593663 6433129 -1.0277

STEP 4: Run example

A tumor sample report is generated by calling the Python script pcgr.py, which takes the following arguments and options:

usage: pcgr.py [-h] [--input_vcf INPUT_VCF] [--input_cna INPUT_CNA]
		 [--force_overwrite] [--version]
		 pcgr_dir output_dir configuration_file sample_id

Personal Cancer Genome Reporter (PCGR) workflow for clinical interpretation of
somatic nucleotide variants and copy number aberration segments

positional arguments:
pcgr_dir              PCGR base directory with accompanying data directory,
				e.g. ~/pcgr-0.5.3
output_dir            Output directory
configuration_file    PCGR configuration file (TOML format)
sample_id             Tumor sample/cancer genome identifier - prefix for
				output files

optional arguments:
-h, --help            show this help message and exit
--input_vcf INPUT_VCF
				VCF input file with somatic query variants
				(SNVs/InDels). Note: GRCh37 is currently the only
				reference genome build supported (default: None)
--input_cna INPUT_CNA
				Somatic copy number alteration segments (tab-separated
				values) (default: None)
--force_overwrite     By default, the script will fail with an error if any
				output file already exists. You can force the
				overwrite of existing result files by using this flag
				(default: False)
--version             show program's version number and exit

The configuration file, formatted using TOML (an easy to read file format) enables the user to configure a number of options in the PCGR workflow, related to the following:

  • MSI prediction
  • Mutational signatures analysis
  • Coding target size - for mutational burden analysis
  • Tumor-only analysis options (i.e. exclusion of germline variants/enrichment for somatic calls)
  • VEP/vcfanno options
  • Specification of INFO tags in VCF that denote sequencing depth/allelic support of variants
  • Log-ratio thresholds for gains/losses in CNA analysis

The examples folder contain input files from two tumor samples sequenced within TCGA. It also contains a PCGR configuration file. A report for a colorectal tumor case can be generated by running the following command in your terminal window:

python pcgr.py --input_vcf ~/pcgr-0.5.3/examples/tumor_sample.COAD.vcf.gz --input_cna ~/pcgr-0.5.3/examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.5.3 ~/pcgr-0.5.3/examples ~/pcgr-0.5.3/examples/pcgr_configuration_examples.toml tumor_sample.COAD

This command will run the Docker-based PCGR workflow and produce the following output files in the examples folder:

  1. tumor_sample.COAD.pcgr.html - An interactive HTML report for clinical interpretation
  2. tumor_sample.COAD.pcgr.vcf.gz - VCF file with rich set of annotations for precision oncology
  3. tumor_sample.COAD.pcgr.maf - A basic MAF file for use as input in downstream analyses with other tools (e.g. 2020plus, MutSigCV)
  4. tumor_sample.COAD.pcgr.snvs_indels.tiers.tsv - Tab-separated values file with variants organized according to tiers of functional relevance
  5. tumor_sample.COAD.pcgr.mutational_signatures.tsv - Tab-separated values file with estimated contributions by known mutational signatures and associated underlying etiologies
  6. tumor_sample.COAD.pcgr.snvs_indels.biomarkers.tsv - Tab-separated values file with clinical evidence items associated with biomarkers for diagnosis, prognosis or drug sensitivity/resistance
  7. tumor_sample.COAD.pcgr.cna_segments.tsv.gz - Tab-separated values file with annotations of gene transcripts that overlap with somatic copy number aberrations

Contact

[email protected]

pcgr's People

Contributors

sigven avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.