The pcgr from likelet

Personal Cancer Genome Reporter (PCGR)- variant interpretation report for precision oncology

Overview

The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package for functional annotation and translation of individual cancer genomes for precision oncology. It interprets both somatic SNVs/InDels and copy number aberrations. The software extends basic gene and variant annotations from the Ensembl’s Variant Effect Predictor (VEP) with oncology-relevant, up-to-date annotations retrieved flexibly through vcfanno, and produces interactive HTML reports intended for clinical interpretation (Figure 1).

News

November 29th 2017: 0.5.3 release
- Fixed bug with propagation of default options
November 23rd 2017: 0.5.2 release
November 15th 2017: 0.5.1 pre-release
- Bug fixing (VCF validation)
November 14th 2017: 0.5.0 pre-release
- Updated version of VEP (v90)
- Updated versions of ClinVar, Uniprot KB, CIViC, CBMDB
- Removal of ExAC (replaced by gnomAD), removal of COSMIC due to licensing restrictions
- Users can analyze samples run without matching control (i.e. tumor-only)
- PCGR pipeline is now configured through a TOML-based configuration file
- Bug fixes / general speed improvements
- Work in progress: Export of report data through JSON

Example reports

PCGR documentation

If you use PCGR, please cite our recent publication:

Sigve Nakken, Ghislain Fournous, Daniel Vodák, Lars Birger Aaasheim, Ola Myklebost, and Eivind Hovig. Personal Cancer Genome Reporter: variant interpretation report for precision oncology (2017). Bioinformatics (in press). doi:10.1093/bioinformatics/btx817

Annotation resources included in PCGR (v0.5.3)

VEP v90 - Variant Effect Predictor release 90 (GENCODE v27 as the gene reference dataset)
dBNSFP v3.4 - Database of non-synonymous functional predictions (March 2017)
gnomAD r1 - Germline variant frequencies exome-wide (March 2017)
dbSNP b147 - Database of short genetic variants (April 2016)
1000 Genomes Project - phase3 - Germline variant frequencies genome-wide (May 2013)
TCGA release 9.0 - somatic mutations discovered across 33 tumor type cohorts (The Cancer Genome Atlas)
ClinVar - Database of clinically related variants (November 2017)
DoCM - Database of curated mutations (v3.2, April 2016)
CIViC - Clinical interpretations of variants in cancer (November 11th 2017)
CBMDB - Cancer Biomarkers database (November 11th 2017)
IntOGen catalog of driver mutations - (May 2016)
DisGeNET - Database of curated gene-tumor type associations (May 2017)
Cancer Hotspots - Resource for statistically significant mutations in cancer (2016)
UniProt/SwissProt KnowledgeBase 2017_10 - Resource on protein sequence and functional information (October 2017)
Pfam v31 - Database of protein families and domains (March 2017)
DGIdb - Database of targeted cancer drugs (v3.0, September 2017)
TSGene v2.0 - Tumor suppressor/oncogene database (November 2015)

Getting started

STEP 0: Python

A local installation of Python (it has been tested with version 2.7.13) is required to run PCGR. Check that Python is installed by typing python --version in a terminal window. In addition, a Python library for parsing configuration files encoded with TOML is needed. To install, simply run the following command:

pip install toml

STEP 1: Installation of Docker

Install the Docker engine on your preferred platform
- installing Docker on Linux
- installing Docker on Mac OS
- NOTE: We have not yet been able to perform enough testing on the Windows platform, and we have received feedback that particular versions of Docker/Windows do not work with PCGR (an example being mounting of data volumes)
Test that Docker is running, e.g. by typing docker ps or docker images in the terminal window
Adjust the computing resources dedicated to the Docker, i.e.:
- Memory: minimum 5GB
- CPUs: minimum 4
- How to - Mac OS X

STEP 2: Download PCGR

Download and unpack the latest software release (0.5.3)
Download and unpack the data bundle (approx. 16Gb) in the PCGR directory
- Download the accompanying data bundle from Google Drive to ~/pcgr-X.X (replace X.X with the version number, e.g ~/pcgr-0.5.3)
- Unpack the data bundle, e.g. through the following Unix command: gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -
A data/ folder within the pcgr-X.X software folder should now have been produced
Pull the PCGR Docker image (0.5.3) from DockerHub (approx 4.2Gb):
- docker pull sigven/pcgr:0.5.3 (PCGR annotation engine)

STEP 3: Input preprocessing

The PCGR workflow accepts two types of input files:

An unannotated, single-sample VCF file (>= v4.2) with called somatic variants (SNVs/InDels)
A copy number segment file

NOTE: GRCh37 is currently supported as the reference genome build

PCGR can be run with either or both of the two input files present.

We strongly recommend that the input VCF is compressed and indexed using bgzip and tabix
If the input VCF contains multi-allelic sites, these will be subject to decomposition
Variants used for reporting should be designated as 'PASS' in the VCF FILTER column

The tab-separated values file with copy number aberrations MUST contain the following four columns:

Chromosome
Start
End
Segment_Mean

Here, Chromosome, Start, and End denote the chromosomal segment (GRCh37), and Segment_Mean denotes the log(2) ratio for a particular segment, which is a common output of somatic copy number alteration callers. Below shows the initial part of a copy number segment file that is formatted correctly according to PCGR's requirements:

Chromosome	Start	End	Segment_Mean
1 3218329 3550598 0.0024
1 3552451 4593614 0.1995
1 4593663 6433129 -1.0277

STEP 4: Run example

A tumor sample report is generated by calling the Python script pcgr.py, which takes the following arguments and options:

usage: pcgr.py [-h] [--input_vcf INPUT_VCF] [--input_cna INPUT_CNA]
		 [--force_overwrite] [--version]
		 pcgr_dir output_dir configuration_file sample_id

Personal Cancer Genome Reporter (PCGR) workflow for clinical interpretation of
somatic nucleotide variants and copy number aberration segments

positional arguments:
pcgr_dir              PCGR base directory with accompanying data directory,
				e.g. ~/pcgr-0.5.3
output_dir            Output directory
configuration_file    PCGR configuration file (TOML format)
sample_id             Tumor sample/cancer genome identifier - prefix for
				output files

optional arguments:
-h, --help            show this help message and exit
--input_vcf INPUT_VCF
				VCF input file with somatic query variants
				(SNVs/InDels). Note: GRCh37 is currently the only
				reference genome build supported (default: None)
--input_cna INPUT_CNA
				Somatic copy number alteration segments (tab-separated
				values) (default: None)
--force_overwrite     By default, the script will fail with an error if any
				output file already exists. You can force the
				overwrite of existing result files by using this flag
				(default: False)
--version             show program's version number and exit

The configuration file, formatted using TOML (an easy to read file format) enables the user to configure a number of options in the PCGR workflow, related to the following:

MSI prediction
Mutational signatures analysis
Coding target size - for mutational burden analysis
Tumor-only analysis options (i.e. exclusion of germline variants/enrichment for somatic calls)
VEP/vcfanno options
Specification of INFO tags in VCF that denote sequencing depth/allelic support of variants
Log-ratio thresholds for gains/losses in CNA analysis

The examples folder contain input files from two tumor samples sequenced within TCGA. It also contains a PCGR configuration file. A report for a colorectal tumor case can be generated by running the following command in your terminal window:

python pcgr.py --input_vcf ~/pcgr-0.5.3/examples/tumor_sample.COAD.vcf.gz --input_cna ~/pcgr-0.5.3/examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.5.3 ~/pcgr-0.5.3/examples ~/pcgr-0.5.3/examples/pcgr_configuration_examples.toml tumor_sample.COAD

This command will run the Docker-based PCGR workflow and produce the following output files in the examples folder:

tumor_sample.COAD.pcgr.html - An interactive HTML report for clinical interpretation
tumor_sample.COAD.pcgr.vcf.gz - VCF file with rich set of annotations for precision oncology
tumor_sample.COAD.pcgr.maf - A basic MAF file for use as input in downstream analyses with other tools (e.g. 2020plus, MutSigCV)
tumor_sample.COAD.pcgr.snvs_indels.tiers.tsv - Tab-separated values file with variants organized according to tiers of functional relevance
tumor_sample.COAD.pcgr.mutational_signatures.tsv - Tab-separated values file with estimated contributions by known mutational signatures and associated underlying etiologies
tumor_sample.COAD.pcgr.snvs_indels.biomarkers.tsv - Tab-separated values file with clinical evidence items associated with biomarkers for diagnosis, prognosis or drug sensitivity/resistance
tumor_sample.COAD.pcgr.cna_segments.tsv.gz - Tab-separated values file with annotations of gene transcripts that overlap with somatic copy number aberrations

Contact

[email protected]

likelet / pcgr Goto Github PK

pcgr's Introduction

Personal Cancer Genome Reporter (PCGR)- variant interpretation report for precision oncology

Overview

News

Example reports

PCGR documentation

Annotation resources included in PCGR (v0.5.3)

Getting started

STEP 0: Python

STEP 1: Installation of Docker

STEP 2: Download PCGR

STEP 3: Input preprocessing

STEP 4: Run example

Contact

pcgr's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent