Giter VIP home page Giter VIP logo

coloc-wrapper's Introduction

COLOC wrapper

This pipeline facilitates easy usage of coloc (Giambartolomei et al. 2014, Wallace 2020) with GWAS and eQTL data.

Colocalization analysis is used to detect genetic causality between two different GWAS traits.

Coloc-wrapper performs genetic colocalization analysis for GWAS and eQTL datasets in a given region using coloc.abf() function from Coloc R-package. It calculates posterior probabilities for the following five hypothesis for each gene in the region under the assumption of a single causal variant for each trait:

: no association
: association to trait 1 only
: association to trait 2 only
: association to both traits, distinct causal variants
: association to both traits, shared causal variant

The posterior probability of hypothesis 4, PP4, determines the possible colocalization. A common threshold for it is PP4 > 0.8.

Getting started

To get started, look at this minimal example.

I on local machine

  • R version >3.6.2
  • R-packages: "coloc", "data.table", "ggplot2", "optparse", "R.utils"
  • tabix

II using docker

sudo docker build -t coloc-wrapper -f docker/Dockerfile .

sudo docker run -it -v /mnt/disks/1/projects/COLOC:/COLOC -w /COLOC coloc-wrapper /bin/bash

Data

The input files are the following:

1. GWAS summary statistics

2. eQTL summary statistics

eQTL data can be found here: https://www.ebi.ac.uk/eqtl/

3. Genomic region

Usage

Running coloc-wrapper involves two steps:

  1. Trimming data, both GWAS and eQTL, according to a predefined region
  2. Running coloc

1. Trimming data

  • file: GWAS or eQTL file path or url
  • region: genomic region of interest, format chr:start-end
  • out: output file
Rscript extdata/step1_subset_data.R	\
	--file=ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/Lepik_2017/ge/Lepik_2017_ge_blood.all.tsv.gz \
	--region="1:10565520-10965520" \
  	--out=tmp.txt

2. Run coloc

  • gwas: GWAS summary statistics file for one region
  • eqtl: eQTL summary statistics file for one region
  • header_gwas: header of GWAS file, named vector in quotes
  • header_eqtl: header of eQTL file, named vector in quotes
  • info_gwas: options for GWAS dataset, more info here
    • type: the type of data in dataset - either "quant" or "cc" to denote quantitative or case-control
    • s: for a case control dataset, the proportion of samples in dataset that are cases
    • N: number of samples in the dataset
  • info_eqtl: options for eQTL dataset, more info here
    • type: the type of data in dataset - either "quant" or "cc" to denote quantitative or case-control
    • sdY: for a quantitative trait, the population standard deviation of the trait. if not given, it can be estimated from the vectors of varbeta and MAF
    • N: number of samples in the dataset
  • p1: the prior probability that any random SNP in the region is associated with exactly trait 1
  • p2: the prior probability that any random SNP in the region is associated with exactly trait 2
  • p12: the prior probability that any random SNP in the region is associated with both traits
  • locuscompare_thresh: PP4 threshold that plots the locuscompare plots
  • out: output file
Rscript extdata/step2_run_coloc.R	\
	--eqtl="extdata/Lepik_2017_ge_blood_chr1_ENSG00000142655_ENSG00000130940.all.tsv" \
	--gwas="extdata/I9_VARICVE_chr1.tsv.gz" \
	--header_eqtl="c(varid = 'rsid', pvalues = 'pvalue', MAF = 'maf', gene_id = 'gene_id')" \
	--header_gwas="c(varid = 'rsids', pvalues = 'pval', MAF = 'maf')" \
	--info_gwas="list(type = 'cc', s = 11006/117692, N  = 11006 + 117692)" \
	--info_eqtl="list(type = 'quant', sdY = 1, N = 491)" \
	--p1=1e-4 \
	--p2=1e-4 \
	--p12=5e-6 \
	--locuscompare_thresh=0.8 \
	--out="Coloc_example.txt" \

Output

Text file

  • gene_id: gene identifier
  • nsnps: number of SNPs included in colocalization
  • PP.H0.abf: Posterior probability that neither trait has a genetic association in the region
  • PP.H1.abf: Posterior probability that only trait 1 has a genetic association in the region
  • PP.H2.abf: Posterior probability that only trait 2 has a genetic association in the region
  • PP.H3.abf: Posterior probability that both traits are associated, but with different causal variants
  • PP.H4.abf: Posterior probability that both traits are associated and share a single causal variant

For more details to output columns see coloc-package.

Alternative coloc-wrapper

An alternative coloc-wrapper: https://github.com/eQTL-Catalogue/colocalisation

Unit tests

Rscript -e 'testthat::test_dir("tests/testthat/")'

References

  • Original coloc paper: Giambartolomei, C., Vukcevic, D., Schadt, E.E., Franke, L., Hingorani, A.D., Wallace, C., Plagnol, V., 2014. Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLOS Genetics 10, e1004383. https://doi.org/10.1371/journal.pgen.1004383
  • Updates on coloc: Wallace, C., 2020. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLOS Genetics 16, e1008720. https://doi.org/10.1371/journal.pgen.1008720
  • Importance of visualizing locus: Liu, B., Gloudemans, M.J., Rao, A.S., Ingelsson, E., Montgomery, S.B., 2019. Abundant associations with gene expression complicate GWAS follow-up. Nat Genet 51, 768โ€“769. https://doi.org/10.1038/s41588-019-0404-0 (see also locuscomparer)

coloc-wrapper's People

Contributors

emiliavartiainen avatar sinarueeger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.