Giter VIP home page Giter VIP logo

ldpred-funct's Introduction

LDpred-funct

Here we present the software for the method LDpredfunct described in Marquez-Luna, et al, "Modeling functional enrichment improves polygenic prediction accuracy in UK Biobank and 23andMe data sets", Biorxiv.

Installing LDpred-funct

As with most Python packages, configurating LDpred-funct is simple. You can either use git (which is installed on most systems) and clone this repository using the following git command:

git clone https://github.com/carlaml/LDpred-funct.git

Alternatively, you can simply download the source files and place them somewhere.

You also need to install the packages h5py, scipy, and libplinkio. With pip (see https://pip.pypa.io/en/latest/quickstart.html), one can install libplinkio using the following command: pip install plinkio

Input files

  1. Plink files. One plink file (binary PED format) per chromosome from the validation (insert the character "[1:22]" instead of the chromsome numbers). These are used as LD-reference panel and later to compute PRS.
  • Example: plinkfile="my_plink_files/Final.chrom[1:22]"
  • Use flag: --gf
  1. File with functional enrichments (see Gazal et al 2017 Nat Genet and Finucane et al 2015 Nat Genet).
    • First you will need to estimate the per-SNP heritability inferred using S-LDSC (see instructions here) under the baselineLD model (you can download Baseline-LD annotations here), and use the summary statistics that will be later used as training. When running S-LDSC make sure to use the --print-coefficients flag to get the regression coefficients.
    • After running S-LDSC:
      1. Get h2g estimate from the *.log file.
      2. Get the regression coefficients from the *.results file (column 8). Divide the regression coeffients by h2g, define it as T=tau/h2g which is a vector of dimension Cx1, where C is the total number of annotations.
      3. From the baselineLD annotations downloaded from here, read the annotations file baselineLD.*.annot.gz, and only keep the annotations columns (i.e. remove first 4 columns). Call this matrix X, with dimensions MxC, where M is the number of SNPs and C is the total number of annotations.
      4. Define the expected per-SNP heritability as a Mx1 vector (sigma2_i from the LDpred-funct manuscript) as the result from multiplying the matrix X times T.
  • Format of FUNCTFILE:

    • Column 1: SNP ID
    • Column 2: per-SNP heritability
  • Use flag: --FUNCT_FILE

  1. Summary statistics file. Please check that the summary statistics contains a column for each of the following field (header is important here, important fields are highlighted in bold font, the order of the columns it is not important).
    • CHR Chromosome
    • SNP SNP ID
    • BP Physical position (base-pair)
    • A1 Minor allele name (based on whole sample)
    • A2 Major allele name
    • P Asymptotic p-value
    • BETA Effect size
    • Z Z-score (default). If instead of Z-score the Chi-square statistic is provided, use the flag --chisq, and CHISQ as column field.
  • Use flag: --ssf
  1. Phenotype file.
  • Format:

    • Column 1: FID
    • Column 2: phenotype
  • This file doesn't have a header.

  • Use flag: --pf

Input parameters:

  1. Training sample size.
  • Use flag: --N
  1. Estimated SNP Heritability (pre-compute this using your favorite method).
  • Use flag: --H2
  1. LD radius (optional). If not provided, it is computed as (1/2)*0.15% of total number of SNPs.
  • Use flag: --ld_radius

Output files

  1. Coordinated files: This is an hdf5 file that stores the coordinated genotype data with the summary statistics and functional enrichments.
  • Use flag: --coord
  • Note: the output file needs to be named differently for different runs.
  1. Posterior mean effect sizes: Estimated posterior mean effect size from LDpred-funct-inf.
  • Use flag: --posterior_means
  1. Output: Polygenic risk score for each individual in the validation.
  • Description:

    • Column 1: Sample ID
    • Column 2: True phenotype
    • Column 3: PRS using all-snps and marginal effect sizes.
    • Colunm 4: PRS obtained using LD-pred-funct-inf
    • Column 5-K: PRS(k) defined in equation 5 from Marquez-Luna, et al, Biorxiv.
  • Use flag: --out

Example

plinkfile="my_plink_files/Final.chrom[1:22]"
outCoord="my_analysis/Coord_Final"
statsfile="my_analysis/sumary_statistics.txt"
outLdpredfunct="my_analysis/ldpredfunct_posterior_means"
outValidate="my_analysis/ldpredfunct_prs"
phenotype="my_analysis/trait_1.txt"
N=training sample size
h2= pre-computed SNP heritability

optional flags:
--ld_radius= Pre-defined ld-radius
--K= Number of bins for LDpred-funct

python LDpred-funct/ldpredfunct.py --gf=${plinkfile} --pf=${phenotype} --FUNCT_FILE=${statsfile}  --coord=${outCoord} --ssf=${statsfile} --N=${N} --posterior_means=${outLdpredfunct}  --H2=${h2} --out=${outValidate} > ${outValidate}.log

Running recommendations

For UK Biobank analysis described in Marquez-Luna, et al, "Modeling functional enrichment improves polygenic prediction accuracy in UK Biobank and 23andMe data sets", Biorxiv., I requested 70G of memory and the jobs lasted approximately 10 hours per trait.

Acknowledgments

We used LDpred software developed by Bjarni Vilhjalmsson as basis for this software.

Contact

Please first submit your question in the issues section, since it can be useful for other users as well. Otherwise, you can contact me at info.

ldpred-funct's People

Contributors

carlaml avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.