Giter VIP home page Giter VIP logo

mmhe's Introduction

mmhe

Moment matching method for SNP-based heritability estimation

mmhe is an implemtation of moment-matching method for SNP-based heritability eitmation.

Getting started

For the python scripts, you will need to install python and the packages required, including argparse, numpy, os, and struct.

For the Matlab scripts, you will need to install Matlab.

You can get mmheby simply clone this repo with

git clone https://github.com/chiayenchen/mmhe.git

or download the entire repo from github website (https://github.com/chiayenchen/mmhe).

How to use mmhe single GRM version

  • Python version mmhe.py

    Usage: python ./mmhe.py --grm my_grm_prefix --pheno my.pheno --mpheno 1 --covar my.covar

    You can get the input files descrption with ./mmhe.py --help. mmhe.py can take a pre-computed gentic relationship matrix (GRM), a phenotype file with multiple phenotypes, and a covaraite file (usually contains pricipal components for ancestry adjustment). Specifications of these files are listed below.

    1. The phenotype file follows the GCTA phenotype file format (same as PLINK phenotype format). The first 2 columns are the family ID and individual ID of the subjects included. These IDs are used to match the phenotype to the GRM. Make sure these IDs correspond to the IDs in the genotype file used to calculate GRM. IF you have multiple phenotypes in the file, you can specify which phenotype to use in the current analysis by --mpheno.

    2. The covariate file follows the GCTA --qcovar file format. The first 2 columns are the family ID and individual ID of the subjects included. These IDs are used to match the phenotype to the GRM. Make sure these IDs correspond to the IDs in the genotype file used to calculate GRM. Note that all covarites in the file will to be included in the analysis and all covariates are treated as continuous variables.

    3. The GRM follows GCTA binary GRM format (PREFIX.grm.bin, PREFIX.grm.N.bin and PREFIX.grm.id). However, mmhe.py only requires PREFIX.grm.bin and PREFIX.grm.id.

    Link to GCTA: http://cnsgenomics.com/software/gcta/index.html

    Link to PLINK2: https://www.cog-genomics.org/plink2

    The output of mmhe.py is the point esitamate and standard error of SNP-based heritabilty. The computation time is also provided.

  • Matlab version mmhe.m

    Once read in the GRM, phenotype, and covariates in Matlab as matrices, mmhe.m can give the point esitamate and standard error of SNP-based heritabilty. Specifications of the input data format are listed below.

    1. y: n_subj x 1 vector of phenotype
    2. X: n_subj x n_cov matrix of covariates
    3. K: n_subj x n_subj matrix of empirical genetic similarity matrix (n_subj is the number of subjects anlyzed)

    The outputs from mmhe.m are

    1. h2: SNP heritability estimate
    2. se: standard error estimate of h2

How to use mmhe vector stacking version

  • Python version mmhe_col.py

    Usage: python ./mmhe.py --grmdir my_grm_dir my_grm_prefix --pheno my.pheno --mpheno 1 --covar my.covar

    For a dataset that has more than 100,000 subjects (or any large dataset), the mmhe_col.py can load the GRM by blockes.

    Specifications of the input data format are listed below.

    1. Phenotype file and covariate file follow the GCTA format (see description in the mmhe single GRM version).

    2. Block GRM files with PREFIX.grm.id file.

    The bock GRM files are n_subj x k matrices that are the columns of the full GRM matrix. Typically you would have n_subj/k block GRM files each with k columns of the full GRM and 1 block GRM files that will take less than k columns at the end of the GRM. These files should be plain text file and are saved as PREFIX.{block_num}.grm (e.g., PREFIX.1.grm, PREFIX.2.grm, ...) in 1 directory.

    The PREFIX.grm.id file shoud follow the format of GCTA GRM file format, with first column family ID and second column individual ID.

  • Matlab version mmhe_col.m

    For a dataset that has more than 100,000 subjects (or any large dataset), the mmhe_col.m can load the GRM by blockes.

    Specifications of the input data format are listed below.

    1. y: n_subj x 1 vector of phenotype
    2. X: n_subj x n_cov matrix of covariates
    3. grm_dir: directory where block columns of the empirical genetic similarity matrix can be found; we have assumed here that each block column variable K is save as GRM_col{col_num}.mat (e.g., GRM_col1.mat, GRM_col2.mat, ...) in this directory. blk_size: size (number of columns) of each block

    The outputs from mmhe_col.m are

    1. h2: SNP heritability estimate
    2. se: standard error estimate of h2

Support

Please contact Tian Ge ([email protected]) or Chia-Yen Chen ([email protected]) for any questions and comments.

Please cite the paper if you use this software.

Tian Ge, Chia-Yen Chen, Benjamin M. Neale, Mert R. Sabuncu, Jordan W. Smoller. Phenome-wide Heritability Analysis of the UK Biobank. PLoS Genetics 13(4): e1006711

https://doi.org/10.1371/journal.pgen.1006711

mmhe's People

Contributors

chiayenchen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.