Giter VIP home page Giter VIP logo

emagma-tutorial's Introduction

eMAGMA-TUTORIAL

This tutorial is a step by step guide on how to use eMAGMA, an approach to conducting eQTL informed gene-based tests by assigning SNPs to tissue-specific eGenes as presented in Gerring et al., 2019a, Gerring et al., 2019b. Here we provide the scripts and files to use the eMAGMA methodology which generates a list of disease-associated eGenes using genome-wide summary statistics. In this tutorial, we will show how to apply eMAGMA using GWAS summary statistics of Major Depression Disorder (MDD) as example data; these summary statistics are publicly available from the Psychiatric Genomic Consortium (PGC) website.

The tutorial is divided into two parts. Part 1 conducts eMAGMA gene-based analysis, this analysis integrates SNP-gene associations from an eQTL reference dataset (GTEx version 8) with GWAS summary statistics. We generated annotation files in which SNPs are assigned to genes based on their association with gene expression. The SNP-gene associations are tissue specific; hence we can estimate what genes are more highly associated with a disease at the tissue level. Part 2 conducts eMAGMA gene-set analysis, testing for the enrichment of association in co-expression networks. The aim of this analysis is to identify modules (sets of highly correlated genes) that are highly associated with disease risk. Tissue-specific annotation files and co-expression network files (for 48 tissues) are shared as part of this tutorial. Explanation of the methods and resources used in this tutorial are provided in the publication accompanying this tutorial, Gerring et al., 2009a.

Requirements

This tutorial is executable in Unix, it is assumed that users are familiar with the Unix environment and command line. You can type or copy paste the commands or re-structure them as you wish. This is a hands-on tutorial with minimum theoretical explanations. It is essential that the user reads through the publications that accompany the tutorial (Gerring et al. 2019a, Gerring et al., 2019b) as they provide the theoretical background for the analyses. Knowledge of GWAS and GWA-summary analysis is required. We have previously generated a tutorial on the execution of GWAS analysis through another Github repository https://github.com/MareesAT/GWA_tutorial (Marees et al., 2018).


Setting Up

Start by creating an eMAGMA folder with all the files we will use throughout the tutorial.

   cd /path/to-yourworking folder
    mkdir eMAGMA
    cd eMAGMA

The analysis is done using MAGMA v1.07b (de Leeuw, Neale, Heskes, & Posthuma, 2016). MAGMA and auxiliary files can be downloaded from the program website: https://ctg.cncr.nl/software/magma. Two auxiliary files are required: a file with gene locations for protein-coding genes from NCBI and a genome reference file. For this tutorial we use build 37(hg19) that matches the build of the summary data (MDD2018_excluding23andMe) and the reference file for the European population. Gene location files for build 36, 37, & 38 are available from the MAGMA website. You can use wget o curl to import the files directly into your directory, for example:

MAGMA

wget https:// https://ctg.cncr.nl/software/MAGMA/prog/magma_v1.07b_static.zip

Auxiliary files for 37(hg19)

wget https://ctg.cncr.nl/software/MAGMA/aux_files/NCBI37.3.zip

Reference data

wget https://ctg.cncr.nl/software/MAGMA/ref_data/g1000_eur.zip

GWAS summary = MDD2018_ex23andMe from PGC web site

 https://www.med.unc.edu/pgc/results-and-downloads/

Notice: If you are using your own data, make sure to download the auxiliary files that correspond to the genome build of your data.

This tutorial provides gene annotation and co-expression networks for 48 tissues, including 13 brain tissues and whole blood. At the end of the tutorial you will be able to apply the eMAGMA approach to your own data using these files.


LIST OF FILES SHARED WITH THIS TUTORIAL:

eMAGMA Annotation files for 48 tissues:

emagma_annot_1.tar.gz: Brain_Amygdala.genes.annot Brain_Hippocampus.genes.annot Brain_Anterior_cingulate_cortex_BA24.genes.annot Brain_Hypothalamus.genes.annot Brain_Caudate_basal_ganglia.genes.annot Brain_Nucleus_accumbens_basal_ganglia.genes.annot Brain_Cerebellar_Hemisphere.genes.annot Brain_Putamen_basal_ganglia.genes.annot Brain_Cerebellum.genes.annot Brain_Spinal_cord_cervical_c-1.genes.annot Brain_Cortex.genes.annot Brain_Substantia_nigra.genes.annot Brain_Frontal_Cortex_BA9.genes.annot

emagma_annot_2.tar.gz: Adipose_Subcutaneous.genes.annot Artery_Aorta.genes.annot Breast_Mammary_Tissue.genes.annot Adipose_Visceral_Omentum.genes.annot Artery_Coronary.genes.annot Adrenal_Gland.genes.annot

emagma_annot_3.tar.gz: Cells_EBV-transformed_lymphocytes.genes.annot Esophagus_Gastroesophageal_Junction.genes.annot Cells_Transformed_fibroblasts.genes.annot Esophagus_Mucosa.genes.annot Colon_Sigmoid.genes.annot Esophagus_Muscularis.genes.annot Colon_Transverse.genes.annot

emagma_annot_4.tar.gz: Heart_Atrial_Appendage.genes.annot Liver.genes.annot Minor_Salivary_Gland.genes.annot Nerve_Tibial.genes.annot Pancreas.genes.annot Prostate.genes.annot

emagma_annot_5.tar.gz: Skin_Not_Sun_Exposed_Suprapubic.genes.annot Spleen.genes.annot Pituitary.genes.annot Skin_Sun_Exposed_Lower_leg.genes.annot Stomach.genes.annot

emagma_annot_6.tar.gz: Ovary.genes.annot Testis.genes.annot Thyroid.genes.annot Uterus.genes.annot Vagina.genes.annot Whole_Blood.genes.annot

emagma_annot_7.tar.gz: Artery_Tibial.genes.annot Heart_Left_Ventricle.genes.annot Lung.genes.annot Muscle_Skeletal.genes.annot Small_Intestine_Terminal_Ileum.genes.annot

eMAGMA Co-expression network files for 48 tissues:

network_files.zip: Brain_Amygdala.txt Brain_Anterior_cingulate_cortex_BA24.txt Brain_Caudate_basal_ganglia.txt Brain_Cerebellar_Hemisphere.txt Brain_Cerebellum.txt Brain_Cortex.txt Brain_Frontal_Cortex_BA9.txt Brain_Hippocampus.txt Brain_Nucleus_accumbens_basal_ganglia.txt Brain_Putamen_basal_ganglia.txt Brain_Spinal_cord_cervical_c-1.txt Brain_Substantia_nigra.txt

Tutorial Output files:

Amygdala_outputs: Amygdala_emagma.genes.out, Amygdala_emagma.genes.raw, Amygdala_emagma.gsa.out, Amygdala_emagma.log, Amygdala_signif_genes.txt.


References

a Zachary F Gerring, Angela Mina-Vargas, Nicholas G Martin2, Eric R Gamazon3-5, Eske M Derks. eMAGMA: An eQTL-informed method to identify risk genes using genome-wide association study summary statistics. doi: https://doi.org/10.1101/854315.

b Gerring ZF, Gamazon ER, Derks EM, for the Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium (2019) A gene co-expression network-based analysis of multiple brain tissues reveals novel genes and molecular pathways underlying major depression. PLOS Genetics 15(7): e1008245. https://doi.org/10.1371/journal.pgen.1008245

Marees, AT, de Kluiver, H, Stringer, S, et al. A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res. 2018; 27:e1608. https://doi.org/10.1002/mpr.1608

de Leeuw C, Mooij J, Heskes T, Posthuma D (2015): MAGMA: Generalized gene-set analysis of GWAS data. PLoS Comput Biol 11(4): e1004219. doi:10.1371/journal.pcbi.1004219

emagma-tutorial's People

Contributors

angelaminavargas avatar eskederks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

emagma-tutorial's Issues

coexpression modules

Hello

From the results of gene-set analysis, how to know which are the significant genes present in the modules?.

I don't know to delete this question.

I figured out that genes.sets file will be generated if the genes are present in the module after correction for multiple testing

Thanks

Batch Download

Hi,

I am trying to go through your tutorial and it seems the zip folders Batche1 to 6 are not correctly zipped.
the downloading with wget takes less than a second and when unzipping I get the following error:
Archive: Batch1.annot.zip End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive. unzip: cannot find zipfile directory in one of Batch1.annot.zip or Batch1.annot.zip.zip, and cannot find Batch1.annot.zip.ZIP, period.

Thank you very much for sharing this new tool!

Laura

e-magma annotation file error

Hi e-magma team
Thanks for developing this tool. I've downloaded emagma annotation file emagma_annot_1.tar.gz. When I tried to unzip the file using below command-
tar xf emagma_annot_1.tar.gz
tar xvzf emagma_annot_1.tar.gz

I'm getting error given below-
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Could you please suggest a way to get the files unzipped ?

Thanks a lot.
Best regards
Tania

About the co-expression network file

Dear author, Hello:
I would like to ask if I use E-MAGMA to do other organizational analysis, what should I choose for the annotation file of the co-expression network file? Or where can I download it?

network files for tissue types other than the brain

Excuse me, I have a few questions about how to use eMAGMA

  1. Are there pre-built expression network files for tissue types other than the brain (for example, blood), and where to download these files?
  2. How to build a co-expression network file with my own data?

Reference genome build file

Dear eMAGMA team,
I've found that the link provided to download the Genome1000 reference file is not working. Could you please share African ancestry genome 1000 reference file so that I can run eMAGMA?
I appreciate your time and help.

Kind regards
Reza
QBI, UQ

issues in eMAGMA GENE-SET ANALYSIS

Hi,

I am trying to run through the tutorial using Amygdala_emagma.genes.raw inside Amygdala_outputs.zip but encountered the attached issue. Could you please kindly help with that?
image

Also the eMAGMA Co-expression network files is currently only avaiable for brain tissues. May I know whether it is available for other tissues? if not, could you please kindly share the codes that you have used for the brain tissues?

The last question will be I notice in the paper that the data is from EUR. How does this work for other ethnicity groups?

Thank you.

Cheers
Shirley

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.