Giter VIP home page Giter VIP logo

qsva_brain's Introduction

qsva_brain

DOI

This repository contains the code for the SCZD case vs control differential expression analysis for the BrainSeq Phase II project (see the brainseq_phase2 repo). It was carried out by Amy Peterson as part of her JHSPH MPH project.

License

Attribution-NonCommercial: CC BY-NC

This license lets others remix, tweak, and build upon our work non-commercially as long as they acknowledge our work.

View License Deed | View Legal Code

Citation

If you use anything in this repository please cite the BrainSeq Phase II project.

Scripts

The main script directories follow this order:

  1. expr_data: scripts for integrating the degradation datasets from multiple brain regions.
  2. means: calculate the base-pair coverage mean files for the degradation data.
  3. ERs: identify expressed regions using derfinder on the degradation data. Also identify which ERs are strongly associated with degradation time.
  4. brainseq_phase2_qsv: compute the coverage matrix for the BrainSeq Phase 2 data, identify the quality surrogate variables (qSVs) and then perform the differential expression analysis between schizophrenia cases and non-psychiatric controls.

Files in this project

  • README.md: this file
  • qsva_brain.Rproj: RStudio file for organizing the project

expr_data

  • merge_data.R: R script merging the different degradation datasets.
  • run_merge_data.sh: bash shell script for running the R script at JHPCE.
  • logs: directory with log files
  • pdf: directory with image files

means

  • qsva_bws.R: R script for computing the mean base-pair coverage and saving them as BigWig files. Uses the recount.bwtool R package.
  • qsva_bws.sh: bash shell script for running the previous R script at JHPCE.

ERs

  • make_ERs_stranded.R: R script for identifying the expressed regions, then identifying which ERs are strongly associated with degradation signal and saving the results for later use.
  • make_ERs_stranded.sh: bash shell script for running the previous R script at JHPCE.

brainseq_phase2_qsv

  • quantify_top1000.R and quantify_top1000.sh: R and bash script for computing the coverage matrix.
  • explore_replicates.R: R script for exploring the coverage matrix results.
  • make_qSVs.R: identify the quality surrogate variables following different options. That is, with all 900 samples, with 712 after dropping potential confounding samples, and then for each brain region individually.
  • explore_qsvs.R: perform PCA on the BrainSeq gene expression data and explore that data. Also explore the qSVs and their associations with measured covariates.
  • casectrl_HIPPO.R and casectrl_DLPFC.R: perform the case-control differential expression analysis for each brain region at the gene expression level.
  • casectrl_HIPPO_allFeatures.R and casectrl_DLPFC_allFeatures.R: similar to the above, but for exons, exon-exon junctions and transcript expression levels.
  • casectrl_HIPPO_plots.R and casectrl_DLPFC_plots.R: R scripts for making the DEqual plots that assess the performance of the qSVA framework.
  • explore_case_control.R: R script for exploring the case-control results, performing gene ontology enrichment analyses, comparing results against BrainSeq Phase I (polyA) and visualizing the top results.
  • pdf: directory with image files

Length of scripts

Number of lines in each script.

## R scripts
$ for i in */*.R; do echo $i; wc -l $i; done
ERs/make_ERs_stranded.R
     258 ERs/make_ERs_stranded.R
brainseq_phase2_qsv/casectrl_DLPFC.R
     263 brainseq_phase2_qsv/casectrl_DLPFC.R
brainseq_phase2_qsv/casectrl_DLPFC_allFeatures.R
     164 brainseq_phase2_qsv/casectrl_DLPFC_allFeatures.R
brainseq_phase2_qsv/casectrl_DLPFC_plots.R
      56 brainseq_phase2_qsv/casectrl_DLPFC_plots.R
brainseq_phase2_qsv/casectrl_HIPPO.R
     261 brainseq_phase2_qsv/casectrl_HIPPO.R
brainseq_phase2_qsv/casectrl_HIPPO_allFeatures.R
     165 brainseq_phase2_qsv/casectrl_HIPPO_allFeatures.R
brainseq_phase2_qsv/casectrl_HIPPO_plots.R
      55 brainseq_phase2_qsv/casectrl_HIPPO_plots.R
brainseq_phase2_qsv/explore_case_control.R
     690 brainseq_phase2_qsv/explore_case_control.R
brainseq_phase2_qsv/explore_qsvs.R
    1510 brainseq_phase2_qsv/explore_qsvs.R
brainseq_phase2_qsv/explore_replicates.R
     137 brainseq_phase2_qsv/explore_replicates.R
brainseq_phase2_qsv/make_qSVs.R
     318 brainseq_phase2_qsv/make_qSVs.R
brainseq_phase2_qsv/quantify_top1000.R
      97 brainseq_phase2_qsv/quantify_top1000.R
expr_data/merge_data.R
     264 expr_data/merge_data.R
means/qsva_bws.R
      29 means/qsva_bws.R
      
## bash scripts
$ for i in */*.sh; do echo $i; wc -l $i; done
ERs/make_ERs_stranded.sh
      21 ERs/make_ERs_stranded.sh
brainseq_phase2_qsv/quantify_top1000.sh
      22 brainseq_phase2_qsv/quantify_top1000.sh
expr_data/run_merge_data.sh
      23 expr_data/run_merge_data.sh
means/qsva_bws.sh
      19 means/qsva_bws.sh

LIBD internal

JHPCE location: /dcs04/lieber/lcolladotor/qSVA_LIBD3080/qsva_brain

qsva_brain's People

Contributors

amy-peterson avatar lcolladotor avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qsva_brain's Issues

Explore qSVs

Do something like https://github.com/LieberInstitute/brainseq_phase2/blob/master/get_degradation_regions.R#L146-L197 where you load /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/degradation_rse_phase2_usingJoint_justFirst.rda for the qSVA data (instead of https://github.com/LieberInstitute/brainseq_phase2/blob/master/get_degradation_regions.R#L138) and /dcl01/lieber/ajaffe/lab/brainseq_phase2/expr_cutoff/rse_gene.Rdata for the BrainSeq Phase 2 data (instead of https://github.com/LieberInstitute/brainseq_phase2/blob/master/get_degradation_regions.R#L146)

You might need to use some of https://github.com/LieberInstitute/brainseq_phase2/blob/master/caseControl_analysis_hippo.R#L16-L22 for simplifying the sample phenotype information given that there are multiple replicates.

Compare DE qual plots from common qSVs vs region-specific qSV results

Compare the results from #5 and #6 against DE qual plots (Andrew maybe has them already) from using outGene from https://github.com/LieberInstitute/brainseq_phase2/blob/master/caseControl_analysis_hippo.R#L78 and https://github.com/LieberInstitute/brainseq_phase2/blob/master/caseControl_analysis_DLPFC.R#L80 for HIPPO and DLPFC respectively.

We might need to think about a few more ways to compare the results from the 2 qSVA methods.

CMC qsvs

Eugenia informed me that the institute needs the qsvs for the cmc to try to increase reproducibility in a recent study. I forgot the original qSV data works with expressed regions and will need to see how to compute those.

Case-control: HIPPO

Using the qSVs you defined in #2, define the differentially expressed genes by diagnosis for only the HIPPO samples.

You'll need to load the BrainSeq phase 2 data and process a bit like in https://github.com/LieberInstitute/brainseq_phase2/blob/master/caseControl_analysis_hippo.R#L10-L28. Then load the qSVs from #2 and subset them to the HIPPO samples. Then build the 2 models: mod and modQsva as in https://github.com/LieberInstitute/brainseq_phase2/blob/master/caseControl_analysis_hippo.R#L36-L47 and https://github.com/LieberInstitute/brainseq_phase2/blob/master/caseControl_analysis_hippo.R#L54.
Next, run the DE analysis as in https://github.com/LieberInstitute/brainseq_phase2/blob/master/caseControl_analysis_hippo.R#L59-L69. You don't need to repeat the DE analysis without qSVs https://github.com/LieberInstitute/brainseq_phase2/blob/master/caseControl_analysis_hippo.R#L71-L77.

Save the gene HIPPO DE data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.