hbctraining / intro-to-chipseq Goto Github PK

Intro to ChIPseq using HPC

Home Page: https://hbctraining.github.io/Intro-to-ChIPseq/

SCSS 100.00%

intro-to-chipseq's Introduction

NOTE: The materials in this repository are no longer actively maintained. More recent content can be found at: https://hbctraining.github.io/Intro-to-ChIPseq-flipped/

OLD - Introduction to ChIP-seq using high performance computing

Audience	Computational Skills	Prerequisites	Duration
Biologists	Beginner/Intermediate	None	3-day workshop (~19.5 hours of trainer-led time)

Description

This repository has teaching materials for a 3-day Introduction to ChIP-sequencing data analysis workshop. This workshop focuses on teaching basic computational skills to enable the effective use of an high-performance computing environment to implement a ChIP-seq data analysis workflow. It includes an introduction to shell (bash) and shell scripting. In addition to running the ChIP-seq workflow from FASTQ files to peak calls and nearest gene annotations, the workshop covers best practice guidlelines for ChIP-seq experimental design and data organization/management and quality control.

These materials were developed for a trainer-led workshop, but are also amenable to self-guided learning.

Learning Objectives

Understand the necessity for, and use of, the command line interface (bash) and HPC for analyzing high-throughput sequencing data.
Understand best practices for designing a ChIP-seq experiment and analysis the resulting data.

Lessons

Click here for links to lessons and the suggested schedule

Dataset

Installation Requirements

Download the most recent versions of R and RStudio for your laptop:

R (version 3.5.0 or above)
RStudio

NOTE: When installing the following packages, if you are asked to select (a/s/n) or (y/n), please select “a” or "y" as applicable.

(1) Install the below packages on your laptop from CRAN. You DO NOT have to go to the CRAN webpage; you can use the following function to install them:

install.packages("BiocManager")
install.packages("tidyverse")

Note that these package names are case sensitive!

(2) Install the below packages from Bioconductor. Load BiocManager, then run BiocManager's install() function 7 times for the 7 packages:

library(BiocManager)
install("insert_first_package_name_in_quotations")
install("insert_second_package_name_in_quotations")
& so on ...

Note that these package names are case sensitive!

ChIPQC
ChIPseeker
DiffBind
clusterProfiler
AnnotationDbi
TxDb.Hsapiens.UCSC.hg19.knownGene
EnsDb.Hsapiens.v75
org.Hs.eg.db

NOTE: The library used for the annotations associated with genes (here we are using TxDb.Hsapiens.UCSC.hg19.knownGene and EnsDb.Hsapiens.v75) will change based on organism (e.g. if studying mouse, would need to install and load TxDb.Mmusculus.UCSC.mm10.knownGene). The list of different organism packages are given here.

(3) Finally, please check that all the packages were installed successfully by loading them one at a time using the library() function.

library(tidyverse)
library(ChIPQC)
library(ChIPseeker)
library(DiffBind)
library(clusterProfiler)
library(AnnotationDbi)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(EnsDb.Hsapiens.v75)

(4) Once all packages have been loaded, run sessionInfo().

sessionInfo()

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Some materials used in these lessons were derived from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).

intro-to-chipseq's People

Contributors

Stargazers

Watchers

Forkers

inambioinfo yunjoonjung1 maozhitao rkhetani pythseq smartgamer marypipes lmexj hpcbio gnilihzeux y461650833y rroutsong yichangyu mayupsc yuxinagli b1234561 sailepradh singlecoated ruska612 neptuneyt ruixiangliu rjo93 lzlgboy acgtcoder jlsesguerra yoonsquared shuyilii karini925 gianasco ning-liang lhaclove chrislou-bioinfo ning-yan8926 bixbeta yixf-self xjyx jchenpku wqhf lelinhd yl-jia marencc taoyongac jie-yin ninlambre learning-jusue404 antonioahn svolazza vishimenon28 zhuqingquan5510 dpak23 solocell moudfassad hzaurzli boxizhang bio-lijs yuewangpanda qiuying2019 jarninggau zefeng-wu jianguozhou3 amrr101 sridhar0605 sruthi10 mayankmurali caibinsh yupuliang emanuelsoda zandigohar biov zojka li-linr jeffyang123 gaochenxuzi amarinderthind abinaya14 qotov trinhlt2 wyn9191 riteshkc shinthor genomicsnx genostack nathancfox alkputman mashiat13 rb56 renyiwu mkyriak bit-vs-it lidweixiang cougarlj merckey sulijimoh tangbozeng kaiser-huang tyrev nvrivera hasanalanya kthorner amdreamer

intro-to-chipseq's Issues

update unix lesssons

create a new directory on O2 where raw_fastq contains the chipseq files
change mentions of rna-seq
specifically modify the data organization markdown (merge with Intro to Unix)

there is no IDR module on O2

Change the main page README links (broken)

Add a metadata section to intro/data management?

We don't currently have an extensive amount of metadata about the data as we do with RNA-seq. We can find out this information and add it in

Update automation script for duplicates information

The current script incorrectly mentions that duplicates are removed during one of the sambamba steps, this needs to be corrected.

update chipQC subset report link

Adding details on alignment

Lecture on alignment like we do in RNA-seq? Yes. - A shortened version (i.e. remove the suffix array stuff).
Add in BWA and ask Rory about including some benchmark results.

update ChIP QC link for full dataset report

change ChIPQC to create chipObj on O2 and then create report locally

this is how you would do it for your own dataset anyway. Or hopefully get it running on O2 (fix X11 problem)

Create new chipseq materials using ChipPeakAnno

Add in BWA

and ask Rory about including some benchmark result

find and replace ChIP-seq or ChIP-Seq

setup repo with directory structure

add narrow peak to file formats slide deck

remove 1-3

Trimming the text on cross-correlation/phantompeakQC

change first two figures in cross-correlation lesson

I think we can either create new or find better figures to represent that

update sambamba filter to address duplicates

Check with Rory. We want to match what is happening in bcbio

Add a slide deck for workflow summary and QC steps

Links to data management website and genes Nanog and Pou5f1 are broken

Pertaining to ChIPseq lesson 01

adding details pulldown to selected lessons

This is an HTML tag taht would be useful for "try it on your own" sections, where the code is hidden initially but clicking on it will make it available

```bash $ idr --samples Pou5f1-rep1_sorted_peaks.narrowPeak Pou5f1-rep2_sorted_peaks.narrowPeak \ --input-file-type narrowPeak \ --rank p.value \ --output-file Pou5f1-idr \ --plot \ --log-output-file pou5f1.idr.log ```

trim the SAM file description in alignment

Adding slides on Illumina sequencing— add the link to the YouTube Ilumina video in the markdown

creating a lesson for automation

this might be good since we have removed trimming

csaw lesson?

We've been using csaw for histone modification studies as an alternative to peak-calling or focusing on specific features like TSS regions. It might be useful as a lesson.

functional analysis change biomart to annotables

add regioner to evaluate overlaps between nanog and pou5f1

https://bioconductor.org/packages/release/bioc/vignettes/regioneR/inst/doc/regioneR.pdf

update long course materials for O2 specific changes

Adding more detail on the SAM file

Talk in more detail about the SAM file
What information is stored in it etc. Take from RNA-seq

move new BAM files over to groups directory

currently in my home directory

move removal of blacklisted regions to before peak calling

FileZilla screenshot in QC needs to be changed -

(Host says orchestra)

more concise project management section

change this as we did with RNA-seq, which means removing the "best practices" section in the QC lesson

update long course material paths

move over chipseq lessons from long course

update cross-correlation plot in the Intro Chip-slidedeck

add a note to peak calling for other types of analysis (broad peaks, ATAC-seq)

Even the move to ChiP

Explain the sort command in visualization lesson

change the README to be updated with table format

remove trimming

remove trimming since we are soft-clipping with bowtie2?

Alignment theory lecture - small edit

When talking about local alignment discuss that this is the soft-clipping described in the QC section.

change the workflow images

Add export to use hbctraining R library

rather than have them install ChIPQC, spp, catools, and any other R packages this might be easier

add in greylisting info

From Rory:
"greylist regions are regions where the input exceeds a threshold, where peak-callers sometimes call spurious peaks. threshold is calculated by calculating depth over the input, sampling it repeatedly and estimating negative binomial parameters and then taking the threshold as the .99 quantile of the NB"

https://github.com/roryk/chipseq-greylist i just copied what the chipseqgreylist R package does

update the long course materials

check everything for flow
check language

move over unix shell lessons

move the lessons from Intro to RNA-seq (O2)

Samtools and sambamba in QC lesson

Samtools is being used in QC lesson, needs more detail to introduce it.
Add a note on similarities between samtools and sambamba. Only using sambamba where necessary.