Giter VIP home page Giter VIP logo

bat's Introduction

BAT - Bisulfite Analyss Toolkit

WEBSITE

Introduction

Cytosine DNA methylation is a biochemical process that has been shown to play an important roll in gene expression and cell differentiation. Recently, a number of whole-genome bisulfite sequencing (WGBS) and targeted bisulfite sequencing (i.e., RRBS) protocols have made it possible to precisely and accurately capture this major epigenetic modification.

Here, a modular bisulfite analysis toolkit (BAT) is introduced. It tackles the major tasks for analyzing bisulfite sequencing data: mapping, extraction of the methylation information (referred to as methylation calling), and differential methylation analysis as well as downstream analyses like integration of the methylation data with annotation and gene expression data. Each part of this analysis workflow is modular and can easily be customized or extended by other bisulfite- or NGS-related tools, but can also be used as is with the additional benefit of many automatically generated graphics by the modules of BAT.

Modules

Mapping

The first module comprises read mapping including pre- and postprocessings. For mapping, BAT_mapping employs segemehl, a performant and highly sensitive short-read aligner with a specialized bisulfite mode (link), but due to the modularity of BAT, this step could be exchanged by running a different bisulfite-sensitive aligner. In addition to a bisulfite-specific quality filtering, aligned reads are converted to an indexed and sorted BAM file. Basic mapping statistics such as the number of mapped pairs/reads, the number of reads with a single (unique-mapped) or multiple alternative alignments (multi-mapped), the distribution of the multiplicity of read alignment, and the distribution of the edit distance of read alignments are calculated by BAT_mapping_stat. If there are multiple datasets per sample (e.g., due to multiplexing on different lanes), all alignment files corresponding to one sample can be merged using BAT_merging. It also enables the addition of dataset-specific read group information during the merging process.

Calling

Following mapping, the methylation information needs to be extracted from the alignments, referred to as methylation calling. First, BAT_calling takes the alignments and generates a VCF file that contains information for each cytosine including the sequence context, coverage, detailed number of covering nucleotides, and the estimated methylation rate. Second, cytosine positions can be filtered by coverage, genomic context, and methylation rate, using BAT_filtering. The output is again in VCF format but it is also provided as bedGraph file with the estimated methylation rate in the fourth column, ready for loading in IGV or uploading to the UCSC genome browser. Furthermore, the coverage and methylation rate distributions for all and filtered positions are illustrated as barplots.

Analysis

The third module covers the basic analysis of two groups of a single sample or up to multiple samples. At first, various helpful summary, bedGraph and bigWig files for all samples are created with BAT_summarize. Furthermore, a Circos plot containing a methylation rate heatmap for each sample could be provided. Overview plots comprising hierarchical clustering, genome-wide average methylation rate boxplots, correlation plots of mean group methylation rates per position and distribution of position-wise group differences are plotted. Specific regions of interest or annotations, e.g., transcription factor binding sites (TFBS), CpG islands, promoter regions, BAT_annotation can be used to get an insight into the methylation of the samples in those regions. Basic statistics, like length of annotation items (in nucleotides and Cs), are calculated. In addition, the distribution of the average group-wise methylation rates per annotation item, clustering heatmaps containing all samples, and boxplots of the single sample average methylation rates per annotation item are shown.

DMRs

Finally, the calling analysis of DMRs is coverd by BAT_DMRcalling and BAT_correlating. The DMR calling tool metilene identifies DMRs between two groups from one or more samples very quickly and accurately. Subsequently, the raw metilene output can be filtered and converted to BED-like or bedGraph format. Basic DMR statistics including length distributions (in nucleotides and Cs), distribution of group methylation differences, and scatterplots of methylation means of group 1 vs. group 2 as well as methylation difference vs. q-value of DMRs is illustrated. Finally, BAT_correlating facilitates the identification of correlating DMRs (cDMRs), i.e., DMRs where the methylation change correlates with a change in the expression of the associated genes. However, it is not restricted to DMRs as input but can also be used for inspecting other annotation items such as promoter regions or TFBS. In result, linear and non-linear correlation effects are tested and the results are reported as text file and correlation plot for easy visual inspection.

Example data

The example data comprise the raw reads of one sample and the already called, but not filtered reads of that sample and further 7 samples. The samples blong to two groups, each of four samples. The data are adopted from a recent lymphoma publication (link). The unmapped sample consists of two sequencing runs. These reads can be mapped to a reduced genome and merged prior to methylation calling. In addition to the raw and called methylation data, a reduced reference genome, some gene annotations and gene expression data are provided. This will enable you, to run the entire toolkit on a small example region.

Obtaining

To download the BAT scripts, please use git to download the most recent development tree. Currently, the tree is hosted on github, and can be obtained via:

git clone https://github.com/helenebioinf/BAT

Docker

If you prefer to not install all dependencies, you can use the BAT docker image. Dependencies and scripts are installed - simply pull the image. To test it, download the input data including the folder structure here (985 MB), run the docker image and the run-script. For a quick start, have a look here

Detailed Information

For more information please go to the website.

Contact

Please report any issues or questions to helene [at] bioinf [dot] uni-leipzig.de

bat's People

Contributors

helenebioinf avatar

Stargazers

 avatar  avatar  avatar

bat's Issues

:enhancement: conda install

Does it interest you to have a conda recipe available for BAT? At our institution, users are not able to run docker commands, but users can manage a conda distribution in their network drives. I have an environment.yml file that installs most of the requirements for BAT via bioconda (https://bioconda.github.io/), it might not be too much more work to build a conda recipe that includes all of the dependencies and allows for a simple install of BAT via bioconda.

# environment.yml
name: BAT
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - bedtools=2.27.1=he941832_2
  - circos=0.69.6=3
  - cutadapt=1.18=py36_0
  - fastqc=0.11.8=0
  - htslib=1.9=hc238db4_4
  - libdeflate=1.0=h470a237_0
  - metilene=0.2.6=h470a237_1
  - perl-app-cpanminus=1.7044=pl526_1
  - perl-autoloader=5.74=pl526_1
  - perl-carp=1.38=pl526_1
  - perl-clone=0.41=pl526h470a237_0
  - perl-config-general=2.61=pl526_1
  - perl-constant=1.33=pl526_1
  - perl-data-dumper=2.161=pl526_2
  - perl-digest-perl-md5=1.9=pl526_1
  - perl-dynaloader=1.25=pl526_1
  - perl-encode=2.88=pl526_1
  - perl-exporter=5.72=pl526_1
  - perl-exporter-tiny=1.000000=pl526_0
  - perl-extutils-makemaker=7.34=pl526_2
  - perl-file-copy-link=0.140=pl526h470a237_1
  - perl-file-path=2.15=pl526_0
  - perl-file-spec=3.48_01=pl526_1
  - perl-file-temp=0.2304=pl526_2
  - perl-font-ttf=1.06=pl526_0
  - perl-gd=2.69=pl526he941832_0
  - perl-getopt-long=2.50=pl526_1
  - perl-io-string=1.08=pl526_3
  - perl-list-moreutils=0.428=pl526_1
  - perl-list-moreutils-xs=0.428=pl526_0
  - perl-list-util=1.38=pl526_1
  - perl-math-bezier=0.01=pl526_1
  - perl-math-round=0.07=pl526_1
  - perl-math-vecstat=0.08=pl526_1
  - perl-module-implementation=0.09=pl526_2
  - perl-module-runtime=0.016=pl526_0
  - perl-number-format=1.75=pl526_3
  - perl-params-validate=1.29=pl526h470a237_0
  - perl-parent=0.236=pl526_1
  - perl-pathtools=3.73=h470a237_2
  - perl-readonly=1.04=pl526_2
  - perl-regexp-common=2017060201=pl526_0
  - perl-scalar-list-utils=1.45=pl526h470a237_3
  - perl-set-intspan=1.19=pl526_1
  - perl-statistics-basic=1.6611=pl526_2
  - perl-svg=2.84=pl526_0
  - perl-test-more=1.001002=pl526_1
  - perl-text-format=0.59=pl526_1
  - perl-time-hires=1.9758=pl526_0
  - perl-try-tiny=0.30=pl526_0
  - perl-xml-parser=2.44=pl526h3a4f0e9_6
  - perl-xsloader=0.24=pl526_0
  - samtools=1.9=h8ee4bcc_1
  - segemehl=0.3.1=haeeaa98_0
  - xopen=0.3.5=py_0
  - bz2file=0.98=py_0
  - bzip2=1.0.6=h470a237_2
  - ca-certificates=2018.10.15=ha4d7672_0
  - cairo=1.14.12=h276e583_5
  - certifi=2018.10.15=py36_1000
  - curl=7.61.0=h93b3f91_2
  - expat=2.2.5=hfc679d8_2
  - fontconfig=2.13.1=h65d0f4c_0
  - freetype=2.9.1=h6debe1e_4
  - gettext=0.19.8.1=h5e8e0c9_1
  - giflib=5.1.4=h470a237_1
  - glib=2.56.2=h464dc38_1
  - gmp=6.1.2=hfc679d8_0
  - graphite2=1.3.12=hfc679d8_1
  - harfbuzz=1.9.0=h04dbb29_1
  - icu=58.2=hfc679d8_0
  - jpeg=9c=h470a237_1
  - krb5=1.14.6=0
  - libedit=3.1.20170329=haf1bffa_1
  - libffi=3.2.1=hfc679d8_5
  - libgcc-ng=7.2.0=hdf63c60_3
  - libgd=2.2.5=h5400f36_4
  - libgfortran=3.0.0=1
  - libiconv=1.15=h470a237_3
  - libpng=1.6.35=ha92aebf_2
  - libssh2=1.8.0=h5b517e9_3
  - libstdcxx-ng=7.2.0=hdf63c60_3
  - libtiff=4.0.9=he6b73bb_2
  - libuuid=2.32.1=h470a237_2
  - libwebp=0.5.2=7
  - libxcb=1.13=h470a237_2
  - libxml2=2.9.8=h422b904_5
  - mpfr=4.0.1=h16a7912_0
  - ncurses=6.1=hfc679d8_1
  - openjdk=11.0.1=h470a237_2
  - openjpeg=2.3.0=h0e734dc_3
  - openssl=1.0.2p=h470a237_1
  - pango=1.40.14=he752989_2
  - pcre=8.41=hfc679d8_3
  - perl=5.26.2=h470a237_0
  - pigz=2.3.4=0
  - pip=18.1=py36_1000
  - pixman=0.34.0=h470a237_3
  - poppler=0.67.0=hdf8a1b3_2
  - poppler-data=0.4.9=0
  - pthread-stubs=0.4=h470a237_1
  - python=3.6.7=h5001a0f_1
  - r-assertthat=0.2.0=r351h6115d3f_1001
  - r-base=3.5.1=h4fe35fd_1
  - r-bitops=1.0_6=r351hc070d10_2
  - r-catools=1.17.1.1=r351h9d2a408_2
  - r-cli=1.0.0=r351h6115d3f_1001
  - r-colorspace=1.3_2=r351hc070d10_2
  - r-crayon=1.3.4=r351h6115d3f_1001
  - r-digest=0.6.18=r351hc070d10_0
  - r-fansi=0.3.0=r351hc070d10_0
  - r-gdata=2.18.0=r351h6115d3f_1001
  - r-getopt=1.20.2=r351_2001
  - r-ggplot2=3.1.0=r351h6115d3f_1000
  - r-glue=1.3.0=r351h470a237_2
  - r-gplots=3.0.1=r351h6115d3f_1001
  - r-gtable=0.2.0=r351h6115d3f_1001
  - r-gtools=3.8.1=r351hc070d10_2
  - r-kernsmooth=2.23_15=r351h364d78e_2
  - r-labeling=0.3=r351h6115d3f_1001
  - r-lattice=0.20_35=r351hc070d10_0
  - r-lazyeval=0.2.1=r351hc070d10_2
  - r-magrittr=1.5=r351h6115d3f_1001
  - r-mass=7.3_50=r351hc070d10_2
  - r-matrix=1.2_14=r351hc070d10_2
  - r-mgcv=1.8_24=r351hc070d10_2
  - r-munsell=0.5.0=r351h6115d3f_1001
  - r-nlme=3.1_137=r351h364d78e_0
  - r-optparse=1.6.0=r351_1001
  - r-pillar=1.3.0=r351h6115d3f_1000
  - r-plyr=1.8.4=r351h9d2a408_2
  - r-r6=2.2.2=r351h6115d3f_1001
  - r-rcolorbrewer=1.1_2=r351h6115d3f_1001
  - r-rcpp=1.0.0=r351h9d2a408_0
  - r-reshape2=1.4.3=r351h9d2a408_2
  - r-rlang=0.3.0.1=r351h470a237_0
  - r-scales=1.0.0=r351h9d2a408_1
  - r-stringi=1.2.4=r351h9d2a408_1
  - r-stringr=1.3.1=r351h6115d3f_1001
  - r-tibble=1.4.2=r351hc070d10_2
  - r-utf8=1.1.4=r351hc070d10_0
  - r-viridislite=0.3.0=r351h6115d3f_1001
  - r-withr=2.1.2=r351h6115d3f_1000
  - readline=7.0=haf1bffa_1
  - setuptools=40.6.2=py36_0
  - six=1.11.0=py36_1001
  - sqlite=3.25.3=hb1c47c0_0
  - tectonic=0.1.11=h99d305b_0
  - tk=8.6.9=ha92aebf_0
  - wheel=0.32.3=py36_0
  - xorg-kbproto=1.0.7=h470a237_2
  - xorg-libice=1.0.9=h470a237_4
  - xorg-libsm=1.2.3=h8c8a85c_0
  - xorg-libx11=1.6.6=h470a237_0
  - xorg-libxau=1.0.8=h470a237_6
  - xorg-libxdmcp=1.1.2=h470a237_7
  - xorg-libxext=1.3.3=h470a237_4
  - xorg-libxrender=0.9.10=h470a237_2
  - xorg-libxt=1.1.5=h470a237_2
  - xorg-renderproto=0.11.1=h470a237_2
  - xorg-xextproto=7.3.0=h470a237_2
  - xorg-xproto=7.0.31=h470a237_7
  - xz=5.2.4=h470a237_1
  - zlib=1.2.11=h470a237_3
  - gsl=2.2.1=h0c605f7_3
  - libcurl=7.61.0=h1ad7b7a_0
  - pip:
    - absl-py==0.2.2
    - argcomplete==1.9.4
    - argh==0.26.2
    - bcbio-gff==0.6.4
    - geojson==2.4.0
    - grpcio==1.13.0
    - ipyvolume==0.4.5
    - ipywebrtc==0.3.0
    - jupyter-contrib-nbextensions==0.5.0
    - jupyter-latex-envs==1.4.4
    - jupyter-nbextensions-configurator==0.4.0
    - markdown==2.6.11
    - sklearn==0.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.