Giter VIP home page Giter VIP logo

steinmann / peakzilla Goto Github PK

View Code? Open in Web Editor NEW
20.0 7.0 6.0 69 KB

Peakzilla is a self-learning algorithm to identify transcription factor binding sites from ChIP-seq data. I would be very happy if you try it and provide me with feedback, so I can improve peakzilla and make you a happier user. Please feel free to send an e-mail to: [email protected]

Home Page: http://stark.imp.ac.at/data/peakzilla/

License: GNU General Public License v2.0

Python 100.00%

peakzilla's Introduction

PEAKZILLA
---------
Peakzilla identifies sites of enrichment and transcription factor binding sites from transcription factor ChIP-seq and ChIP-exo experiments at hight accuracy and resolution. It is designed to perform equally well for data from any species. All necessary parameters are estimated from the data. Peakzilla is suitable for both single and paired end data from any sequencing platform.

Note that peakzilla is not suited for the identification of broad regions of enrichment (e.g. ChIP-seq for histone marks), we recommand using MACS instead: Zhang et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol (2008) 9(9):R137

USAGE
-----
Options:
  -h, --help            show this help message and exit
  -m N_MODEL_PEAKS, --model_peaks=N_MODEL_PEAKS
                        number of most highly enriched regions used to
                        estimate peak size: default = 200
  -c ENRICHMENT_CUTOFF, --enrichment_cutoff=ENRICHMENT_CUTOFF
                        minimum cutoff for fold enrichment: default = 2
  -s SCORE_CUTOFF, --score_cutoff=SCORE_CUTOFF
                        minimum cutoff for peak score: default = 1
  -e, --gaussian        use empirical model estimate instead of gaussian
  -p, --bedpe           input is paired end and in BEDPE format
  -l LOG, --log=LOG     directory/filename to store log file to: default =
                        log.txt
  -n, --negative        write negative peaks to negative_peaks.tsv


DEPENDENCIES
------------
* Runs on OSX and Linux based operating systems
* Requires Python 2.5 or greater
* For 2x better perormance use PyPy instead of CPython


INPUT FORMAT
------------
Peakzilla accepts BED formated alignments as input.

For converstion to BED format and working with BED files and alignments in
general I highly reccommend:

* bowtie (http://bowtie-bio.sourceforge.net/)
* SAMtools (http://samtools.sourceforge.net/)
* bedtools (http://code.google.com/p/bedtools/)


WORKFLOW EXAMPLE
----------------
# use bowtie to map uniquely mappable reads to the genome
bowtie -p4 -m1 --sam genome_index input.fastq input.sam
bowtie -p4 -m1 --sam genome_index chip.fastq chip.sam

# convert to BAM format
samtools view -bS input.sam > input.bam
samtools view -bS chip.sam > chip.bam

# convert to BED format
bamToBed -i input.bam > input.bed
bamToBed -i chip.bam > chip.bed

# run peakzilla
python peakzilla.py chip.bed input.bed > chip_peaks.tsv

# Comparison of 2 datasets
#    Determine significant peaks with a score threshold of 10
python peakzilla.py -s 10 chip1.bed input1.bed > chip1_s10_peaks.tsv
#    Determine enriched regions with a score threshold of 2
python peakzilla.py -s 2 chip2.bed input2.bed > chip2_s2_peaks.tsv
#    Overlap significant peaks from chip1 with enriched regions from chip2
intersectBed -a chip1_s10_peaks.tsv -b chip2_s2_peaks.tsv > intersect_peaks.tsv

For example datasets as well as an example of a computational pipeline for the comparative analysis of ChIP-seq datasets, please refer to our publication: Bardet AF et al. A computational pipeline for comparative ChIP-seq analyses. Nature Protocols (2011) 7(1):45-61 (http://www.starklab.org/data/bardet_natprotoc_2011/)

OPTIONS
-------
One of peakzilla's design goals is to learn all the necessary information
from the data. The usage of the options should therefore not be required.

OUTPUT FORMAT
-------------
* Results are printed as a table of tab delimited values to stdout
* Logs are appended to logs.txt in the current directory or a custom directory/filename specified by the -l option
* Enriched regions in the control sample are written to negative_peaks.tsv or a custom directory/filename specified by the -n option
* Columns represent Chromosome / Start / End / Name / Summit / Score / ChIP / Control / FoldEnrichment / DistributionScore / FDR (%)

peakzilla's People

Contributors

bgruening avatar steinmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

peakzilla's Issues

bam input

It would be nice if peakzilla could directly load bam files

about ChIP-exo paired bed files

Dear Jonas,

Thank you for coding such a great tool for peak calling. It will greatly
contribute to my graduate project.

I would like to ask couple questions. I would like to process a public GEO
dataset that contains chip-exo data. Each sample contains two bed files (
for paired ends reads). therefore can I use peakzilla to call peaks as it
was advertised in the abstract of peakzilla paper ? If so could you
describe me how ? because in github tutorial there is only single end read
run is shown.

Best regards,

Tunc.

Peakzilla ChIP-exo published result data (Rhee and Pugh, 2011) replication

Dear Jonas ,

Greetings for the day !!!

I have chip-exo data. To call peak, I am planning to used peakzilla.
Before trying out on our lab data, my boss wants me to replicate the result published in peakzilla paper.

I have three bam file for three replicates of CTCF data from Rhee and Pugh,2011. I run Peakzilla with default parameter,

Below are the results I got
image

But in publication, you have mentioned 36 bp peak and 60 smallest peak to peak distance.

I think it may be the peakzilla version which I am using is different than used during publication.

Can you please explain to me why I am getting such results.

Command used:

python ~/Tools/peakzilla-master/peakzilla.py input.bed >$out"s1_peaks.tsv"

Best Regards,
Sudhir

peakzilla crashes

I'm happy to provide data files separately.

$ python peakzilla/peakzilla.py ip.bed bg.bed
Traceback (most recent call last):
  File "peakzilla/peakzilla.py", line 817, in <module>
main()
  File "peakzilla/peakzilla.py", line 89, in main
peak_model = PeakShiftModel(ip_tags, options)
  File "peakzilla/peakzilla.py", line 368, in __init__
self.build()
  File "peakzilla/peakzilla.py", line 381, in build
top_shifts.append(self.peak_shifts[i][1])
IndexError: list index out of range
$ uname -a
Darwin  12.4.0 Darwin Kernel Version 12.4.0: Wed May  1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64 i386 MacBookPro9,2 Darwin
$ python --version
Python 2.7.5
$ 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.