Giter VIP home page Giter VIP logo

phip-stat's Introduction

Pipeline for analyzing PhIP-seq data

See Larman et. al. for more information.

The pipeline assumes that the initial data set is a single .fastq file with all the reads, including a possible barcode in the header of each read.

First start by splitting the data into smaller chunks:

fastq2parts.py -i in.fastq -o workdir/parts -p 2000000

Then align each read to the reference PhIP-seq library using bowtie (making sure to set the right queue):

bowtie_parts_with_LSF.py -i workdir/parts -o workdir/alns -x path/to/index_name.ebwt -l workdir/logs_aln -q short_serial

Then reads are reorganized according to barcode. The mapping file should be a tab-separated file with the barcode sequence as the first column and the sample name as the second column. The sample names should be something that's friendly as a UNIX filename:

parts2barcodes.py -i workdir/alns -o workdir/barcodes -m mapping.tsv

Now we must generate the counts and p-values. There are two ways to proceed:

  • Generate a single count file and a single p-value file, and have them all calculated at the same time.
  • Perform counts and p-values on each sample separately. Each p-value calculation gets dispatched as a separate job.

For the single-file method:

alns2counts.py -i workdir/barcodes -o workdir/counts.csv -r input_counts.csv
counts2pvals.py -i workdir/counts.csv -o workdir/pvals.csv

For the parallel method (make sure to set the queue):

alns2counts_separated.py -i workdir/barcodes -o workdir/counts -r input_counts.csv
counts2pvals_separated.py -i workdir/counts -o workdir/pvals -q short_serial -l logs_pvals
ls -l workdir/pvals/*.csv | awk '$5 == 0 {print $8}' | xargs rm -f  # remove empty pval files
merge_columns.py -i workdir/pvals -o workdir/pvals.csv

Note that any of these commands can be dispatched to the LSF job scheduler.

phip-stat's People

Contributors

laserson avatar

Stargazers

Jie Huang avatar

Watchers

Jie Huang avatar James Cloos avatar

Forkers

oklasoft gjx

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.