Giter VIP home page Giter VIP logo

xpresspipe's Introduction

XPRESSpipe

An alignment and analysis pipeline for RNAseq data

Build Status codecov.io Documentation Status DOI


Please refer to the documentation for more in depth details.

Citation:

Berg JA, et. al. (2019). XPRESSyourself: Enhancing and Automating the Ribosome
Profiling and RNA-Seq Analysis Toolkit. bioRxiv 704320; doi: https://doi.org/10.1101/704320

Installation:

Installing from source

The following is a short tutorial showing you how to install XPRESSpipe:
asciicast

  • Make sure you let Anaconda set up the PATH info for you.
  • If the help menu is not displayed when testing, try adding the path where you installed XPRESSpipe to the system PATH
$ echo 'export PATH=$PATH:/path/to/xpresspipe' >> ~/.bash_profile
  • If you do not have a file names ~/.bash_profile, try looking for one called ~/.profile
  • The commands used in the video above are summarized here:
$ curl -L -O https://github.com/XPRESSyourself/XPRESSpipe/archive/v0.2.3b0.zip
$ unzip v0.2.3b0.zip
$ cd XPRESSpipe-0.2.3b0/
$ conda env create -f requirements.yml
$ conda activate xpresspipe
$ python setup.py install
$ xpresspipe -h
$ xpresspipe test
  • Be sure to specify the correct release version in the first URL

QuickStart:

  • Reference building
    asciicast

  • Running XPRESSpipe on sequence data
    asciicast

  • You can also use the XPRESSpipe command builder and executor for reference curation or running the pipeline by executing the following:

$ xpresspipe build

Important Notes:

Basic Starting Input

  • input directory with raw sequence data
    • Sequence data files should be FASTQ format and end in .fastq or .fq and can be .zip or .gz compressed
  • An empty output directory
  • A reference directory (see documentation for curateReference for more details)

Naming Conventions

In order for ordered output after alignment (except for generation of a raw counts table), recommended file naming conventions should be followed.

  1. Download your raw sequence data and place in a folder -- this folder should contain all the sequence data and nothing else.
  2. Make sure files follow a pattern naming scheme. For example, if you had 3 genetic backgrounds of ribosome profiling data, the naming scheme would go as follows:
ExperimentName_BackgroundA_FP.fastq(.qz)
ExperimentName_BackgroundA_RNA.fastq(.qz)
ExperimentName_BackgroundB_FP.fastq(.qz)
ExperimentName_BackgroundB_RNA.fastq(.qz)
ExperimentName_BackgroundC_FP.fastq(.qz)
ExperimentName_BackgroundC_RNA.fastq(.qz)
  1. If the sample names are replicates, their sample number needs to be indicated.
  2. If you want the final count table to be in a particular order and the samples ordered that way are not alphabetically, append a letter in front of the sample name to force this ordering.
ExperimentName_a_WT.fastq(.qz)
ExperimentName_a_WT.fastq(.qz)
ExperimentName_b_exType.fastq(.qz)
ExperimentName_b_exType.fastq(.qz)
  1. If you have replicates:
ExperimentName_a_WT_1.fastq(.qz)
ExperimentName_a_WT_1.fastq(.qz)
ExperimentName_a_WT_2.fastq(.qz)
ExperimentName_a_WT_2.fastq(.qz)
ExperimentName_b_exType_1.fastq(.qz)
ExperimentName_b_exType_1.fastq(.qz)
ExperimentName_b_exType_2.fastq(.qz)
ExperimentName_b_exType_2.fastq(.qz)

Running a test dataset:

$ xpresspipe curateReference -o /path/to/reference -f /path/to/reference/genome_fastas -g /path/to/reference/transcripts.gtf -p -t --sjdbOverhang 49
  • And we can process the dataset like so:
xpresspipe riboseq -i /path/to/input -o /path/to/output -r /path/to/reference/ --gtf /path/to/reference//transcripts_CT.gtf -e isrib_test_study -a CTGTAGGCACCATCAAT --sjdbOverhang 49
  • The above steps will be very computationally intensive, so we recommend running this on a supercomputing cluster

  • Scripts used to analyze this data can be found here and here and here

  • Alternatively, smaller test datasets can be found within the XPRESSpipe tests folder and an outline of commands to run can be found here

xpresspipe's People

Contributors

j-berg avatar jbelyeu avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.