Please refer to the documentation for more in depth details.
Berg JA, et. al. (2019). XPRESSyourself: Enhancing and Automating the Ribosome
Profiling and RNA-Seq Analysis Toolkit. bioRxiv 704320; doi: https://doi.org/10.1101/704320
The following is a short tutorial showing you how to install XPRESSpipe:
- Make sure you let Anaconda set up the PATH info for you.
- If the help menu is not displayed when testing, try adding the path where you installed XPRESSpipe to the system PATH
$ echo 'export PATH=$PATH:/path/to/xpresspipe' >> ~/.bash_profile
- If you do not have a file names
~/.bash_profile
, try looking for one called~/.profile
- The commands used in the video above are summarized here:
$ curl -L -O https://github.com/XPRESSyourself/XPRESSpipe/archive/v0.2.3b0.zip
$ unzip v0.2.3b0.zip
$ cd XPRESSpipe-0.2.3b0/
$ conda env create -f requirements.yml
$ conda activate xpresspipe
$ python setup.py install
$ xpresspipe -h
$ xpresspipe test
- Be sure to specify the correct release version in the first URL
-
You can also use the XPRESSpipe command builder and executor for reference curation or running the pipeline by executing the following:
$ xpresspipe build
input
directory with raw sequence data- Sequence data files should be
FASTQ
format and end in.fastq
or.fq
and can be.zip
or.gz
compressed
- Sequence data files should be
- An empty
output
directory - A
reference
directory (see documentation forcurateReference
for more details)
In order for ordered output after alignment (except for generation of a raw counts table), recommended file naming conventions should be followed.
- Download your raw sequence data and place in a folder -- this folder should contain all the sequence data and nothing else.
- Make sure files follow a pattern naming scheme. For example, if you had 3 genetic backgrounds of ribosome profiling data, the naming scheme would go as follows:
ExperimentName_BackgroundA_FP.fastq(.qz)
ExperimentName_BackgroundA_RNA.fastq(.qz)
ExperimentName_BackgroundB_FP.fastq(.qz)
ExperimentName_BackgroundB_RNA.fastq(.qz)
ExperimentName_BackgroundC_FP.fastq(.qz)
ExperimentName_BackgroundC_RNA.fastq(.qz)
- If the sample names are replicates, their sample number needs to be indicated.
- If you want the final count table to be in a particular order and the samples ordered that way are not alphabetically, append a letter in front of the sample name to force this ordering.
ExperimentName_a_WT.fastq(.qz)
ExperimentName_a_WT.fastq(.qz)
ExperimentName_b_exType.fastq(.qz)
ExperimentName_b_exType.fastq(.qz)
- If you have replicates:
ExperimentName_a_WT_1.fastq(.qz)
ExperimentName_a_WT_1.fastq(.qz)
ExperimentName_a_WT_2.fastq(.qz)
ExperimentName_a_WT_2.fastq(.qz)
ExperimentName_b_exType_1.fastq(.qz)
ExperimentName_b_exType_1.fastq(.qz)
ExperimentName_b_exType_2.fastq(.qz)
ExperimentName_b_exType_2.fastq(.qz)
- We can run a test dataset as in the associated manuscript by downloading the FASTQ files from GSE65778 using the SRAtoolkit.
- We can curate the reference like so:
$ xpresspipe curateReference -o /path/to/reference -f /path/to/reference/genome_fastas -g /path/to/reference/transcripts.gtf -p -t --sjdbOverhang 49
- And we can process the dataset like so:
xpresspipe riboseq -i /path/to/input -o /path/to/output -r /path/to/reference/ --gtf /path/to/reference//transcripts_CT.gtf -e isrib_test_study -a CTGTAGGCACCATCAAT --sjdbOverhang 49
-
The above steps will be very computationally intensive, so we recommend running this on a supercomputing cluster
-
Scripts used to analyze this data can be found here and here and here
-
Alternatively, smaller test datasets can be found within the XPRESSpipe
tests
folder and an outline of commands to run can be found here