Giter VIP home page Giter VIP logo

peptides-spectra-annotations-pasta's Introduction

Peptides SpecTra Annotations (PaSTA)

Background

Unlike the genomics field, currently most tools/workflows for analyzing proteomics data are either tied to a specific platform, such as Galaxy, or an operating system (OS), such as Microsoft Windows or Linux. This lack of publicly available, non-platform/OS-dependent and reusable proteomics tools and workflows is preventing valuable public proteomic datasets, such as those in NCI’s Proteomic Data Commons, to be analyzed. This proposal is to create an analysis workflow to generate annotated peptide sequence from proteomic spectra using containerized tools.

Challenges in the field

TBD

Workflow

Peptides SpecTra Annotations (PaSTA)

Prerequisite

Installation

Download the git repo

git clone https://github.com/NCBI-Hackathons/Peptides-SpecTra-Annotations-PaSTA
cd Peptides-SpecTra-Annotations-PaSTA

Download the dataset used

wget --recursive --no-parent --reject="index.html*" -e robots=off https://cptc-xfer.uis.georgetown.edu/publicData/Phase_II_Data/TCGA_Colorectal_Cancer_S_022/TCGA-A6-3807-01A-22_Proteome_VU_20121019/TCGA-A6-3807-01A-22_Proteome_VU_20121019_mzML/
gzip -d cptc-xfer.uis.georgetown.edu/publicData/Phase_II_Data/TCGA_Colorectal_Cancer_S_022/TCGA-A6-3807-01A-22_Proteome_VU_20121019/TCGA-A6-3807-01A-22_Proteome_VU_20121019_mzML/TCGA-A6-3807-01A-22_W_VU_201210*.mzML.gz

Download Human reference proteome database from UniProt

wget -O AUP000005640_sp.fasta "https://www.uniprot.org/uniprot/?query=reviewed:yes+AND+proteome:UP000005640&format=fasta"

Install MSGFPlus

wget https://github.com/MSGFPlus/msgfplus/releases/download/v2018.07.17/v2018.07.17.zip
mkdir -p software/MSGFPlus
unzip -d software/MSGFPlus v2018.07.17.zip
rm v2018.07.17.zip

Install Percolator

wget https://github.com/percolator/percolator/releases/download/rel-3-02-01/ubuntu64.tar.gz
tar -xvzf ubuntu64.tar.gz
sudo dpkg -i *.deb
sudo apt-get install -f

Install Mimic

wget https://github.com/percolator/mimic/archive/rel-1-00.zip
unzip rel-1-00.zip
cd mimic-rel-1-00
cmake -DCMAKE_INSTALL_PREFIX=$(pwd)/../software/ src/ && make && make install
cd ..
rm -rf mimic-rel-1-00 rel-1-00.zip

Run the whole pipeline

bash examples/workflow_mimic_msgf_percolator.sh
bash examples/run_one_example.sh

Default Mods.txt can be found in: software/MSGFPlus/doc/examples Instruction for adding custom modifications is also available in Mods.txt.

Workflow availability on the NCI Cloud Resources

A proof-of-concept of this workflow has been created on the Seven Bridges Cancer Genomics Cloud using Rabix Composer in Common Workflow Language Version 1.

Schematic of the Workflow on the Seven Bridges Cancer Genomics Cloud

Docker Instructions

A Docker image for the tools in the workflow is avialable here. The image includes all the prerequisites and dependencies.
To run the Docker image -

docker run -v `pwd`:`pwd` -w `pwd` -i -t stevetsa/proteomics:latest

This mounts the current working directory to the same directory structure inside the container. You will be able to access all files and folders downstream of the current working directory.
All the executibles are in /usr/bin, /usr/local/bin, MS-GF+ JAR file is /opt/MSGFPlusv2018.07.17.jar

Presentations

Resources

Future development

  • Downstream analysis: like meme-suite
  • Run the whole pipeline in a Docker image

peptides-spectra-annotations-pasta's People

Contributors

allissadillman avatar bunseki2 avatar hsiaoyi0504 avatar kaveelim avatar palchamy avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

peptides-spectra-annotations-pasta's Issues

Make directory structure in the script consistent with docker image

A Docker image for the tools in the workflow is avialable here. The image includes all the prerequisites and dependencies.
To run the Docker image -

docker run -v pwd:pwd -w pwd -i -t stevetsa/proteomics:latest
This mounts the current working directory to the same directory structure inside the container. You will be able to access all files and folders downstream of the current working directory.
All the executibles are in /usr/bin, /usr/local/bin, MS-GF+ JAR file is /opt/MSGFPlusv2018.07.17.jar

update workflow diagram

add Percolator to take the mass spec identified output and recalculates the false discovery rate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.