Giter VIP home page Giter VIP logo

tps's Introduction

TPS: Temporal Pathway Synthesizer Circle CI Build Status DOI

TPS is a tool for combining time series global phosphoproteomic data and protein-protein interaction networks to reconstruct the vast signaling pathways that control post-translational modifications.

Reference

Please cite the following manuscript if you make use of the TPS software or our EGF response phosphoproteomic data:

Synthesizing Signaling Pathways from Temporal Phosphoproteomic Data. Ali Sinan Köksal, Kirsten Beck, Dylan R. Cronin, Aaron McKenna, Nathan D. Camp, Saurabh Srivastava, Matthew E. MacGilvray, Rastislav Bodík, Alejandro Wolf-Yadlin, Ernest Fraenkel, Jasmin Fisher, Anthony Gitter. Cell Reports 24(13):3607-3618 2018.

Requirements

TPS runs on both Linux and OS X. The only requirement is:

Installation and sample usage

TPS is built and run using the command-line interface. To use TPS, follow these steps:

  1. Download the code:

     git clone https://github.com/koksal/tps.git
    
  2. Browse to the root project folder:

     cd tps
    
  3. Invoke ./scripts/run. The first time this script is run, it will download sbt-extras, which is a script for running the build tool sbt. After sbt is downloaded, the script will build the code and run TPS with the given command-line arguments. To run TPS using the provided data, copy and paste the following command into the terminal:

     ./scripts/run \
       --network data/networks/input-network.tsv \
       --timeseries data/timeseries/median-time-series.tsv \
       --firstscores data/timeseries/p-values-first.tsv \
       --prevscores data/timeseries/p-values-prev.tsv \
       --partialmodel data/resources/kinase-substrate-interactions.sif \
       --peptidemap data/timeseries/peptide-mapping.tsv \
       --source EGF_HUMAN \
       --threshold 0.01
    

    This command will generate, in the current folder:

    • a network file named output.sif
    • a tab-separated file named activity-windows.tsv

    The output files are described in the Output section.

Command-line arguments

Required arguments

  • --network <file>: Input network file in TSV format, where each row defines an undirected edge.
  • --timeseries <file>: Input time series file in TSV format. The first line defines the time point labels, and each subsequent line corresponds to one time series profile.
  • --firstscores <file>: Input file that contains significance scores for each time point of a profile (except the first time point), with respect to the first time point of the profile.
  • --prevscores <file>: Similar to --firstscores, an input file that gives significance scores for each time point (except the first one), with respect to the previous time point.
  • --source <value>: Identifier for the network source node. Multiple source nodes can be provided by repeating the argument multiple times. For example, --source <node1> --source <node2> --source <node3>.
  • --threshold <value>: Threshold value for significance scores, above which measurements are considered non-significant.

Optional arguments

  • --partialModel <file>: Input partial model file given as a signed directed SIF network. Each line corresponds to a directed interaction, where the relationship type can be N (directed, unsigned edge), A (directed activation edge), or I (directed inhibition edge). Multiple partial model files can be provided.
  • --peptidemap <file>: Input file in TSV format that defines a mapping between time series profile identifiers and input network node identifiers. A profile can be mapped to more than one node, in which case the second column is a pipe-separated list of node identifiers. The file begins with a header row.
  • --outlabel <value>: Prefix string to be added to all output files.
  • --outfolder <value>: Folder in which the output files should be generated. By default, output files are generated in the current directory.
  • --no-connectivity: Do not use connectivity constraints.
  • --no-temporality: Do not use temporal constraints.
  • --no-monotonicity: Do not use monotonicity constraints when inferring activity intervals for time series data.

Preparing input files

We recommend the following strategies for preparing the required input files:

  • --network <file>: The network should be a subnetwork of a protein-protein interaction network that connects the phosphorylated proteins to the source node(s). The Omics Integrator implementation of the Prize-Collecting Steiner Forest algorithm can produce such a subnetwork. To generate more general subnetworks instead of tree-structured graphs, run Omics Integrator with the option to add random noise to edge weights and merge the graphs output by each randomized run. Omics Integrator writes the network in a three column tab-separated format. The second column, the interaction type, must be removed before providing the file to TPS. The scripts in the pcsf subdirectory demonstrate this process.
  • --timeseries <file>: TPS expects a single intensity for each peptide at each time point, which can be calculated by taking the median intensity over all mass spectrometry replicates. TPS allows missing data, which should be denoted by a non-numeric value such as N/A or an empty string. This file must contain a header row, which specifies the time point labels.
  • --firstscores <file>: Significance scores can be naively computed with t-tests comparing the phosphorylation intensity at each time point and the first time point. A preferable option is to account for the comparisons of multiple pairs of time points using Tukey's Honest Significant Difference test, which is implemented as TukeyHSD in R. This test compares all pairs of time points, from which the comparisons to the first time point can be extracted. This file should not contain a header row, and if a header row is provided it should be commented out with a leading # character. If there are t time points in the --timeseries <file>, this file should contain t - 1 significance score columns. Missing values and N/A are not allowed and should be replaced by placeholder scores of 1.0. If a peptide's value is missing in the --timeseries <file> at one or more time points, those time points cannot have significance scores less than the --threshold <value>.
  • --prevscores <file>: Significance scores can be computed in the same manner as the --firstscores <file> except the scores should be based on comparisons of the current time point and the preceding time point. The file format and requirements are the same as the --firstscores <file>.

Output

Summary network

TPS outputs a Simple Interaction Format (SIF) file output.sif that summarizes the valid pathway models. The SIF file can be imported into Cytoscape to visualize the network. Each line has the form:

ProteinA <relationship type> ProteinB

The TPS relationship types are:

  • A: ProteinA activates ProteinB
  • I: ProteinA inhibits ProteinB
  • N: ProteinA regulates ProteinB but the edge sign is unknown
  • U: an undirected edge between ProteinA and ProteinB

Activity windows

TPS also produces a tab-separated file activity-windows.tsv that lists, for each node in the expanded input network, one of four possible activity types per time point:

  • activation: the peptide may be activated at the given time point
  • inhibition: the peptide may be inhibited at the given time point
  • ambiguous: the peptide may be either activated or inhibited at the given time point
  • inactive: the peptide is inactive at the given time point

Solvers

TPS uses by default a custom solver (DataflowSolver). Historically, it also supported two symbolic solvers (NaiveSymbolicSolver and BilateralSolver) that implement the same functionality as the custom solver.

Currently, only the default solver (which is the most recent and fastest of all three) is supported.

Example data

The example dataset included with TPS is our phosphoproteomic time course of the cellular response to EGF stimulation. See the citation information above.

The example network was produced by Omics Integrator run on a network of iRefIndex and PhosphoSitePlus interactions. Please acknowledge and reference PhosphoSitePlus if you use data/resources/kinase-substrate-interactions.sif and both PhosphoSitePlus and iRefIndex if you use data/networks/phosphosite-irefindex13.0-uniprot.txt or data/networks/input-network.tsv.

The yeast osmotic stress response data and analysis are available in the separate osmotic-stress repository.

tps's People

Contributors

agitter avatar koksal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tps's Issues

Installation error

Hi There,

I tried to follow the installation guide on the front page but run into an error.

After doing

git clone https://github.com/koksal/tps.git
cd tps
./scripts/run

I got the following error.

[info] Loading project definition from /Users/ozgun/Documents/Code/tps/project
java.lang.NullPointerException
at java.base/java.util.regex.Matcher.getTextLength(Matcher.java:1769)
at java.base/java.util.regex.Matcher.reset(Matcher.java:416)
at java.base/java.util.regex.Matcher.(Matcher.java:253)
at java.base/java.util.regex.Pattern.matcher(Pattern.java:1130)
at java.base/java.util.regex.Pattern.split(Pattern.java:1249)
at java.base/java.util.regex.Pattern.split(Pattern.java:1322)
at sbt.IO$.pathSplit(IO.scala:744)
at sbt.IO$.parseClasspath(IO.scala:859)
at sbt.compiler.CompilerArguments.extClasspath(CompilerArguments.scala:62)
at sbt.compiler.AggressiveCompile.withBootclasspath(AggressiveCompile.scala:50)
at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:83)
at sbt.compiler.AggressiveCompile.compile1(AggressiveCompile.scala:70)
at sbt.compiler.AggressiveCompile.apply(AggressiveCompile.scala:45)
at sbt.Compiler$.apply(Compiler.scala:74)
at sbt.Compiler$.apply(Compiler.scala:65)
at sbt.Defaults$.sbt$Defaults$$compileTaskImpl(Defaults.scala:789)
at sbt.Defaults$$anonfun$compileTask$1.apply(Defaults.scala:781)
at sbt.Defaults$$anonfun$compileTask$1.apply(Defaults.scala:781)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
at sbt.std.Transform$$anon$4.work(System.scala:63)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
at sbt.Execute.work(Execute.scala:235)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:844)
[error] (compile:compile) java.lang.NullPointerException

I am on MacOS 10.15.5. Any idea why this can happen?

Running TPS without known receptors

In #10 (comment) @ozgunbabur asked

How can we use TPS to model proteomic differences when we don't know the exact location of the perturbation on the network? For instance, a cell type is treated with two different types of media (e.g. bovine serum vs PBS). In that case, does it still make sense to use PCST? Do you have any best practice for such cases?

I wanted to open a new issue to discuss this. There could be multiple approaches, and we may want to keep this discussion active even after #10 is resolved.

My first thought would be to use a large and general list of source proteins or receptors as the source nodes for both PCSF and TPS. I don't have experience using TPS in this manner. However, a similar approach was used with PCSF in Tuncbag et al. 2013. There they used receptors from the Human Plasma Membrane Database. My hope is that when PCSF selects a subnetwork, it will prune the receptors to a smaller set that is reasonable for TPS.

An alternative idea would be to look at your proteomic data and see which receptors are differentially phosphorylated. Those could be used as the PCSF and TPS sources.

I'm interested in supporting this use case and can comment on any preliminary results you get from PCSF or TPS with suggestions.

Statistical tests for pre-processing time series data

There are new time series hypothesis testing frameworks that would likely be better than Tukey's honestly significant difference test. I'm tracking them in this issue as external pre-processing options that TPS users may be interested in. Eventually, we could consider including one in a TPS workflow.

Temporal ordering of omics and multiomic events inferred from time-series data
https://doi.org/10.1038/s41540-020-0141-0

An empirical Bayes change-point model for transcriptome time-course data
https://doi.org/10.1214/20-AOAS1403

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.