Giter VIP home page Giter VIP logo

pathwayforte / pathway-forte Goto Github PK

View Code? Open in Web Editor NEW
13.0 7.0 6.0 2.18 MB

A Python package for benchmarking pathway database with functional enrichment and classification methods

Home Page: https://pathwayforte.readthedocs.io/

License: Apache License 2.0

Python 97.66% Shell 2.34%
pathway-analysis machine-learning bioinformatics databases benchmarking systems-biology pathway-enrichment-analysis

pathway-forte's Introduction

PathwayForte Build Status Documentation Status Coverage Status zenodo

A Python package for benchmarking pathway databases with functional enrichment and prediction methods tasks.

If you find pathway_forte useful for your work, please consider citing:

Installation Current version on PyPI Stable Supported Python Versions Apache-2.0

pathway_forte can be installed from PyPI with the following command in your terminal:

$ python3 -m pip install pathway_forte

The latest code can be installed from GitHub with:

$ python3 -m pip install git+https://github.com/pathwayforte/pathway-forte.git

For developers, the code can be installed with:

$ git clone https://github.com/pathwayforte/pathway-forte.git
$ cd pathway-forte
$ python3 -m pip install -e .

Main Commands

The table below lists the main commands of PathwayForte.

Command Action
datasets Lists of Cancer Datasets
export Export Gene Sets using ComPath
ora List of ORA Analyses
fcs List of FCS Analyses
prediction List of Prediction Methods

Functional Enrichment Methods

  • ora. Lists Over-Representation Analyses (e.g., one-tailed hyper-geometric test).
  • fcs. Lists Functional Class Score Analyses such as GSEA and ssGSEA using GSEAPy.

Prediction Methods

pathway_forte enables three classification methods (i.e., binary classification, training SVMs for multi-classification tasks, or survival analysis) using individualized pathway activity scores. The scores can be calculated from any pathway with a variety of tools (see1) using any pathway database that enables to export its gene sets.

  • binary. Trains an elastic net model for a binary classification task (e.g., tumor vs. normal patients). The training is conducted using a nested cross validation approach (the number of cross validation in both loops can be selected). The model used can be easily changed since most of the models in scikit-learn (the machine learning library used by this package) required the same input.
  • subtype. Trains a SVM model for a multi-class classification task (e.g., predict tumor subtypes). The training is conducted using a nested cross validation approach (the number of cross validation in both loops can be selected). Similarly as the previous classification task, other models can quickly be implemented.
  • survival. Trains a Cox's proportional hazard's model with elastic net penalty. The training is conducted using a nested cross validation approach with a grid search in the inner loop. This analysis requires pathway activity scores, patient classes and lifetime patient information.

Other

  • export. Export GMT files with current gene sets for the pathway databases included in ComPath2.
  • datasets. Lists the TCGA data sets3 that are ready to run in pathway_forte.

References

License

The Pathway Forte logo is derived from "Muscle Fat" by Lorc, used under CC BY 3.0.

Disclaimer

PathForte is a scientific software that has been developed in an academic capacity, and thus comes with no warranty or guarantee of maintenance, support, or back-up of data.


  1. Lim, S., et al. (2018). Comprehensive and critical evaluation of individualized pathway activity measurement tools on pan-cancer data. Briefings in bioinformatics, bby125.

  2. Domingo-Fernández, D., et al. (2018). ComPath: An ecosystem for exploring, analyzing, and curating mappings across pathway databases. npj Syst Biol Appl., 4(1):43.

  3. Weinstein, J. N., et al. (2013). The cancer genome atlas pan-cancer analysis project. Nature genetics, 45(10), 1113.

pathway-forte's People

Contributors

cthoyt avatar ddomingof avatar sarahbeenie avatar vinaysb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pathway-forte's Issues

Excise functions from CLI

Right now, all of the main functions that do the heavy lifting are implemented inside CLI functions. Make them their own functions that can be called programatically, and are then just wrapped by the CLI functions

Add all modules to docs

Ensure all modules (enrichment, prediction and corresponding submodules) are in the documentation

Split data pre-processing from machine learning

As I'm trying to write the implementation section for the manuscript, it's apparent that the package needs quite a bit of restructuring to make claims like it's extensible to new database types and new data sets. All of the dataset pre-processing for TCGA should be separated from the database pre-processing and the machine learning code

Use r2py to make complete BEL -> SPIA pipeline

Right now in PyBEL tools, we've implemented the data pre-processing that turns a BEL graph into the files that need to be fed into the SPIA R package. Since you have a bit of experience wrapping R scripts with r2py, do you think it would be possible to do the same treatment again?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.