Giter VIP home page Giter VIP logo

fungcat-function-prediction's People

Contributors

jlaw9 avatar

Watchers

 avatar  avatar  avatar  avatar

fungcat-function-prediction's Issues

Speed up algorithms

Our algorithms can take a long time (sinksource one-versus-none takes longer than an hour and a half).

goterm network algorithm cpu time (min)
pathogenesis 2017_10-seq-sim-x5-string local-ova 3.06
local-ovn 4.2
fun-flow-ovn 11.8
sinksource-ova 36.8
sinksource-ovn 108
genemania-ova 63.1
toxic substance binding 2017_10-seq-sim-x5-string local-ova 3.06
local-ovn 4.1
fun-flow-ovn 11.4
sinksource-ova 12.7
sinksource-ovn 21.4
genemania-ova 16.9

We have one big network for all the proteins of all these species (70K nodes, 5M edges), but if for a specific function we only give high confidence predictions for a very small subset of proteins, we don't need the algorithm to give a score to the entire network, just that small portion.

Dockerizing GAIN

Guarav requested that every prediction method also be responsible for creating a docker-version of their method so the ensemble prediction methods can try out different inputs or cross-validation methods without relying on us to run our methods and give them the results.

Some questions to consider:

  • What will be the inputs and outputs for our image? Preferred inputs will be something like:
    1. A set of annotations in a GAF file
    2. A set of GO terms to make predictions for
    3. A network containing some combination of a sequence similarity network and a STRING network. The difficulty here is so far we're the only ones who use these networks so I'm not sure how that will work into their pipeline.

Update process-go to transfer NOT annotations

The process-go tool can transfer annotations up the GO DAG, and unknown annotations down the GO DAG. NOT annotations should also be transferred down the GO DAG.

We could also probably find another tool to transfer annotations for us.

Implement additional algorithms

Additional algorithms to implement as well as a short description:

Algorithm Code Availability Description
AptRank matlab and julia An adaptive PageRank model that incorporates the GO hierarchy with functional association networks
deepNF python A network fusion method based on Multimodal Deep Autoencoders

Integrate sequence similarity and STRING networks

The Sequence Similarity (SS) network weights range from 0 to 200, while the STRING network weights range from 150 (we're currently using a cutoff of 400) to 1000.

Currently I'm using a very simple approach and just multiplying the SS weights by 5, but we should really be using a smarter method to prioritize weights based on the contribution of these networks.

A python implementation of the original GeneMANIA algorithm (2008) is available here, but we're really interested in using the Simultaneous Weights (SW) GeneMANIA method (2010) to integrate the weights as it optimizes the weights based on labels of all functions simultaneously. The matlab code is available here.

Missing library

When I build GAIN on Ubuntu 16 and try to run it on Ubuntu 14, the library libboost_filesystem.so.1.58.0 is missing:

/home/jeffl/src/c++/biorithm/trunk/gain/gain: error while loading shared libraries: libboost_filesystem.so.1.58.0: cannot open shared object file: No such file or directory

and when I build on 14 and run on 16, libboost_filesystem.so.1.54.0 is missing:

/home/jeffl/src/c++/biorithm-ubuntu14/trunk/gain/gain: error while loading shared libraries: libboost_filesystem.so.1.54.0: cannot open shared object file: No such file or directory

On some of the lab computers, both of those libraries are missing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.