fungcat-function-prediction's People
fungcat-function-prediction's Issues
Speed up algorithms
Our algorithms can take a long time (sinksource one-versus-none takes longer than an hour and a half).
goterm | network | algorithm | cpu time (min) |
---|---|---|---|
pathogenesis | 2017_10-seq-sim-x5-string | local-ova | 3.06 |
local-ovn | 4.2 | ||
fun-flow-ovn | 11.8 | ||
sinksource-ova | 36.8 | ||
sinksource-ovn | 108 | ||
genemania-ova | 63.1 | ||
toxic substance binding | 2017_10-seq-sim-x5-string | local-ova | 3.06 |
local-ovn | 4.1 | ||
fun-flow-ovn | 11.4 | ||
sinksource-ova | 12.7 | ||
sinksource-ovn | 21.4 | ||
genemania-ova | 16.9 |
We have one big network for all the proteins of all these species (70K nodes, 5M edges), but if for a specific function we only give high confidence predictions for a very small subset of proteins, we don't need the algorithm to give a score to the entire network, just that small portion.
Update GAIN to output the set of negatives for each GO term
More details outlined in the group google doc
Dockerizing GAIN
Guarav requested that every prediction method also be responsible for creating a docker-version of their method so the ensemble prediction methods can try out different inputs or cross-validation methods without relying on us to run our methods and give them the results.
Some questions to consider:
- What will be the inputs and outputs for our image? Preferred inputs will be something like:
- A set of annotations in a GAF file
- A set of GO terms to make predictions for
- A network containing some combination of a sequence similarity network and a STRING network. The difficulty here is so far we're the only ones who use these networks so I'm not sure how that will work into their pipeline.
Update process-go to transfer NOT annotations
The process-go tool can transfer annotations up the GO DAG, and unknown annotations down the GO DAG. NOT annotations should also be transferred down the GO DAG.
We could also probably find another tool to transfer annotations for us.
Implement additional algorithms
Additional algorithms to implement as well as a short description:
Algorithm | Code Availability | Description |
---|---|---|
AptRank | matlab and julia | An adaptive PageRank model that incorporates the GO hierarchy with functional association networks |
deepNF | python | A network fusion method based on Multimodal Deep Autoencoders |
Integrate sequence similarity and STRING networks
The Sequence Similarity (SS) network weights range from 0 to 200, while the STRING network weights range from 150 (we're currently using a cutoff of 400) to 1000.
Currently I'm using a very simple approach and just multiplying the SS weights by 5, but we should really be using a smarter method to prioritize weights based on the contribution of these networks.
A python implementation of the original GeneMANIA algorithm (2008) is available here, but we're really interested in using the Simultaneous Weights (SW) GeneMANIA method (2010) to integrate the weights as it optimizes the weights based on labels of all functions simultaneously. The matlab code is available here.
Missing library
When I build GAIN on Ubuntu 16 and try to run it on Ubuntu 14, the library libboost_filesystem.so.1.58.0
is missing:
/home/jeffl/src/c++/biorithm/trunk/gain/gain: error while loading shared libraries: libboost_filesystem.so.1.58.0: cannot open shared object file: No such file or directory
and when I build on 14 and run on 16, libboost_filesystem.so.1.54.0
is missing:
/home/jeffl/src/c++/biorithm-ubuntu14/trunk/gain/gain: error while loading shared libraries: libboost_filesystem.so.1.54.0: cannot open shared object file: No such file or directory
On some of the lab computers, both of those libraries are missing.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.