Giter VIP home page Giter VIP logo

deepnog's People

Contributors

alepfu avatar colligant avatar dependabot[bot] avatar lokiluciferase avatar phyden avatar saper0 avatar varir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deepnog's Issues

Update readme

The readme needs an update.
Changes:

  • Badge for Github Actions builds
  • List available eggNOG 5 and COG 2020 models
  • upcoming paper in Bioinformatics
  • Remove requirements' versions from readme (not tested recently)

Progress bar

Progress bars currently log minibaches per second. This may not be very helpful for typical users, due to the terminology and the fact that it does not say how large each minibatch is.

Sequences/sec would be more informative.

CLI set confidence threshold

There should be an option to set the confidence threshold from the CLI, so that users can easily choose on their own.

CI with Python 3.9

Python 3.9 is available now, and all our dependencies should have been updated.
Let's add Actions for 3.9 on Linux and macOS.
Could also set up Actions for 3.9 on Windows, which would allow to phase out AppVeyor as well.
(However, it might be better to keep it, considering quotas of CI providers. Need to check).

DeepNOG implementation doesn't use dropout

dropout with p=0.3 is declared in the __init__ method of class DeepNOG, yet is not called in model.forward(). i'm not sure where to add it based on the information in the paper.

Galaxy integration

The eggNOG integration in Galaxy is quite popular. I think it would be nice to add deepnog as well to Galaxy.

Model versioning

Currently, deepnog ships one model per eggnog level and network architecture.
If we ever decide to retrain certain models, users need to individually come up with strategies to tell models apart, or use a specific model (e.g., for reproducibility), such as manually moving files around, renaming accordingly, etc.
Retraining, however, could sometimes make sense. For example, we might want to use different data splits, increase the share of training sequences compared to test sequences to squeeze a little more performance out of the model.

We should at least introduce some versioning, model identifiers, etc., that are stored with the model. Could be a simple string inside the model_dict. This could even be "backported" to existing models.

Ideally, automatic model download should also be version-aware. Currently, a user that already has downloaded a model will not receive any updated model.

Error handling

The client should not throw errors, but log helpful messages for the user.

Switching to other CI providers?

We are hitting the new limits on Travis CI introduced in Nov 2020.
Especially the MacOS builds are hurting us (50 credits/min compared to 10 credits/min for Linux).
We should be eligible for OSS extra credits, which I'll try to obtain.

However, in the long run, we should consider using other CI providers, e.g. GitHub actions or Azure pipelines,
depending on what they offer for public open source projects.

Remove training with iterable datasets?

As discussed in #43 training with iterable datasets is rarely used. In particular, training without shuffling might never be useful.
Due to that, the corresponding functionality is not subject to many tests, and was broken for some time, until this was pointed out in PR #43.

While for the time being, the bug is fixed, and additional tests have been put in place, we might want to remove these functions altogether. This could reduce maintenance cost, and improve quality.

Travis Ubuntu builds fail

The builds fails with a RuntimeError: code is too big from PyTorch.
Not reproducible on local machines on Fedora or CentOS. MacOS builds work, too.
For now, the Travis Ubuntu builds are allowed to fail.

Erroneous packaging in v1.2.0

There was an error in packaging deepnog 1.2.0 for PyPI, hitting the Linux/macOS wheel.
Old modules were not removed and interfered with the new package structure.
A new version 1.2.1 is available now on PyPI, which is essentially identical, but packaged correctly.

Please update to 1.2.1.

Number of threads for CPU training

We should document how to set the number of threads for training on CPUs (in case anyone would like to do that).
Basically, it's export OMP_NUM_THREADS=8 for intra-op parallelism.

Alternatively, this may be set programatically with torch.set_num_threads() and set_num_interop_threads().

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.