Giter VIP home page Giter VIP logo

rohitdwivedula / enzyme-classification Goto Github PK

View Code? Open in Web Editor NEW
11.0 1.0 2.0 78.64 MB

Predict the enzyme class of a given FASTA sequence using deep learning methods including CNNs, LSTM, BiLSTM, GRU, and attention models along with a host of other ML methods.

License: MIT License

Python 82.76% Jupyter Notebook 17.24%
enzyme-classification bioinformatics proteins smote-sampling adasyn-sampling machine-learning deep-learning neural-networks

enzyme-classification's Introduction

License: MIT DOI

ABLE: Attention Based Learning for Enzyme classification

Attention based deep learning model to classify a given protein sequence into the seven classes of enzymes or a negative class. The code for the machine learning and deep learning models (with cross validation and sampling methods), along with the code for preprocessing (vectorization) and postprocessing (statistical testing and plotting) can be found in the /src folder. Run the machine learning models by running python3 ml_models.py in the directory. DL models can be run using the script run_dl_models.py which can be run by a command of the format:

python3 run_dl_models.py CNN --epochs 200 --batch 256

epochs and batch are optional arguments which default to 100 and 128 respectively, if not specified. The model used has to be the first argument to the script and has to be one of CNN, LSTM, BILSTM, GRU, ABLE.

Data used is provided in the data.zip file (64MB) - containing two pickle files (X.pickle and y.pickle), which contain the vectorized representations of the data. X.pickle contains an array of shape (127537, 3, 100) - 1,27,537 proteins represented in vectorized form of size (3, 100). Y.pickle contains the labels for these data (numbers 0 to 7), with 0 representing the negative class. Before running any of the code in src/, make sure to extract data.zip in the root of this repository to create the data/ directory with these two files.

The results directory contains all information about the performance of the models - including runtimes, multiclass confusion matrices, f-score, precision, and recall, among other metrics, as both Python pickle files and CSVs.

  • results/dl contains the training history for each run of the models, stored as .npy files - the syntax for each filename in this directory is {{MODEL_NAME}}_{{SAMPLING_METHOD}}_{{TESTING_FOLD}}_{{NUM_EPOCHS}}_{{BATCH_SIZE}}.npy.
  • Performance metrics of the runs are stored in these pickle files: ABLE_results.pickle BILSTM_results.pickle CNN_results.pickle GRU_results.pickle LSTM_results.pickle ML_results.pickle, in the results directory
  • The Jupyter Notebook in results/postprocessing contains the scripts to save Wilcoxin Signed Rank Test results between all pairs of models to files, on all performance metrics (precision, recall, f-score, balanced accuracy).

If you find these code or results useful in your research, please consider citing:

@article{ABLE2021,
    title = {ABLE: Attention Based Learning for Enzyme Classification},
    journal = {Computational Biology and Chemistry},
    pages = {107558},
    year = {2021},
    issn = {1476-9271},
    doi = {https://doi.org/10.1016/j.compbiolchem.2021.107558},
    url = {https://www.sciencedirect.com/science/article/pii/S1476927121001250},
    author = {Nallapareddy Mohan Vamsi and {Rohit Dwivedula}},
}

A preprint of this article is also available on BioRxiv, though the preprint does not contain results from ADASYN sampling and comparisions with DeepEC (these additions were made during the peer review process).

enzyme-classification's People

Contributors

rohitdwivedula avatar vam-sin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

enzyme-classification's Issues

Using with my fasta

Hi,

I read your pre-print at biorxiv and I would like to apply it to my proteins to predict their EC number. How would be the easiest way to do that?

Kind regard,
Lucas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.