Giter VIP home page Giter VIP logo

Comments (3)

r-kellerm avatar r-kellerm commented on July 24, 2024

Hello,

I'm not the developer of this package but can try to answer.

  1. You can extract the embeddings and run clustering on them, e.g.
from proteinbert import load_pretrained_model
from proteinbert.conv_and_global_attention_model import get_model_with_hidden_layers_as_outputs

pretrained_model_generator, input_encoder = load_pretrained_model()
model = get_model_with_hidden_layers_as_outputs(pretrained_model_generator.create_model(seq_len))
encoded_x = input_encoder.encode_X(seqs, seq_len)
local_representations, global_representations = model.predict(encoded_x, batch_size=batch_size)
# Do clustering based on local_representations, global_representations
  1. Pre-training from scratch is beneficial in case you have additional datasets or if you would like to modify the model architecture or the training flow. For clustering or other fine-tuning tasks you don't need to run the training from scratch.
  2. See below - you first encode the input sequences using input_encoder.encode_X(seqs, seq_len) and then send the encoded inputs for inference through model.predict(encoded_x, batch_size=batch_size)

Good luck!

from protein_bert.

nadavbra avatar nadavbra commented on July 24, 2024

I fully endorse @r-kellerm 's answer. Clustering protein sequences based on ProteinBERT's embeddings seems like a sensible thing to do if you want proteins that are functionally similar to cluster together (but I've never actually tried that).

from protein_bert.

ddofer avatar ddofer commented on July 24, 2024

Indeed - take the embeddings and cluster them. I can confirm that this works really well for many tasks, even without fine-tuning (We have another paper using this approach -

Detecting Anomalous Proteins Using Deep Representations
Tomer Michael-Pitschaze, Niv Cohen, Dan Ofer, Yedid Hoshen, Michal Linial
bioRxiv 2023.04.03.535457; doi: https://doi.org/10.1101/2023.04.03.535457

https://www.biorxiv.org/content/10.1101/2023.04.03.535457v1

from protein_bert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.