Giter VIP home page Giter VIP logo

Comments (2)

nadavbra avatar nadavbra commented on July 24, 2024 1

@r-kellerm

The sentence you quote refers to the pretraining of ProteinBERT. For fine-tuning, we did filter sequences by length:

Throughout the entire fine-tuning process we used a sequence length of 512 tokens, except for a final epoch of 1,024 tokens which was introduced to encourage the model to generalize to different sequence lengths. For all but the last epoch, we filtered out all the proteins larger than 512 tokens from the training and validation set, and trained the model on the remaining records. For the final epoch, we trained the model on records from the training and validation set of size up to 1,024 tokens.

(see "Fine-tuning and evaluation" in the Supplementary Methods for more details)

Since ProteinBERT is pretrained on all sequence lengths and includes different sequence lengths in fine-tuning, we show in our paper that it can generalize to sequence lengths that aren't encountered during fine-tuning (see Figure 4 in the paper).

In all cases, whether or not sequences are filtered by length, START and END tokens must be included prior to sub-sequencing.

from protein_bert.

r-kellerm avatar r-kellerm commented on July 24, 2024

Thank you very much for the clarification!

from protein_bert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.