Giter VIP home page Giter VIP logo

Comments (4)

nadavbra avatar nadavbra commented on June 25, 2024

Hi @dsaeedeh, can you please provide more context about your question? Which script/function of ProteinBERT are you using exactly?

from protein_bert.

davoudisaeedeh avatar davoudisaeedeh commented on June 25, 2024

Hi,
In class ModelTrainer existed in pretraining.py file, there is a function:
def train_next_epoch(self, autosave = True):
changed_episode, episode = self.epoch_generator.determine_episode_and_ready_next_epoch()
if changed_episode:
log('Starting a new episode with seq_len = %d.' % episode.seq_len)
self.model_generator.dummy_epoch = self.epoch_generator.create_dummpy_epoch()[:2]
self.model_generator.update_state(self.model)
self.model = self.model_generator.create_model(episode.seq_len)
X, Y, sample_weigths = self.epoch_generator.create_next_epoch()
log('Epoch %d (current sample %d):' % (self.current_epoch_index, self.epoch_generator.current_sample_index))
self.model.fit(X, Y, sample_weight = sample_weigths, batch_size = episode.batch_size, callbacks = self.fit_callbacks)

model.fit takes X and Y with size of batch_size * batches_per_epoch samples. It means that we only need to import this number of samples into the memory each time. So, can we reduce chunk_size from 100,000 samples to this number ?

from protein_bert.

nadavbra avatar nadavbra commented on June 25, 2024

What dataset are you training on? Are you using the same seq_len throughout the entire pretraining (without switching to episodes to different protein lengths)? The idea of a larger chunk_size is to make the process more efficient and run faster by making fewer storage reads, but sure you can make it smaller if you want.

from protein_bert.

davoudisaeedeh avatar davoudisaeedeh commented on June 25, 2024

My dataset is the same as yours but with a different annotation vector. I am using a fixed seq_len throughout the entire pre-training. Thanks for your reply. I agree with you however, in case of memory usage I think smaller chunk_size would be more efficient.

from protein_bert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.