I figured out the model.fit takes batch_size * batches_per_epoch samples. However, we

chunk_size value about protein_bert HOT 4 CLOSED

davoudisaeedeh commented on June 25, 2024

chunk_size value

from protein_bert.

Comments (4)

nadavbra commented on June 25, 2024

Hi @dsaeedeh, can you please provide more context about your question? Which script/function of ProteinBERT are you using exactly?

from protein_bert.

davoudisaeedeh commented on June 25, 2024

Hi,
In class ModelTrainer existed in pretraining.py file, there is a function:
def train_next_epoch(self, autosave = True):
changed_episode, episode = self.epoch_generator.determine_episode_and_ready_next_epoch()
if changed_episode:
log('Starting a new episode with seq_len = %d.' % episode.seq_len)
self.model_generator.dummy_epoch = self.epoch_generator.create_dummpy_epoch()[:2]
self.model_generator.update_state(self.model)
self.model = self.model_generator.create_model(episode.seq_len)
X, Y, sample_weigths = self.epoch_generator.create_next_epoch()
log('Epoch %d (current sample %d):' % (self.current_epoch_index, self.epoch_generator.current_sample_index))
self.model.fit(X, Y, sample_weight = sample_weigths, batch_size = episode.batch_size, callbacks = self.fit_callbacks)

model.fit takes X and Y with size of batch_size * batches_per_epoch samples. It means that we only need to import this number of samples into the memory each time. So, can we reduce chunk_size from 100,000 samples to this number ?

from protein_bert.

nadavbra commented on June 25, 2024

What dataset are you training on? Are you using the same seq_len throughout the entire pretraining (without switching to episodes to different protein lengths)? The idea of a larger chunk_size is to make the process more efficient and run faster by making fewer storage reads, but sure you can make it smaller if you want.

from protein_bert.

davoudisaeedeh commented on June 25, 2024

My dataset is the same as yours but with a different annotation vector. I am using a fixed seq_len throughout the entire pre-training. Thanks for your reply. I agree with you however, in case of memory usage I think smaller chunk_size would be more efficient.

from protein_bert.

Recommend Projects

chunk_size value about protein_bert HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent