Describe the bug Hi. I am training different PyKEEN models (TransE

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

You can see with <div class="highlight highlight-source-python notranslate positio

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Out of memory after some epochs, for some models about pykeen HOT 7 CLOSED

ferzcam commented on June 16, 2024 1

Out of memory after some epochs, for some models

from pykeen.

Comments (7)

renzhonglu11 commented on June 16, 2024 1

+1. I have the same issue when I used the forward() function and loss() function seperately in the trainer.

from pykeen.

mberr commented on June 16, 2024 1

Hi @ferzcam (and @renzhonglu11 ),

this likely comes DistMult using a regularizer by default, cf.

pykeen/src/pykeen/models/unimodal/distmult.py

Lines 85 to 86 in d1222b7

 regularizer: HintOrType[Regularizer] = LpRegularizer, 

 regularizer_kwargs: OptionalKwargs = None,

that needs to accumulate the regularization term across micro-batches. PyKEEN's pipelines call collect_regularization_term which not only collects the regularization terms across different places, but also releases their references. When this is not called, the term accumulates indefinetely, and holds references to buffers -> over time you will run out of memory.

To fix it, either

configure the model without regularizer, e.g., for DistMult pass regularizer=None, or
if you want to make use of the regularizer, make sure to call collect_regularization_term on the models and include this into your loss terms

from pykeen.

renzhonglu11 commented on June 16, 2024 1

@mberr your solution works. Thanks a lot!😁

from pykeen.

mberr commented on June 16, 2024 1

You can see with

from pykeen.datasets import get_dataset
from pykeen.models import DistMult

dataset = get_dataset(dataset="nations")
model = DistMult(triples_factory=dataset.training)
print(model)

how the resulting structure looks like:

DistMult(
  (loss): MarginRankingLoss(
    (margin_activation): ReLU()
  )
  (interaction): DistMultInteraction()
  (entity_representations): ModuleList(
    (0): Embedding(
      (_embeddings): Embedding(14, 50)
    )
  )
  (relation_representations): ModuleList(
    (0): Embedding(
      (regularizer): LpRegularizer()
      (_embeddings): Embedding(55, 50)
    )
  )
  (weight_regularizers): ModuleList()
)

Notice how the relation_representations as a regularizer. compare this to

from pykeen.datasets import get_dataset
from pykeen.models import DistMult

dataset = get_dataset(dataset="nations")
model = DistMult(triples_factory=dataset.training, regularizer=None)
print(model)

resulting in

DistMult(
  ...
  (relation_representations): ModuleList(
    (0): Embedding(
      (_embeddings): Embedding(55, 50)
    )
  )
  ...
)

The regularization term of the relation embedding is updated here

pykeen/src/pykeen/nn/representation.py

Lines 189 to 191 in d1222b7

 # regularize *after* repeating 

 if self.regularizer is not None: 

 self.regularizer.update(x)

i.e., in the Embedding's forward call.

from pykeen.

renzhonglu11 commented on June 16, 2024 1

Nice. Now it makes more sense. And I also found out where Pykeen calls the collect_regularization_term(). Thanks for your explanation.

from pykeen.

renzhonglu11 commented on June 16, 2024

Thanks a lot. But I am still quite not sure if it is really the reason. I took a look at Pykeen's source code. It seems Pykeen calculates the loss just with a function in the trainer. (

pykeen/src/pykeen/training/training_loop.py

Line 643 in d1222b7

batch_loss = self._forward_pass(

)
What I did is that I first call forward() of a model(like Distmult) to calculate the prediction value and then call loss to calculate the loss value. (similar to what @ferzcam did) Then the memory will increase per epoch during training despite using GPU or CPU.
However, when I put forward and loss calculation together in one function just like Pykeen, and call this function. The memory does not increase anymore. So confused about it. 🧐

In my KGE Model

class PykeenKGE:
    def training_step(self, batch):
        x_batch, y_batch = batch
        yhat_batch = self.forward(x_batch)
        loss_batch = self.loss(yhat_batch, y_batch)
        return loss_batch + self.model.collect_regularization_term()

In Trainer:

 batch_loss = self.model.training_step(batch)

from pykeen.

ferzcam commented on June 16, 2024

Hi @mberr. Thanks for the explanation. I was able to make it work now!

from pykeen.

Out of memory after some epochs, for some models about pykeen HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	regularizer: HintOrType[Regularizer] = LpRegularizer,
	regularizer_kwargs: OptionalKwargs = None,

	# regularize after repeating
	if self.regularizer is not None:
	self.regularizer.update(x)