Hello, thanks for your great work. I have run your training code and found somethi

questions about training about minklocmultimodal HOT 2 CLOSED

jac99 commented on May 28, 2024

questions about training

from minklocmultimodal.

Comments (2)

jac99 commented on May 28, 2024

Hi,

yes, num_non_zero_triplets is the number of triplets with non-zero loss value, which we call active triplets in the paper.
I cannot answer the question about the differences in loss value. But the loss value can be misleading - as the loss is calculated (in BatchHardTripletLossWithMasksHelper class) by averaging the loss of each active triplet (triplets with non-zero loss value). So when the number of active triplets goes down during the training, the value of the training loss may increase - as harder active triplets retain.
Because of this fact, in the paper, we argued that to monitor the training progress and overfitting behavior, it's better to observe the number of active triplets, not the loss value. In the later phases of training, likely the number of active triplets can still decrease while the loss stagnates (because in our implementation the loss is calculated by taking a mean of the loss from active triplets only).

When auxiliary uni-modal losses are disabled (alpha=0 and beta=0 - see Table IV in the paper), there are:

significantly fewer active triplets for images than for point clouds on the training set
but more active triplets for images than for point clouds on the validation set
We concluded that the reason is that the image-processing sub-network was more prone to overfitting than the point cloud-processing sub-network. When constructing the final descriptor, the entire network learns to focus only on input images and mostly ignore the point clouds - because such a strategy gives good results during the training. But due to overfitting, the results on the test set are much worse.
To counteract this problem, we can increase the weight of the point cloud modality (parameter alpha). Now, during the training, the network will focus on both modalities.

I'm not sure what's the reason.
Even if alpha or beta (or both alpha and beta) are zeros, both uni-modal subnetworks will learn something, and the uni-modal loss (and the number of active triplets) will usually get down. This is because we minimize the final loss function - and to minimize the final loss, each uni-modal network should generate discriminative descriptors. This will indirectly decrease the uni-modal loss.

from minklocmultimodal.

where2go947 commented on May 28, 2024

Sorry for the late reply and thanks for your answer, it provides me with some new perspectives to think about the question.

from minklocmultimodal.

questions about training about minklocmultimodal HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent