Hi @ShengcaiLiao ,
Thank you so much for the impressive work!
I'm not familiar with person identification tasks, but I found another your paper that says, "The detection sub-task is to determine the presence of the probe subject in the gallery, and the identification sub-task is to determine which person in the gallery has the same identity as the accepted probe." So I assume the memory
here (in the TransMatcher instance initialization) should be the gallery feature.
Let's look at the forward
function of the TransMatcher
,
def forward(self, features):
score = self.decoder(self.memory, features)
return score
The first input is memory
, and the second is features
. However, in the TransformerDecoder
definition, it go as follow
def forward(self, tgt: Tensor, memory: Tensor) -> Tensor:
r"""Pass the inputs through the decoder layer in turn.
Args:
tgt: the sequence to the decoder (required).
memory: the sequence from the last layer of the encoder (required).
Shape:
tgt: [q, h, w, d*n], where q is the query length, d is d_model, n is num_layers, and (h, w) is feature map size
memory: [k, h, w, d*n], where k is the memory length
"""
The tgt
and memory
variables here confuse me. Which should be probe (query), and which should be gallery features?
Thank you for your reply in advance.