The original DeepInsight implementation in the MXNet first preprocesses the image (Face detection, cropping and preprocessing), then extracts its embeddings and finally normalizes it so that when we want to compare two features the code would be like this:
dist = np.sum(np.square(embedding1 - embedding2))
print("Distance => %s" % dist)
sim = np.dot(embedding1, embedding2.T)
print("Similarity => %s" % sim)
I went ahead and did some tests with this code and the Similarities are off the charts and plain wrong (even with different dropout rates):
model = base_server.BaseServer(model_fp=configs.face_describer_model_fp,
input_tensor_names=configs.face_describer_input_tensor_names,
output_tensor_names=configs.face_describer_output_tensor_names,
device=configs.face_describer_device)
# Define input tensors feed to session graph
dropout_rate = 0.1
first_image = cv2.imread('./Images/1.jpg')
first_image = cv2.resize(first_image, (112, 112))
input_data = [np.expand_dims(first_image, axis=0), dropout_rate]
face_descriptor1 = model.inference(data=input_data)
embedding1 = preprocessing.normalize(face_descriptor1[0])
second_image = cv2.imread('./Images/4.jpg')
second_image = cv2.resize(second_image, (112, 112))
input_data = [np.expand_dims(second_image, axis=0), dropout_rate]
face_descriptor2 = model.inference(data=input_data)
embedding2 = preprocessing.normalize(face_descriptor2[0])
dist = np.sum(np.square(embedding1 - embedding2))
print("Distance => %s" % dist)
sim = np.dot(embedding1, embedding2.T)
print("Similarity => %s" % sim)