happynear / amsoftmax Goto Github PK

View Code? Open in Web Editor NEW

484.0 484.0 130.0 1.07 MB

A simple yet effective loss function for face verification.

License: MIT License

MATLAB 100.00%

deep-learning face-recognition loss-functions metric-learning softmax

amsoftmax's People

Contributors

Stargazers

Watchers

Forkers

objectdetection keyky dcnhan locussam runauto felixmonkey chrgo pustar yesyu tzhang2014 qianjide cheyuanxiaozi starstylesky nbadalls denethor1997 arasharchor xialuxi ctgushiwei farmingyard 10183308 fireae liyuanyaun houjun-data yaqilyu lyy5 gzuvisionspace alexliyang wyc2015fq baifanysu kakacynic yemenr sunxingxingtf kk52099 goodluckcwl lji72 zch-90 arestorres naiveghost fireeyesgit 1093842024 armstrongyang jackcc fishman2008 lbwang2006 yogsin madongyu nerddd superhero1991 eelva stoneyang-face wuyuanyuan1990 chaoso kevin2599 shlpu lqs19881030 yongzhengqi jimeffry suzhoushr ahuirecome clscy zhaomonica jackywang-001 tandychao mikaelhu0823 liaoheping zumbalamambo rain2008204 afcarl joeysu tinyloop dreadlord1984 lilysys limingda92 gds101054108 chanbluky yuhaoluo clhne zhengshunjie liushuan dajidali010 soccergame ieyer amena6490 zp1018 heshenghuan taotaoyuhust wencoast sunjunlishi ailihong xinxin12345 xiaoye77 dev233 leethony msnqqer tensorflow-pool onlynata normalct xysong1201 qiaoxie robosina

amsoftmax's Issues

What‘s the performance of AMSoftmax on larger datasets？ MsCeleb， vggface2

Setting of deploy

Should we save the norm1 layer for deploy? or just get the output from fc5.

I train my model[Webface] in parms of s=30, m=0.35. The Result in lfw is 98.53%. I try to change parms but it get worse result. Is the parms is the best in your tests? Can data strength help me improve the effect? thanks for your advice

lfw准确率为99.73是如何得到的？使用的什么数据集！

您好，我有看您的实验都是使用的64层spereface结构进行训练的！那么请问，您有在别的结构上训练过么？我看arcface实用您的损失，arcface的结构能够达到99.7~99.8（仅适用vgg2或者ms）

Have you ever tried batch normalization?

@happynear

I tried to add batch normalization on your modified resnet20, but the loss became 87.3365. As far as I know, BN helps learning more quickly, Is it possible to add batch normalization with amsoftmax?

Here is the prototxt

layer {
  name: "input"
  type: "Input"
  top: "data"
  input_param {
    shape {
      dim: 1
      dim: 3
      dim: 160
      dim: 160
    }
  }
}
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv1_1/bn"
  type: "BatchNorm"
  bottom: "conv1_1"
  top: "conv1_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv1_1/scale"
  type: "Scale"
  bottom: "conv1_1"
  top: "conv1_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1_1"
  type: "PReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}
layer {
  name: "conv1_2"
  type: "Convolution"
  bottom: "conv1_1"
  top: "conv1_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv1_2/bn"
  type: "BatchNorm"
  bottom: "conv1_2"
  top: "conv1_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv1_2/scale"
  type: "Scale"
  bottom: "conv1_2"
  top: "conv1_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1_2"
  type: "PReLU"
  bottom: "conv1_2"
  top: "conv1_2"
}
layer {
  name: "conv1_3"
  type: "Convolution"
  bottom: "conv1_2"
  top: "conv1_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv1_3/bn"
  type: "BatchNorm"
  bottom: "conv1_3"
  top: "conv1_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv1_3/scale"
  type: "Scale"
  bottom: "conv1_3"
  top: "conv1_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1_3"
  type: "PReLU"
  bottom: "conv1_3"
  top: "conv1_3"
}
layer {
  name: "res1_3"
  type: "Eltwise"
  bottom: "conv1_1"
  bottom: "conv1_3"
  top: "res1_3"
}
layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "res1_3"
  top: "conv2_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_1/bn"
  type: "BatchNorm"
  bottom: "conv2_1"
  top: "conv2_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_1/scale"
  type: "Scale"
  bottom: "conv2_1"
  top: "conv2_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_1"
  type: "PReLU"
  bottom: "conv2_1"
  top: "conv2_1"
}
layer {
  name: "conv2_2"
  type: "Convolution"
  bottom: "conv2_1"
  top: "conv2_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_2/bn"
  type: "BatchNorm"
  bottom: "conv2_2"
  top: "conv2_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_2/scale"
  type: "Scale"
  bottom: "conv2_2"
  top: "conv2_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_2"
  type: "PReLU"
  bottom: "conv2_2"
  top: "conv2_2"
}
layer {
  name: "conv2_3"
  type: "Convolution"
  bottom: "conv2_2"
  top: "conv2_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_3/bn"
  type: "BatchNorm"
  bottom: "conv2_3"
  top: "conv2_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_3/scale"
  type: "Scale"
  bottom: "conv2_3"
  top: "conv2_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_3"
  type: "PReLU"
  bottom: "conv2_3"
  top: "conv2_3"
}
layer {
  name: "res2_3"
  type: "Eltwise"
  bottom: "conv2_1"
  bottom: "conv2_3"
  top: "res2_3"
}
layer {
  name: "conv2_4"
  type: "Convolution"
  bottom: "res2_3"
  top: "conv2_4"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_4/bn"
  type: "BatchNorm"
  bottom: "conv2_4"
  top: "conv2_4"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_4/scale"
  type: "Scale"
  bottom: "conv2_4"
  top: "conv2_4"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_4"
  type: "PReLU"
  bottom: "conv2_4"
  top: "conv2_4"
}
layer {
  name: "conv2_5"
  type: "Convolution"
  bottom: "conv2_4"
  top: "conv2_5"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv2_5/bn"
  type: "BatchNorm"
  bottom: "conv2_5"
  top: "conv2_5"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv2_5/scale"
  type: "Scale"
  bottom: "conv2_5"
  top: "conv2_5"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu2_5"
  type: "PReLU"
  bottom: "conv2_5"
  top: "conv2_5"
}
layer {
  name: "res2_5"
  type: "Eltwise"
  bottom: "res2_3"
  bottom: "conv2_5"
  top: "res2_5"
}
layer {
  name: "conv3_1"
  type: "Convolution"
  bottom: "res2_5"
  top: "conv3_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_1/bn"
  type: "BatchNorm"
  bottom: "conv3_1"
  top: "conv3_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_1/scale"
  type: "Scale"
  bottom: "conv3_1"
  top: "conv3_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_1"
  type: "PReLU"
  bottom: "conv3_1"
  top: "conv3_1"
}
layer {
  name: "conv3_2"
  type: "Convolution"
  bottom: "conv3_1"
  top: "conv3_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_2/bn"
  type: "BatchNorm"
  bottom: "conv3_2"
  top: "conv3_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_2/scale"
  type: "Scale"
  bottom: "conv3_2"
  top: "conv3_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_2"
  type: "PReLU"
  bottom: "conv3_2"
  top: "conv3_2"
}
layer {
  name: "conv3_3"
  type: "Convolution"
  bottom: "conv3_2"
  top: "conv3_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_3/bn"
  type: "BatchNorm"
  bottom: "conv3_3"
  top: "conv3_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_3/scale"
  type: "Scale"
  bottom: "conv3_3"
  top: "conv3_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_3"
  type: "PReLU"
  bottom: "conv3_3"
  top: "conv3_3"
}
layer {
  name: "res3_3"
  type: "Eltwise"
  bottom: "conv3_1"
  bottom: "conv3_3"
  top: "res3_3"
}
layer {
  name: "conv3_4"
  type: "Convolution"
  bottom: "res3_3"
  top: "conv3_4"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_4/bn"
  type: "BatchNorm"
  bottom: "conv3_4"
  top: "conv3_4"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_4/scale"
  type: "Scale"
  bottom: "conv3_4"
  top: "conv3_4"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_4"
  type: "PReLU"
  bottom: "conv3_4"
  top: "conv3_4"
}
layer {
  name: "conv3_5"
  type: "Convolution"
  bottom: "conv3_4"
  top: "conv3_5"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_5/bn"
  type: "BatchNorm"
  bottom: "conv3_5"
  top: "conv3_5"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_5/scale"
  type: "Scale"
  bottom: "conv3_5"
  top: "conv3_5"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_5"
  type: "PReLU"
  bottom: "conv3_5"
  top: "conv3_5"
}
layer {
  name: "res3_5"
  type: "Eltwise"
  bottom: "res3_3"
  bottom: "conv3_5"
  top: "res3_5"
}
layer {
  name: "conv3_6"
  type: "Convolution"
  bottom: "res3_5"
  top: "conv3_6"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_6/bn"
  type: "BatchNorm"
  bottom: "conv3_6"
  top: "conv3_6"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_6/scale"
  type: "Scale"
  bottom: "conv3_6"
  top: "conv3_6"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_6"
  type: "PReLU"
  bottom: "conv3_6"
  top: "conv3_6"
}
layer {
  name: "conv3_7"
  type: "Convolution"
  bottom: "conv3_6"
  top: "conv3_7"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_7/bn"
  type: "BatchNorm"
  bottom: "conv3_7"
  top: "conv3_7"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_7/scale"
  type: "Scale"
  bottom: "conv3_7"
  top: "conv3_7"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_7"
  type: "PReLU"
  bottom: "conv3_7"
  top: "conv3_7"
}
layer {
  name: "res3_7"
  type: "Eltwise"
  bottom: "res3_5"
  bottom: "conv3_7"
  top: "res3_7"
}
layer {
  name: "conv3_8"
  type: "Convolution"
  bottom: "res3_7"
  top: "conv3_8"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_8/bn"
  type: "BatchNorm"
  bottom: "conv3_8"
  top: "conv3_8"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_8/scale"
  type: "Scale"
  bottom: "conv3_8"
  top: "conv3_8"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_8"
  type: "PReLU"
  bottom: "conv3_8"
  top: "conv3_8"
}
layer {
  name: "conv3_9"
  type: "Convolution"
  bottom: "conv3_8"
  top: "conv3_9"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv3_9/bn"
  type: "BatchNorm"
  bottom: "conv3_9"
  top: "conv3_9"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv3_9/scale"
  type: "Scale"
  bottom: "conv3_9"
  top: "conv3_9"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu3_9"
  type: "PReLU"
  bottom: "conv3_9"
  top: "conv3_9"
}
layer {
  name: "res3_9"
  type: "Eltwise"
  bottom: "res3_7"
  bottom: "conv3_9"
  top: "res3_9"
}
layer {
  name: "conv4_1"
  type: "Convolution"
  bottom: "res3_9"
  top: "conv4_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv4_1/bn"
  type: "BatchNorm"
  bottom: "conv4_1"
  top: "conv4_1"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv4_1/scale"
  type: "Scale"
  bottom: "conv4_1"
  top: "conv4_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_1"
  type: "PReLU"
  bottom: "conv4_1"
  top: "conv4_1"
}
layer {
  name: "conv4_2"
  type: "Convolution"
  bottom: "conv4_1"
  top: "conv4_2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv4_2/bn"
  type: "BatchNorm"
  bottom: "conv4_2"
  top: "conv4_2"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv4_2/scale"
  type: "Scale"
  bottom: "conv4_2"
  top: "conv4_2"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_2"
  type: "PReLU"
  bottom: "conv4_2"
  top: "conv4_2"
}
layer {
  name: "conv4_3"
  type: "Convolution"
  bottom: "conv4_2"
  top: "conv4_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "conv4_3/bn"
  type: "BatchNorm"
  bottom: "conv4_3"
  top: "conv4_3"
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
}
layer {
  name: "conv4_3/scale"
  type: "Scale"
  bottom: "conv4_3"
  top: "conv4_3"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu4_3"
  type: "PReLU"
  bottom: "conv4_3"
  top: "conv4_3"
}
layer {
  name: "res4_3"
  type: "Eltwise"
  bottom: "conv4_1"
  bottom: "conv4_3"
  top: "res4_3"
}
layer {
  name: "fc5"
  type: "InnerProduct"
  bottom: "res4_3"
  top: "fc5"
  inner_product_param {
    num_output: 512
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

how to set the m when the feature without norm?

Hi, your paper shows the result of AM-Softmax w/o FN with the m = 0.35 and 0.4.
(1).with FN : Fai = s * (cos(theta) - m) s=30, m=0.35
#prototxt
layer {
name: "fc6_l2"
type: "InnerProduct"
bottom: "norm1"
top: "fc6"
param {
lr_mult: 1
}
inner_product_param{
num_output: 10516
normalize: true
weight_filler {
type: "xavier"
}
bias_term: false
}
}
layer {
name: "label_specific_margin"
type: "LabelSpecificAdd"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
label_specific_add_param {
bias: -0.35
}
}
layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 30
}
}
}
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
top: "softmax_loss"
loss_weight: 1
}

(2).w/o FN : s not needed, Fai = ||x|| * cos(theta) - m, still use m = 0.35?
#prototxt
layer {
name: "fc6_l2"
type: "InnerProduct"
bottom: "norm1"
top: "fc6"
param {
lr_mult: 1
}
inner_product_param{
num_output: 10516
normalize: false
weight_filler {
type: "xavier"
}
bias_term: false
}
}
layer {
name: "label_specific_margin"
type: "LabelSpecificAdd"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
label_specific_add_param {
bias: -0.35
}
}
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin"
bottom: "label"
top: "softmax_loss"
loss_weight: 1
}

Can you show your prototxt and trainning log? thx.

CASIA-Webface dataset download link

@happynear I can't get any download link about CASIA-Webface, neither cleaned CASIA Dataset nor dirty.
The official download link and Baiduyun links that I found on the Internet cannot be accessed now，Can you give me a download link, thank you!

Alignment of the image in realtime

I want to test caffemodel on the the real world problem, so I use mtcnn landmarks and align image like the below:
`

 Mat transform(Mat image,                 //cropped face image
                       vector<Point2f> dst)  //dst are the landmark of the face
 {
  float image_height=96.0/image.cols;
  float image_width=112.0/image.rows;
  for (int i = 0; i < dst.size(); ++i)
  {
      dst[i].y*=image_height;          //in this line I will scale the points to the new size of image(96,112)
      dst[i].x*=image_width;
  }
  cv::resize(image,image,Size(96,112));
  vector<Point2f> src;
  src.push_back(Point2f(30.2946,51.6963));
  src.push_back(Point2f(65.5318,51.5014));
  src.push_back(Point2f(48.0252,71.7366));
  src.push_back(Point2f(33.5493,92.3655));
  src.push_back(Point2f(62.7299,92.2041));


  cv::Mat R = cv::estimateRigidTransform(dst,src,false);
  Mat out;
  cv::warpAffine(image,out,R,Size(96,112));
  return out;
}

what I got is something like the below image

As you can see there is a black area in the top and rights side of the image, So I'm confused that is this normal?

Where is fc6 in face_train_test.prototxt?

I only see the layer name of fc6_L2 in face_train_test.prototxt.

About pretrain model！

Hi thanks for you amsoftmax:
Could you tell us if the ResNet20 is a pretrained model？ And if it is pretrained with the amsoftmax！

Training without alignment

Dear @happynear, first of all thanks for your work and uploaded results! I would like to ask about alignment step, is it really important to get good performance? I have not tried your code ( will do it this week ), but tried ResNet-18 with VGG2 without alignment step. SoftMax and CenterLoss gave about 90% both, CenterLoss also provided much better localization, but surprisingly ArcFace result was 70% only, I will try CosFace this week but I expect more or less the same. Did you try to train something without alignment?
Thank you! I will post my result with CosFace

when set a mini-batch to model input the time of calculation won't decreade

Hi, first of all thanks to great job,
I compile your caffe and want to test per-trained weight so I put an image to model input's and it give me 512 float array in 0.4 second on 1080ti gpu(is this ok?) after that I want to set a mini-batch for model input but for 10 images it give me 3.6 second that just slightly faster than when we set one image.

What is gnap prototxt for?

there is a gnap prototxt

I am not sure what is this for? Any document?

thank you

The way of using AMSoftmax

Dear AMSoftmax team,
Thanks for the great work, I've checked out your paper and prepare to try my owndata on the repo, however I'm a bit confused, would you mind telling me the relation between Amsoftmax and Sphereface? I noticed AMSoftmax seems to be the latest result on your experiment, but there's no many instructions of manipulation . As far as I recognized , should the only difference between Sphereface and AMSoftmax be the prototxt file? besides I can not only obtain AMSoftmax repo, but to keep Sphereface repo and follow the steps to train?

target logit and its curves

hi,I read your paper and your target logit curves code ,I feel puzzled,your paper say that Wf is also called as the target logit ,but I feel that your target logit curves is not correspond with the definied Wf,it seems not to think about the f.

iter_size问题

你好，对于slover配置文件iter_size参数我在网上查了相关资料，caffe官方并没有这个参数的说明，我的问题是，如果设置这个参数相当于调整了batch_size,那么迭代次数是不是应该减少呢，因为我设置了这个参数为n后，训练的速度相应的变慢了n倍，求解答

how does the loss converge

Does the AMS loss have the similar converge curve with softmax loss? In my exps, the AMS loss (set m=1) changes little during training, even after lots of iters.

Can you send a verson of mnist dataset for amsoftmax, I modify a verson but when s=30 and m=0.4 it not converge and loss is also 16.811

I push my code here

5 of 10 epoch

Compare AMSoftmax with ArcFace

I didn't find any comparison between AMSoftmax and ArcFace on the internet, could you tell me your result ?

what is inner-product layer's output depended , How to calculate。

using ./prototxt/face_solver.prototxt & ./prototxt/face_train_test.prototxt but not converge

As I don't have enough GPU memory, I set iter_size: 8 in face_solver.prototxt and batch_size: 32 in face_train_test.prototxt, after 30000 iter it didn't converge on CASIA-Webface. I am confused that if I have done anything wrong.

Thanks in advanced.

How to debug margin and scale?

Hi, thanks for your great job. I wonder how I can debug the value of margin and scale to get better result.I use the default setting(m=0.35, scale=30)on my face recognition dataset, the final training loss is about 3 and it can't decrease. So I come here to ask the quesion, thank you~

draw_sphere

您好，我想得到不同损失函数在mnist数据集上球面分布的图，自己使用不同的loss function去训练，得到的相应的模型图，但是还是无法画图？您的那个jet.mat文件是如何生成的？谢谢

关于预训练模型在LFW数据集上准确率的问题

我使用sphereface里的evaluation.m文件在LFW数据集上测试您提供的模型face_train_test_iter_30000.caffemodel，输出结果如下：
fold ACC

1 61.17%
2 60.33%
3 64.00%
4 60.00%
5 59.50%
6 58.83%
7 59.17%
8 58.33%
9 62.33%
10 65.67%

AVE 60.93%
是我哪里操作失误吗？准确率为什么跟您论文里描述的差了那么多呢？

about the weight norm

hi @happynear,
i read the code of innerproduct, find that, you only norm the weights in the forward pass. so why it does not need corresponding bp?

thanks.

CASIA-Webface dataset download link

Hi @xisi789
Could you please again share the Casia webface database or Replay-Attack database for Face Anti Spoofing. I couldnot download it since the share file has expired

high similarity between abnormal face images

Hi, I found that the model trained with AMS may get higher similarity bettem a pair of abnormal enroll and probe images (the probe is low qulaity, wrong aligned, or even not a face, the enroll is not a good id photo). The similarity may be around 0.4 or even higher while the ones trained with softmax may be just around 0. So have you ever met the same problems? Is it because that the margin push the feature space much compact than softmax?
Thanks!

face_train_test_wo_fn.prototxt

不好意思，打扰啦

关于LabelSpecificAdd问题

没问题了

I put inner_product_layer and LabelSpecificAdd together,but it gets worse result.

Hi,thanks for your work. Did you ever try to put it together? I think put it together shouldn't make it worse. But , when I use it to train lenet(dataset:mnist) , it actually get worse result.

how to set the m and s when there are only 1000 identities?

thanks very much for your contribution. i am using gray images to train the model with only about 1000 identities, how need i set the m and s. both of them should be set smaller?

flip in face_deploy_mirror_normalize.prototxt doesn't work properly

I set an image to AMSoftmax, which net prototxt is (face_deploy_mirror_normalize.prototxt) and weight is (your pretrained weights) . after loading weights I put an image to net input and run forward() method on it. Then I wanted to explore how the flip layer works but after plot the output of flip_data blobs I see something goes wrong, the flip layer has flipped data vertically(I mean up down) !! is it Okay?
result of code:

The code is something like below:


net=caffe.Net(
        'face_deploy_mirror_normalize.prototxt',
        'face_train_test_iter_30000.caffemodel',
        caffe.TEST);

def return_layer_name(layer_name,i):
    output=net.blobs[layer_name].data[i]
    output=np.swapaxes(output,0,2)
    return output


img=caffe.io.load_image('Anthony_Hopkins_0002.jpg')
img=caffe.io.resize(img,(96,112))
img=np.expand_dims(img,0)
img=np.swapaxes(img,1,3)
net.blobs['data'].data[...]=img
net.forward()
output=net.blobs['norm1'].data[0]
out1=return_layer_name('data_input_0_split_0',0)
out2=return_layer_name('flip_data',0)
fig = plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.imshow(out1)
plt.subplot(1,2,2)
plt.imshow(out2)

auto margin

Could you give a short introduction to this https://github.com/happynear/AMSoftmax/tree/master/prototxt/auto, please?

support caffe.binding?

Great work, appreciate it. Can I use the same caffe.binding project without any modification?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.