Giter VIP home page Giter VIP logo

lightcnn's Introduction

Light CNN for Deep Face Recognition, in PyTorch

A PyTorch implementation of A Light CNN for Deep Face Representation with Noisy Labels from the paper by Xiang Wu, Ran He, Zhenan Sun and Tieniu Tan. The official and original Caffe code can be found here.

Table of Contents

Updates

  • Feb 9, 2022
    • Light CNN v4 pretrained model is released.
  • Jan 17, 2018
    • Light CNN-29 v2 model and training code are released. The 100% - EER on LFW achieves 99.43%.
    • The performance of set 1 on MegaFace achieves 76.021% for rank-1 accuracy and 89.740% for TPR@FAR=10^-6.
  • Sep 12, 2017
    • Light CNN-29 model and training code are released. The 100% - EER on LFW achieves 99.40%.
    • The performance of set 1 on MegaFace achieves 72.704% for rank-1 accuracy and 85.891% for TPR@FAR=10^-6.
  • Jul 12, 2017
    • Light CNN-9 model and training code are released. The 100% - EER on LFW obtains 98.70%.
    • The performance of set 1 on MegaFace achieves 65.782% for rank-1 accuracy and 76.288% for TPR@FAR=10^-6.
  • Jul 4, 2017
    • The repository was built.

Installation

  • Install pytorch following the website.
  • Clone this repository.
    • Note: We currently only run it on Python 2.7.

Datasets

  • Download face dataset such as CASIA-WebFace, VGG-Face and MS-Celeb-1M.

  • All face images are converted to gray-scale images and normalized to 144x144 according to landmarks.

  • According to the five facial points, we not only rotate two eye points horizontally but also set the distance between the midpoint of eyes and the midpoint of mouth(ec_mc_y), and the y axis of midpoint of eyes(ec_y) .

  • The aligned LFW images are uploaded on Baidu Yun.

    Dataset size ec_mc_y ec_y
    Training set 144x144 48 48
    Testing set 128x128 48 40

Training

  • To train Light CNN using the train script simply specify the parameters listed in train.py as a flag or manually change them.
python train.py --root_path=/path/to/your/datasets/ \
		--train_list=/path/to/your/train/list.txt \
		--val_list=/path/to/your/val/list.txt \
		--save_path=/path/to/your/save/path/ \
		--model="LightCNN-9/LightCNN-29" --num_classes=n
  • Tips:
    • The lists of train and val datasets are followed by the format of caffe. The details of data loader is shown in load_imglist.py. Or you can use torchvision.datasets.ImageFolder to load your datasets.
    • The num_classes denotes the number of identities in your training dataset.
    • When training by pytorch, you can set a larger learning rate than caffe and it is faster converaged by pytorch than caffe for Light CNN.
    • We enlarge the learning rate for the parameters of fc2 which may lead better performance. If the training is collapsed on your own datasets, you can decrese it.
    • We modify the implementation of SGD with momentum since the official pytorch implementation is different from Sutskever et. al. The details are shown in here.
    • The training datasets for LightCNN-29v2 are CASIA-WebFace and MS-Celeb-1M, therefore, the num_classes is 80013.

Evaluation

  • To evaluate a trained network:
python extract_features.py --resume=/path/to/your/model \
			   --root_path=/path/to/your/datasets/ \
			   --img_list=/path/to/your/list.txt \
			   --save_path=/path/to/your/save/path/ \
			   --model="LightCNN-9/LightCNN-29/LightCNN-29v2"\
			   --num_classes=n (79077 for LightCNN-9/LightCNN-29, 80013 for LightCNN-29v2)
  • You can use vlfeat or sklearn to evaluate the features on ROC and obtain EER and TPR@FPR for your testing datasets.
  • The model of LightCNN-9 is released on Google Drive.
    • Note that the released model contains the whole state of the light CNN module and optimizer. The details of loading model can be found in train.py.
  • The model of LightCNN-29 is released on Google Drive.
  • The model of LightCNN-29 v2 is released on Google Drive.
  • The features of lfw and megaface of LightCNN-9 are released.
  • The model of LightCNN v4 is released on Google Drive.
    • The detailed structure of LightCNN v4 is shown in light_cnn_v4.py
    • The input is an aligned 128*128 BGR face image.
    • The input pixel value is normalized by mean ([0.0, 0.0, 0.0]) and std ([255.0, 255.0, 255.0]).

Performance

The Light CNN performance on lfw 6,000 pairs.

Model 100% - EER TPR@FAR=1% TPR@FAR=0.1% TPR@FAR=0
LightCNN-9 98.70% 98.47% 95.13% 89.53%
LightCNN-29 99.40% 99.43% 98.67% 95.70%
LightCNN-29v2 99.43% 99.53% 99.30% 96.77%
LightCNN v4 99.67% 99.67% 99.57% 99.27%

The Light CNN performance on lfw BLUFR protocols

Model VR@FAR=0.1% DIR@FAR=1%
LightCNN-9 96.80% 83.06%
LightCNN-29 98.95% 91.33%
LightCNN-29v2 99.41% 94.43%

The Light CNN performance on MegaFace

Model Rank-1 TPR@FAR=1e-6
LightCNN-9 65.782% 76.288%
LightCNN-29 72.704% 85.891%
LightCNN-29v2 76.021% 89.740%

Citation

If you use our models, please cite the following paper:

@article{wu2018light,
  title={A light CNN for deep face representation with noisy labels},
  author={Wu, Xiang and He, Ran and Sun, Zhenan and Tan, Tieniu},
  journal={IEEE Transactions on Information Forensics and Security},
  volume={13},
  number={11},
  pages={2884--2896},
  year={2018},
  publisher={IEEE}
}

References

lightcnn's People

Contributors

alfredxiangwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lightcnn's Issues

Data preprocessing

Hi
How can I align and crop the images? there is no part of code doing the image preprocessing

Thanks

high memory usage low GPU util

Hello,

I run the "train.py". But I found that my GPU has a high memory usage with a low GPU util. Could you please help me about that? My worker number is 32. Thank you!

无法解压模型文件,是否损坏?

您好,我在下载模型(LightCNN_29Layers_checkpoint.pth.tar)后,无法解压,请问是我的解压方式不对吗?我在Windows和linux下都试过。

which MS Celeb 1M dataset to use

which MS-Celeb-1M dataset to use aligned / cropped or thumbnails? On which dataset have you uploded the cleanlist of MS-Celeb-1M?

Can not extract tarred pretrained model

Hi,
I downloaded your pre-trained models from Google drive and ran the following command to extract the models. I am not able to extract the models and getting the errors as:

$ tar -xvf LightCNN_9Layers_checkpoint.pth.tar 
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

Could you check this issue?

pytorch精度远低于caffe?

您在这里的结果对照您的论文来看应该是最新版的在微软数据集训练的结果,从这里可以看出在LFW上pytorch的精度略低于caffe的精度。

我只使用casia webface作为训练集 使用您的pytorch训练代码(lightcnn-9) 最终在LFW 6000对的评测上仅能达到100% - EER 97.2 TPR@FAR=1% 94.333 BLUFR 结果为82.67 和 51.72。实验结果远低于您论文中的结果, 但同样的训练数据在同样的网络下使用caffe可以复现接近您论文的结果。

请问您测试了使用pytorch框架仅用casia webface作为训练集的LFW评测结果吗?若有,可否将结果共享一份或者有什么建议可以提高pytorch的精度吗?

Pytorch code training problem

Hi!
We trying to train LightCNN-9 or LightCNN-29 using your code(Pytorch) and your default params.
But always the result was NaN(from first iteration). We tryed it on CASIA and Celeb datasets with same results. Do you use this code for training in your paper? Or maybe there is some tricks?

optimizer error

Thanks for your work. I'm new to pytorch and there's an error comes up when I run the train.py script:

Traceback (most recent call last):
File "train.py", line 275, in
main()
File "train.py", line 85, in main
weight_decay=args.weight_decay)
File "/usr/local/lib/python2.7/dist-packages/torch/optim/sgd.py", line 56, in init
super(SGD, self).init(params, defaults)
File "/usr/local/lib/python2.7/dist-packages/torch/optim/optimizer.py", line 61, in init
raise ValueError("can't optimize a non-leaf Variable")
ValueError: can't optimize a non-leaf Variable

It looks like the problem is related to the modified implementation of SGD. Would you mind give me a hint to deal with this, thanks~

Validation Set

Where can I get or generate the validation list required for the training? I'm trying to train the network on MS-Celeb-1M.
Also, is the pretrained model trained on the Full Image Thumbnails, FaceCropped, or FaceAligned dataset of MS-Celeb-1M?

What's the differences between /train/list.txt and /val/list.txt in content?

Hi, Algred:
Could you please show me what's the differences between /train/list.txt and /val/list.txt in content? Expecting your reply.

from

python train.py --root_path=/path/to/your/datasets/ 
		--train_list=/path/to/your/train/list.txt 
		--val_list=/path/to/your/val/list.txt 
		--save_path=/path/to/your/save/path/ 
		--model="LightCNN-9/LightCNN-29" --num_classes=n

lfw evaluation code

Thanks for your source code.

could you give me the information that you use lfw evaluation code?

What is the typical training accuracy while training on MS-celeb-1m 70k?

Hi, I wonder what is your typical accuracy during your training process .For me, I got an accuracy of top1 at 95.5% , by the way ,the learning rate is 1*e-5. In your experiment, did you get a obviously higher accuracy on training set?In other words, should I keep on decreasing learning rate?
Thank you!

pytorch version

Hi AlfredXiangWu, could you tell me which pytorch version of your work, I tested several versions but they all did't work.
image

THCudaCheck FAIL

Hi @AlfredXiangWu ,

my training datafile like this:
/home/jonanza/datasets/casia/CASIA__144/3599667/037.png 0
/home/jonanza/datasets/casia/CASIA__144/1466221/082.png 1
/home/jonanza/datasets/casia/CASIA__144/1466221/044.png 1
...

and training ...
it broke here:
Epoch: [0][3500/3568] Time 0.108 (0.112) Data 0.000 (0.000) Loss 8.8549 (9.2667) Prec@1 0.000 (0.000) Prec@5 0.000 (0.008)

/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [20,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [21,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [22,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [23,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [25,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [26,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [27,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [28,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [30,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [31,0,0] Assertion t >= 0 && t < n_classes failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.c line=32 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train.py", line 279, in
main()
File "train.py", line 136, in main
train(train_loader, model, criterion, optimizer, epoch)
File "train.py", line 175, in train
print(loss.data[0], input.size(0))
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.c:32

I searched the problem, and they said it was beyond indexes, i have no idea about this Orz

Some questions.

Do not need image normalization?

I know that many face recognition frameworks use the normalization of the position of the face landmark. Does LightCNN require such a process?

In addition, what is the size of the image set? I would like to use an image set consisting of 700,000 images and 8,500 labels. Is it possible?

Sorry for the short English. Thanks for reading.

pytorch model can't convert to caffe model

I use pytorch2caffe tool to convert model, but failed.

m = LightCNN_29Layers_v2(num_classes=80013)
checkpoint = torch.load('LightCNN_29Layers_V2_checkpoint.pth.tar')
m.load_state_dict(checkpoint['state_dict'])
m.eval()

input_var = Variable(torch.rand(1, 1, 128, 128))
#input = torch.zeros(1, 1, 128, 128)
#input_var = torch.autograd.Variable(input, volatile=True)
output_var = m(input_var)
plot_graph(output_var, os.path.join(os.getcwd(), 'lightcnn.png'))

pytorch2caffe(input_var, output_var,
os.path.join(os.getcwd(), 'lightcnn_v2.prototxt'),
os.path.join(os.getcwd(), 'lightcnn_v2.caffemodel'))

plot graph to png

plot_graph(output_var, os.path.join(os.getcwd(), 'lightcnn.png'))

The error is

plot_graph(output_var, os.path.join(os.getcwd(), 'lightcnn.png'))
File "~/my_test/pytorch2caffe/pytorch2caffe.py", line 379, in plot_graph
add_nodes(top_var.grad_fn)
AttributeError: 'tuple' object has no attribute 'grad_fn'

thanks for your attention :)

on the training

Hi, Alfred,

regarding the training, is there a limitation on the number of class and number of image of each class if using your default setting? e.g., can use 100 class?

Checkpoint Loading Problem on CPU

I am using the LightCNN for feature extraction on CPU.
Initially, I used,
model_path = 'LightCNN/LightCNN_9Layers_checkpoint.pth.tar'
model = LightCNN_9Layers(num_classes=79077)
#model.eval()
checkpoint = torch.load(model_path, map_location=lambda storage, loc: storage)
model.load_state_dict(checkpoint['state_dict'])

I got the error below
**RuntimeError: Error(s) in loading state_dict for network_9layers:
Missing key(s) in state_dict: "features.0.filter.bias", "features.0.filter.weight", ...

Unexpected key(s) in state_dict: "module.features.0.filter.weight", "module.features.0.filter.bias", ...**

So I changed the last part of the code to
#load the model
checkpoint = torch.load(model_path, map_location=lambda storage, loc: storage)
model.load_state_dict(checkpoint['state_dict'], strict=False)

After that everything works fine. Features I extracted have expected number of inputs. However, the result am getting when I do simple cosine similarity test on two identical images is far far below expectation. Hence I begin to wonder whether using load_state_dict in the manner I did loads weights in random manner.

Help will be appreciated. Thanks

The loss value does not converge

Hi @AlfredXiangWu ,

Thank you for your great work. I have tried your code training on your MS-Celeb list from scratch. First of all, I used learning rate of 0.001 and I got Nan value after a few iteration . So, I tried to running with smaller learning rate of 0.0001 but the loss value did not decrease and fluctuated after 80 epoches.

Here is my configuration:

for name, value in model.named_parameters():
if 'bias' in name:
if 'fc2' in name:
params += [{'params':value, 'lr': 10 * args.lr, 'weight_decay': 0}]
else:
params += [{'params':value, 'lr': 2 * args.lr, 'weight_decay': 0}]
else:
if 'fc2' in name:
params += [{'params':value, 'lr': 10 * args.lr}]
else:
params += [{'params':value, 'lr': 1 * args.lr}]

Thanks,
Hai

What if image is RGB?

I want to know if I need to extract the face representation of RGB image, how to change the network structure for fitting this mission, and how to preprocess the training data and test data?

How can I set train.list and val.list for training?

Appreciate for your code.

I downloaded the MS-Celeb-1M clean list.
How do you set train_list and val_list?
I guess, is it alright?
I will be split MS-Celeb-1M clean list for tran_list and val_list manually

When did loss begin to decrease?

We are experimentingwith mfm29 architecture.
The papers were implemented as a caffe, and the pytorch code was used as it was for your experiment.
In both experiments, loss is about 10.7 and does not decrease. I learned more than 60 epoch, but it is the same situation.

In your experiment, I wonder when loss starts to decrease.
And I want you to let me know how long it took until the training is completed.

thanks.

light 29V2 FC2 layer no bias

Hi, Xiang,

I noticed that you set the the bias of FC2 layer to 'False' (lightcnn 29 v2) . Is there any special reason for that?

Best regards,
Ming

How do you deal with this situation?

In LFW evaluation of the 6,000 test images, how do you deal with images where faces cannot be detected by the dlib face detector?

Can you please provide the face alignment script to reproduce the evaluation results?

Validation images in batches.

Hi,

I am trying to send a batch of 64 images and trying to extract the features all at once from the model. However, after a certain number of images I get erroneous image features. Any idea why could that be?
I modified the extract_features.py file to do it.

the 29layer converge problem

hello ,wuxiang,from your result,the 29layers model did a good job in FR .from the python script,it seems that your 29layers lightcnn model is made frome inserting several resnet blocks in the original 9layers model...I use it to training face model,,,but the caffe loss is converge to 10....i is seems that the model didn't training well...my question is using caffe,the loss can reach how small..

MFM operation implement?

In your paper, you write mfm operation like this:
1
2
But in your code, you only implement the equation (1), why not the equation (4). Is that equation (4) is not good? Or some other reasons?
If you implement equation (4), could you upload the train and valuate results?
Thanks

duplicate between CASIA-WebFace and MS-Celeb-1M

Hi,AlfredXiangWu.
In your latest training code,training num is 80013,and the last training code,training num is 79077,and casia is 10575,is CASIA-WebFace and MS-Celeb-1M has about 10000 duplicate id?

Different performance under BLURF protocal

Hi, @AlfredXiangWu
I use features and matlab code you provided to test the ROC results and got exactly the same reults.
But when I test it under the lfw BLURF protocal, I got different results.
First, I found out that the lfw list file you provided is different from the official list provided by the BLURF, so I extract your features followed with the official list.
And then I test the VR and DIR by following the example matlab code provided by BLURF, got the following results:
VR@FAR=0.1%: 89.54%
DIR@FAR=1%: 57.03%
which is lower than yours.

May I ask what method you used to get the results?
In my way, I just load your features and delete the PCA training part of the "demo_pca.m" provided by BLURF.

restore with worse performance

I have a problem about restoring model. When I restore a light cnn model, the loss and accuracy of the first epoch is still good, which matches with the performance of such model. But then performance got much worse from the second epoch.

There has two ways to save and restore model from pytorch website. I check the code the save and restore model, which uses the first way to do this (torch.save(the_model.state_dict(), PATH). Do you think the problem come from this? Thank you!

Best regards,
Ming

Resume from checkpoint

Sairam.
Hi, I trained lightccnn_29_v2 on MSceleb DB for 14 epochs. lr reduced from 0.001 to 0.0004575 at step size of 10. Validation Accuracy improved from 86 to 95.95 and Avg loss reduced from 11 to 0.28 after 14 epochs.
Now, when I resume from saved model, it starts well as shown below:
Test set: Average loss: 0.28508767582333855, Accuracy: (95.95295308032168)
But loss is decreasing and Precision is also not improving as shown below:
Epoch: [14][0/38671] Loss 0.1796 (0.1796) Prec@1 94.531 (94.531) Prec@5 97.656 (97.656)
Epoch: [14][100/38671] Loss 0.2190 (0.2752) Prec@1 93.750 (93.379) Prec@5 99.219 (98.337)
Epoch: [14][200/38671] Loss 0.4000 (0.3149) Prec@1 89.062 (92.405) Prec@5 96.875 (98.084)
......
Epoch: [14][6100/38671] Loss 0.8118 (0.5597) Prec@1 85.938 (86.939) Prec@5 92.188 (95.961)
Epoch: [14][6200/38671] Loss 0.7565 (0.5611) Prec@1 80.469 (86.915) Prec@5 93.750 (95.945)
Epoch: [14][6300/38671] Loss 0.6364 (0.5625) Prec@1 84.375 (86.893) Prec@5 96.875 (95.927)

Did you face the above issue? Could you help me what could be issue?
Thanks.
Darshan,SSSIHL.

About val dataset?

Hi!
When you train the model with MS-Celeb-1M, what is the validation dataset, how to generate the val dataset?

Is this normal?

lr: 0.0005
Epoch: [5][0/3907] Time 0.543 (0.543) Data 0.466 (0.466) Loss 6.8968 (6.8968) Prec@1 0.000 (0.000) Prec@5 1.562 (1.562)
Epoch: [5][100/3907] Time 0.182 (0.186) Data 0.000 (0.005) Loss 6.1796 (6.3176) Prec@1 0.000 (0.348) Prec@5 0.000 (2.088)
Epoch: [5][200/3907] Time 0.182 (0.185) Data 0.000 (0.003) Loss 5.9996 (6.1570) Prec@1 0.000 (0.253) Prec@5 0.000 (2.037)
Epoch: [5][300/3907] Time 0.186 (0.185) Data 0.000 (0.002) Loss 7.6430 (6.2538) Prec@1 0.000 (0.363) Prec@5 0.000 (2.370)
Epoch: [5][400/3907] Time 0.179 (0.184) Data 0.000 (0.001) Loss 5.9914 (6.2375) Prec@1 0.000 (0.273) Prec@5 0.000 (1.847)
Epoch: [5][500/3907] Time 0.184 (0.184) Data 0.000 (0.001) Loss 6.0924 (6.2430) Prec@1 0.000 (0.253) Prec@5 0.000 (1.695)
Epoch: [5][600/3907] Time 0.181 (0.184) Data 0.000 (0.001) Loss 5.2746 (6.2718) Prec@1 0.000 (0.220) Prec@5 0.000 (1.687)
Epoch: [5][700/3907] Time 0.183 (0.184) Data 0.000 (0.001) Loss 4.4259 (6.3011) Prec@1 0.000 (0.188) Prec@5 0.781 (1.493)
Epoch: [5][800/3907] Time 0.185 (0.184) Data 0.000 (0.001) Loss 6.1079 (6.3245) Prec@1 0.000 (0.165) Prec@5 0.000 (1.332)
Epoch: [5][900/3907] Time 0.188 (0.184) Data 0.000 (0.001) Loss 3.8808 (6.3316) Prec@1 0.000 (0.147) Prec@5 28.906 (1.234)
Epoch: [5][1000/3907] Time 0.186 (0.184) Data 0.000 (0.001) Loss 6.4368 (6.3144) Prec@1 0.000 (0.142) Prec@5 0.000 (1.218)
Epoch: [5][1100/3907] Time 0.182 (0.184) Data 0.000 (0.001) Loss 7.2492 (6.3498) Prec@1 0.000 (0.129) Prec@5 0.000 (1.113)
Epoch: [5][1200/3907] Time 0.183 (0.184) Data 0.000 (0.001) Loss 5.3767 (6.3607) Prec@1 0.000 (0.120) Prec@5 0.000 (1.043)
Epoch: [5][1300/3907] Time 0.182 (0.184) Data 0.000 (0.001) Loss 5.2591 (6.3527) Prec@1 0.000 (0.112) Prec@5 0.000 (0.984)
Epoch: [5][1400/3907] Time 0.182 (0.184) Data 0.000 (0.001) Loss 6.1982 (6.3612) Prec@1 0.000 (0.107) Prec@5 0.000 (0.960)
Epoch: [5][1500/3907] Time 0.183 (0.184) Data 0.000 (0.001) Loss 7.2118 (6.4134) Prec@1 0.000 (0.100) Prec@5 0.000 (0.908)

After five epoch, training data is shown above. I don't know if this is normal?

Model file issue?

I have download the LightCNN_9Layers_checkpoint.pth.tar and LightCNN_29Layers_checkpoint.pth.tar files ,but when I decompression these files , error occurs! can you sent me these files to me, my email [email protected],thank you very much!

loss change to nan when training on a new dataset

Hi, @AlfredXiangWu !

When I try to train LightCNN with my dataset, the loss change to nan, the logs below:

lr: 0.01
Epoch: [0][0/2402]      Time 10.607 (10.607)    Data 0.372 (0.372)      Loss 8.5419 (8.5419)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.000)
Epoch: [0][100/2402]    Time 0.086 (0.191)      Data 0.000 (0.004)      Loss 8.6579 (8.5740)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.232)
Epoch: [0][200/2402]    Time 0.087 (0.139)      Data 0.000 (0.002)      Loss 8.4459 (8.6040)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.117)
Epoch: [0][300/2402]    Time 0.086 (0.122)      Data 0.000 (0.002)      Loss 8.7730 (8.6280)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.078)
Epoch: [0][400/2402]    Time 0.087 (0.113)      Data 0.000 (0.001)      Loss 8.6854 (8.6500)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.058)
Epoch: [0][500/2402]    Time 0.088 (0.108)      Data 0.000 (0.001)      Loss 8.6538 (8.6725)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.047)
Epoch: [0][600/2402]    Time 0.093 (0.105)      Data 0.000 (0.001)      Loss 8.6870 (8.6952)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.039)
Epoch: [0][700/2402]    Time 0.086 (0.103)      Data 0.000 (0.001)      Loss 8.8851 (8.7174)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.033)
Epoch: [0][800/2402]    Time 0.092 (0.101)      Data 0.000 (0.001)      Loss 9.0064 (8.7384)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.029)
Epoch: [0][900/2402]    Time 0.090 (0.100)      Data 0.000 (0.001)      Loss 8.7320 (8.7574)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.026)
Epoch: [0][1000/2402]   Time 0.088 (0.099)      Data 0.000 (0.001)      Loss 8.8482 (8.7760)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.023)
Epoch: [0][1100/2402]   Time 0.088 (0.098)      Data 0.000 (0.001)      Loss 8.9686 (8.7938)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.021)
Epoch: [0][1200/2402]   Time 0.092 (0.098)      Data 0.000 (0.001)      Loss 9.0300 (8.8087)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.020)
Epoch: [0][1300/2402]   Time 0.085 (0.097)      Data 0.000 (0.001)      Loss 9.0934 (8.8245)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.018)
Epoch: [0][1400/2402]   Time 0.090 (0.097)      Data 0.000 (0.001)      Loss 9.0064 (8.8394)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.017)
Epoch: [0][1500/2402]   Time 0.089 (0.096)      Data 0.000 (0.001)      Loss 9.0079 (8.8531)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.016)
Epoch: [0][1600/2402]   Time 0.089 (0.096)      Data 0.000 (0.000)      Loss 9.0939 (8.8666)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.015)
Epoch: [0][1700/2402]   Time 0.088 (0.095)      Data 0.000 (0.000)      Loss 9.2418 (8.8800)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.014)
Epoch: [0][1800/2402]   Time 0.108 (0.095)      Data 0.000 (0.000)      Loss 9.1652 (8.8923)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.013)
Epoch: [0][1900/2402]   Time 0.089 (0.095)      Data 0.000 (0.000)      Loss 9.1364 (8.9050)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.012)
Epoch: [0][2000/2402]   Time 0.091 (0.095)      Data 0.000 (0.000)      Loss 9.1717 (8.9169)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.012)
Epoch: [0][2100/2402]   Time 0.092 (0.095)      Data 0.000 (0.000)      Loss 9.0747 (8.9283)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.011)
Epoch: [0][2200/2402]   Time 0.093 (0.095)      Data 0.000 (0.000)      Loss 9.2118 (8.9398)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.011)
Epoch: [0][2300/2402]   Time 0.093 (0.094)      Data 0.000 (0.000)      Loss 9.2229 (8.9508)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.010)
Epoch: [0][2400/2402]   Time 0.095 (0.094)      Data 0.000 (0.000)      Loss 9.2780 (8.9618)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.010)
lr: 0.01
Epoch: [1][0/2402]      Time 0.493 (0.493)      Data 0.447 (0.447)      Loss 8.8280 (8.8280)    Prec@1 0.000 (0.000)    Prec@5 0.000 (0.000)
Epoch: [1][100/2402]    Time 0.092 (0.095)      Data 0.000 (0.005)      Loss nan (nan)  Prec@1 0.000 (0.000)    Prec@5 0.000 (2.498)
Epoch: [1][200/2402]    Time 0.092 (0.094)      Data 0.000 (0.002)      Loss nan (nan)  Prec@1 0.000 (0.000)    Prec@5 0.000 (1.255)
Epoch: [1][300/2402]    Time 0.094 (0.094)      Data 0.000 (0.002)      Loss nan (nan)  Prec@1 0.000 (0.000)    Prec@5 0.000 (0.838)

...

I tried a few times but always has this problem. Do you have any ideas to solve this?

Thanks!

Issue loading the model

Hi!

Could you please assist me with the following issue.

I try to load the LightCNN_9Layers checkpoint model for inference using the following code(following the code from extract_features.py)

from LightCNN import light_cnn
model = nn.DataParallel(light_cnn.LightCNN_9Layers(num_classes=79077)).cuda()
model.load_state_dict(torch.load('LightCNN_9Layers_checkpoint.pth.tar')['state_dict'])

I get the following error

	While copying the parameter named "module.features.0.filter.weight", whose dimensions in the model are torch.Size([96, 1, 3, 3]) and whose dimensions in the checkpoint are torch.Size([96, 1, 5, 5]).

What am I doing wrong?
My guess is that kernel size is wrong in the model definition. Could it be the case?

I use torch 0.4

Thanks in advance.

乱码问题

你好。我读入抽取的特征的.feat文件,读入出来是一团乱码

L�A��6�sYcAЋ���W�A�i�A�����Ni�9��A�wFA��;�P�

不知道是不是由于python2的编码问题造成的?

how to choose learning rate

Hi @AlfredXiangWu !
Bother you again Orz
I want to train the dataset 1M_Cele, can I just use the scale, step default as 0.457305, 5 respectively, and change the epoch to 250~~ I am afraid if it will be convergence ?
Or May I change the paras?

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.