💎1MB lightweight face detection model (1MB轻量级人脸检测模型)

License: MIT License

Python 69.35% Shell 0.44% C++ 29.09% CMake 0.19% C 0.58% Cython 0.35%

ultra-light-fast-generic-face-detector-1mb's Introduction

English | 中文简体

Ultra-Light-Fast-Generic-Face-Detector-1MB

Ultra-lightweight face detection model

This model is a lightweight facedetection model designed for edge computing devices.

In terms of model size, the default FP32 precision (.pth) file size is 1.04~1.1MB, and the inference framework int8 quantization size is about 300KB.
In terms of the calculation amount of the model, the input resolution of 320x240 is about 90~109 MFlops.
There are two versions of the model, version-slim (network backbone simplification,slightly faster) and version-RFB (with the modified RFB module, higher precision).
Widerface training pre-training model with different input resolutions of 320x240 and 640x480 is provided to better work in different application scenarios.
Support for onnx export for ease of migration and inference.
Provide NCNN C++ inference code.
Provide MNN C++ inference code, MNN Python inference code, FP32/INT8 quantized models.
Provide Caffe model and onnx2caffe conversion code.
Caffe python inference code and OpencvDNN inference code.

Tested the environment that works

Ubuntu16.04、Ubuntu18.04、Windows 10（for inference）
Python3.6
Pytorch1.2
CUDA10.0 + CUDNN7.6

Accuracy, speed, model size comparison

The training set is the VOC format data set generated by using the cleaned widerface labels provided by Retinaface in conjunction with the widerface data set (PS: the following test results were obtained by myself, and the results may be partially inconsistent).

Widerface test

Test accuracy in the WIDER FACE val set (single-scale input resolution: 320*240 or scaling by the maximum side length of 320)

Model	Easy Set	Medium Set	Hard Set
libfacedetection v1（caffe）	0.65	0.5	0.233
libfacedetection v2（caffe）	0.714	0.585	0.306
Retinaface-Mobilenet-0.25 (Mxnet)	0.745	0.553	0.232
version-slim	0.77	0.671	0.395
version-RFB	0.787	0.698	0.438

Test accuracy in the WIDER FACE val set (single-scale input resolution: VGA 640*480 or scaling by the maximum side length of 640 )

Model	Easy Set	Medium Set	Hard Set
libfacedetection v1（caffe）	0.741	0.683	0.421
libfacedetection v2（caffe）	0.773	0.718	0.485
Retinaface-Mobilenet-0.25 (Mxnet)	0.879	0.807	0.481
version-slim	0.853	0.819	0.539
version-RFB	0.855	0.822	0.579

This part mainly tests the effect of the test set under the medium and small resolutions.

RetinaFace-mnet (Retinaface-Mobilenet-0.25), from a great job insightface, when testing this network, the original image is scaled by 320 or 640 as the maximum side length, so the face will not be deformed, and the rest of the networks will have a fixed size resize. At the same time, the result of the RetinaFace-mnet optimal 1600 single-scale val set was 0.887 (Easy) / 0.87 (Medium) / 0.791 (Hard).

Terminal device inference speed

Raspberry Pi 4B MNN Inference Latency (unit: ms) (ARM/A72x4/1.5GHz/input resolution: 320x240 /int8 quantization)

Model	1 core	2 core	3 core	4 core
libfacedetection v1	28	16	12	9.7
Official Retinaface-Mobilenet-0.25 (Mxnet)	46	25	18.5	15
version-slim	29	16	12	9.5
version-RFB	35	19.6	14.8	11

iPhone 6s Plus MNN (version tag：0.2.1.5) Inference Latency ( input resolution : 320x240 )Data comes from MNN official

Model	Inference Latency(ms)
slim-320	6.33
RFB-320	7.8

Kendryte K210 NNCase Inference Latency (RISC-V/400MHz/input resolution: 320x240 /int8 quantization)Data comes from NNCase

Model	Inference Latency(ms)
slim-320	65.6
RFB-320	164.8

Model size comparison

Comparison of several open source lightweight face detection models:

Model	model file size（MB）
libfacedetection v1（caffe）	2.58
libfacedetection v2（caffe）	3.34
Official Retinaface-Mobilenet-0.25 (Mxnet)	1.68
version-slim	1.04
version-RFB	1.11

Generate VOC format training data set and training process

Download the wideface official website dataset or download the training set I provided and extract it into the ./data folder:

(1) The clean widerface data pack after filtering out the 10px*10px small face: Baidu cloud disk (extraction code: cbiu) 、Google Drive

(2) Complete widerface data compression package without filtering small faces: Baidu cloud disk (extraction code: ievk)、Google Drive

(PS: If you download the filtered packets in (1) above, you don't need to perform this step) Because the wideface has many small and unclear faces, which is not conducive to the convergence of efficient models, it needs to be filtered for training.By default,faces smaller than 10 pixels by 10 pixels will be filtered. run ./data/wider_face_2_voc_add_landmark.py

 python3 ./data/wider_face_2_voc_add_landmark.py

After the program is run and finished, the wider_face_add_lm_10_10 folder will be generated in the ./data directory. The folder data and data package (1) are the same after decompression. The complete directory structure is as follows:

  data/
    retinaface_labels/
      test/
      train/
      val/
    wider_face/
      WIDER_test/
      WIDER_train/
      WIDER_val/
    wider_face_add_lm_10_10/
      Annotations/
      ImageSets/
      JPEGImages/
    wider_face_2_voc_add_landmark.py

At this point, the VOC training set is ready. There are two scripts: train-version-slim.sh and train-version-RFB.sh in the root directory of the project. The former is used to train the slim version model, and the latter is used. Training RFB version model, the default parameters have been set, if the parameters need to be changed, please refer to the description of each training parameter in ./train.py.
Run train-version-slim.sh train-version-RFB.sh

sh train-version-slim.sh or sh train-version-RFB.sh

Detecting image effects (input resolution: 640x480)

PS

If the actual production scene is medium-distance, large face, and small number of faces, it is recommended to use input size input_size: 320 (320x240) resolution for training, and use 320x240 ,160x120 or 128x96 image size input for inference, such as using the provided pre-training model version-slim-320.pth or version-RFB-320.pth .
If the actual production scene is medium or long distance, medium or small face and large face number, it is recommended to adopt:

(1) Optimal: input size input_size: 640 (640x480) resolution training, and use the same or larger input size for inference, such as using the provided pre-training model version-slim-640.pth or version-RFB-640.pth for inference, lower False positives.

(2) Sub-optimal: input size input_size: 320 (320x240) resolution training, and use 480x360 or 640x480 size input for predictive reasoning, more sensitive to small faces, false positives will increase.

The best results for each scene require adjustment of the input resolution to strike a balance between speed and accuracy.
Excessive input resolution will enhance the recall rate of small faces, but it will also increase the false positive rate of large and close-range faces, and the speed of inference will increase exponentially.
Too small input resolution will significantly speed up the inference, but it will greatly reduce the recall rate of small faces.
The input resolution of the production scene should be as consistent as possible with the input resolution of the model training, and the up and down floating should not be too large.

TODO LIST

Add some test data

Completed list

Third-party related projects

Reference

ultra-light-fast-generic-face-detector-1mb's People

Contributors

Stargazers

Watchers

Forkers

miwaliu maniaajia honyolan songwsx cunjian freewind2016 xjtueducation zyg11 chentyjpm tianxingyzxq satchelwu rinex20 qaz734913414 forestwang crabkun bobo-y tempdban whoissunshijia invalid0xcccccccc hqdmyjsw 6216 nisinongnong zhly0 99kyuu zhangwen0301 liuguoyou wangyangneu arlenrick jwmneu yangheng111 btbujiangjun niklausliu xiaoye77 fendaq mafangfang9 lyp-deeplearning edward0514 somebody-deep summer1988 ajinkya933 happog azuredsky amanda-barbara yuhan200868 xieincz greedyboy ahlfors arryboom templeblock felixzhang7 labimage dreadlord1984 mysee1989 mornydew slzhly zhongtb molyswu dnimo kaizen123 pbdahzou zjutlzt xmduhan buruoyanyang bingqingsuimeng tchigher qfz9527 evorigin shaunstanislauslau zhangwulong teze longfeiprojects isight wuxiaolianggit gdyshi xiangliu886 git-ztx lake4863 wangxudaisy zhangxujinsh echosimon daydreamer2023 autohe wy676579037 hadryan kevenlee micbetter alphashi forvoyager tamwaiban vivounicorn rmdlove whh951030 tjuhenryli play3577 lucaslu1987 clhne igi123 hmzjwhmzjw neptuneyt silentstreet

ultra-light-fast-generic-face-detector-1mb's Issues

onnx到mnn转换失败

Start to Convert Other Model Format To MNN Model...
libc++abi.dylib: terminating with uncaught exception of type Error: [19:05:03] /Users/wuzhuo/MNN/tools/converter/source/onnx/onnxConverter.cpp:27: Check failed: success ==> read onnx model failed: /Users/wuzhuo/Downloads/Mb_Tiny_RFB_FD_train_input_320.onnx
[1] 94897 abort ./MNNConvert -f ONNX --modelFile --MNNModel test.mnn --bizCode MNN

同时Netron打开onnx模型报错:
File format is not onnx.ModelProto (invalid wire type 4 at offset 36) in 'Mb

win10怎样运行？

配置环境以后pycharm import torch不报错，git bash import torch还是报错，.sh又需要git bash运行......是不是Linux才可以呀😭

pc上测试时间问题

Hi 请教下 pc: win7 cpu E5-1603 2.80GHz ,net_type: mb_tiny_fd, input_size:320,图片大小为320*240，测试14张图片，平均时间为：203ms, 请问测试的时间对吗？cpu下感觉时间有点长。

Question about quantification

Hello
How do you quantify it? (float to int8)

该人脸检测方法是基于哪篇论文实现的？

是基于SSD+RBFNet进行人脸检测？我想读一下论文。

MultiboxLoss中最后return的回归loss和分类loss除以num_pos这个对吗，回归只有正样本参与，分类还有负样本也参与了吧

failed to convert *.pth to ncnn model

你好，我尝试将预训练中的*.pth模型按说明中的先修改源代码转为onnx后，无法将该onnx模型通过onnx simplifier转换为新的onnx模型，想确认下你工程中给出的转换流程是否如下：
1.按说明修改源代码，执行convert_to_onnx将.pth模型转为onnx模型
2.用onnx simplifier工具将onnx模型转换为onnx_sim模型
3.编译ncnn,利用ncnn中tools/onnx下的转换工具将onnx_sim模型转换为对应的NCNN模型

anchor的设定

您好，看了下代码中anchor设定并没有加入一些h和w有1:2或者2:1的，是因为人脸的size一般是h 和 w 1:1的吗。

RandomSampleCrop_v2问题

在该函数中，if h/w != 1 一直成立（h，w为随机的小数，未进行int操作，h/w==1难以满足），一直执行continue, 直到采样mode==None(即不执行任何crop操作)，函数return。因此该函数未执行crop操作。

作者是否有与MTCNN进行对比？

anchor修改

下面的代码就是修改anchor的地方吗？
def generate_priors(feature_map_list, shrinkage_list, image_size, min_boxes, clamp=True) -> torch.Tensor:
priors = []
for index in range(0, len(feature_map_list[0])):
scale_w = image_size[0] / shrinkage_list[0][index]
scale_h = image_size[1] / shrinkage_list[1][index]
for j in range(0, feature_map_list[1][index]):
for i in range(0, feature_map_list[0][index]):
x_center = (i + 0.5) / scale_w
y_center = (j + 0.5) / scale_h

            for min_box in min_boxes[index]:
                w = min_box / image_size[0]
                h = min_box / image_size[1]
                priors.append([
                    x_center,
                    y_center,
                    w,
                    h
                ])
print("priors nums:{}".format(len(priors)))
priors = torch.tensor(priors)
if clamp:
    torch.clamp(priors, 0.0, 1.0, out=priors)
return priors

移动端使用问题

如果要在iOS或者安卓上使用，是把这个程序作为后端部署在服务器上给移动端设备调用吗？

针对大脸检测效果不好? models: Mb_Tiny_FD_train_input_320.pth

Hi 非常感谢你的贡献！请问models: Mb_Tiny_FD_train_input_320.pth 是不是对大脸检测效果比较差使用detect_imgs.py进行测试，参数未做任何修改。

Can it changed into support 2 classes or more to train?

Except face, what if adding one more class for training, does it possible to change?

转化成MNN模型计算计算结果不准确

我可以将models中onnx模型转化成MNN模型，这样做准确吗？转完后发现输出的置信度值不准确（和为1），故请教一下该onnx模型使用方式

how to make Int8 quantization?

Android

请问Android端适合部署和使用呢？

How to install in raspberry pi（arm）

As description as title, I tried to install on pi 4, but failed, so many dependencies!

Just only one command could install on x86-64: pip3 install --no-cache-dir -r requirements.txt -i https://pypi.doubanio.com/simple/, so easy!

Do you have same easy way to install them?

Thanks!

Wider face test or val?

感谢你的工作! 麻烦请问一下测试的时候是在widerface val(官方公布了标注文件)进行测试的么? widerface test数据集官网没有放出来标签,评测的话一般是需要提交txt文件给官方代为评测.

ncnn inference

good job! Cou you support ncnn inference

windows10 , python3.7 环境下执行报错..

Traceback (most recent call last):
  File "H:/demo/Ultra-Light-Fast-Generic-Face-Detector-1MB/train.py", line 347, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "H:/demo/Ultra-Light-Fast-Generic-Face-Detector-1MB/train.py", line 139, in train
    for i, data in enumerate(loader):
  File "H:\worktool\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 278, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "H:\worktool\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 682, in __init__
    w.start()
  File "H:\worktool\Anaconda3\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "H:\worktool\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "H:\worktool\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "H:\worktool\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "H:\worktool\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TrainAugmentation.__init__.<locals>.<lambda>'

priors nums:4420
2019-10-15 10:36:24,269 - root - INFO - Use Cuda.
Traceback (most recent call last):
File "", line 1, in
File "H:\worktool\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "H:\worktool\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

环境:
系统: windows10
python 3.7
pytorch 1.3.0

关于pth和onnx模型所需输入图像预处理步骤不同的问题

Linzaer你好，

非常感谢你开源的这个代码。我根据你的代码训练了一个网络，实测效果很好而且也成功转为ONNX格式了。

但我发现PyTorch模型的输入的元素值依旧是0到255之间的，而ONNX模型的输入却必须归一化到 -1到1之间，期间我检查了PyTorch模型的权重参数，与ONNX相同。想请问一下这是什么原因？

另外我因工作任务需要不得不转换模型至Keras的.h5模型，也发现一些问题：
模型结构上看，整个模型的第二层就是Batch Normalization，它需要4个参数：gamma/weight、beta/bias、running_mean和running_var，而第一层就是一个普通的卷积层。
由于PyTorch模型的输入是0到255范围的图像输入，所以第一个卷积层的输出（feature map）也是值横跨范围较大的（已验证）。这种情况下统计出的mean和var往往应该是很大的，mean中最大的值可能会达到100+，而var的值往往会高达1000+。但不管怎样训练从PyTorch模型中查看到的mean和var都是0到1之间的非常小的数，根据PyTorch的BN层定义，这样是无法批量归一化的，但实际结果是PyTorch用这样小的mean和var能够将大值输入的feature map归一化，而我将权重提取出来实现为Keras的BN层却不能……（两种格式的模型的第一个卷积层输出已验证）。

所以想请问一下PyTorch的Batch Normalization是不是中间包含什么特殊操作（源码看了跟它写的公式明明是一样的）？或者说我是不是对图像输入预处理的步骤理解错误了？

PS：目前已验证的步骤（对比PyTorch和Keras）：1. 测试时的网络输入一致（值相同），经过第一个卷积层后的输入一致。2. 提取PyTorch模型的BN层参数，并根据PyTorch官方文档里的BN层公式来计算输出，输出结果跟Keras模型的BN层输出一致，与PyTorch的BN层输出不一致（但查看结果，显然PyTorch模型的BN层输出是正确的，基本上徘徊在-1~1之间）……3. PyTorch模型的最终输出正确，能够正常找到人脸框。

不好意思问题有点难表达，如果可以希望能有机会进一步询问，非常感谢。

关于模型量化

非常棒的工作！
关于mnn量化的模型，你这里给出了前向的时间，想问下量化后的检测精度怎样呢？下降程度大吗?注意到你骨干类似mobilenetv1，为何不用带残差的v2 v3类似结构？

如何计算检测准确率？能不能提供个代码？

请问有Paper或者Report吗？

How to get the pictures with frame as you show

Please ,How to get the pictures with frame as you show?

小问题反馈

nice work👏 最近正好在做人脸检测项目，目前使用的是Retina_face,昨天看见代码，今天正好在实际数据上测试了一下. 总体来说，效果还是很棒的，端正的人脸应该能全检出. 问题主要有两个：1.拍摄时人头部往上仰但人脸完整，召回比较低，0.5左右. 2.测试使用的图片视频比较小，里面的小人脸检测不出来. 这两种情况在我的测试集中retina_face全部都能检测出来. 不过速度确实比Retina_face快多了🎉🎉🎉 其实也不算问题啦，就是实验结果报告😄

How to get facial landmark output?

Hello,thank you ao much for awesome work,how yo get landmarks points from this model?
Im using Face recognition and i need to have facial landmarks.

Can someone translate this repo to English?

用的是什么算法？adaboost?

关于输入大小和retinaface 的问题

想问下我的理解对不对：
1.wider face的结果，这里提到的input 320240 和 640480是专指训练的时候的大小吗，inference的时候直接输入原图对吧？
2.mobilenev1_0.25版的retinaface先按照640*480的训练，再在原图上inference得到 0.879 | 0.807 | 0.481的结果吗？

关于ncnn2int8量化

你好，我用ncnn提供的量化工具，将您给的模型ncnn.bin.ncnn.param进行量化，再用量化后的模型，用ncnn调用预测，结果出错了，请问您能否提供一下量化后的模型，谢谢！

.onnx模型和_ncnn.onnx、*_ncnn_slim.onnx模型的区别

十分感谢您的工作，对我很有帮助。但我发现针对同一输入尺寸您提供的onnx模型似乎有些细微差别，.onnx 和 _ncnn.onnx不能直接转换成ncnn的模型，只有_ncnn_slim.onnx可以转换而且推理无误，想跟您请教下如何得到_ncnn_slim.onnx的模型？

在Android上推理时间如何优化？

我使用ncnn的C++代码完成了在Android平台的编译，目前测试1000次取平均值，单次时间为42ms左右，测试手机是小米8（骁龙845），请问如何继续优化时间？

RandomSampleCrop_v2

实际没有采用任何crop操作？为什么？故意不采用么？

Does the model have facial landmark branch?

Hi, just wondering does the model have facial landmark output? like 5 keypoints or sth. BTW, this is a great work.

label = f"{class_names[labels[i]]}: {probs[i]: .2f}" error：invalid syntax

How to slove this problems? thank you

训练的数据增强里我看你crop的都是长宽比1：1的，然后直接resize成320240这种比例，为啥输入的比例是320240这种的，不应该方形更符合数据增强的策略吗

测试结果

你好，请问你这边验证集上的测试结果是计算map吗，还是计算score_threshold 0.7下的精度呢

Good job! How to change the setting as I want run this algorithm in cpu mode.Thanks!

这个有什么用？人脸计数么？

got errors when loading the weights!!! Help!!!

Good JOB!!!

But i hit an error when running the detect_imgs.py with cpu， @Linzaer ， can you please help to address the issue?

priors nums:4420
Traceback (most recent call last):
File "run_video_face_detect.py", line 44, in
net.load(model_path)
File "D:\yyf_workspace\Ultra-Light-Fast-Generic-Face-Detector-1MB-master\vision\ssd\ssd.py", line 138, in load
self.load_state_dict(torch.load(model, map_location=lambda storage, loc: storage))
File "C:\Users\fkt00044\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\nn\modules\module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SSD:
Unexpected key(s) in state_dict: "base_net.0.1.num_batches_tracked", "base_net.1.1.num_batches_tracked", "base_net.1.4.num_batches_tracked", "base_net.2.1.num_batches_tracked", "base_net.2.4.num_batches_tracked", "base_net.3.1.num_batches_tracked", "base_net.3.4.num_batches_tracked", "base_net.4.1.num_batches_tracked", "base_net.4.4.num_batches_tracked", "base_net.5.1.num_batches_tracked", "base_net.5.4.num_batches_tracked", "base_net.6.1.num_batches_tracked", "base_net.6.4.num_batches_tracked", "base_net.7.branch0.0.bn.num_batches_tracked", "base_net.7.branch0.1.bn.num_batches_tracked", "base_net.7.branch0.2.bn.num_batches_tracked", "base_net.7.branch1.0.bn.num_batches_tracked", "base_net.7.branch1.1.bn.num_batches_tracked", "base_net.7.branch1.2.bn.num_batches_tracked", "base_net.7.branch2.0.bn.num_batches_tracked", "base_net.7.branch2.1.bn.num_batches_tracked", "base_net.7.branch2.2.bn.num_batches_tracked", "base_net.7.branch2.3.bn.num_batches_tracked", "base_net.7.ConvLinear.bn.num_batches_tracked", "base_net.7.shortcut.bn.num_batches_tracked", "base_net.8.1.num_batches_tracked", "base_net.8.4.num_batches_tracked", "base_net.9.1.num_batches_tracked", "base_net.9.4.num_batches_tracked", "base_net.10.1.num_batches_tracked", "base_net.10.4.num_batches_tracked", "base_net.11.1.num_batches_tracked", "base_net.11.4.num_batches_tracked", "base_net.12.1.num_batches_tracked", "base_net.12.4.num_batches_tracked".
[ WARN:0] global C:\projects\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (674) SourceReaderCB::~SourceReaderCB terminating async callback

ONNX模型导入到TensorRT报错。

模型是Mb_Tiny_RFB_FD_train_input_640.pth

这个是没有用imagenet做预训练，从零开始训的是吧？

Raspberry Pi 4b OS choice

Hi,
thanks for the great work, I 'd like to ask what is your OS choice on Raspberry Pi 4b ? Cuz 32-bit OS like Raspbain will encounter cross compiling issue when setting up MNN inference tool. Or your choice is to get a different cross compiling tool that works on 32-bit OS.

Thanks again for your answer!

linzaer / ultra-light-fast-generic-face-detector-1mb Goto Github PK