Giter VIP home page Giter VIP logo

ultra-light-fast-generic-face-detector-1mb's Introduction

English | 中文简体

Ultra-Light-Fast-Generic-Face-Detector-1MB

Ultra-lightweight face detection model

img1 This model is a lightweight facedetection model designed for edge computing devices.

Tested the environment that works

  • Ubuntu16.04、Ubuntu18.04、Windows 10(for inference)
  • Python3.6
  • Pytorch1.2
  • CUDA10.0 + CUDNN7.6

Accuracy, speed, model size comparison

The training set is the VOC format data set generated by using the cleaned widerface labels provided by Retinaface in conjunction with the widerface data set (PS: the following test results were obtained by myself, and the results may be partially inconsistent).

Widerface test

  • Test accuracy in the WIDER FACE val set (single-scale input resolution: 320*240 or scaling by the maximum side length of 320)
Model Easy Set Medium Set Hard Set
libfacedetection v1(caffe) 0.65 0.5 0.233
libfacedetection v2(caffe) 0.714 0.585 0.306
Retinaface-Mobilenet-0.25 (Mxnet) 0.745 0.553 0.232
version-slim 0.77 0.671 0.395
version-RFB 0.787 0.698 0.438
  • Test accuracy in the WIDER FACE val set (single-scale input resolution: VGA 640*480 or scaling by the maximum side length of 640 )
Model Easy Set Medium Set Hard Set
libfacedetection v1(caffe) 0.741 0.683 0.421
libfacedetection v2(caffe) 0.773 0.718 0.485
Retinaface-Mobilenet-0.25 (Mxnet) 0.879 0.807 0.481
version-slim 0.853 0.819 0.539
version-RFB 0.855 0.822 0.579
  • This part mainly tests the effect of the test set under the medium and small resolutions.
  • RetinaFace-mnet (Retinaface-Mobilenet-0.25), from a great job insightface, when testing this network, the original image is scaled by 320 or 640 as the maximum side length, so the face will not be deformed, and the rest of the networks will have a fixed size resize. At the same time, the result of the RetinaFace-mnet optimal 1600 single-scale val set was 0.887 (Easy) / 0.87 (Medium) / 0.791 (Hard).

Terminal device inference speed

  • Raspberry Pi 4B MNN Inference Latency (unit: ms) (ARM/A72x4/1.5GHz/input resolution: 320x240 /int8 quantization)
Model 1 core 2 core 3 core 4 core
libfacedetection v1 28 16 12 9.7
Official Retinaface-Mobilenet-0.25 (Mxnet) 46 25 18.5 15
version-slim 29 16 12 9.5
version-RFB 35 19.6 14.8 11
Model Inference Latency(ms)
slim-320 6.33
RFB-320 7.8
Model Inference Latency(ms)
slim-320 65.6
RFB-320 164.8

Model size comparison

  • Comparison of several open source lightweight face detection models:
Model model file size(MB)
libfacedetection v1(caffe) 2.58
libfacedetection v2(caffe) 3.34
Official Retinaface-Mobilenet-0.25 (Mxnet) 1.68
version-slim 1.04
version-RFB 1.11

Generate VOC format training data set and training process

  1. Download the wideface official website dataset or download the training set I provided and extract it into the ./data folder:

(1) The clean widerface data pack after filtering out the 10px*10px small face: Baidu cloud disk (extraction code: cbiu)Google Drive

(2) Complete widerface data compression package without filtering small faces: Baidu cloud disk (extraction code: ievk)Google Drive

  1. (PS: If you download the filtered packets in (1) above, you don't need to perform this step) Because the wideface has many small and unclear faces, which is not conducive to the convergence of efficient models, it needs to be filtered for training.By default,faces smaller than 10 pixels by 10 pixels will be filtered. run ./data/wider_face_2_voc_add_landmark.py
 python3 ./data/wider_face_2_voc_add_landmark.py

After the program is run and finished, the wider_face_add_lm_10_10 folder will be generated in the ./data directory. The folder data and data package (1) are the same after decompression. The complete directory structure is as follows:

  data/
    retinaface_labels/
      test/
      train/
      val/
    wider_face/
      WIDER_test/
      WIDER_train/
      WIDER_val/
    wider_face_add_lm_10_10/
      Annotations/
      ImageSets/
      JPEGImages/
    wider_face_2_voc_add_landmark.py
  1. At this point, the VOC training set is ready. There are two scripts: train-version-slim.sh and train-version-RFB.sh in the root directory of the project. The former is used to train the slim version model, and the latter is used. Training RFB version model, the default parameters have been set, if the parameters need to be changed, please refer to the description of each training parameter in ./train.py.

  2. Run train-version-slim.sh train-version-RFB.sh

sh train-version-slim.sh or sh train-version-RFB.sh

Detecting image effects (input resolution: 640x480)

img1 img1 img1

PS

  • If the actual production scene is medium-distance, large face, and small number of faces, it is recommended to use input size input_size: 320 (320x240) resolution for training, and use 320x240 ,160x120 or 128x96 image size input for inference, such as using the provided pre-training model version-slim-320.pth or version-RFB-320.pth .
  • If the actual production scene is medium or long distance, medium or small face and large face number, it is recommended to adopt:

(1) Optimal: input size input_size: 640 (640x480) resolution training, and use the same or larger input size for inference, such as using the provided pre-training model version-slim-640.pth or version-RFB-640.pth for inference, lower False positives.

 (2) Sub-optimal: input size input_size: 320 (320x240) resolution training, and use 480x360 or 640x480 size input for predictive reasoning, more sensitive to small faces, false positives will increase.  

  • The best results for each scene require adjustment of the input resolution to strike a balance between speed and accuracy.
  • Excessive input resolution will enhance the recall rate of small faces, but it will also increase the false positive rate of large and close-range faces, and the speed of inference will increase exponentially.
  • Too small input resolution will significantly speed up the inference, but it will greatly reduce the recall rate of small faces.
  • The input resolution of the production scene should be as consistent as possible with the input resolution of the model training, and the up and down floating should not be too large.

TODO LIST

  • Add some test data

Completed list

Third-party related projects

Reference

ultra-light-fast-generic-face-detector-1mb's People

Contributors

alonegiveup avatar cclauss avatar daquexian avatar deftruth avatar ingjieye avatar jackweiwang avatar jason9075 avatar linzaer avatar ninjarz avatar sunnycase avatar themisir avatar zimoqingfeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ultra-light-fast-generic-face-detector-1mb's Issues

How to get facial landmark output?

Hello,thank you ao much for awesome work,how yo get landmarks points from this model?
Im using Face recognition and i need to have facial landmarks.

*.onnx模型和*_ncnn.onnx、*_ncnn_slim.onnx模型的区别

十分感谢您的工作,对我很有帮助。但我发现针对同一输入尺寸您提供的onnx模型似乎有些细微差别,.onnx 和 _ncnn.onnx不能直接转换成ncnn的模型,只有_ncnn_slim.onnx可以转换而且推理无误,想跟您请教下如何得到_ncnn_slim.onnx的模型?

移动端使用问题

如果要在iOS或者安卓上使用,是把这个程序作为后端部署在服务器上给移动端设备调用吗?

pytorch 转onnx

你好,我想问一下,为什么pytroch转为onnx模型后,不需要重写生成框的代码呢?而ncnn就需要。

RandomSampleCrop_v2问题

1

在该函数中,if h/w != 1 一直成立(h,w为随机的小数,未进行int操作,h/w==1难以满足), 一直执行continue, 直到采样mode==None(即不执行任何crop操作), 函数return。 因此该函数未执行crop操作。

关于模型量化

非常棒的工作!
关于mnn量化的模型,你这里给出了前向的时间,想问下量化后的检测精度怎样呢?下降程度大吗?注意到你骨干类似mobilenetv1, 为何不用带残差的v2 v3类似结构?

Android

请问Android端适合部署和使用呢?

win10怎样运行?

配置环境以后pycharm import torch不报错,git bash import torch还是报错,.sh又需要git bash运行......是不是Linux才可以呀😭

关于输入大小和retinaface 的问题

想问下我的理解对不对:
1.wider face的结果,这里提到的input 320240 和 640480是专指训练的时候的大小吗,inference的时候直接输入原图对吧?
2.mobilenev1_0.25版的retinaface先按照640*480的训练,再在原图上inference得到 0.879 | 0.807 | 0.481的结果吗?

Wider face test or val?

感谢你的工作! 麻烦请问一下测试的时候是在widerface val(官方公布了标注文件)进行测试的么? widerface test数据集官网没有放出来标签,评测的话一般是需要提交txt文件给官方代为评测.

测试结果

你好,请问你这边验证集上的测试结果是计算map吗,还是计算score_threshold 0.7下的精度呢

关于ncnn2int8量化

你好,我用ncnn提供的量化工具,将您给的模型ncnn.bin.ncnn.param进行量化,再用量化后的模型,用ncnn调用预测,结果出错了,请问您能否提供一下量化后的模型,谢谢!

小问题反馈

nice work👏 最近正好在做人脸检测项目,目前使用的是Retina_face,昨天看见代码,今天正好在实际数据上测试了一下. 总体来说,效果还是很棒的,端正的人脸应该能全检出. 问题主要有两个:1.拍摄时人头部往上仰但人脸完整,召回比较低,0.5左右. 2.测试使用的图片视频比较小,里面的小人脸检测不出来. 这两种情况在我的测试集中retina_face全部都能检测出来. 不过速度确实比Retina_face快多了🎉🎉🎉 其实也不算问题啦,就是实验结果报告😄

anchor修改

下面的代码就是修改anchor的地方吗?
def generate_priors(feature_map_list, shrinkage_list, image_size, min_boxes, clamp=True) -> torch.Tensor:
priors = []
for index in range(0, len(feature_map_list[0])):
scale_w = image_size[0] / shrinkage_list[0][index]
scale_h = image_size[1] / shrinkage_list[1][index]
for j in range(0, feature_map_list[1][index]):
for i in range(0, feature_map_list[0][index]):
x_center = (i + 0.5) / scale_w
y_center = (j + 0.5) / scale_h

            for min_box in min_boxes[index]:
                w = min_box / image_size[0]
                h = min_box / image_size[1]
                priors.append([
                    x_center,
                    y_center,
                    w,
                    h
                ])
print("priors nums:{}".format(len(priors)))
priors = torch.tensor(priors)
if clamp:
    torch.clamp(priors, 0.0, 1.0, out=priors)
return priors

anchor的设定

您好,看了下代码中anchor设定并没有加入一些h和w有1:2或者2:1的,是因为人脸的size一般是h 和 w 1:1的吗。

Raspberry Pi 4b OS choice

Hi,
thanks for the great work, I 'd like to ask what is your OS choice on Raspberry Pi 4b ? Cuz 32-bit OS like Raspbain will encounter cross compiling issue when setting up MNN inference tool. Or your choice is to get a different cross compiling tool that works on 32-bit OS.

Thanks again for your answer!

onnx到mnn转换失败

Start to Convert Other Model Format To MNN Model...
libc++abi.dylib: terminating with uncaught exception of type Error: [19:05:03] /Users/wuzhuo/MNN/tools/converter/source/onnx/onnxConverter.cpp:27: Check failed: success ==> read onnx model failed: /Users/wuzhuo/Downloads/Mb_Tiny_RFB_FD_train_input_320.onnx
[1] 94897 abort ./MNNConvert -f ONNX --modelFile --MNNModel test.mnn --bizCode MNN

同时Netron打开onnx模型报错:
File format is not onnx.ModelProto (invalid wire type 4 at offset 36) in 'Mb

failed to convert *.pth to ncnn model

你好,我尝试将预训练中的*.pth模型按说明中的先修改源代码转为onnx后,无法将该onnx模型通过onnx simplifier转换为新的onnx模型,想确认下你工程中给出的转换流程是否如下:
1.按说明修改源代码,执行convert_to_onnx将.pth模型转为onnx模型
2.用onnx simplifier工具将onnx模型转换为onnx_sim模型
3.编译ncnn,利用ncnn中tools/onnx下的转换工具将onnx_sim模型转换为对应的NCNN模型

got errors when loading the weights!!! Help!!!

Good JOB!!!

But i hit an error when running the detect_imgs.py with cpu, @Linzaer , can you please help to address the issue?

priors nums:4420
Traceback (most recent call last):
File "run_video_face_detect.py", line 44, in
net.load(model_path)
File "D:\yyf_workspace\Ultra-Light-Fast-Generic-Face-Detector-1MB-master\vision\ssd\ssd.py", line 138, in load
self.load_state_dict(torch.load(model, map_location=lambda storage, loc: storage))
File "C:\Users\fkt00044\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\nn\modules\module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SSD:
Unexpected key(s) in state_dict: "base_net.0.1.num_batches_tracked", "base_net.1.1.num_batches_tracked", "base_net.1.4.num_batches_tracked", "base_net.2.1.num_batches_tracked", "base_net.2.4.num_batches_tracked", "base_net.3.1.num_batches_tracked", "base_net.3.4.num_batches_tracked", "base_net.4.1.num_batches_tracked", "base_net.4.4.num_batches_tracked", "base_net.5.1.num_batches_tracked", "base_net.5.4.num_batches_tracked", "base_net.6.1.num_batches_tracked", "base_net.6.4.num_batches_tracked", "base_net.7.branch0.0.bn.num_batches_tracked", "base_net.7.branch0.1.bn.num_batches_tracked", "base_net.7.branch0.2.bn.num_batches_tracked", "base_net.7.branch1.0.bn.num_batches_tracked", "base_net.7.branch1.1.bn.num_batches_tracked", "base_net.7.branch1.2.bn.num_batches_tracked", "base_net.7.branch2.0.bn.num_batches_tracked", "base_net.7.branch2.1.bn.num_batches_tracked", "base_net.7.branch2.2.bn.num_batches_tracked", "base_net.7.branch2.3.bn.num_batches_tracked", "base_net.7.ConvLinear.bn.num_batches_tracked", "base_net.7.shortcut.bn.num_batches_tracked", "base_net.8.1.num_batches_tracked", "base_net.8.4.num_batches_tracked", "base_net.9.1.num_batches_tracked", "base_net.9.4.num_batches_tracked", "base_net.10.1.num_batches_tracked", "base_net.10.4.num_batches_tracked", "base_net.11.1.num_batches_tracked", "base_net.11.4.num_batches_tracked", "base_net.12.1.num_batches_tracked", "base_net.12.4.num_batches_tracked".
[ WARN:0] global C:\projects\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (674) SourceReaderCB::~SourceReaderCB terminating async callback

pc上测试时间问题

Hi 请教下 pc: win7 cpu E5-1603 2.80GHz ,net_type: mb_tiny_fd, input_size:320,图片大小为320*240,测试14张图片,平均时间为:203ms, 请问测试的时间对吗?cpu下感觉时间有点长。

关于pth和onnx模型所需输入图像预处理步骤不同的问题

Linzaer你好,

非常感谢你开源的这个代码。我根据你的代码训练了一个网络,实测效果很好而且也成功转为ONNX格式了。

但我发现PyTorch模型的输入的元素值依旧是0到255之间的,而ONNX模型的输入却必须归一化到 -1到1之间,期间我检查了PyTorch模型的权重参数,与ONNX相同。想请问一下这是什么原因?

另外我因工作任务需要不得不转换模型至Keras的.h5模型,也发现一些问题:
模型结构上看,整个模型的第二层就是Batch Normalization,它需要4个参数:gamma/weight、beta/bias、running_mean和running_var,而第一层就是一个普通的卷积层。
由于PyTorch模型的输入是0到255范围的图像输入,所以第一个卷积层的输出(feature map)也是值横跨范围较大的(已验证)。这种情况下统计出的mean和var往往应该是很大的,mean中最大的值可能会达到100+,而var的值往往会高达1000+。但不管怎样训练从PyTorch模型中查看到的mean和var都是0到1之间的非常小的数, 根据PyTorch的BN层定义,这样是无法批量归一化的,但实际结果是PyTorch用这样小的mean和var能够将大值输入的feature map归一化,而我将权重提取出来实现为Keras的BN层却不能……(两种格式的模型的第一个卷积层输出已验证)。

所以想请问一下PyTorch的Batch Normalization是不是中间包含什么特殊操作(源码看了跟它写的公式明明是一样的)?或者说我是不是对图像输入预处理的步骤理解错误了?

PS:目前已验证的步骤(对比PyTorch和Keras):1. 测试时的网络输入一致(值相同),经过第一个卷积层后的输入一致。2. 提取PyTorch模型的BN层参数,并根据PyTorch官方文档里的BN层公式来计算输出,输出结果跟Keras模型的BN层输出一致,与PyTorch的BN层输出不一致(但查看结果,显然PyTorch模型的BN层输出是正确的,基本上徘徊在-1~1之间)……3. PyTorch模型的最终输出正确,能够正常找到人脸框。

不好意思问题有点难表达,如果可以希望能有机会进一步询问,非常感谢。

windows10 , python3.7 环境下执行报错..

Traceback (most recent call last):
  File "H:/demo/Ultra-Light-Fast-Generic-Face-Detector-1MB/train.py", line 347, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "H:/demo/Ultra-Light-Fast-Generic-Face-Detector-1MB/train.py", line 139, in train
    for i, data in enumerate(loader):
  File "H:\worktool\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 278, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "H:\worktool\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 682, in __init__
    w.start()
  File "H:\worktool\Anaconda3\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "H:\worktool\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "H:\worktool\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "H:\worktool\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "H:\worktool\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TrainAugmentation.__init__.<locals>.<lambda>'

priors nums:4420
2019-10-15 10:36:24,269 - root - INFO - Use Cuda.
Traceback (most recent call last):
File "", line 1, in
File "H:\worktool\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "H:\worktool\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

环境:
系统: windows10
python 3.7
pytorch 1.3.0

在Android上推理时间如何优化?

我使用ncnn的C++代码完成了在Android平台的编译,目前测试1000次取平均值,单次时间为42ms左右,测试手机是小米8(骁龙845),请问如何继续优化时间?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.