Giter VIP home page Giter VIP logo

facenet_trt's Introduction

facenet_trt

NVIDIA TensorRT implementation for facenet with pre-train SavedModel. facenet is a project from https://github.com/davidsandberg/facenet to do face recognition with tensorflow.

Changes

  1. facenet.py: Enable facenet pre-train SavedModel with TRT
  2. face.py: Add threshold of probobility for return, change minimum size of face to 50px, change gpu_memory_fraction to 0.3
  3. /align/detect_face.py: Enable TensorRT for PNET only, keep RNET and ONET graph same as before due to batch size warning
  4. face.py and facenet.py: Minor change to support multi-thread
  5. face.py: Change input:0 to batch_join:0 to support both TensorRT4 and TensorRT5
  6. face.py: Add process for TRT INT8 calib if INT8ENABLE=True

TensorRT introduction

"NVIDIA announced the integration of our TensorRT inference optimization tool with TensorFlow. TensorRT integration will be available for use in the TensorFlow 1.7 branch. TensorFlow remains the most popular deep learning framework today while NVIDIA TensorRT speeds up deep learning inference through optimizations and high-performance runtimes for GPU-based platforms. We wish to give TensorFlow users the highest inference performance possible along with a near transparent workflow using TensorRT. The new integration provides a simple API which applies powerful FP16 and INT8 optimizations using TensorRT from within TensorFlow. TensorRT sped up TensorFlow inference by 8x for low latency runs of the ResNet-50 benchmark." - from NVIDIA website.

Latest TensorRT version is 5.0.4.

See details from below links:

https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/

https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html

See documents for support matrix: https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/index.html

TRT Installation document: https://developer.download.nvidia.com/compute/machine-learning/tensorrt/docs/5.0/GA_5.0.2.6/TensorRT-Installation-Guide.pdf

Usage

  1. Get GPU cuda/cudnn, tensorflow-gpu and TensorRT ready
  2. Get facenet ready with https://github.com/davidsandberg/facenet ready
  3. Download here facenet.py, face.py (optional), /align/detect_face.py and replace original files.

Setup

HW Ubuntu Driver CUDA cuDNN TensorRT TensorFlow
Tesla V100 graphic and intel x86_64 16.04 384.111 9.0.179 7.3.1 4.0.1.6 1.12 gpu
Quadro V100 graphic and intel x86_64 18.04 410.93 10.0.117 7.3.1 5.0.3 1.12 gpu
Jetson Xavier with internal GV10B GPU 18.04 L4T 4.1.1 10.0.117 7.3.1 5.0.3 1.12 gpu

Result

*Note: this table is only for face identify inception-resnet v1 network savedmodel runtime improvement compare. Xavier is Jetson Xavier with L4T 4.1.1.

TensorRT 4 result

Face detection with MTCNN: test 30 times with different image at different resolution

Detect Network Avg Time
original network ckpt 41.948318 ms
tensorrt network FP32 41.948318 ms
tensorrt network FP16 42.028268 ms

*Note: suspect MTCNN network is not converted to TensorRT network automatically, will investage more and try plugin later. And due to batch mis-match warning, only enabled pnet TRT convert right now.

Face identify with Inception-ResNet-v1 : test 27 times with different image (crop and alignment 160x160)

Identify Network Avg Time
original network ckpt 13.713258 ms
tensorrt network FP32 11.296281 ms
tensorrt network FP16 10.54711 ms

*Note: INT8 not implemented due to calib issues "nvinfer1::DimsCHW nvinfer1::getCHW(const nvinfer1::Dims): Assertion `d.nbDims >= 3' failed", it is caused by TRT4, with new TRT5, there is no such problem, but still have other issues, see "issues" for more detailed.

*Note: The result is based on savedmodel file, for checkpoints frozen graph, it has similar result.

TensorRT 5 result Similar to TRT4 but the runtime improvement with savedmodel is about 11.89% on GV100.

TensorRT 5 on Xavier result Similar to TRT4 but the runtime improvement with savedmodel is about 23.15% on Xavier: test 20 times with same image (crop and alignment 160x160, except of first long init one)

Identify Network Avg Time
original network ckpt 45.034961 ms
tensorrt network savedmodel FP16 37.567716 ms

facenet_trt's People

Contributors

jerryjiagit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

facenet_trt's Issues

Unresolved reference 'frozen_graph'

Replacing your files directly with those of the facenet project can be using tensorrt to speed up facenet inference, right? I just started learning and I don't know if it's right.
And there is an error in line 401 of the facenet.py. Unresolved reference 'frozen_graph'
Another same error in line 410.
How to solve this problem?
Thanks in advance.

No runtime reducing with TensorRT5/CUDA10/cuDNN7.3 on Jetson Xavier

No runtime reducing with TensorRT5/CUDA10/cuDNN7.3 on Jetson Xavier.

L4T 4.1.1
TensorFlow 1.12.0
TensorRT 5.0.3

Can enabled TensorRT convert with same code, but there is no runtime reducing with TensorRT5/CUDA10/cuDNN7.3 on Jetson Xavier.

For inception-resnet v1 network saved model convert:
Original: 0.0428 s
TensorRT FP16: 0.04263 s

Will try x86 TRT5 and check with NVIDIA TRT team later.

Dimension issue

I'm trying to load classifier.py and getting the following errorL

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 497, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'gradients/InceptionResnetV1/Bottleneck/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512]

ValueError: Node 'gradients/InceptionResnetV1/Bottleneck/BatchNorm/cond/FusedBatchNorm_1_grad/FusedBatchNormGrad' has an _output_shapes attribute inconsistent with the GraphDef for output #3: Dimension 0 in both shapes must be equal, but are 0 and 512. Shapes are [0] and [512].

Please help.

ready for inference model

hi! i'm trying to build own facial recognition system with facenet as feature extractor.when I try to run the output on the jetson tx2, I get very large time delays. i tried to use your project to build tensorrt models, but i failed.
could you provide tensorrt fp16 saved_model please?

Extremely slow. Is facenet's Inception Resnet v1 supported with TensorRT?

@JerryJiaGit
Hello, I tried replacing face.py, facenet.py, detect_face.py as you advised, but when I run the predict.py of facenet with 2 frozen models at https://github.com/davidsandberg/facenet/wiki#pre-trained-models, it runs extremely slowly (it shows a bunch of output and hangs) so I have to stop it from running, I ran on Jetson Nano. Sorry for not providing the output, my bad.

I have searched and doubt that the Inception Resnet v1 network architecture has some layers and ops that are not supported by TensorRT. Currently I'm not sure how to handle this, please give me some advice. Thank you a lot!!

run

hi, thanks in advance

how to run the program? can you provide steps to run the program.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.