This is nontrivial. Possibility 1: Completely in Python <ul di

Yes, the latter. I created this issue and <a class="issue-link js-issue-link" data-err

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

My timing tests: <div class="snippet-clipboard-content notranslate position-relati

Load or call forward pass of the network in Python. [100% Python implementation] about openface HOT 28 CLOSED

cmusatyalab commented on May 19, 2024

Load or call forward pass of the network in Python. [100% Python implementation]

from openface.

Comments (28)

hexerei commented on May 19, 2024

am i getting this right, that eventually this will need both: a 100% Lua implementation AND a 100% Python implementation (since both could work with C++ wrappers) OR is it desirable to figure which implementation would be best (less work, less installation, etc.)

from openface.

bamos commented on May 19, 2024

Yes, the latter. I created this issue and #3 to help me think about simplifying the semi-complex Lua and Python dependencies by only requiring users to set up a Python or a Lua environment. Would be great for users to just pip install openface or luarocks install openface. However, the more I think about it, the less feasible this seems.

I think the best solution for the near-future will be to improve how Python calls into Torch with #4.

from openface.

Strateus commented on May 19, 2024

Brandon,
as i tested your solution, it takes roughly half a second to make a prediction on forward pass with your demo network. I saw some time.sleep(0.5) as well, so i consider this can be reworked as Torch output will be embedded in Python. Can you please clarify what is the fastest forward pass in current conditions with demo network you trained? Can it be lowered for example to 50ms?
And im looking forward to help with Torch embedding, problem is im familiar only with Python and can help on its side. So if you see how i might help, please let me know.

from openface.

bamos commented on May 19, 2024

Hi @Strateus - the only time.sleep is in the initialization of the wrapper around the Torch subprocess. This loads the ~1GB model and I added the sleep to make sure it was correctly loaded since there may be errors from Lua dependencies or elsewhere. This sleep does not occur every time an image is processed.

openface/openface/__init__.py

Line 48 in 5c4e692

time.sleep(0.5)

from openface.

Strateus commented on May 19, 2024

So what average minimum forward pass in your tests then? im still building my infrastructure on windows (with Torch) before i can make thorough tests, im developing under win for now so i cant test it myself, but when i tested on Fedora, it was like 0.5s.

from openface.

bamos commented on May 19, 2024

Depending on the image size, detecting, aligning, and representing takes about 0.5s.

from openface.

Strateus commented on May 19, 2024

is it Torch-Python bond bottleneck? If we embed torch model in python, would it help to lower to ~50ms in order to process video in real time? Im talking about 100x100 images, give or take.

from openface.

bamos commented on May 19, 2024

I don't think Torch-Python is the bottleneck. I think performance improvements need algorithmic changes, like quantizing the neural net or reducing its size. Also, object tracking runs in a few hundred milliseconds and can interwoven in frames when predicting in real-time video so face recognition doesn't have to be run on every frame. See http://blog.dlib.net/2015/02/dlib-1813-released.html

from openface.

Strateus commented on May 19, 2024

Just tried this object tracking, working not so good on faces. Thank you anyways for your thoughts and clarifications.

from openface.

Strateus commented on May 19, 2024

My timing tests:

2015-10-28 18:38:46,577 DEBUG Thread-2   Openface Extractor: Decoded image of shape (150, 150, 3)
2015-10-28 18:38:46,579 DEBUG Thread-2   Openface Extractor: Bounding box: [(0, 0) (150, 150)], timing: 2ms
2015-10-28 18:38:46,628 DEBUG Thread-2   Openface Extractor: Aligned image of shape (96, 96, 3), timing: 49ms
2015-10-28 18:38:49,333 DEBUG Thread-2   Openface Extractor: Net vector length: 128, timing: 2705ms
2015-10-28 18:38:49,866 DEBUG Thread-2   Openface Extractor: Decoded image of shape (150, 150, 3)
2015-10-28 18:38:49,868 DEBUG Thread-2   Openface Extractor: Bounding box: [(0, 0) (150, 150)], timing: 2ms
2015-10-28 18:38:49,920 DEBUG Thread-2   Openface Extractor: Aligned image of shape (96, 96, 3), timing: 52ms
2015-10-28 18:38:52,762 DEBUG Thread-2   Openface Extractor: Net vector length: 128, timing: 2842ms
2015-10-28 18:38:53,293 DEBUG Thread-2   Openface Extractor: Decoded image of shape (150, 150, 3)
2015-10-28 18:38:53,295 DEBUG Thread-2   Openface Extractor: Bounding box: [(0, 0) (150, 150)], timing: 2ms
2015-10-28 18:38:53,335 DEBUG Thread-2   Openface Extractor: Aligned image of shape (96, 96, 3), timing: 40ms
2015-10-28 18:38:56,171 DEBUG Thread-2   Openface Extractor: Net vector length: 128, timing: 2835ms
2015-10-28 18:38:56,701 DEBUG Thread-2   Openface Extractor: Decoded image of shape (150, 150, 3)
2015-10-28 18:38:56,707 DEBUG Thread-2   Openface Extractor: Bounding box: [(0, 0) (150, 150)], timing: 6ms
2015-10-28 18:38:56,767 DEBUG Thread-2   Openface Extractor: Aligned image of shape (96, 96, 3), timing: 60ms
2015-10-28 18:38:59,481 DEBUG Thread-2   Openface Extractor: Net vector length: 128, timing: 2713ms
2015-10-28 18:38:59,529 DEBUG Thread-2   Openface Extractor: Decoded image of shape (150, 150, 3)
2015-10-28 18:38:59,531 DEBUG Thread-2   Openface Extractor: Bounding box: [(0, 0) (150, 150)], timing: 2ms
2015-10-28 18:38:59,601 DEBUG Thread-2   Openface Extractor: Aligned image of shape (96, 96, 3), timing: 70ms
2015-10-28 18:39:02,400 DEBUG Thread-2   Openface Extractor: Net vector length: 128, timing: 2799ms

roughly 2.5-3s for network forward pass, on AMD with 8 cores under Fedora.
maybe i compiled something wrong?

from openface.

bamos commented on May 19, 2024

Interesting the network is taking so long. Try running ./util/profile-network.lua to profile just the network. On my MBP's CPU, this gives:

openface(master)$ ./util/profile-network.lua
Single image forward pass: 75.00 ms +/- 12.89 ms

from openface.

bamos commented on May 19, 2024

And this number is consistent with the timing in compare.py:

openface(master)$ ./demos/compare.py images/examples/lennon-{1,2}.jpg --verbose
Argument parsing and loading libraries took 1.06411409378 seconds.
Loading the dlib and OpenFace models took 1.72060704231 seconds.
Processing images/examples/lennon-1.jpg.
  + Original size: (1050, 1400, 3)
  + Face detection took 1.44350385666 seconds.
  + Face alignment took 1.42118906975 seconds.
  + OpenFace forward pass took 0.0809760093689 seconds.
Representation:
[ 0.21  0.09  0.07  0.02  0.15 -0.   -0.05  0.05  0.02 -0.    0.02  0.05
 -0.01 -0.03  0.03  0.02  0.   -0.1  -0.19 -0.04 -0.05  0.01 -0.04  0.1
  0.07  0.    0.14  0.1  -0.07  0.04 -0.07 -0.05  0.03 -0.19 -0.23  0.03
 -0.    0.07 -0.09 -0.1   0.05 -0.2   0.12 -0.13  0.08 -0.05 -0.18  0.02
 -0.09  0.05 -0.04 -0.    0.02  0.03 -0.01 -0.07 -0.03 -0.08  0.05 -0.09
  0.12  0.04 -0.12  0.02 -0.   -0.14  0.1  -0.02  0.1   0.03 -0.13  0.01
  0.06 -0.07 -0.06  0.06  0.07 -0.05  0.03  0.   -0.06 -0.17 -0.11 -0.01
  0.09  0.18  0.05 -0.08  0.09  0.19  0.02 -0.04  0.16 -0.07 -0.02  0.02
 -0.08  0.06  0.    0.04  0.14  0.15  0.12  0.07 -0.02 -0.01  0.03 -0.03
 -0.11  0.02  0.05  0.15 -0.18  0.06  0.03  0.07 -0.13 -0.04  0.03  0.04
  0.01 -0.21 -0.05  0.07  0.01 -0.12 -0.04  0.01]
-----

Processing images/examples/lennon-2.jpg.
  + Original size: (1200, 1200, 3)
  + Face detection took 1.39798402786 seconds.
  + Face alignment took 1.39780211449 seconds.
  + OpenFace forward pass took 0.0754799842834 seconds.
Representation:
[ 0.14  0.06  0.07  0.02  0.09  0.03  0.01  0.01 -0.   -0.05  0.06  0.08
 -0.04 -0.07  0.08 -0.02  0.01 -0.03 -0.18 -0.02 -0.01 -0.02 -0.02  0.12
  0.08 -0.03  0.18  0.06  0.01 -0.    0.02 -0.03  0.06 -0.18 -0.17  0.14
  0.03  0.14 -0.06 -0.18  0.01 -0.18  0.1  -0.2   0.06 -0.03 -0.13  0.
 -0.04  0.16  0.05 -0.05  0.05  0.1  -0.04 -0.02 -0.1  -0.06  0.01 -0.07
  0.12  0.02 -0.11  0.03 -0.04 -0.17  0.07 -0.05  0.14 -0.01 -0.12  0.04
  0.06  0.01 -0.12 -0.01  0.09 -0.05 -0.07 -0.1  -0.09 -0.13 -0.1  -0.04
  0.04  0.19 -0.03 -0.06  0.05  0.16  0.06 -0.09  0.11 -0.02  0.08  0.01
 -0.03  0.01 -0.04 -0.02  0.12  0.17  0.06  0.12  0.03 -0.09 -0.    0.01
 -0.11  0.01  0.08  0.11 -0.16  0.12  0.04  0.05 -0.05 -0.02  0.05  0.06
  0.04 -0.21  0.03  0.07  0.07 -0.17 -0.01  0.06]
-----

Comparing images/examples/lennon-1.jpg with images/examples/lennon-2.jpg.
  + Squared l2 distance between representations: 0.276853047155

from openface.

Strateus commented on May 19, 2024

Used killall python, and this seems doesn't kill lua processes, system was full of them.
After clearing here what tests showed:

Single image forward pass: 496.52 ms +/- 21.57 ms

Processing images/examples/lennon-1.jpg.
  + Original size: (1050, 1400, 3)
  + Face detection took 1.44743204117 seconds.
  + Face alignment took 1.51719808578 seconds.
  + OpenFace forward pass took 0.579107046127 seconds.
Processing images/examples/lennon-2.jpg.
  + Original size: (1200, 1200, 3)
  + Face detection took 1.40661597252 seconds.
  + Face alignment took 1.43040680885 seconds.
  + OpenFace forward pass took 0.530766010284 seconds.

My tests:

2015-10-28 20:50:32,876 DEBUG Thread-2   Openface Extractor: Decoded image of shape (150, 150, 3)
2015-10-28 20:50:32,877 DEBUG Thread-2   Openface Extractor: Bounding box: [(0, 0) (150, 150)], timing: 1ms
2015-10-28 20:50:32,911 DEBUG Thread-2   Openface Extractor: Aligned image of shape (96, 96, 3), timing: 34ms
2015-10-28 20:50:33,428 DEBUG Thread-2   Openface Extractor: Net vector length: 128, timing: 517ms

During profiling about 55% of CPU only was used, looks like 4 out of 8 cores. Any parameters i can check within lua on this topic?

from openface.

bamos commented on May 19, 2024

You shouldn't have the loose Lua processes. The exit handler in TorchWrap should be killing those off, which always happens on my MBP, Linux machines, and Docker instances:

root@ae33cef9ed0f:/openface# ./demos/compare.py ./images/examples/clapton-{1,2}.jpg
Comparing ./images/examples/clapton-1.jpg with ./images/examples/clapton-2.jpg.
  + Squared l2 distance between representations: 0.258562088431
root@ae33cef9ed0f:/openface# ps aux | grep -i 'lua'
root        48  0.0  0.0   8868   800 ?        S+   16:57   0:00 grep --color=auto -i lua

What git commit of the code do you have checked out?

Regardless, #4 will fix this by running a separate Lua server and communicating over localhost with TCP.

from openface.

Strateus commented on May 19, 2024

TorchWrap in my code uses atexit module. I added some stopping on kills also, just to backup, so its no big deal.
Any speedup thoughts about Lua? or its pretty much it for me?

from openface.

bamos commented on May 19, 2024

I'm not sure how to speedup Torch network executions on the CPU. Let me know if you find anything useful out. Depending on your setup, a cheap GPU may also help, it will need around 2GB of memory.

from openface.

Strateus commented on May 19, 2024

Tried on VPS with 24 cores (2300Mhz each):
Single image forward pass: 461.33 ms +/- 124.23 ms

really strange

from openface.

bamos commented on May 19, 2024

Closing, we're focusing on using a Lua server in a separate process in #4 that communicates with the Python code.

from openface.

hbredin commented on May 19, 2024

would https://github.com/facebook/fb-caffe-exts#torch2caffe help?

from openface.

Strateus commented on May 19, 2024

Thats awesome, thank you!

from openface.

commented on May 19, 2024

I have a project where I am using VGG Face detection network ported from Torch to TensorFlow. Since this issue was discussed before TensorFlow was available, it might be worth to reopen it, or open another issue. I think porting/loading inception network to TensorFlow should be straightforward.

https://github.com/akshayubhat/tensorface

from openface.

Strateus commented on May 19, 2024

AKSHAYUBHAT, why would you want to use TensorFlow instead of Caffe, Torch, Theano, Chainer, Cuda-convnet?

from openface.

commented on May 19, 2024

Caffe author is a contributor to TensorFlow. It has become tedious to keep track of pull requests and each different version. E.g. There is Apollo Caffe, Caffe-Future (for semantic segmentation) and so on. At this point of time there is no "Caffe". Whether the master branch will work or not depends upon the network. Also there is already Caffe 2 development underway.

Torch is great (especially with iTorch) but I find it difficult to context switch from one language to another, it also additionally requires development of packages for all secondary tasks (Serialization, Server, XML, GIS etc.) which complicate the effort. Other things such as package management are also very difficult . E.g. there is no luarocks update command. When I tried to get OpenFace to work on an AMI, I had to keep on looking for errors and update each package (nn, torch , image) until it finally worked.

Theano has delays associated with compilation, where a mistake in shape / dimension requires waiting again. Regarding the rest (Chainer , Cuda-convnet) there isn't enough momentum.

Tensor Flow allows very simple integration with rest of the python ecosystem, this is especially helpful when you are integrating neural networks into tasks such as training on CT scans (Dicom library for Python) or GIS etc. There is a huge momentum behind TF with well tested ports on Android.

from openface.

Strateus commented on May 19, 2024

Thank you, that really makes sense. I suffered similar problems with Caffe and was wondering why it is so hard to get it up and running.
Did you consider Mxnet as well btw? I just found out about it and it seems very promising (in momentum as well) since these guys made XGboost (the most used on Kaggle).

from openface.

bamos commented on May 19, 2024

Hi all,

@AKSHAYUBHAT - can you tell a noticeable difference between Torch's and Tensorflow's performance? soumith/convnet-benchmarks#66 shows that Tensorflow's performance isn't very good, but I expect it'll greatly improve over time.

@hbredin - thanks for the torch2caffe reference!
Using it might be a little messy because I don't think Caffe's master branch supports all of the features that I'm using with Torch, and looking at the supported layers for torch2caffe in torch_layers.lua shows that some modifications will need to be done:

Inception layers will probably be straightforward
From Caffe's docs, I don't think there's an equivalent for SpatialLPPooling
l2 Normalization, can possibly be excluded from the Caffe network and just done in Python since it's the last layer and the gradient isn't needed if Caffe is just used for inference.

Re-opening with help wanted tag if anybody wants to help with this. It would help simplify OpenFace deployments and the Python code.

-Brandon.

from openface.

Strateus commented on May 19, 2024

Just in case:

Chainer Caffe loading module, ie you can load VGG model and finetune the last layer for embeddings: http://docs.chainer.org/en/stable/reference/caffe.html
Mxnet caffe converter + model importing tool: apache/mxnet#628

Both frameworks pure python. I made a triplet loss (with Alfredo, Lua TripletLoss author, great help) for Chainer and still testing it. Im not sure if i did everything correct with l2_norm layer as well (there is no l2_norm with backprop in Chainer). Once i've finish my tests and sure that everything is working, i will contribute here with results.

Btw anyone have ideas which small dataset/network i could test TL implementation with? Mnist seems not like a good idea i think.

from openface.

Strateus commented on May 19, 2024

Pushed TripletLoss for Chainer to https://github.com/Strateus/TripletChain
Will add more files soon to show usage and results.

from openface.

stale commented on May 19, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from openface.

Load or call forward pass of the network in Python. [100% Python implementation] about openface HOT 28 CLOSED

Comments (28)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent