Giter VIP home page Giter VIP logo

few_shot_gaze's Introduction

Updates

Release 09/22/2020

  • Incorporated (a) distributed data parallel trainng and (b) fusedSGD optimizer, resulting in 2x faster training.

FAZE: Few-Shot Adaptive Gaze Estimation

This repository contains the code for training and evaluation of our ICCV 2019 work, which was presented as an Oral presentation. FAZE is a framework for few-shot adaptation of gaze estimation networks, consisting of equivariance learning (via the DT-ED or Disentangling Transforming Encoder-Decoder architecture) and meta-learning with gaze-direction embeddings as input.

The FAZE Framework

Links

Training and Evaluation

1. Datasets

Pre-process the GazeCapture and MPIIGaze datasets using the code-base at https://github.com/swook/faze_preprocess which is also available as a git submodule at the relative path, preprocess/.

If you have already cloned this few_shot_gaze repository without pulling the submodules, please run:

git submodule update --init --recursive

After the dataset preprocessing procedures have been performed, we can move on to the next steps.

2. Prerequisites

This codebase should run on most standard Linux systems. We specifically used Ubuntu

Please install the following prerequisites manually (as well as their dependencies), by following the instructions found below:

The remaining Python package dependencies can be installed by running:

pip3 install --user --upgrade -r requirements.txt

3. Pre-trained weights for the DT-ED architecture and MAML models

You can obtain a copy of the pre-trained weights for the Disentangling Transforming Encoder-Decoder and for the various MAML models from the following location.

cd src/
wget -N https://files.ait.ethz.ch/projects/faze/outputs_of_full_train_test_and_plot.zip
unzip -o outputs_of_full_train_test_and_plot.zip

4. Training, Meta-Learning, and Final Evaluation

Run the all-in-one example bash script with:

cd src/
bash full_train_test_and_plot.bash

The bash script should be self-explanatory and can be edited to replicate the final FAZE model evaluation procedure, given that hardware requirements are satisfied (8x GPUs, where each are Tesla V100 GPUs with 32GB of memory).

The pre-trained DT-ED weights should be loaded automatically by the script 1_train_dt_ed.py. Please note that this model can take a long time to train when training from scratch, so we recommend adjusting batch sizes and the using multiple GPUs (the code is multi-GPU-ready).

The Meta-Learning step is also very time consuming, particularly because it must be run for every value of k or number of calibration samples. The code pertinent to this step is 2_meta_learning.py, and its execution is recommended to be done in parallel as shown in full_train_test_and_plot.bash.

5. Outputs

When the full pipeline successfully runs, you will find some outputs in the path src/outputs_of_full_train_test_and_plot, in particular:

  • walks/: mp4 videos of latent space walks in gaze direction and head orientation
  • Zg_OLR1e-03_IN5_ILR1e-05_Net64/: outputs of the meta-learning step.
  • Zg_OLR1e-03_IN5_ILR1e-05_Net64 MAML MPIIGaze.pdf: plotted results of the few-shot learning evaluations on MPIIGaze.
  • Zg_OLR1e-03_IN5_ILR1e-05_Net64 MAML GazeCapture (test).pdf: plotted results of the few-shot learning evaluations on the GazeCapture test set.

Realtime Demo

We also provide a realtime demo that runs with live input from a webcam in the demo/ folder. Please check the separate demo instructions for details of how to setup and run it.

Bibtex

Please cite our paper when referencing or using our code.

@inproceedings{Park2019ICCV,
  author    = {Seonwook Park and Shalini De Mello and Pavlo Molchanov and Umar Iqbal and Otmar Hilliges and Jan Kautz},
  title     = {Few-Shot Adaptive Gaze Estimation},
  year      = {2019},
  booktitle = {International Conference on Computer Vision (ICCV)},
  location  = {Seoul, Korea}
}

Acknowledgements

Seonwook Park carried out this work during his internship at NVIDIA. This work was supported in part by the ERC Grant OPTINT (StG-2016-717054).

few_shot_gaze's People

Contributors

molchanovp avatar shalinidemello avatar swook avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

few_shot_gaze's Issues

Mirror Calibration

The link to the mirror calibration method in the demo instructions is broken. Whats the easiest way to calibrate the web cam / display extrinsics?

Thanks

"Current thread is not the object's thread" and "Segmentation fault (core dumped)" error

Hi, I'm trying to exec run_demo.py. but this warning below (maybe error) appear repeatedly

QObject::moveToThread: Current thread (0x55af5ad9b920) is not the object's thread (0x55af5cb91700).
Cannot move to target thread (0x55af5ad9b920)

and Segmentation fault (core dumped) error occur when I press any key.
That's all errors I got in console.
Can I fix this problem?
thank you

How did you get the camera parameters of GazeCapture?

As I know, GazeCapture doesn't provide the camera parameters(intrinsic parameters and distortion_parameters), but in GazeCapture_supplementary.h5 did have those camera parameters. Did you look them up online via the "DeviceName" in
info.json which provided by the GazeCapture originally. If not in this way ,what is it ?

Scaling error in `landmarks.py`

I am trying to run them demo. Camera Calibration and fine-tuning worked, but the gaze prediction on the screen do not work because the detected face landmarks are scaled in a wired way.

I only ran landmarks.py and can confirm that the error is also present there, see example blow. Did anybody else experience these issues?

Thank you for your help.

image

image

image

How do you get the gaze direction of GazeCapture dataset?

As I know, GazeCapture dataset doesn't include the gaze direction of each sample, so i am curious how do you acquire the gaze direction of that? Can you share the specific idea to compute the gaze direction of GazeCapture samples?

Thanks for your great work ,and I'm looking forward your reply.

Still Confused about how can i get the groud-truth gaze direction of GazeCapture dataset?

Firstly, thanks for your reply . Then, I am sorry you might misread my aforementioned question because of my inexplicit expression. I have already read those pre-process code and paper references ,but I am still confused about how did you get the groud-truth gaze direction of GazeCaputre,since the dataset doesn't come with the gaze direction as original groud-truth info. More specifically,in my understanding, the GazeCapture dataset provides images and the (x,y) position of gaze dot on the phones as well as portable devices ,but doesn't provide the 3D gaze direction ,so how do you compute the 3D gaze dirction as groud-truth to train the model? If i misunderstand this ,please let me know.

IR synthetic data

Hello,
Regarding the NVGaze dataset,; how you generate the IR synthetic data?
I am using UnityEye but it generates RGB synthetic data.
I would appreciate if you guide me.
Thx

Missing key(s) in state_dict: "encoder.initial.conv1.weight"

I'm trying to run the demo on an ubuntu (no GPU), but I'm getting this:

(venv) rchaves@rchaves-VirtualBox:~/Desktop/few_shot_gaze/demo$ python run_demo.py 
/bin/sh: 1: v4l2-ctl: not found
/bin/sh: 1: v4l2-ctl: not found
/bin/sh: 1: v4l2-ctl: not found
> Loading: demo_weights/weights_ted.pth.tar
> Loading: demo_weights/weights_maml/09.pth.tar
Traceback (most recent call last):
  File "run_demo.py", line 98, in <module>
    gaze_network.load_state_dict(ted_weights)
  File "/home/rchaves/Desktop/few_shot_gaze/demo/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DTED:
	Missing key(s) in state_dict: "encoder.initial.conv1.weight", "encoder.initial.conv2.weight", "encoder.block1.compo1.conv.weight", "encoder.block1.compo2.conv.weight", "encoder.block1.compo3.conv.weight", "encoder.block1.compo4.conv.weight", "encoder.trans1.composite.conv.weight", "encoder.block2.compo1.conv.weight", "encoder.block2.compo2.conv.weight", "encoder.block2.compo3.conv.weight", "encoder.block2.compo4.conv.weight", "encoder.trans2.composite.conv.weight", "encoder.block3.compo1.conv.weight", "encoder.block3.compo2.conv.weight", "encoder.block3.compo3.conv.weight", "encoder.block3.compo4.conv.weight", "encoder.trans3.composite.conv.weight", "encoder.block4.compo1.conv.weight", "encoder.block4.compo2.conv.weight", "encoder.block4.compo3.conv.weight", "encoder.block4.compo4.conv.weight", "decoder.block1.compo1.conv.weight", "decoder.block1.compo2.conv.weight", "decoder.block1.compo3.conv.weight", "decoder.block1.compo4.conv.weight", "decoder.trans1.conv.weight", "decoder.block2.compo1.conv.weight", "decoder.block2.compo2.conv.weight", "decoder.block2.compo3.conv.weight", "decoder.block2.compo4.conv.weight", "decoder.trans2.conv.weight", "decoder.block3.compo1.conv.weight", "decoder.block3.compo2.conv.weight", "decoder.block3.compo3.conv.weight", "decoder.block3.compo4.conv.weight", "decoder.trans3.conv.weight", "decoder.block4.compo1.conv.weight", "decoder.block4.compo2.conv.weight", "decoder.block4.compo3.conv.weight", "decoder.block4.compo4.conv.weight", "decoder.last.conv1.weight", "decoder.last.conv2.weight", "decoder.last.conv3.weight", "fc_enc.weight", "fc_enc.bias", "fc_dec.weight", "fc_dec.bias". 
	Unexpected key(s) in state_dict: "module.encoder.initial.conv1.weight", "module.encoder.initial.conv2.weight", "module.encoder.block1.compo1.conv.weight", "module.encoder.block1.compo2.conv.weight", "module.encoder.block1.compo3.conv.weight", "module.encoder.block1.compo4.conv.weight", "module.encoder.trans1.composite.conv.weight", "module.encoder.block2.compo1.conv.weight", "module.encoder.block2.compo2.conv.weight", "module.encoder.block2.compo3.conv.weight", "module.encoder.block2.compo4.conv.weight", "module.encoder.trans2.composite.conv.weight", "module.encoder.block3.compo1.conv.weight", "module.encoder.block3.compo2.conv.weight", "module.encoder.block3.compo3.conv.weight", "module.encoder.block3.compo4.conv.weight", "module.encoder.trans3.composite.conv.weight", "module.encoder.block4.compo1.conv.weight", "module.encoder.block4.compo2.conv.weight", "module.encoder.block4.compo3.conv.weight", "module.encoder.block4.compo4.conv.weight", "module.decoder.block1.compo1.conv.weight", "module.decoder.block1.compo2.conv.weight", "module.decoder.block1.compo3.conv.weight", "module.decoder.block1.compo4.conv.weight", "module.decoder.trans1.conv.weight", "module.decoder.block2.compo1.conv.weight", "module.decoder.block2.compo2.conv.weight", "module.decoder.block2.compo3.conv.weight", "module.decoder.block2.compo4.conv.weight", "module.decoder.trans2.conv.weight", "module.decoder.block3.compo1.conv.weight", "module.decoder.block3.compo2.conv.weight", "module.decoder.block3.compo3.conv.weight", "module.decoder.block3.compo4.conv.weight", "module.decoder.trans3.conv.weight", "module.decoder.block4.compo1.conv.weight", "module.decoder.block4.compo2.conv.weight", "module.decoder.block4.compo3.conv.weight", "module.decoder.block4.compo4.conv.weight", "module.decoder.last.conv1.weight", "module.decoder.last.conv2.weight", "module.decoder.last.conv3.weight", "module.fc_enc.weight", "module.fc_enc.bias", "module.fc_dec.weight", "module.fc_dec.bias", "module.gaze1.weight", "module.gaze1.bias", "module.gaze2.weight", "module.gaze2.bias". 

what am I doing wrong? Maybe because I don't have CUDA?

An formula error (rotation matrics) in paper Few-Shot Adaptive Gaze Estimation

3.2.1 Architecture Overview
the frontal orientation of eyes and heads in our setting
can be represented as (0, 0) in Euler angles notation for azimuth and elevation, respectively assuming no roll, and using the x−y convention. Then, the rotation of the eyes and the head from the frontal orientation can be described us- ing (θg, φg) and
in Euler angles and converted to
rotation matrices defined as,

in this description (The frontal orientation of eyes and heads in our setting
can be represented as (0, 0) in Euler angles notation for azimuth and elevation, respectively assuming no roll) i understood that θ, φ is yaw,pitch respectively, but in rotaion matrices is the opposite. could you explain it,thanks .

normalization gaze part should be np.dot not *

As we have seen, in xucong zhang, the normalization of gaze is np.dot(R, gc). But in this paper, the normalization is R*gc, it is not the same with each other. I am not sure why you used this format.

ValueError: Namespace Gdk not available

I followed the "Realtime Demo Instructions" to run the demo, but found that:
Traceback (most recent call last):
File "run_demo.py", line 22, in
from monitor import monitor
File "/home/rgyao/env/few_shot_gaze/demo/monitor.py", line 10, in
gi.require_version('Gdk', '3.0')
File "/home/rgyao/anaconda3/envs/faze/lib/python3.6/site-packages/gi/init.py", line 126, in require_version
raise ValueError('Namespace %s not available' % namespace)
ValueError: Namespace Gdk not available

could not update process submodules

If you have already cloned this few_shot_gaze repository without pulling the submodules, please run:

git submodule update --init --recursive
i can not download the submodules

after python calibrate_camera.py , keeping show tips and try press any key is not respond

I want to run the demo , i follow each step of the instructions to "2. Camera and Monitor calibration", run "python calibrate_camera.py" , the terminal show "Calibrate camera once. Print pattern.png, paste on a clipboard, show to camera and capture non-blurry images in which points are detected well.Press s to save frame, c to continue to next frame and q to quit collecting data and proceed to calibration."
But now i press s 、c 、 q or any others key , it nothing to next step and keeping this state long time until i press ctrl+z to stop.
I don't konw how to solve this , please help me , thank you.

Hard to replicate because of many sys.path changes and other peculiarities

Just some comments in my attempt at getting the code working, that might be helpful for others.

There are a lot of sys.path.append strewn throughout the run_demo.py and its imported code and often import same named but different src directories, making it hard to reproduce without fiddling with many paths. In my case, python clearly got confused which src directory should be imported.
For example:

  • sys.path.append("ext/mtcnn-pytorch/") has a src dir that is imported in face.py
  • The top dir of the repo has a src dir which will be in the path if you run code from there
  • sys.path.append("../src") occurs in some other files
    It would be better if all code is executed from the top directory with no sys.path amendments.

Also, the pretrained model files have module.<some weights>, but the scripts require the module. to be removed. Quick fix: ted_weights = {k[7:]: v for k,v in ted_weights.items() if k.startswith('module.')}

Furthermore, I'm a bit surprised there is not a cpu-only version that simply does inference (without adapting to a particular person), as that would be a quick test for reproducibility. Unlike Nvidia, I don't have that many GPUs lying around ;)

Finally, I got so far that I could calibrate the camera, and fixed some of the landmarks.py code so it ran, but that left me with a blank white screen.

At this point I unfortunately gave up. I'll probably try to implement it or a modified version from scratch.

index out of bound

I sometimes faced this problem while running the demo.

person_calibration.py", line 208, in load_fine_tune 'image_a': img[valid_indices, :, :, :], IndexError: index 437 is out of bounds for axis 0 with size 435

demo

How to demo on my video that is not webcam ?
Thanks you

Cannot view gaze point while running demo

After running the camera and person calibration, I can see a cv2 window with my cropped eye patch. However, I do not see anything else on the screen. I beleive the demo is supposed the display the point of regard on the screen.

Could you kindly guide me on how to get this running?

Groud Truth of GazeCapture

As far as I know, the ground truth of GazeCapture dataset is 2D gaze points. However, your project is trained using 3D vectors as labels. Is there any conversion between them?

Predicted point of regard ~10x bigger on demo

Hello there!

For some reason the predicted PoR is way off screen, to try to debug it, on an already trained network I ran the person calibration again, then saved the gaze_n_vector variable used during training and g_cnn variable used during prediction on frame_processor.py, and if I plot them separate I get this:

image

Leaving the clear error aside, if I plot them together I get this:

image

now if I fit a linear regression I get a coef of almost exactly 0.1 for both

image

now by applying those I get a prediction that makes more sense

image

why is that? Is some part of the calculation missing during prediction on frame_processor.py? Why is PoR always 10x bigger?

I cannot find detect_faces and show_bboxes code in the src folder.Is the src folder complete?

Dude, thanks for coming up with the Faze model, which is now the most advanced gaze estimation model on the MPIIGaze dataset. I try to run the run_demo.py program in the demo folder, but it calls frame_processor.py, and the frame_processor.py calls detect_faces and show_bboxes in the src folder. However, I can't find the detect_faces and show_bboxes code in your src folder, is the src folder you uploaded complete?
Here are snippets of detect_faces and show_bboxes called by theface.pycode in your demo folder.

from src import detect_faces, show_bboxes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.