Giter VIP home page Giter VIP logo

retinaface's Introduction

Retinaface

DOI

https://habrastorage.org/webt/uj/ff/vx/ujffvxxpzixwlmae8gyh7jylftq.jpeg

This repo is build on top of https://github.com/biubug6/Pytorch_Retinaface

Differences

Train loop moved to Pytorch Lightning

IT added a set of functionality:

  • Distributed training
  • fp16
  • Syncronized BatchNorm
  • Support for various loggers like W&B or Neptune.ml

Hyperparameters are defined in the config file

Hyperparameters that were scattered across the code moved to the config at retinadace/config

Augmentations => Albumentations

Color that were manually implemented replaced by the Albumentations library.

Todo:

  • Horizontal Flip is not implemented in Albumentations
  • Spatial transforms like rotations or transpose are not implemented yet.

Color transforms defined in the config.

Added mAP calculation for validation

In order to track the progress, mAP metric is calculated on validation.

Installation

pip install -U retinaface_pytorch

Example inference

import cv2
from retinaface.pre_trained_models import get_model

image = <numpy array with shape (height, width, 3)>

model = get_model("resnet50_2020-07-20", max_size=2048)
model.eval()
annotation = model.predict_jsons(image)
  • Jupyter notebook with the example: Open In Colab
  • Jupyter notebook with the example on how to combine face detector with mask detector: Open In Colab

Data Preparation

The pipeline expects labels in the format:

[
  {
    "file_name": "0--Parade/0_Parade_marchingband_1_849.jpg",
    "annotations": [
      {
        "bbox": [
          449,
          330,
          571,
          720
        ],
        "landmarks": [
          [
            488.906,
            373.643
          ],
          [
            542.089,
            376.442
          ],
          [
            515.031,
            412.83
          ],
          [
            485.174,
            425.893
          ],
          [
            538.357,
            431.491
          ]
        ]
      }
    ]
  },

You can convert the default labels of the WiderFaces to the json of the propper format with this script.

Training

Install dependencies

pip install -r requirements.txt
pip install -r requirements_dev.txt

Define config

Example configs could be found at retinaface/configs

Define environmental variables

export TRAIN_IMAGE_PATH=<path to train images>
export VAL_IMAGE_PATH=<path to validation images>
export TRAIN_LABEL_PATH=<path to train annotations>
export VAL_LABEL_PATH=<path to validation annotations>

Run training script

python retinaface/train.py -h
usage: train.py [-h] -c CONFIG_PATH

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG_PATH, --config_path CONFIG_PATH
                        Path to the config.

Inference

python retinaface/inference.py -h
usage: inference.py [-h] -i INPUT_PATH -c CONFIG_PATH -o OUTPUT_PATH [-v]
                    [-g NUM_GPUS] [-m MAX_SIZE] [-b BATCH_SIZE]
                    [-j NUM_WORKERS]
                    [--confidence_threshold CONFIDENCE_THRESHOLD]
                    [--nms_threshold NMS_THRESHOLD] -w WEIGHT_PATH
                    [--keep_top_k KEEP_TOP_K] [--world_size WORLD_SIZE]
                    [--local_rank LOCAL_RANK] [--fp16]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_PATH, --input_path INPUT_PATH
                        Path with images.
  -c CONFIG_PATH, --config_path CONFIG_PATH
                        Path to config.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path to save jsons.
  -v, --visualize       Visualize predictions
  -g NUM_GPUS, --num_gpus NUM_GPUS
                        The number of GPUs to use.
  -m MAX_SIZE, --max_size MAX_SIZE
                        Resize the largest side to this number
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        batch_size
  -j NUM_WORKERS, --num_workers NUM_WORKERS
                        num_workers
  --confidence_threshold CONFIDENCE_THRESHOLD
                        confidence_threshold
  --nms_threshold NMS_THRESHOLD
                        nms_threshold
  -w WEIGHT_PATH, --weight_path WEIGHT_PATH
                        Path to weights.
  --keep_top_k KEEP_TOP_K
                        keep_top_k
  --world_size WORLD_SIZE
                        number of nodes for distributed training
  --local_rank LOCAL_RANK
                        node rank for distributed training
  --fp16                Use fp6
python -m torch.distributed.launch --nproc_per_node=<num_gpus> retinaface/inference.py <parameters>

Web app

https://retinaface.herokuapp.com/

Code for the web app: https://github.com/ternaus/retinaface_demo

Converting to ONNX

The inference could be sped up on CPU by converting the model to ONNX.

Ex: python -m converters.to_onnx -m 1280 -o retinaface1280.onnx

retinaface's People

Contributors

ternaus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

retinaface's Issues

Some questions about training on my custom datasets?

@ternaus Hi,I'm a undergraduate from North University of China and I'm currently working on using neural network to detect the armor in the Robomaster competitions![3_demo.jpg](https://user-images.githubusercontent.com/53631206/141986706-3446a23c-16d6-437e-8f48-3d54e4563497.jpg
just like the above image,my job is to do so too.And I have tried yolov5 to achieve the goal but I found that the bbox can't fit the armor perfectly which will influence the attitude solution using pnp in the next step,so I wonder if I can use the keypoint to replace the bbox which drives me to find your project.However the main problem I meet now is that I don't know how to convert my label format into which can be trained with your model,my labels seems like this:
id+x1,y1+x2,y2+...+x4,y4,so what should I change in your code to adapt to my datasets?Hope you can give me some suggestions.

Bug: Inference on cuda results in fixed bounding box offset!

I have noticed that if you infer using cuda rather than the cpu, the landmarks are still spot on but the bounding boxes will be shifted to the right significantly. Anyone have a clue what could cause this?

Output with "cpu":
image

Output with "cuda":
image

Default parameter values

Hello,

Thank you for sharing this work as a PyPI package!

I have questions about default parameter values.

arg("-m", "--max_size", type=int, help="Resize the largest side to this number", default=960)
arg("-b", "--batch_size", type=int, help="batch_size", default=1)
arg("-j", "--num_workers", type=int, help="num_workers", default=12)
arg("--confidence_threshold", default=0.7, type=float, help="confidence_threshold")
arg("--nms_threshold", default=0.4, type=float, help="nms_threshold")

Are these values recommended based on the training dataset and procedure of RetinaFace?
Or are these values which worked best for you?

More specifically, I don't know which value for max_size to choose for my personal dataset of mixed photos and hand-drawn/computer-assisted artworks. I am analysing 300x450 images. So, maybe I should upsample them 2x, and go for 900 max size.

As for the confidence threshold, I would settle for something closer to 0.5 (or even less, to detect hand-drawn faces), and lower the NMS threshold as well (to avoid detecting too many boxes). I understand that it might not be the case for people using the detector on natural photography. Do you think it is a bad idea to lower the threshold?

How Syncronized BatchNorm implemented

Hi! In README you mentioned, that you've implemented Syncronized BatchNorm with pytorch lightning, but I don't see it in code. Could you please point me, where this feature is implemented? Thanks in advance)

How to predict with batch size ?

Thannks Vladimir Iglovikov! Great works, but I only predict with single image by using model.predict_jsons(image), how to use batch size in there?

Inference using GPU

Hey,

Can anyone explain to me that how should I run the inference of retinaface on a large number of images on a GPU.

Fixed bug in the training code!

Possible bug in the provided training code?

I have heavily modified the training "loop" and moved the project to a more recent pytorch lightning version. But when training I noticed that it didn't learn the landmarks correctly (they would converge to the center of the bbox). After many hours of debugging I believe I found the issue in the encoding function for the landmarks in box_utils.py:

There seem to be missing parentheses in the encode_landm function, where the code is supposed to divide by the variance times the prior boxes, it divides by only the variance and then multiplies by the prior boxes.
This fix now puts everything in line with the respective landmark decoding function that multiplies by both the priors and the variance. I am just curious how this made it into the repo, because I'm assuming that the pertained weights were achieved with this code?

Original:

g_cxcy = matched[:, :, :2] - priors[:, :, :2]
# encode variance
g_cxcy = g_cxcy // variances[0] * priors[:, :, 2:]
# return target for smooth_l1_loss
return g_cxcy.reshape(g_cxcy.size(0), -1)

Modified:

g_cxcy = matched[:, :, :2] - priors[:, :, :2]
# encode variance
g_cxcy = g_cxcy / (variances[0] * priors[:, :, 2:])
# return target for smooth_l1_loss
return g_cxcy.reshape(g_cxcy.size(0), -1)

relevant decode_landm part:

return torch.cat(
        (
            priors[:, :2] + pre[:, :2] * variances[0] * priors[:, 2:],
            priors[:, :2] + pre[:, 2:4] * variances[0] * priors[:, 2:],
            priors[:, :2] + pre[:, 4:6] * variances[0] * priors[:, 2:],
            priors[:, :2] + pre[:, 6:8] * variances[0] * priors[:, 2:],
            priors[:, :2] + pre[:, 8:10] * variances[0] * priors[:, 2:],
        ),
        dim=1,

RuntimeError during get_model

I'm facing a runtime error when loading the model, with the following parameters:

model = get_model("resnet50_2020-07-20", max_size=2048)
model.eval()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-42-993dc68920c7> in <module>
----> 1 model = get_model("resnet50_2020-07-20", max_size=2048)
      2 model.eval()

/workspace/.local/lib/python3.6/site-packages/retinaface/pre_trained_models.py in get_model(model_name, max_size, device)
     18 def get_model(model_name: str, max_size: int, device: str = "cpu") -> nn.Module:
     19     model = models[model_name].model(max_size=max_size, device=device)
---> 20     state_dict = model_zoo.load_url(models[model_name].url, progress=True, map_location="cpu")
     21 
     22     model.load_state_dict(state_dict)

/opt/conda/lib/python3.6/site-packages/torch/hub.py in load_state_dict_from_url(url, model_dir, map_location, progress, check_hash)
    507             cached_file = os.path.join(model_dir, extraced_name)
    508 
--> 509     return torch.load(cached_file, map_location=map_location)

/opt/conda/lib/python3.6/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    583     with _open_file_like(f, 'rb') as opened_file:
    584         if _is_zipfile(opened_file):
--> 585             with _open_zipfile_reader(f) as opened_zipfile:
    586                 if _is_torchscript_zip(opened_zipfile):
    587                     warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"

/opt/conda/lib/python3.6/site-packages/torch/serialization.py in __init__(self, name_or_buffer)
    243 class _open_zipfile_reader(_opener):
    244     def __init__(self, name_or_buffer):
--> 245         super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
    246 
    247 

RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at ../caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old.

I've tried upgrading torch and torchvision with !pip install --upgrade torch torchvision , as recommended by pytorch/vision#1943 but it wasn't resolved.

Current installed versions:

torch - 1.5.0a0+8f84ded
torchvision - 0.6.0a0 

Any recommendations?

map

https://github.com/biubug6/Pytorch_Retinaface

When I compute the AP, the image error result will not be compute. Is it correct ?

For example, the Image grouth have one face. I use the network to detect the image and the result have two face be detect.
When I compute a image AP, and the AP is 1. Is it correct ?

How about your AP?

Unintuitive output of predict_jsons()

I was surprised by the output of predict_jsons() when the input image did not contain any face.

The expected behaviour would have been an empty list, so that I could find the number of detected faces as len(annotations).

Instead, there is a list with 1 element, with empty bounding-boxes and a score of -1, due to the following line:

if boxes.shape[0] == 0:
return [{"bbox": [], "score": -1, "landmarks": []}]

I think it is unintuitive and can lead to some confusion for the user.
A more consistent and intuitive behaviour would return annotations as initialized here:

annotations: List[Dict[str, Union[List, float]]] = []

Indeed:

  • if new faces are detected, they are appended to annotations,
  • otherwise, annotations would be returned exactly as it was initialized.

annotations += [
{
"bbox": bbox.tolist(),
"score": scores[box_id],
"landmarks": landmarks[box_id].reshape(-1, 2).tolist(),
}
]

Validation accuracy & Mobilenet

Hi,

Thank you for the rewrite and very intuitive repo.

Have you reported validation accuracy for the Resnet50 and have you also implemented mobilenet?

Thanks,
Johannes

Does this project use focal loss?

@ternaus Hi, thank you for this great project.
May I understand why you use cross entropy loss for classification head, other than focal loss? As Focal loss is the key feature of retinaNet.

Negative values in the predicted annotations

Screenshot 2021-03-05 at 2 11 20 AM
This is a screenshot of the image that I tried to make predictions in using model.predict_jsons functions, but the resultant annotations of this function have negative bbox values.
There are two faces detected in the image, in the sequence I have added the annotations below, second prediction bbox has negative values. I checked the image, the image is good. Only the predictions of the get_model with max_size=2048, is resulting in such predictions
Screenshot 2021-03-05 at 2 12 53 AM
Screenshot 2021-03-05 at 2 12 39 AM

How to train with custom dataset by using the pretrained model?

Hello,
I would like to ask a few questions about this github repo.

  1. What do I need to do to train using the pretrained model?
  2. How can I create my own custom dataset other than Wider-Face? Is there an annotation tool you recommend that does annotation in the same format?
  3. What should I do to train with the dataset I created?
  4. How can I convert the model to onnx after training?

google-colab 1.0.0 has requirement ipykernel~=4.10

Thank you for the PYPI module.

For information, people using Google Colab have to restart their runtime after they pip install the module.

ERROR: google-colab 1.0.0 has requirement ipykernel~=4.10, but you'll have ipykernel 5.3.4 which is incompatible.
WARNING: Upgrading ipython, ipykernel, tornado, prompt-toolkit or pyzmq can
cause your runtime to repeatedly crash or behave in unexpected ways and is not
recommended. If your runtime won't connect or execute code, you can reset it
with "Factory reset runtime" from the "Runtime" menu.
WARNING: The following packages were previously imported in this runtime:
  [ipykernel]
You must restart the runtime in order to use newly installed versions.

This happens because of the requirements of streamlit.

pydeck>=0.1.dev5->streamlit->retinaface_pytorch) (4.3.3)
Collecting ipykernel>=5.1.2; python_version >= "3.4"
  Downloading https://files.pythonhosted.org/packages/52/19/[...]/ipykernel-5.3.4-py3-none-any.whl (120kB)
     |████████████████████████████████| 122kB 54.6MB/s 

If I run pip install streamlit, I can see the following, which confirms my suspicion:

Requirement already satisfied: ipykernel>=5.1.2; python_version >= "3.4"
in /usr/local/lib/python3.6/dist-packages (from pydeck>=0.1.dev5->streamlit) (5.3.4)

I wonder if it would be possible to downgrade the version of ipykernel or remove the streamlit requirement altogether.

Suggest to loosen the dependency on albumentations

Hi, your project retinaface(commit id: b72317d) requires "albumentations==1.0.0" in its dependency. After analyzing the source code, we found that the following versions of albumentations can also be suitable, i.e., albumentations 1.0.1, 1.0.2, 1.0.3, since all functions that you directly (9 APIs: albumentations.augmentations.geometric.rotate.RandomRotate90.init, albumentations.core.composition.Compose.init, albumentations.core.composition.Compose.init, albumentations.augmentations.geometric.resize.LongestMaxSize.init, albumentations.core.composition.KeypointParams.init, albumentations.core.composition.Compose.init, albumentations.core.serialization.from_dict, albumentations.core.composition.BboxParams.init, albumentations.augmentations.transforms.Normalize.init) or indirectly (propagate to 10 albumentations's internal APIs and 0 outsider APIs) used from the package have not been changed in these versions, thus not affecting your usage.

Therefore, we believe that it is quite safe to loose your dependency on albumentations from "albumentations==1.0.0" to "albumentations>=1.0.0,<=1.0.3". This will improve the applicability of retinaface and reduce the possibility of any further dependency conflict with other projects.

May I pull a request to further loosen the dependency on albumentations?

By the way, could you please tell us whether such an automatic tool for dependency analysis may be potentially helpful for maintaining dependencies easier during your development?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.