peteryux / arcface-tf2 Goto Github PK

ArcFace unofficial Implemented in Tensorflow 2.0+ (ResNet50, MobileNetV2). "ArcFace: Additive Angular Margin Loss for Deep Face Recognition" Published in CVPR 2019. With Colab.

License: MIT License

Python 100.00%

tensorflow face-recognition arcface arcface-tf2 deep-face-recognition tf2 colab-notebook colab

arcface-tf2's People

Contributors

Stargazers

Watchers

arcface-tf2's Issues

trouble in running dataset_cheaker.py

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __inference_Dataset_map_parse_tfrecord_98}} Feature: image/encoded (data type: string) is required but could not be found.
[[{{node ParseSingleExample/ParseSingleExample}}]] [Op:IteratorGetNextSync]

hello，i have a question！

I used your training parameters, loss=nan?I checked and said to use dynamic learning rate. Have you used it?

bad result from lfw dataset

hello @peteryuX i try to test accuracy on 2 images of 1 person in lfw dataset.
The distance is big.

my test code:

from absl import app, flags, logging
from absl.flags import FLAGS
import cv2
import os
import numpy as np
import tensorflow as tf

from modules.evaluations import get_val_data, perform_val
from modules.models import ArcFaceModel
from modules.utils import set_memory_growth, load_yaml, l2_norm
from scipy.spatial.distance import cosine


flags.DEFINE_string('cfg_path', './configs/arc_res50.yaml', 'config file path')
flags.DEFINE_string('gpu', '0', 'which gpu to use')
flags.DEFINE_string('img_path', '', 'path to input image')


def main(_argv):
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
    os.environ['CUDA_VISIBLE_DEVICES'] = FLAGS.gpu

    logger = tf.get_logger()
    logger.disabled = True
    logger.setLevel(logging.FATAL)
    set_memory_growth()

    cfg = load_yaml(FLAGS.cfg_path)

    model = ArcFaceModel(size=cfg['input_size'],
                         backbone_type=cfg['backbone_type'],
                         training=False)

    ckpt_path = tf.train.latest_checkpoint('./checkpoints/')
    if ckpt_path is not None:
        print("[*] load ckpt from {}".format(ckpt_path))
        model.load_weights(ckpt_path)
    else:
        print("[*] Cannot find ckpt from {}.".format(ckpt_path))
        exit()
    image_fol = "./tmp"
    paths = os.listdir(image_fol)
    embeds = []
    images = []
    flip_images = []
    for path in paths:
        print(path)
        img = cv2.imread(os.path.join(image_fol, path))
        img = cv2.resize(img, (cfg['input_size'], cfg['input_size']))
        img = img.astype(np.float32) / 255.
        # if len(img.shape) == 3:
        #     img = np.expand_dims(img, 0)
        # embeds.append(l2_norm(model(img)))
        images.append(img)
    images = np.array(images)
    def hflip_batch(imgs):
        assert len(imgs.shape) == 4
        return imgs[:, :, ::-1, :]

    flip_images = hflip_batch(images)
    embeds = model(images) + model(flip_images)
    embeds = l2_norm(embeds)

    dist = np.sum(np.square(embeds[0]-embeds[1]))

    print("dist: ", dist)
    # diff = np.subtract([embeds[0]], [embeds[1]])
    # dist = np.sum(np.square(diff), 1)
    # print("diff: ", diff)
    acc = 1 - cosine(embeds[0], embeds[1])
    print("acc: ", acc)

the result is:

dist:  1.1708782
acc:  0.4145609736442566

How can i achive 99.35% accuracy on lfw?

get nan result for a whole batch

Hi,
Thanks for sharing this amazing work! I downloaded your model and loaded the weights, resnet-50 ccrop=true, and I test it with a bunch of images. Some batches work fine, no nan, but some batches are all nan results. What might cause this?

Use pretrained ResNet50 model for Face Recognition on my own dataset

I want to build a face recognizer using the pretrained models given in the repository. Currently facing the issue with the distance threshold as with different faces the distances are coming very small whereas for same faces the distance is coming large, in most cases. My questions are:

Am I calculating the embedding correctly?
How face comparison should be done in order to reach LFW level accuracy?

I have already referenced this #6 and #8 but couldn't come up with a concrete solution.

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import cv2
import numpy as np
import tensorflow as tf

from .modules.models import ArcFaceModel
from .modules.utils import set_memory_growth, load_yaml, l2_norm


class ArcFaceResNet50:
    def __init__(self):
        set_memory_growth()
        self.cfg = load_yaml(os.path.join(os.path.dirname(os.path.abspath(__file__)), \
                                     './configs/arc_res50.yaml'))

        self.model = ArcFaceModel(size=self.cfg['input_size'],
                             backbone_type=self.cfg['backbone_type'],
                             training=False)

        ckpt_path = tf.train.latest_checkpoint(os.path.join(os.path.dirname(os.path.abspath(__file__)), \
                                                            './checkpoints/' + self.cfg['sub_name']))
        if ckpt_path is not None:
            print("[*] load ckpt from {}".format(ckpt_path))
            self.model.load_weights(ckpt_path)
        else:
            print("[*] Cannot find ckpt from {}.".format(ckpt_path))
            exit()
        
    def get_embeddings(self, frame_rgb, bounding_boxes):
            faces = []
            for x1, y1, x2, y2 in bounding_boxes:
                face_patch = frame_rgb[y1:y2, x1:x2, :]
                resized = cv2.resize(face_patch, (self.cfg['input_size'], self.cfg['input_size']), interpolation=cv2.INTER_AREA)
                normalize = resized.astype(np.float32) / 255.
                faces.append(normalize)
            faces = np.stack(faces)
            if len(faces.shape) == 3:
                faces = np.expand_dims(faces, 0)
            # Run prediction
            embeddings = l2_norm(self.model(faces))
            return embeddings

Thanks in advance!

tf 2.1 无法调用不到GPU 1.14可以调用

装的是tensorflow-gpu-2.2.0和tensorflow-gpu-2.1.0都试过了，调用不到GPU

my desktop use two gpu. occurred error that 'Memory growth cannot differ between GPU devices'

When i ran python train.py --mode="fit" --cfg_path="./configs/arc-res50.yaml", this error is occurred.

i found this code in train.py

so i thnk that two gpu have to set_memoy_growth() but 'flags.DEFINE_string('gpu', '0', 'which gpu to use')' is only one gpu. how can i do?

训练epoch问题

请问下大神，你在MS-Celeb-1M数据集上训练5个epoch就能达到Verification results中的效果吗？

Question regarding evaluation metric

Hi, thanks for sharing the project !

I am wondering about the metric used for distnace of embeddings.
From what I understand, you use L2 in the following code:

arcface-tf2/modules/evaluations.py

Lines 72 to 73 in 66d0afb

 diff = np.subtract(embeddings1, embeddings2) 

 dist = np.sum(np.square(diff), 1)

While, the offical code seems to use cosine similarity in all evaluations.
For example:

https://github.com/deepinsight/insightface/blob/1af6eeffdc1fe1d81c308fafe37f28883c5cf27f/Evaluation/IJB/IJB_1N.py#L117

Am I missing something ? Is this intentional ?
I'd appreciate your clarification.

Cannot find ckpt from None

im have this error when running testing python test.py --cfg_path="./configs/arc_res50.yaml" , im already put the pretrained model checkpoint on checkpoints folder and i just read closed issue but i still cant understand it , can u guys explain to me how to solve this problem?

good performance is not obtained

Ihi. I am very grateful for the help here.

I am currently training a model using your source code. However, good performance is not obtained. This is the result after learning 2 epochs, but I don't know if I'm doing well. I look forward to your comments.

SIze of model

Thank you for this project.
I would like to know if there is a way to reduce the size of the model using MobileNetV2 as backbone architecture.

Cosine Similarity, Best Threshold

I want to use this pre-trained model to compare images, What is the best threshold for cosine similarity corresponding to this model.

accuracy related query

Hi friend,

I used below config

# general
batch_size: 128
input_size: 112
embd_shape: 512
sub_name: 'my_arc_res50_no_central_crop'
backbone_type: 'ResNet50' # 'ResNet50', 'MobileNetV2'
head_type: ArcHead # 'ArcHead', 'NormHead'
is_ccrop: False # central-cropping or not

# train
train_dataset: './data/ms1m_bin.tfrecord'
binary_img: True
num_classes: 85742
num_samples: 5822653
epochs: 10
base_lr: 0.01
w_decay: !!float 5e-4
save_steps: 1000

# test
test_dataset: 'test_dataset'

At the end of training loss is 8.0511

when I tried
is_ccrop:True

At that time the epoch were set to 5.

The loss was near 9.x

Please suggest what can be done to improve this.

Thanks,
Vatsal

Asking test.py

Hi, thank you for your implementation.

I have one question about test.py. Can you explain what's the purpose of output embedding vector for Brucelee.jpg ?

Can anyone explain why model loading fails when I try 'archead' while it works with 'normhead' during custom training with16 classes

ValueError: Shapes (512, 16) and (512, 85742) are incompatible

Pre-model download

Hi, your pre-model is on Google Drive, I cannot download it, can you upload it to Baidu Drive or send it to my E-mail?
Thank you very much and I wish you good health.
E-mail [email protected]

How can I still get loss=nan

I'm using the MS-Celeb-1M dataset, downloaded from the link posted in README.md

1 . I converted the data to tfrecords following the steps provided in the documetation for binary images.

My training cfg are like these:
`

general

batch_size: 8
input_size: 112
embd_shape: 128
sub_name: 'arc_mbv2'
backbone_type: 'MobileNetV2' # 'ResNet50', 'MobileNetV2'
head_type: ArcHead # 'ArcHead', 'NormHead'
is_ccrop: False # central-cropping or not

train

train_dataset: './data/imgs_full.tfrecord'
binary_img: True
num_classes: 85742
num_samples: 5822653
epochs: 100
base_lr: 0.01
w_decay: !!float 5e-4
save_steps: 100

test

test_dataset: '.test/'
`

But I'm still getting loss=nan. Is is normal for the initial epochs? Is it a tfrecords error?

Asian-celeb dataset download link

[Asian-celeb dataset]

Training data(Asian-celeb)

The dataset consists of the crawled images of celebrities on the he web.The ima images are covered under a Creative Commons Attribution-NonCommercial 4.0 International license (Please read the license terms here. e. http://creativecommons.org/licenses/by-nc/4.0/).

[train_msra.tar.gz]

MD5:c5b668f2204c400099b14f367069aef5

Content: Train dataset called MS-Celeb-1M-v1c with 86,876 ids/3,923,399 aligned images cleaned from MS-Celeb-1M dataset.

This dataset has been excluded from both LFW and Asian-Celeb.

Format: *.jpg

Google: https://drive.google.com/file/d/1aaPdI0PkmQzRbWErazOgYtbLA1mwJIfK/view?usp=sharing

[msra_lmk.tar.gz]

MD5:7c053dd0462b4af243bb95b7b31da6e6

Content: A list of five-point landmarks for the 3,923,399 images in MS-Celeb-1M-v1c.

Format: .....

while is the path of images in tar file train_msceleb.tar.gz.

Label is an integer ranging from 0 to 86,875.

(x,y) is the coordinate of a key point on the aligned images.

left eye
right eye
nose tip
mouth left
mouth right

Google: https://drive.google.com/file/d/1FQ7P4ItyKCneNEvYfJhW2Kff7cOAFpgk/view?usp=sharing

[train_celebrity.tar.gz]

MD5:9f2e9858afb6c1032c4f9d7332a92064

Content: Train dataset called Asian-Celeb with 93,979 ids/2,830,146 aligned images.

This dataset has been excluded from both LFW and MS-Celeb-1M-v1c.

Format: *.jpg

Google: https://drive.google.com/file/d/1-p2UKlcX06MhRDJxJukSZKTz986Brk8N/view?usp=sharing

[celebrity_lmk.tar.gz]

MD5:9c0260c77c13fbb32692fc06a5dbfaf0

Content: A list of five-point landmarks for the 2,830,146 images in Asian-Celeb.

Format: .....

while is the path of images in tar file train_celebrity.tar.gz.

Label is an integer ranging from 86,876 to 196,319.

(x,y) is the coordinate of a key point on the aligned images.

left eye
right eye
nose tip
mouth left
mouth right

Google: https://drive.google.com/file/d/1sQVV9epoF_8jS3ge6DqbilpWk3UNE8U7/view?usp=sharing

[testdata.tar.gz]

MD5:f17c4712f7562ea6d45f0a158e59b792

Content: Test dataset with 1,862,120 aligned images.

Format: *.jpg

Google: https://drive.google.com/file/d/1ghzuEQqmUFN3nVujfrZfBx_CeGUpWzuw/view?usp=sharing

[testdata_lmk.tar]

MD5:7e4995eb9976a2cfd2b23db05d76572c

Content: A list of five-point landmarks for the 1,862,120 images in testdata.tar.gz.

Features should be extracted in the same sequence and with the same amount with this list.

Format: .....

while is the path of images in tar file testdata.tar.gz.

(x,y) is the coordinate of a key point on the aligned images.

left eye
right eye
nose tip
mouth left
mouth right

Google: https://drive.google.com/file/d/1lYzqnPyHXRVgXJYbEVh6zTXn3Wq4JO-I/view?usp=sharing

[feature_tools.tar.gz]

MD5:227b069d7a83aa43b0cb738c2252dbc4

Content: Feature format transform tool and a sample feature file.

Format: We use the same format as Megaface(http://megaface.cs.washington.edu/) except that we merge all files into a single binary file.

Google: https://drive.google.com/file/d/1bjZwOonyZ9KnxecuuTPVdY95mTIXMeuP/view?usp=sharing

你好，我用的ms1m_112x112数据集，arc_mbv2训练，loss在21.9就不下降了，能请教下是什么问题吗

loss = nan..what's the problem?

I am training model with ms1m_dataset and asian seleb dataset
but loss = Non...
Model is not tranied at all.
mode = 'fit' -> loss = non
mode = 'eager_ft' -> loss = non
mode = 'eager_fit' -> Out Of memory Error
what's the problem?
please help me and thank you...have a nice day

hello！我想请教一下这个项目中使用的resnet50是tf官方的网络吧，原版insightface论文中使用的是修改后的resnet50，有对比过哪个效果更好吗？

Test script

Hi @peteryuX, thank you for your amazing work. i wont to know the recommended threshold value for validating if 2 images are similar.

Learning rate and loss value for small number of epoch

Thanks for the project. During your training, did you change the base learning rate to smaller number over training? How many total epoch did you train for your pre trained model? What is the loss value at the end of training? I found my training loss at around 2.1 is reducing very slowly, do you have any suggestions?

Fine-tunning ArcFace

Hi!
Is there a way to perform a fine-tunning to your pretrained model? Or your training code doesn't support a quick way to do it?

For example, I've noticed in other repositories that it can be achieved with this type of commands:

python -u train.py --network m1 --loss triplet --lr 0.005 --pretrained ./models/m1-softmax-emore,1

Thanks!

您好，我想請教一下tfrecord中輸入是image的資訊，那label是原始的image的id嗎？

您好，我想了解一下進入網路訓練時，輸入圖片是image，那對應的label是image的ID嗎？也就是圖片資料夾的名字嗎？

Issues with perform_val

The function perform_val (from modules.evaluations) seems to have two issues:

it evaluates the test data without converting from BGR to RGB: in the test data archives (lfw_align_112.zip, ...) the images are provided in BGR Format, and that is used in evaluation. But the training procedure uses the RGB format (as obtained from tf.image.decode_jpeg). Evaluating on RGB images instead of BGR can slightly (but consistently) improve the results, e.g. with the pretrained ResNet50 from 99.35% to 99,42% for LFW and from 90.36% to 92.56% for CFP-FP.
when providing the parameters is_ccrop=True, is_flip=False, center cropping is performed twice, drastically reducing the performance in that case.

TensorBoard

Hi, great job in here! Could you please share TensorBoard logs if you still have?

Load arc_res50 pretained model with tensorflow keras load_model

Hello! I have downloaded the arc_res50 pretrained model from Google Drive. My question is how can I load the model with tf.keras.models.load_model or tf.saved_model.load? This is because the .pb file is missing.

Otherwise, how can I load the model?

[*] Cannot find ckpt from None

I am running

python test.py --cfg_path="./configs/arc_res50.yaml"

What can be the issue?

visualization

Hi. Can you help me how can i visualize my outputs. lets say i want the output label on the given image while evaluating.

Accuracy difference from insightface mxnet implementation

Hi, I was curious why there is a big difference in performance on LFW between the original insightface implementation in mxnet and your implementation. Do you have any insights you can share?

你好，当运行test.py时出现问题，不知道如何解决。

这是checkpoint文件

这是test的配置

运行$ python test.py --cfg_path="./configs/arc_mbv2.yaml" -img_path="./data/BruceLee.jpg"，然后就出现了问题：

找不到解决的办法。

how to make testset like lfw.bin, lfw_list.npy, data dir and so on

Hello,
I am studing arcface and I want to test my own dataset.
please how to make test dataset.
thank you for your help.

How can I train and test my image dataset that is not a face?

Thank you for the wonderful post!
I have a question.
I want to detect pen anomalies using arcface.
Image data already exists.

I can't achieve the accuracy in bench mark, could somebody help?

I use the same train dataset and test dataset as you proposed, but the best result I've got so far is as the picture shows.
I used SGD optimizer and lr=0.1,0.05,0.01,0.0001,0.00001, each lr an epoch. And when I found the loss increasing rather than decreasing, I stoped training. And I got the test result for loss 19.42 as up picture.
More, this is test result when the train loss is 21.15, shown as down picture.

train.py AssertionError

Hello, It's so useful code.
but when I use ArcHead on training, It occur error like this.

Traceback (most recent call last):
File "train.py", line 83, in
train()
File "train.py", line 55, in train
logist = model(inputs, training=True)
File "/home/dukim/env/tf2.1/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 891, in call
outputs = self.call(cast_inputs, *args, **kwargs)
File "/home/dukim/env/tf2.1/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/network.py", line 708, in call
convert_kwargs_to_constants=base_layer_utils.call_context().saving)
File "/home/dukim/env/tf2.1/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/network.py", line 870, in _run_internal_graph
assert str(id(x)) in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("ArcHead/Identity:0", shape=(None, 1200), dtype=float32)

[BUG] lost GlobalAveragePooling

In modules/models.py backbones are loaded without pretrained classification head (include_top=False) and then custom OutputLayer is added on the top. Ignoring pretrained classifier means cutting off GlobalAveragePooling layer, but OutputLayer doesn't contain it.

I propose something like this:

   def OutputLayer(embd_shape, w_decay=5e-4, name='OutputLayer'):
    
     def output_layer(x_in):
        x = inputs = Input(x_in.shape[1:])
        x = BatchNormalization()(x) # maybe this layer is redundunt
        x = GlobalAveragePooling2D()(x)
        x = Dropout(rate=0.5)(x)
        x = Flatten()(x)
        x = Dense(embd_shape, kernel_regularizer=_regularizer(w_decay))(x)
        x = BatchNormalization()(x)
        model = Model(inputs, x, name=name)
        return model(x_in)

    return output_layer

An effect of loosing GlobalAveragePooling is increasing backbone MobileNetV2 in size from 12 MB to 50 MB, but increasing in accuracy too, although for training MobileNetV2 must be used other hyperparameters which will increase val accuracy.

How to get the classification result?

I trained a model with my own dataset and save it as checkpoint with the help of train.py.
Now, the question is how can I test the model with my own dataset? I tried to just use predict but it only gives me a list of numbers. I guess they are embedding? But what I want is the classification result.

Really appreciate any help.

Colab notebook does not work(for downloading arc_res50.zip file)

Could you let me know how to get the zip file arc_res50.zip?
I attached the warning messege.
From this error, I cannot make checkpoints folder.

Downloading 1HasWQb86s4xSYy36YbmhRELg9LBmvhvt into ./arc_res50.zip... Done.
Unzipping.../usr/local/lib/python3.7/dist-packages/google_drive_downloader/google_drive_downloader.py:78: UserWarning: Ignoring unzip since "1HasWQb86s4xSYy36YbmhRELg9LBmvhvt" does not look like a valid zip file
warnings.warn('Ignoring unzip since "{}" does not look like a valid zip file'.format(file_id))
mv: cannot stat 'arc_res50': No such file or directory

meta file missing

How can I get the meta file from this model?

How can i apply augmentation

I've never user tf.records for training.

My question is, Is there any way to set augmentation such as albumentation or imgaug on the training pipeline?

And if so, where do I suppose to set this? during the convertion to tf.records or while loading?

Colab notebook does not work

Traceback (most recent call last):
  File "test.py", line 77, in <module>
    app.run(main)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "test.py", line 31, in main
    training=False)
  File "/content/arcface-tf2/modules/models.py", line 82, in ArcFaceModel
    x = Backbone(backbone_type=backbone_type, use_pretrain=use_pretrain)(x)
  File "/content/arcface-tf2/modules/models.py", line 32, in backbone
    weights=weights)(x_in)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/applications/__init__.py", line 46, in wrapper
    return base_fun(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/applications/resnet.py", line 33, in ResNet50
    return resnet.ResNet50(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/keras_applications/resnet_common.py", line 435, in ResNet50
    **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/keras_applications/resnet_common.py", line 411, in ResNet
    model.load_weights(weights_path)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/engine/training.py", line 234, in load_weights
    return super(Model, self).load_weights(filepath, by_name, skip_mismatch)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/engine/network.py", line 1222, in load_weights
    hdf5_format.load_weights_from_hdf5_group(f, self.layers)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 651, in load_weights_from_hdf5_group
    original_keras_version = f.attrs['keras_version'].decode('utf8')
AttributeError: 'str' object has no attribute 'decode'

	diff = np.subtract(embeddings1, embeddings2)
	dist = np.sum(np.square(diff), 1)

peteryux / arcface-tf2 Goto Github PK

arcface-tf2's People

Contributors

Stargazers

Watchers

Forkers

arcface-tf2's Issues

general

train

test

Google: https://drive.google.com/file/d/1aaPdI0PkmQzRbWErazOgYtbLA1mwJIfK/view?usp=sharing

Google: https://drive.google.com/file/d/1FQ7P4ItyKCneNEvYfJhW2Kff7cOAFpgk/view?usp=sharing

Google: https://drive.google.com/file/d/1-p2UKlcX06MhRDJxJukSZKTz986Brk8N/view?usp=sharing

Google: https://drive.google.com/file/d/1sQVV9epoF_8jS3ge6DqbilpWk3UNE8U7/view?usp=sharing

Google: https://drive.google.com/file/d/1ghzuEQqmUFN3nVujfrZfBx_CeGUpWzuw/view?usp=sharing

Google: https://drive.google.com/file/d/1lYzqnPyHXRVgXJYbEVh6zTXn3Wq4JO-I/view?usp=sharing

Google: https://drive.google.com/file/d/1bjZwOonyZ9KnxecuuTPVdY95mTIXMeuP/view?usp=sharing

Recommend Projects

Recommend Topics

Recommend Org