Giter VIP home page Giter VIP logo

zhaoj9014 / face.evolve Goto Github PK

View Code? Open in Web Editor NEW
3.4K 110.0 753.0 8.24 MB

🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

License: MIT License

Python 100.00%
pytorch face-recognition face-detection face-alignment face-landmark-detection model-training feature-extraction fine-tuning data-augmentation deep-learning computer-vision imbalanced-learning transfer-learning hard-negative-mining supervised-learning nus tencent convolutional-neural-network machine-learning artificial-intelligence

face.evolve's Introduction

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch

  • Evolve to be more comprehensive, effective and efficient for face related analytics & applications! (WeChat News)
  • About the name:
    • "face" means this repo is dedicated for face related analytics & applications.
    • "evolve" means unleash your greatness to be better and better. "LV" are capitalized to acknowledge the nurturing of Learning and Vision (LV) group, Nation University of Singapore (NUS).
  • This work was done during Jian Zhao served as a short-term "Texpert" Research Scientist at Tencent FiT DeepSea AI Lab, Shenzhen, China.
Author Jian Zhao
Homepage https://zhaoj9014.github.io

License

The code of face.evoLVe is released under the MIT License.


News

CLOSED 02 September 2021: Baidu PaddlePaddle officially merged face.evoLVe to faciliate researches and applications on face-related analytics (Official Announcement).

CLOSED 03 July 2021: Provides training code for the paddlepaddle framework.

CLOSED 04 July 2019: We will share several publicly available datasets on face anti-spoofing/liveness detection to facilitate related research and analytics.

CLOSED 07 June 2019: We are training a better-performing IR-152 model on MS-Celeb-1M_Align_112x112, and will release the model soon.

CLOSED 23 May 2019: We share three publicly available datasets to facilitate research on heterogeneous face recognition and analytics. Please refer to Sec. Data Zoo for details.

CLOSED 23 Jan 2019: We share the name lists and pair-wise overlapping lists of several widely-used face recognition datasets to help researchers/engineers quickly remove the overlapping parts between their own private datasets and the public datasets. Please refer to Sec. Data Zoo for details.

CLOSED 23 Jan 2019: The current distributed training schema with multi-GPUs under PyTorch and other mainstream platforms parallels the backbone across multi-GPUs while relying on a single master to compute the final bottleneck (fully-connected/softmax) layer. This is not an issue for conventional face recognition with moderate number of identities. However, it struggles with large-scale face recognition, which requires recognizing millions of identities in the real world. The master can hardly hold the oversized final layer while the slaves still have redundant computation resource, leading to small-batch training or even failed training. To address this problem, we are developing a highly-elegant, effective and efficient distributed training schema with multi-GPUs under PyTorch, supporting not only the backbone, but also the head with the fully-connected (softmax) layer, to facilitate high-performance large-scale face recognition. We will added this support into our repo.

CLOSED 22 Jan 2019: We have released two feature extraction APIs for extracting features from pre-trained models, implemented with PyTorch build-in functions and OpenCV, respectively. Please check ./util/extract_feature_v1.py and ./util/extract_feature_v2.py.

CLOSED 22 Jan 2019: We are fine-tuning our released IR-50 model on our private Asia face data, which will be released soon to facilitate high-performance Asia face recognition.

CLOSED 21 Jan 2019: We are training a better-performing IR-50 model on MS-Celeb-1M_Align_112x112, and will replace the current model soon.


Contents


face.evoLVe for High-Performance Face Recognition

Introduction

💁

  • This repo provides a comprehensive face recognition library for face related analytics & applications, including face alignment (detection, landmark localization, affine transformation, etc.), data processing (e.g., augmentation, data balancing, normalization, etc.), various backbones (e.g., ResNet, IR, IR-SE, ResNeXt, SE-ResNeXt, DenseNet, LightCNN, MobileNet, ShuffleNet, DPN, etc.), various losses (e.g., Softmax, Focal, Center, SphereFace, CosFace, AmSoftmax, ArcFace, Triplet, etc.) and bags of tricks for improving performance (e.g., training refinements, model tweaks, knowledge distillation, etc.).
  • The current distributed training schema with multi-GPUs under PyTorch and other mainstream platforms parallels the backbone across multi-GPUs while relying on a single master to compute the final bottleneck (fully-connected/softmax) layer. This is not an issue for conventional face recognition with moderate number of identities. However, it struggles with large-scale face recognition, which requires recognizing millions of identities in the real world. The master can hardly hold the oversized final layer while the slaves still have redundant computation resource, leading to small-batch training or even failed training. To address this problem, this repo provides a highly-elegant, effective and efficient distributed training schema with multi-GPUs under PyTorch, supporting not only the backbone, but also the head with the fully-connected (softmax) layer, to facilitate high-performance large-scale face recognition.
  • All data before & after alignment, source codes and trained models are provided.
  • This repo can help researchers/engineers develop high-performance deep face recognition models and algorithms quickly for practical use and deployment.

Pre-Requisites

🍰

  • Linux or macOS
  • Python 3.7 (for training & validation) and Python 2.7 (for visualization w/ tensorboardX)
  • PyTorch 1.0 (for traininig & validation, install w/ pip install torch torchvision)
  • MXNet 1.3.1 (optional, for data processing, install w/ pip install mxnet-cu90)
  • TensorFlow 1.12 (optional, for visualization, install w/ pip install tensorflow-gpu)
  • tensorboardX 1.6 (optional, for visualization, install w/ pip install tensorboardX)
  • OpenCV 3.4.5 (install w/ pip install opencv-python)
  • bcolz 1.2.0 (install w/ pip install bcolz)

While not required, for optimal performance it is highly recommended to run the code using a CUDA enabled GPU. We used 4-8 NVIDIA Tesla P40 in parallel.


Usage

📙

  • Clone the repo: git clone https://github.com/ZhaoJ9014/face.evoLVe.PyTorch.git.
  • mkdir data checkpoint log at appropriate directory to store your train/val/test data, checkpoints and training logs.
  • Prepare your train/val/test data (refer to Sec. Data Zoo for publicly available face related databases), and ensure each database folder has the following structure:
    ./data/db_name/
            -> id1/
                -> 1.jpg
                -> ...
            -> id2/
                -> 1.jpg
                -> ...
            -> ...
                -> ...
                -> ...
    
  • Refer to the codes of corresponding sections for specific purposes.

Face Alignment

📐

  • This section is based on the work of MTCNN.
  • Folder: ./align
  • Face detection, landmark localization APIs and visualization toy example with ipython notebook:
    from PIL import Image
    from detector import detect_faces
    from visualization_utils import show_results
    
    img = Image.open('some_img.jpg') # modify the image path to yours
    bounding_boxes, landmarks = detect_faces(img) # detect bboxes and landmarks for all faces in the image
    show_results(img, bounding_boxes, landmarks) # visualize the results
  • Face alignment API (perform face detection, landmark localization and alignment with affine transformations on a whole database folder source_root with the directory structure as demonstrated in Sec. Usage, and store the aligned results to a new folder dest_root with the same directory structure):
    python face_align.py -source_root [source_root] -dest_root [dest_root] -crop_size [crop_size]
    
    # python face_align.py -source_root './data/test' -dest_root './data/test_Aligned' -crop_size 112
    
  • For macOS users, there is no need to worry about *.DS_Store files which may ruin your data, since they will be automatically removed when you run the scripts.
  • Keynotes for customed use: 1) specify the arguments of source_root, dest_root and crop_size to your own values when you run face_align.py; 2) pass your customed min_face_size, thresholds and nms_thresholds values to the detect_faces function of detector.py to match your practical requirements; 3) if you find the speed using face alignment API is a bit slow, you can call face resize API to firstly resize the image whose smaller size is larger than a threshold (specify the arguments of source_root, dest_root and min_side to your own values) before calling the face alignment API:
    python face_resize.py
    

Data Processing

📊

  • Folder: ./balance
  • Remove low-shot data API (remove the low-shot classes with less than min_num samples in the training set root with the directory structure as demonstrated in Sec. Usage for data balance and effective model training):
    python remove_lowshot.py -root [root] -min_num [min_num]
    
    # python remove_lowshot.py -root './data/train' -min_num 10
    
  • Keynotes for customed use: specify the arguments of root and min_num to your own values when you run remove_lowshot.py.
  • We prefer to include other data processing tricks, e.g., augmentation (flip horizontally, scale hue/satuation/brightness with coefficients uniformly drawn from [0.6,1.4], add PCA noise with a coefficient sampled from a normal distribution N(0,0.1), etc.), weighted random sampling, normalization, etc. to the main training script in Sec. Training and Validation to be self-contained.

Training and Validation

  • Folder: ./

  • Configuration API (configurate your overall settings for training & validation) config.py:

    import torch
    
    configurations = {
        1: dict(
            SEED = 1337, # random seed for reproduce results
    
            DATA_ROOT = '/media/pc/6T/jasonjzhao/data/faces_emore', # the parent root where your train/val/test data are stored
            MODEL_ROOT = '/media/pc/6T/jasonjzhao/buffer/model', # the root to buffer your checkpoints
            LOG_ROOT = '/media/pc/6T/jasonjzhao/buffer/log', # the root to log your train/val status
            BACKBONE_RESUME_ROOT = './', # the root to resume training from a saved checkpoint
            HEAD_RESUME_ROOT = './', # the root to resume training from a saved checkpoint
    
            BACKBONE_NAME = 'IR_SE_50', # support: ['ResNet_50', 'ResNet_101', 'ResNet_152', 'IR_50', 'IR_101', 'IR_152', 'IR_SE_50', 'IR_SE_101', 'IR_SE_152']
            HEAD_NAME = 'ArcFace', # support:  ['Softmax', 'ArcFace', 'CosFace', 'SphereFace', 'Am_softmax']
            LOSS_NAME = 'Focal', # support: ['Focal', 'Softmax']
    
            INPUT_SIZE = [112, 112], # support: [112, 112] and [224, 224]
            RGB_MEAN = [0.5, 0.5, 0.5], # for normalize inputs to [-1, 1]
            RGB_STD = [0.5, 0.5, 0.5],
            EMBEDDING_SIZE = 512, # feature dimension
            BATCH_SIZE = 512,
            DROP_LAST = True, # whether drop the last batch to ensure consistent batch_norm statistics
            LR = 0.1, # initial LR
            NUM_EPOCH = 125, # total epoch number (use the firt 1/25 epochs to warm up)
            WEIGHT_DECAY = 5e-4, # do not apply to batch_norm parameters
            MOMENTUM = 0.9,
            STAGES = [35, 65, 95], # epoch stages to decay learning rate
    
            DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu"),
            MULTI_GPU = True, # flag to use multiple GPUs; if you choose to train with single GPU, you should first run "export CUDA_VISILE_DEVICES=device_id" to specify the GPU card you want to use
            GPU_ID = [0, 1, 2, 3], # specify your GPU ids
            PIN_MEMORY = True,
            NUM_WORKERS = 0,
    ),
    }
  • Train & validation API (all folks about training & validation, i.e., import package, hyperparameters & data loaders, model & loss & optimizer, train & validation & save checkpoint) train.py. Since MS-Celeb-1M serves as an ImageNet in the filed of face recognition, we pre-train the face.evoLVe models on MS-Celeb-1M and perform validation on LFW, CFP_FF, CFP_FP, AgeDB, CALFW, CPLFW and Vggface2_FP. Let's dive into details together step by step.

    • Import necessary packages:
      import torch
      import torch.nn as nn
      import torch.optim as optim
      import torchvision.transforms as transforms
      import torchvision.datasets as datasets
      
      from config import configurations
      from backbone.model_resnet import ResNet_50, ResNet_101, ResNet_152
      from backbone.model_irse import IR_50, IR_101, IR_152, IR_SE_50, IR_SE_101, IR_SE_152
      from head.metrics import ArcFace, CosFace, SphereFace, Am_softmax
      from loss.focal import FocalLoss
      from util.utils import make_weights_for_balanced_classes, get_val_data, separate_irse_bn_paras, separate_resnet_bn_paras, warm_up_lr, schedule_lr, perform_val, get_time, buffer_val, AverageMeter, accuracy
      
      from tensorboardX import SummaryWriter
      from tqdm import tqdm
      import os
    • Initialize hyperparameters:
      cfg = configurations[1]
      
      SEED = cfg['SEED'] # random seed for reproduce results
      torch.manual_seed(SEED)
      
      DATA_ROOT = cfg['DATA_ROOT'] # the parent root where your train/val/test data are stored
      MODEL_ROOT = cfg['MODEL_ROOT'] # the root to buffer your checkpoints
      LOG_ROOT = cfg['LOG_ROOT'] # the root to log your train/val status
      BACKBONE_RESUME_ROOT = cfg['BACKBONE_RESUME_ROOT'] # the root to resume training from a saved checkpoint
      HEAD_RESUME_ROOT = cfg['HEAD_RESUME_ROOT']  # the root to resume training from a saved checkpoint
      
      BACKBONE_NAME = cfg['BACKBONE_NAME'] # support: ['ResNet_50', 'ResNet_101', 'ResNet_152', 'IR_50', 'IR_101', 'IR_152', 'IR_SE_50', 'IR_SE_101', 'IR_SE_152']
      HEAD_NAME = cfg['HEAD_NAME'] # support:  ['Softmax', 'ArcFace', 'CosFace', 'SphereFace', 'Am_softmax']
      LOSS_NAME = cfg['LOSS_NAME'] # support: ['Focal', 'Softmax']
      
      INPUT_SIZE = cfg['INPUT_SIZE']
      RGB_MEAN = cfg['RGB_MEAN'] # for normalize inputs
      RGB_STD = cfg['RGB_STD']
      EMBEDDING_SIZE = cfg['EMBEDDING_SIZE'] # feature dimension
      BATCH_SIZE = cfg['BATCH_SIZE']
      DROP_LAST = cfg['DROP_LAST'] # whether drop the last batch to ensure consistent batch_norm statistics
      LR = cfg['LR'] # initial LR
      NUM_EPOCH = cfg['NUM_EPOCH']
      WEIGHT_DECAY = cfg['WEIGHT_DECAY']
      MOMENTUM = cfg['MOMENTUM']
      STAGES = cfg['STAGES'] # epoch stages to decay learning rate
      
      DEVICE = cfg['DEVICE']
      MULTI_GPU = cfg['MULTI_GPU'] # flag to use multiple GPUs
      GPU_ID = cfg['GPU_ID'] # specify your GPU ids
      PIN_MEMORY = cfg['PIN_MEMORY']
      NUM_WORKERS = cfg['NUM_WORKERS']
      print("=" * 60)
      print("Overall Configurations:")
      print(cfg)
      print("=" * 60)
      
      writer = SummaryWriter(LOG_ROOT) # writer for buffering intermedium results
    • Train & validation data loaders:
      train_transform = transforms.Compose([ # refer to https://pytorch.org/docs/stable/torchvision/transforms.html for more build-in online data augmentation
          transforms.Resize([int(128 * INPUT_SIZE[0] / 112), int(128 * INPUT_SIZE[0] / 112)]), # smaller side resized
          transforms.RandomCrop([INPUT_SIZE[0], INPUT_SIZE[1]]),
          transforms.RandomHorizontalFlip(),
          transforms.ToTensor(),
          transforms.Normalize(mean = RGB_MEAN,
                               std = RGB_STD),
      ])
      
      dataset_train = datasets.ImageFolder(os.path.join(DATA_ROOT, 'imgs'), train_transform)
      
      # create a weighted random sampler to process imbalanced data
      weights = make_weights_for_balanced_classes(dataset_train.imgs, len(dataset_train.classes))
      weights = torch.DoubleTensor(weights)
      sampler = torch.utils.data.sampler.WeightedRandomSampler(weights, len(weights))
      
      train_loader = torch.utils.data.DataLoader(
          dataset_train, batch_size = BATCH_SIZE, sampler = sampler, pin_memory = PIN_MEMORY,
          num_workers = NUM_WORKERS, drop_last = DROP_LAST
      )
      
      NUM_CLASS = len(train_loader.dataset.classes)
      print("Number of Training Classes: {}".format(NUM_CLASS))
      
      lfw, cfp_ff, cfp_fp, agedb, calfw, cplfw, vgg2_fp, lfw_issame, cfp_ff_issame, cfp_fp_issame, agedb_issame, calfw_issame, cplfw_issame, vgg2_fp_issame = get_val_data(DATA_ROOT)
    • Define and initialize model (backbone & head):
      BACKBONE_DICT = {'ResNet_50': ResNet_50(INPUT_SIZE), 
                       'ResNet_101': ResNet_101(INPUT_SIZE), 
                       'ResNet_152': ResNet_152(INPUT_SIZE),
                       'IR_50': IR_50(INPUT_SIZE), 
                       'IR_101': IR_101(INPUT_SIZE), 
                       'IR_152': IR_152(INPUT_SIZE),
                       'IR_SE_50': IR_SE_50(INPUT_SIZE), 
                       'IR_SE_101': IR_SE_101(INPUT_SIZE), 
                       'IR_SE_152': IR_SE_152(INPUT_SIZE)}
      BACKBONE = BACKBONE_DICT[BACKBONE_NAME]
      print("=" * 60)
      print(BACKBONE)
      print("{} Backbone Generated".format(BACKBONE_NAME))
      print("=" * 60)
      
      HEAD_DICT = {'ArcFace': ArcFace(in_features = EMBEDDING_SIZE, out_features = NUM_CLASS, device_id = GPU_ID),
                   'CosFace': CosFace(in_features = EMBEDDING_SIZE, out_features = NUM_CLASS, device_id = GPU_ID),
                   'SphereFace': SphereFace(in_features = EMBEDDING_SIZE, out_features = NUM_CLASS, device_id = GPU_ID),
                   'Am_softmax': Am_softmax(in_features = EMBEDDING_SIZE, out_features = NUM_CLASS, device_id = GPU_ID)}
      HEAD = HEAD_DICT[HEAD_NAME]
      print("=" * 60)
      print(HEAD)
      print("{} Head Generated".format(HEAD_NAME))
      print("=" * 60)
    • Define and initialize loss function:
      LOSS_DICT = {'Focal': FocalLoss(), 
                   'Softmax': nn.CrossEntropyLoss()}
      LOSS = LOSS_DICT[LOSS_NAME]
      print("=" * 60)
      print(LOSS)
      print("{} Loss Generated".format(LOSS_NAME))
      print("=" * 60)
    • Define and initialize optimizer:
      if BACKBONE_NAME.find("IR") >= 0:
          backbone_paras_only_bn, backbone_paras_wo_bn = separate_irse_bn_paras(BACKBONE) # separate batch_norm parameters from others; do not do weight decay for batch_norm parameters to improve the generalizability
          _, head_paras_wo_bn = separate_irse_bn_paras(HEAD)
      else:
          backbone_paras_only_bn, backbone_paras_wo_bn = separate_resnet_bn_paras(BACKBONE) # separate batch_norm parameters from others; do not do weight decay for batch_norm parameters to improve the generalizability
          _, head_paras_wo_bn = separate_resnet_bn_paras(HEAD)
      OPTIMIZER = optim.SGD([{'params': backbone_paras_wo_bn + head_paras_wo_bn, 'weight_decay': WEIGHT_DECAY}, {'params': backbone_paras_only_bn}], lr = LR, momentum = MOMENTUM)
      print("=" * 60)
      print(OPTIMIZER)
      print("Optimizer Generated")
      print("=" * 60)
    • Whether resume from a checkpoint or not:
      if BACKBONE_RESUME_ROOT and HEAD_RESUME_ROOT:
          print("=" * 60)
          if os.path.isfile(BACKBONE_RESUME_ROOT) and os.path.isfile(HEAD_RESUME_ROOT):
              print("Loading Backbone Checkpoint '{}'".format(BACKBONE_RESUME_ROOT))
              BACKBONE.load_state_dict(torch.load(BACKBONE_RESUME_ROOT))
              print("Loading Head Checkpoint '{}'".format(HEAD_RESUME_ROOT))
              HEAD.load_state_dict(torch.load(HEAD_RESUME_ROOT))
          else:
              print("No Checkpoint Found at '{}' and '{}'. Please Have a Check or Continue to Train from Scratch".format(BACKBONE_RESUME_ROOT, HEAD_RESUME_ROOT))
          print("=" * 60)
    • Whether use multi-GPU or not:
      if MULTI_GPU:
          # multi-GPU setting
          BACKBONE = nn.DataParallel(BACKBONE, device_ids = GPU_ID)
          BACKBONE = BACKBONE.to(DEVICE)
      else:
          # single-GPU setting
          BACKBONE = BACKBONE.to(DEVICE)
    • Minor settings prior to training:
      DISP_FREQ = len(train_loader) // 100 # frequency to display training loss & acc
      
      NUM_EPOCH_WARM_UP = NUM_EPOCH // 25  # use the first 1/25 epochs to warm up
      NUM_BATCH_WARM_UP = len(train_loader) * NUM_EPOCH_WARM_UP  # use the first 1/25 epochs to warm up
      batch = 0  # batch index
    • Training & validation & save checkpoint (use the first 1/25 epochs to warm up -- gradually increase LR to the initial value to ensure stable convergence):
      for epoch in range(NUM_EPOCH): # start training process
          
          if epoch == STAGES[0]: # adjust LR for each training stage after warm up, you can also choose to adjust LR manually (with slight modification) once plaueau observed
              schedule_lr(OPTIMIZER)
          if epoch == STAGES[1]:
              schedule_lr(OPTIMIZER)
          if epoch == STAGES[2]:
              schedule_lr(OPTIMIZER)
      
          BACKBONE.train()  # set to training mode
          HEAD.train()
      
          losses = AverageMeter()
          top1 = AverageMeter()
          top5 = AverageMeter()
      
          for inputs, labels in tqdm(iter(train_loader)):
      
              if (epoch + 1 <= NUM_EPOCH_WARM_UP) and (batch + 1 <= NUM_BATCH_WARM_UP): # adjust LR for each training batch during warm up
                  warm_up_lr(batch + 1, NUM_BATCH_WARM_UP, LR, OPTIMIZER)
      
              # compute output
              inputs = inputs.to(DEVICE)
              labels = labels.to(DEVICE).long()
              features = BACKBONE(inputs)
              outputs = HEAD(features, labels)
              loss = LOSS(outputs, labels)
      
              # measure accuracy and record loss
              prec1, prec5 = accuracy(outputs.data, labels, topk = (1, 5))
              losses.update(loss.data.item(), inputs.size(0))
              top1.update(prec1.data.item(), inputs.size(0))
              top5.update(prec5.data.item(), inputs.size(0))
      
              # compute gradient and do SGD step
              OPTIMIZER.zero_grad()
              loss.backward()
              OPTIMIZER.step()
              
              # dispaly training loss & acc every DISP_FREQ
              if ((batch + 1) % DISP_FREQ == 0) and batch != 0:
                  print("=" * 60)
                  print('Epoch {}/{} Batch {}/{}\t'
                        'Training Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                        'Training Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
                        'Training Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
                      epoch + 1, NUM_EPOCH, batch + 1, len(train_loader) * NUM_EPOCH, loss = losses, top1 = top1, top5 = top5))
                  print("=" * 60)
      
              batch += 1 # batch index
      
          # training statistics per epoch (buffer for visualization)
          epoch_loss = losses.avg
          epoch_acc = top1.avg
          writer.add_scalar("Training_Loss", epoch_loss, epoch + 1)
          writer.add_scalar("Training_Accuracy", epoch_acc, epoch + 1)
          print("=" * 60)
          print('Epoch: {}/{}\t'
                'Training Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                'Training Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
                'Training Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
              epoch + 1, NUM_EPOCH, loss = losses, top1 = top1, top5 = top5))
          print("=" * 60)
      
          # perform validation & save checkpoints per epoch
          # validation statistics per epoch (buffer for visualization)
          print("=" * 60)
          print("Perform Evaluation on LFW, CFP_FF, CFP_FP, AgeDB, CALFW, CPLFW and VGG2_FP, and Save Checkpoints...")
          accuracy_lfw, best_threshold_lfw, roc_curve_lfw = perform_val(MULTI_GPU, DEVICE, EMBEDDING_SIZE, BATCH_SIZE, BACKBONE, lfw, lfw_issame)
          buffer_val(writer, "LFW", accuracy_lfw, best_threshold_lfw, roc_curve_lfw, epoch + 1)
          accuracy_cfp_ff, best_threshold_cfp_ff, roc_curve_cfp_ff = perform_val(MULTI_GPU, DEVICE, EMBEDDING_SIZE, BATCH_SIZE, BACKBONE, cfp_ff, cfp_ff_issame)
          buffer_val(writer, "CFP_FF", accuracy_cfp_ff, best_threshold_cfp_ff, roc_curve_cfp_ff, epoch + 1)
          accuracy_cfp_fp, best_threshold_cfp_fp, roc_curve_cfp_fp = perform_val(MULTI_GPU, DEVICE, EMBEDDING_SIZE, BATCH_SIZE, BACKBONE, cfp_fp, cfp_fp_issame)
          buffer_val(writer, "CFP_FP", accuracy_cfp_fp, best_threshold_cfp_fp, roc_curve_cfp_fp, epoch + 1)
          accuracy_agedb, best_threshold_agedb, roc_curve_agedb = perform_val(MULTI_GPU, DEVICE, EMBEDDING_SIZE, BATCH_SIZE, BACKBONE, agedb, agedb_issame)
          buffer_val(writer, "AgeDB", accuracy_agedb, best_threshold_agedb, roc_curve_agedb, epoch + 1)
          accuracy_calfw, best_threshold_calfw, roc_curve_calfw = perform_val(MULTI_GPU, DEVICE, EMBEDDING_SIZE, BATCH_SIZE, BACKBONE, calfw, calfw_issame)
          buffer_val(writer, "CALFW", accuracy_calfw, best_threshold_calfw, roc_curve_calfw, epoch + 1)
          accuracy_cplfw, best_threshold_cplfw, roc_curve_cplfw = perform_val(MULTI_GPU, DEVICE, EMBEDDING_SIZE, BATCH_SIZE, BACKBONE, cplfw, cplfw_issame)
          buffer_val(writer, "CPLFW", accuracy_cplfw, best_threshold_cplfw, roc_curve_cplfw, epoch + 1)
          accuracy_vgg2_fp, best_threshold_vgg2_fp, roc_curve_vgg2_fp = perform_val(MULTI_GPU, DEVICE, EMBEDDING_SIZE, BATCH_SIZE, BACKBONE, vgg2_fp, vgg2_fp_issame)
          buffer_val(writer, "VGGFace2_FP", accuracy_vgg2_fp, best_threshold_vgg2_fp, roc_curve_vgg2_fp, epoch + 1)
          print("Epoch {}/{}, Evaluation: LFW Acc: {}, CFP_FF Acc: {}, CFP_FP Acc: {}, AgeDB Acc: {}, CALFW Acc: {}, CPLFW Acc: {}, VGG2_FP Acc: {}".format(epoch + 1, NUM_EPOCH, accuracy_lfw, accuracy_cfp_ff, accuracy_cfp_fp, accuracy_agedb, accuracy_calfw, accuracy_cplfw, accuracy_vgg2_fp))
          print("=" * 60)
      
          # save checkpoints per epoch
          if MULTI_GPU:
              torch.save(BACKBONE.module.state_dict(), os.path.join(MODEL_ROOT, "Backbone_{}_Epoch_{}_Batch_{}_Time_{}_checkpoint.pth".format(BACKBONE_NAME, epoch + 1, batch, get_time())))
              torch.save(HEAD.state_dict(), os.path.join(MODEL_ROOT, "Head_{}_Epoch_{}_Batch_{}_Time_{}_checkpoint.pth".format(HEAD_NAME, epoch + 1, batch, get_time())))
          else:
              torch.save(BACKBONE.state_dict(), os.path.join(MODEL_ROOT, "Backbone_{}_Epoch_{}_Batch_{}_Time_{}_checkpoint.pth".format(BACKBONE_NAME, epoch + 1, batch, get_time())))
              torch.save(HEAD.state_dict(), os.path.join(MODEL_ROOT, "Head_{}_Epoch_{}_Batch_{}_Time_{}_checkpoint.pth".format(HEAD_NAME, epoch + 1, batch, get_time())))
  • Now, you can start to play with face.evoLVe and run train.py. User friendly information will popped out on your terminal:

    • About overall configuration:

    • About number of training classes:

    • About backbone details:

    • About head details:

    • About loss details:

    • About optimizer details:

    • About resume training:

    • About training status & statistics (when batch index reachs DISP_FREQ or at the end of each epoch):

    • About validation statistics & save checkpoints (at the end of each epoch):

  • Monitor on-the-fly GPU occupancy with watch -d -n 0.01 nvidia-smi.

  • Please refer to Sec. Model Zoo for specific model weights and corresponding performance.

  • Feature extraction API (extract features from pre-trained models) ./util/extract_feature_v1.py (implemented with PyTorch build-in functions) and ./util/extract_feature_v2.py (implemented with OpenCV).

  • Visualize training & validation statistics with tensorboardX (see Sec. Model Zoo):

    tensorboard --logdir /media/pc/6T/jasonjzhao/buffer/log
    

Data Zoo

🐯

Database Version #Identity #Image #Frame #Video Download Link
LFW Raw 5,749 13,233 - - Google Drive, Baidu Drive
LFW Align_250x250 5,749 13,233 - - Google Drive, Baidu Drive
LFW Align_112x112 5,749 13,233 - - Google Drive, Baidu Drive
CALFW Raw 4,025 12,174 - - Google Drive, Baidu Drive
CALFW Align_112x112 4,025 12,174 - - Google Drive, Baidu Drive
CPLFW Raw 3,884 11,652 - - Google Drive, Baidu Drive
CPLFW Align_112x112 3,884 11,652 - - Google Drive, Baidu Drive
CASIA-WebFace Raw_v1 10,575 494,414 - - Baidu Drive
CASIA-WebFace Raw_v2 10,575 494,414 - - Google Drive, Baidu Drive
CASIA-WebFace Clean 10,575 455,594 - - Google Drive, Baidu Drive
MS-Celeb-1M Clean 100,000 5,084,127 - - Google Drive
MS-Celeb-1M Align_112x112 85,742 5,822,653 - - Google Drive
Vggface2 Clean 8,631 3,086,894 - - Google Drive
Vggface2_FP Align_112x112 - - - - Google Drive, Baidu Drive
AgeDB Raw 570 16,488 - - Google Drive, Baidu Drive
AgeDB Align_112x112 570 16,488 - - Google Drive, Baidu Drive
IJB-A Clean 500 5,396 20,369 2,085 Google Drive, Baidu Drive
IJB-B Raw 1,845 21,798 55,026 7,011 Google Drive
CFP Raw 500 7,000 - - Google Drive, Baidu Drive
CFP Align_112x112 500 7,000 - - Google Drive, Baidu Drive
Umdfaces Align_112x112 8,277 367,888 - - Google Drive, Baidu Drive
CelebA Raw 10,177 202,599 - - Google Drive, Baidu Drive
CACD-VS Raw 2,000 163,446 - - Google Drive, Baidu Drive
YTF Align_344x344 1,595 - 3,425 621,127 Google Drive, Baidu Drive
DeepGlint Align_112x112 180,855 6,753,545 - - Google Drive
UTKFace Align_200x200 - 23,708 - - Google Drive, Baidu Drive
BUAA-VisNir Align_287x287 150 5,952 - - Baidu Drive, PW: xmbc
CASIA NIR-VIS 2.0 Align_128x128 725 17,580 - - Baidu Drive, PW: 883b
Oulu-CASIA Raw 80 65,000 - - Baidu Drive, PW: xxp5
NUAA-ImposterDB Raw 15 12,614 - - Baidu Drive, PW: if3n
CASIA-SURF Raw 1,000 - - 21,000 Baidu Drive, PW: izb3
CASIA-FASD Raw 50 - - 600 Baidu Drive, PW: h5un
CASIA-MFSD Raw 50 - - 600
Replay-Attack Raw 50 - - 1,200
WebFace260M Raw 24M 2M - https://www.face-benchmark.org/
  • Remark: unzip CASIA-WebFace clean version with
    unzip casia-maxpy-clean.zip    
    cd casia-maxpy-clean    
    zip -F CASIA-maxpy-clean.zip --out CASIA-maxpy-clean_fix.zip    
    unzip CASIA-maxpy-clean_fix.zip
    
  • Remark: after unzip, get image data & pair ground truths from AgeDB, CFP, LFW and VGGFace2_FP align_112x112 versions with
    import numpy as np
    import bcolz
    import os
    
    def get_pair(root, name):
        carray = bcolz.carray(rootdir = os.path.join(root, name), mode='r')
        issame = np.load('{}/{}_list.npy'.format(root, name))
        return carray, issame
    
    def get_data(data_root):
        agedb_30, agedb_30_issame = get_pair(data_root, 'agedb_30')
        cfp_fp, cfp_fp_issame = get_pair(data_root, 'cfp_fp')
        lfw, lfw_issame = get_pair(data_root, 'lfw')
        vgg2_fp, vgg2_fp_issame = get_pair(data_root, 'vgg2_fp')
        return agedb_30, cfp_fp, lfw, vgg2_fp, agedb_30_issame, cfp_fp_issame, lfw_issame, vgg2_fp_issame
    
    agedb_30, cfp_fp, lfw, vgg2_fp, agedb_30_issame, cfp_fp_issame, lfw_issame, vgg2_fp_issame = get_data(DATA_ROOT)
  • Remark: We share MS-Celeb-1M_Top1M_MID2Name.tsv (Google Drive, Baidu Drive), VGGface2_ID2Name.csv (Google Drive, Baidu Drive), VGGface2_FaceScrub_Overlap.txt (Google Drive, Baidu Drive), VGGface2_LFW_Overlap.txt (Google Drive, Baidu Drive), CASIA-WebFace_ID2Name.txt (Google Drive, Baidu Drive), CASIA-WebFace_FaceScrub_Overlap.txt (Google Drive, Baidu Drive), CASIA-WebFace_LFW_Overlap.txt (Google Drive, Baidu Drive), FaceScrub_Name.txt (Google Drive, Baidu Drive), LFW_Name.txt (Google Drive, Baidu Drive), LFW_Log.txt (Google Drive, Baidu Drive) to help researchers/engineers quickly remove the overlapping parts between their own private datasets and the public datasets.
  • Due to release license issue, for other face related databases, please make contact with us in person for more details.

Model Zoo

🐒

  • Model

    Backbone Head Loss Training Data Download Link
    IR-50 ArcFace Focal MS-Celeb-1M_Align_112x112 Google Drive, Baidu Drive
    • Setting

      INPUT_SIZE: [112, 112]; RGB_MEAN: [0.5, 0.5, 0.5]; RGB_STD: [0.5, 0.5, 0.5]; BATCH_SIZE: 512 (drop the last batch to ensure consistent batch_norm statistics); Initial LR: 0.1; NUM_EPOCH: 120; WEIGHT_DECAY: 5e-4 (do not apply to batch_norm parameters); MOMENTUM: 0.9; STAGES: [30, 60, 90]; Augmentation: Random Crop + Horizontal Flip; Imbalanced Data Processing: Weighted Random Sampling; Solver: SGD; GPUs: 4 NVIDIA Tesla P40 in Parallel
      
    • Training & validation statistics

    • Performance

      LFW CFP_FF CFP_FP AgeDB CALFW CPLFW Vggface2_FP
      99.78 99.69 98.14 97.53 95.87 92.45 95.22
  • Model

    Backbone Head Loss Training Data Download Link
    IR-50 ArcFace Focal Private Asia Face Data Google Drive, Baidu Drive
    • Setting

      INPUT_SIZE: [112, 112]; RGB_MEAN: [0.5, 0.5, 0.5]; RGB_STD: [0.5, 0.5, 0.5]; BATCH_SIZE: 1024 (drop the last batch to ensure consistent batch_norm statistics); Initial LR: 0.01 (initialize weights from the above model pre-trained on MS-Celeb-1M_Align_112x112); NUM_EPOCH: 80; WEIGHT_DECAY: 5e-4 (do not apply to batch_norm parameters); MOMENTUM: 0.9; STAGES: [20, 40, 60]; Augmentation: Random Crop + Horizontal Flip; Imbalanced Data Processing: Weighted Random Sampling; Solver: SGD; GPUs: 8 NVIDIA Tesla P40 in Parallel
      
    • Performance (please perform evaluation on your own Asia face benchmark dataset)

  • Model

    Backbone Head Loss Training Data Download Link
    IR-152 ArcFace Focal MS-Celeb-1M_Align_112x112 Baidu Drive, PW: b197
    • Setting

      INPUT_SIZE: [112, 112]; RGB_MEAN: [0.5, 0.5, 0.5]; RGB_STD: [0.5, 0.5, 0.5]; BATCH_SIZE: 256 (drop the last batch to ensure consistent batch_norm statistics); Initial LR: 0.01; NUM_EPOCH: 120; WEIGHT_DECAY: 5e-4 (do not apply to batch_norm parameters); MOMENTUM: 0.9; STAGES: [30, 60, 90]; Augmentation: Random Crop + Horizontal Flip; Imbalanced Data Processing: Weighted Random Sampling; Solver: SGD; GPUs: 4 NVIDIA Geforce RTX 2080 Ti in Parallel
      
    • Training & validation statistics

    • Performance

      LFW CFP_FF CFP_FP AgeDB CALFW CPLFW Vggface2_FP
      99.82 99.83 98.37 98.07 96.03 93.05 95.50

Achievement

🎊

  • 2017 No.1 on ICCV 2017 MS-Celeb-1M Large-Scale Face Recognition Hard Set/Random Set/Low-Shot Learning Challenges. WeChat News, NUS ECE News, NUS ECE Poster, Award Certificate for Track-1, Award Certificate for Track-2, Award Ceremony.

  • 2017 No.1 on National Institute of Standards and Technology (NIST) IARPA Janus Benchmark A (IJB-A) Unconstrained Face Verification challenge and Identification challenge. WeChat News.

  • State-of-the-art performance on

    • MS-Celeb-1M (Challenge1 Hard Set Coverage@P=0.95: 79.10%; Challenge1 Random Set Coverage@P=0.95: 87.50%; Challenge2 Development Set Coverage@P=0.99: 100.00%; Challenge2 Base Set Top 1 Accuracy: 99.74%; Challenge2 Novel Set Coverage@P=0.99: 99.01%).
    • IJB-A (1:1 Veification TAR@FAR=0.1: 99.6%±0.1%; 1:1 Veification TAR@FAR=0.01: 99.1%±0.2%; 1:1 Veification TAR@FAR=0.001: 97.9%±0.4%; 1:N Identification FNIR@FPIR=0.1: 1.3%±0.3%; 1:N Identification FNIR@FPIR=0.01: 5.4%±4.7%; 1:N Identification Rank1 Accuracy: 99.2%±0.1%; 1:N Identification Rank5 Accuracy: 99.7%±0.1%; 1:N Identification Rank10 Accuracy: 99.8%±0.1%).
    • IJB-C (1:1 Veification TAR@FAR=1e-5: 82.6%).
    • Labeled Faces in the Wild (LFW) (Accuracy: 99.85%±0.217%).
    • Celebrities in Frontal-Profile (CFP) (Frontal-Profile Accuracy: 96.01%±0.84%; Frontal-Profile EER: 4.43%±1.04%; Frontal-Profile AUC: 99.00%±0.35%; Frontal-Frontal Accuracy: 99.64%±0.25%; Frontal-Frontal EER: 0.54%±0.37%; Frontal-Frontal AUC: 99.98%±0.03%).
    • CMU Multi-PIE (Rank1 Accuracy Setting-1 under ±90°: 76.12%; Rank1 Accuracy Setting-2 under ±90°: 86.73%).
    • MORPH Album2 (Rank1 Accuracy Setting-1: 99.65%; Rank1 Accuracy Setting-2: 99.26%).
    • CACD-VS (Accuracy: 99.76%).
    • FG-NET (Rank1 Accuracy: 93.20%).

Acknowledgement

👬


Citation

📑

  • Please consult and consider citing the following papers:

    @article{wu20223d,
    title={3D-Guided Frontal Face Generation for Pose-Invariant Recognition},
    author={Wu, Hao and Gu, Jianyang and Fan, Xiaojin and Li, He and Xie, Lidong and Zhao, Jian},
    journal={T-IST},
    year={2022}
    }
    
    
    @article{wang2021face,
    title={Face.evoLVe: A High-Performance Face Recognition Library},
    author={Wang, Qingzhong and Zhang, Pengfei and Xiong, Haoyi and Zhao, Jian},
    journal={arXiv preprint arXiv:2107.08621},
    year={2021}
    }
    
    
    @article{tu2021joint,
    title={Joint Face Image Restoration and Frontalization for Recognition},
    author={Tu, Xiaoguang and Zhao, Jian and Liu, Qiankun and Ai, Wenjie and Guo, Guodong and Li, Zhifeng and Liu, Wei and Feng, Jiashi},
    journal={T-CSVT},
    year={2021}
    }
    
    
    @article{zhao2020towards,
    title={Towards age-invariant face recognition},
    author={Zhao, Jian and Yan, Shuicheng and Feng, Jiashi},
    journal={T-PAMI},
    year={2020}
    }
    
    
    @article{zhao2019recognizing,
    title={Recognizing Profile Faces by Imagining Frontal View},
    author={Zhao, Jian and Xing, Junliang and Xiong, Lin and Yan, Shuicheng and Feng, Jiashi},
    journal={IJCV},
    pages={1--19},
    year={2019}
    }    
    
    
    @inproceedings{zhao2019multi,
    title={Multi-Prototype Networks for Unconstrained Set-based Face Recognition},
    author={Zhao, Jian and Li, Jianshu and Tu, Xiaoguang and Zhao, Fang and Xin, Yuan and Xing, Junliang and Liu, Hengzhu and Yan, Shuicheng and Feng, Jiashi},
    booktitle={IJCAI},
    year={2019}
    }
    
    
    @inproceedings{zhao2019look,
    title={Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition},
    author={Zhao, Jian and Cheng, Yu and Cheng, Yi and Yang, Yang and Lan, Haochong and Zhao, Fang and Xiong, Lin and Xu, Yan and Li, Jianshu and Pranata, Sugiri and others},
    booktitle={AAAI},
    year={2019}
    }
    
    
    @article{zhao20183d,
    title={3D-Aided Dual-Agent GANs for Unconstrained Face Recognition},
    author={Zhao, Jian and Xiong, Lin and Li, Jianshu and Xing, Junliang and Yan, Shuicheng and Feng, Jiashi},
    journal={T-PAMI},
    year={2018}
    }
    
    
    @inproceedings{zhao2018towards,
    title={Towards Pose Invariant Face Recognition in the Wild},
    author={Zhao, Jian and Cheng, Yu and Xu, Yan and Xiong, Lin and Li, Jianshu and Zhao, Fang and Jayashree, Karlekar and Pranata,         Sugiri and Shen, Shengmei and Xing, Junliang and others},
    booktitle={CVPR},
    pages={2207--2216},
    year={2018}
    }
    
    
    @inproceedings{zhao3d,
    title={3D-Aided Deep Pose-Invariant Face Recognition},
    author={Zhao, Jian and Xiong, Lin and Cheng, Yu and Cheng, Yi and Li, Jianshu and Zhou, Li and Xu, Yan and Karlekar, Jayashree and       Pranata, Sugiri and Shen, Shengmei and others},
    booktitle={IJCAI},
    pages={1184--1190},
    year={2018}
    }
    
    
    @inproceedings{zhao2018dynamic,
    title={Dynamic Conditional Networks for Few-Shot Learning},
    author={Zhao, Fang and Zhao, Jian and Yan, Shuicheng and Feng, Jiashi},
    booktitle={ECCV},
    pages={19--35},
    year={2018}
    }
    
    
    @inproceedings{zhao2017dual,
    title={Dual-agent gans for photorealistic and identity preserving profile face synthesis},
    author={Zhao, Jian and Xiong, Lin and Jayashree, Panasonic Karlekar and Li, Jianshu and Zhao, Fang and Wang, Zhecan and Pranata,           Panasonic Sugiri and Shen, Panasonic Shengmei and Yan, Shuicheng and Feng, Jiashi},
    booktitle={NeurIPS},
    pages={66--76},
    year={2017}
    }
    
    
    @inproceedings{zhao122017marginalized,
    title={Marginalized cnn: Learning deep invariant representations},
    author={Zhao12, Jian and Li, Jianshu and Zhao, Fang and Yan13, Shuicheng and Feng, Jiashi},
    booktitle={BMVC},
    year={2017}
    }
    
    
    @inproceedings{cheng2017know,
    title={Know you at one glance: A compact vector representation for low-shot learning},
    author={Cheng, Yu and Zhao, Jian and Wang, Zhecan and Xu, Yan and Jayashree, Karlekar and Shen, Shengmei and Feng, Jiashi},
    booktitle={ICCVW},
    pages={1924--1932},
    year={2017}
    }
    
    
    @inproceedings{wangconditional,
    title={Conditional Dual-Agent GANs for Photorealistic and Annotation Preserving Image Synthesis},
    author={Wang, Zhecan and Zhao, Jian and Cheng, Yu and Xiao, Shengtao and Li, Jianshu and Zhao, Fang and Feng, Jiashi and Kassim, Ashraf},
    booktitle={BMVCW},
    }
    

face.evolve's People

Contributors

clhne avatar insightcs avatar perfectzh avatar reatris avatar verasativa avatar zhaoj9014 avatar zllrunning avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

face.evolve's Issues

amsoftmax the size not match?

cos_theta = torch.mm(embbedings, kernel_norm)

RuntimeError: size mismatch, m1: [1 x 5994], m2: [512 x 5994]

i use a resnext as backbone and pass the output feature to amsoftmax, but error as upon, the kernel size type not match the feature type, something like the ordre is not correctly, am i used not correct?

Question about MegaFace results

Hi~

Thank you for your great work!
We wonder the validation results (or code) for MegaFace would be reported or not.

Thank you.

Detail about the folder structure

Hi

What should be the training data folder structure?

should it be $rootfolder/data/train, $rootfolder/data/val, $rootfolder/data/test

or $rootfolder/data/imgs

Thanks in advance.

About the model of MTCNN

Hello,
Thanks for your great work!!! My question is that what is the difference of the MTCNN model provided in this repo between the original model released by the auther. Do you retrain this model on your own dataset? Thanks!

Training difficulties

Hi,

I am training on casia-clean + align, using IR_SE_50 backbone and ArcFace head. Somehow the network just isn't learning well. The loss is around 24 for the first few epochs, and even after stage 1 adjustment, it drops only to around 19.
I see during your training that the loss starts around 10. Anything I am doing wrong / missing?

Thanks

Problem with dimension when trying to extract features.

Hi.

I get this error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 3, 3], but got 3-dimensional input of size [3, 112, 112] instead

in this line:
features = l2_norm(backbone(images.to(device)).cpu())
When running this code:

def get_transform(input_size = [112, 112], rgb_mean = [0.5, 0.5, 0.5], rgb_std = [0.5, 0.5, 0.5]): 
  transform = transforms.Compose([
  transforms.Resize([int(128 * input_size[0] / 112), int(128 * input_size[0] / 112)]), # smaller side resized
  transforms.CenterCrop([input_size[0], input_size[1]]),
  transforms.ToTensor(),
  transforms.Normalize(mean = rgb_mean, std = rgb_std)])
  return transform

def extract_feature(images, backbone, embedding_size = 512, 
                    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")):
  batch_size = len(images)
  features = None
  print("backbone:", backbone)
  with torch.no_grad():
    features = l2_norm(backbone(images.to(device)).cpu())
  return features

if __name__ == "__main__":
  cap = cv2.VideoCapture(0)
  backbone = load_backbone(IR_50(input_size = [112, 112]), './backbone_ir50_ms1m_epoch120.pth')
  transform = get_transform(input_size = [112, 112])
  while(True):
    ret, frame = cap.read()
    #cv_image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) #cv2.COLOR_BGR2GRAY)
    pil_image = Image.fromarray(frame)
    bounding_boxes, landmarks = detect_faces(pil_image)
    faces = []
    for box in bounding_boxes:
      face = pil_image.crop( ( box[0], box[1], box[2] , box[3]))
      face = transform(face)
      faces.append(face.float())
    if len(faces) > 0:
      I_ = torch.cat(faces, 0)
      I_ = Variable(I_, requires_grad=False)
      features = extract_feature(I_, backbone)
      print("features:", features)

I am beginner with pytorch.

Thanks.

Asking for meta and sizes

i am trying to train the network on lfw,

my folder structure looks like

$RootFolder/data/train, $RootFolder/data/val, $RootFolder/data/test

but when i run train.py , i get

FileNotFoundError: [Errno 2] No such file or directory: '/media/ryan/shakira/face.evoLVe.PyTorch/data/lfw/meta/sizes'

Am i missing something?

Thanks in advance.

question about retrain the model

Thank you very much for your high performance repo!
I want to improve the recognition rate on my dataset, so I use the “backbone_ir50_asia.pth” you have released as the pretrain backbone model to retrain it, I update all the parameter of the backbone, but the val acc of calfw is only 56% even after 50 epochs. So I decide to only retrain the parameter of fc1 and fc2 of the backboned, do you think it will be used?
Can you give me some precious advice about how to improve the model's recognition rate on real scene and how should i do to finetune the pretrian backbone model?
thanks, good wishes for you!

Cannot unzip ms1m_align_112.zip

Hello!

I downloaded the file ms1m_align_112.zip from this Google Drive link. However, I am getting a weird following exception while extracting the data.

   creating: imgs/67619/
  inflating: imgs/67619/4636064.jpg  
  inflating: imgs/67619/4636050.jpg  
  inflating: imgs/67619/4636028.jpg  
  inflating: imgs/67619/4636004.jpg  
  inflating: imgs/67619/4635977.jpg  
  inflating: imgs/67619/4635994.jpg  
  inflating: imgs/67619/4636076.jpg  
  inflating: imgs/67619/4635998.jpg  
  inflating: imgs/67619/4635981.jpg  
  inflating: imgs/67619/4636018.jpg  
  inflating: imgs/67619/4636027.jpg  
  inflating: imgs/67619/4636043.jpg  
imgs:  mismatching "local" filename (imgs/67619/4636066.jpg),
         continuing with "central" filename version
replace imgs? [y]es, [n]o, [A]ll, [N]one, [r]ename:

Can you kindly look into it?

Regards!

General Question

I wanted to perform face verification, correct me if i'm wrong, but most of the recognition models are trained on a million images or so,

I was thinking if i could combine most of the open-source datasets like 1millionceleb, facesemore, lfw etc, and train the network, would that give me a better face embedding?

What do you think?

Thanks in advance.

REFERENCE_FACIAL_POINTS

I notice that you defined REFERENCE_FACIAL_POINTS in the file align_trans.py. Could do you please tell me how do you calculate its value? If I want to use more landmarks (detected by other models) to do face alignment, how can I calculate those reference points myself.

there are lot of errors in this git lib for example utlits.py there are extra parenthesis.after solved nearly 50 problems i got stuck here

 python train.py
============================================================
Overall Configurations:
{'SEED': 1337, 'DATA_ROOT': '/media/mustafa/ubuntu_backup/face/face.evoLVe.PyTorch/alogn_faces', 'MODEL_ROOT': '/media/mustafa/ubuntu_backup/face/face.evoLVe.PyTorch/model', 'LOG_ROOT': '/media/mustafa/ubuntu_backup/face/face.evoLVe.PyTorch/log', 'BACKBONE_RESUME_ROOT': './', 'HEAD_RESUME_ROOT': './', 'BACKBONE_NAME': 'IR_SE_50', 'HEAD_NAME': 'ArcFace', 'LOSS_NAME': 'Focal', 'INPUT_SIZE': [112, 112], 'RGB_MEAN': [0.5, 0.5, 0.5], 'RGB_STD': [0.5, 0.5, 0.5], 'EMBEDDING_SIZE': 512, 'BATCH_SIZE': 512, 'DROP_LAST': True, 'LR': 0.1, 'NUM_EPOCH': 125, 'WEIGHT_DECAY': 0.0005, 'MOMENTUM': 0.9, 'STAGES': [35, 65, 95], 'DEVICE': device(type='cuda', index=0), 'MULTI_GPU': True, 'GPU_ID': [0, 1, 2, 3], 'PIN_MEMORY': True, 'NUM_WORKERS': 0}
============================================================
Traceback (most recent call last):
  File "train.py", line 73, in <module>
    weights = make_weights_for_balanced_classes(dataset_train.imgs, len(dataset_train.classes))
  File "/media/mustafa/ubuntu_backup/face/face.evoLVe.PyTorch/util/utils.py", line 47, in make_weights_for_balanced_classes
    weight_per_class[i] = N / float(count[i])
ZeroDivisionError: float division by zero

Failed building wheel for bcolz

build fail for
'''
gcc -pthread -B /data/software/anaconda3/envs/evoLVe/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -DHAVE_ZSTD=1 -Ibcolz -Ic-blosc/blosc -Ic-blosc/internal-complibs/zstd-1.3.4 -Ic-blosc/internal-complibs/lz4-1.8.1.2 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -Ic-blosc/internal-complibs/zstd-1.3.4/compress -Ic-blosc/internal-complibs/zstd-1.3.4/dictBuilder -Ic-blosc/internal-complibs/zstd-1.3.4/decompress -Ic-blosc/internal-complibs/zstd-1.3.4/legacy -Ic-blosc/internal-complibs/zstd-1.3.4/common -Ic-blosc/internal-complibs/zstd-1.3.4/dll -Ic-blosc/internal-complibs/zstd-1.3.4/deprecated -I/data/software/anaconda3/envs/evoLVe/lib/python3.7/site-packages/numpy/core/include -I/data/software/anaconda3/envs/evoLVe/include/python3.7m -c c-blosc/internal-complibs/snappy-1.1.1/snappy-stubs-internal.cc -o build/temp.linux-x86_64-3.7/c-blosc/internal-complibs/snappy-1.1.1/snappy-stubs-internal.o -DSHUFFLE_SSE2_ENABLED -msse2 -DSHUFFLE_AVX2_ENABLED -mavx2
cc1plus: 警告:command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
c-blosc/internal-complibs/snappy-1.1.1/snappy-stubs-internal.cc:29:21: 致命错误:algorithm:没有那个文件或目录
编译中断。
error: command 'gcc' failed with exit status 1


Failed building wheel for bcolz
Running setup.py clean for bcolz
'''

'''
gcc -pthread -B /data/software/anaconda3/envs/evoLVe/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -DHAVE_ZSTD=1 -Ibcolz -Ic-blosc/blosc -Ic-blosc/internal-complibs/zstd-1.3.4 -Ic-blosc/internal-complibs/lz4-1.8.1.2 -Ic-blosc/internal-complibs/snappy-1.1.1 -Ic-blosc/internal-complibs/zlib-1.2.8 -Ic-blosc/internal-complibs/zstd-1.3.4/compress -Ic-blosc/internal-complibs/zstd-1.3.4/dictBuilder -Ic-blosc/internal-complibs/zstd-1.3.4/decompress -Ic-blosc/internal-complibs/zstd-1.3.4/legacy -Ic-blosc/internal-complibs/zstd-1.3.4/common -Ic-blosc/internal-complibs/zstd-1.3.4/dll -Ic-blosc/internal-complibs/zstd-1.3.4/deprecated -I/data/software/anaconda3/envs/evoLVe/lib/python3.7/site-packages/numpy/core/include -I/data/software/anaconda3/envs/evoLVe/include/python3.7m -c c-blosc/internal-complibs/snappy-1.1.1/snappy-stubs-internal.cc -o build/temp.linux-x86_64-3.7/c-blosc/internal-complibs/snappy-1.1.1/snappy-stubs-internal.o -DSHUFFLE_SSE2_ENABLED -msse2 -DSHUFFLE_AVX2_ENABLED -mavx2
cc1plus: 警告:command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
c-blosc/internal-complibs/snappy-1.1.1/snappy-stubs-internal.cc:29:21: 致命错误:algorithm:没有那个文件或目录
编译中断。
error: command 'gcc' failed with exit status 1
'''
How I fix it?

The network detects 18 different (farther than 0.99) me when moving my head.

Hi.

I have developed a little app that when detecting a face tries to find it in the database and if it's not there, it created a new record.
As I move my head the app thinks it's a new user (distance >= 0.99) and creates a new entry in the database.
Up to 18 records have been created.

I was thinking of developing a professional app for controlling access to places, etc.
How can I do to filter (or any other way) these extra records created with different head poses?

Thanks.

Where the MTCNN parameters are from?

Thanks for your job.
Did you train MTCNN parameters p/o/rnet.npy by yourself or use the parameters from other places? I'd like to know how it was trained.

Face alignment speed up and GPU usage

To whom it may concern,

This repo provided really amazing tools. Thanks for the great work.
I tried face alignment, extract features by using this lib. I found the face alignment may cost 1.3s to process an image. After reading the code, I realized the mtcnn is not running on GPU. A a little bit changes were made, e.g., torch.FloatTensor => torch.cuda.FloatTensor, Pnet() =>Pnet().cuda(), etc.

This increased the face alignment speed per image from 1.3 to 0.8s. It works, however, the result does not make me satisfied. Is there a way to make the face detection/alignment run faster?

There is another thing make me confused. The GPU usage is very low, 1%~2%. Please see the attachments.

screen shot 2019-02-26 at 12 22 32

I'm not sure if this is due to I didn't configured the GPU properly or it is just one of the advantages of this library.
The installed CUDA version is 9.2, Cudnn version is 7.4. Graphic card is RTX 2070. It reports an error after I run the python code. Can anyone tell me how to fix it?
screen shot 2019-02-26 at 12 23 37

Again, many thanks for the great work!

How to disable this user warning?

When run detector to show result on detect result, there is an warning:

/align/get_nets.py:70: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  a = F.softmax(a)
detector.py:82: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  img_boxes = Variable(torch.FloatTensor(img_boxes), volatile = True)

If I use a pretrained model do I need to train to recognize people?

Hi.

What I have understood is that a trained model will assign a vector to each face. With each vector I find in the database if that person is already registered (distance between vector <= 0.99).
Is this wrong? Do I have to train the network with the faces of people I want it to recognize?

I have read several times the documentation but I don't know how to assign a vector to a face. Where ca nI find that code?

Thanks.

About model parallel for weight matrix in the head

Thank you very much for your high performance repo!

By splitting large matrix(refer to here),
the memory consumption is more balanced, however the training process seems not speed up. I guess the bottleneck lies in the communication of transferring head matrix from device to device.

May I ask how to implement parallel_module_local_v1.py in insightface efficiently?

Looking forward for your suggestion! Thank you!

About some problems in evaluation

Thank you for your great repo! May ask some problems about evaluation?

  • The model is trained by RGB image, but evaluated with BGR image, which cause slight performance degradation, to be specific, acc on cfp_fp can reach 98% if fix this problem.
  • May I ask why you use ccrop, first resize 112x112 to 128x128 and then crop to 112x112?
  • What is the difference with the validation dataset released by InsightFace_Pytorch? I notice that using your released cfp_fp can yield a better accuracy.

I am trying to extract faces and train

Hello,
I have live feeds streaming of stores I want to extract each face of the person automatic and save it.
Currently, I am doing that manually cropping face and saving it in a folder with a unique id
even I tried opencv's modle but it was not accurate than i used ddn I got decent accuracy but some time i am getting blur faces may be because of movment of people fast or camera issue. So any knows how can i save detected face with quniue id inside a unique folder. The script is given blow I have searched a lot bet did not find any thing like auto face extracting from video.

I am using this git project uses knn but its not scalable
What i am trying to do is this extract face and keep the faces with unique id folders each face should have unique id.

My Questions Are

  1. Is there any way to solve above problem
  2. How many images do i need to give of a person face to train the modle using your git lib
  3. Do i need to lable my data like bonding box or keeping the photos in a different folder. If my first problem is solved
    DDN Code
# USAGE
# python object_tracker.py --prototxt deploy.prototxt --model res10_300x300_ssd_iter_140000.caffemodel

# import the necessary packages
from data.centroidtracker import CentroidTracker
from imutils.video import VideoStream
import numpy as np
import argparse
import imutils
import time
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

# initialize our centroid tracker and frame dimensions
ct = CentroidTracker()
(H, W) = (None, None)

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

# initialize the video stream and allow the camera sensor to warmup
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(2.0)

# loop over the frames from the video stream
while True:
	# read the next frame from the video stream and resize it
	frame = vs.read()
	frame = imutils.resize(frame, width=400)

	# if the frame dimensions are None, grab them
	if W is None or H is None:
		(H, W) = frame.shape[:2]

	# construct a blob from the frame, pass it through the network,
	# obtain our output predictions, and initialize the list of
	# bounding box rectangles
	blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H),
		(104.0, 177.0, 123.0))
	net.setInput(blob)
	detections = net.forward()
	rects = []

	# loop over the detections
	for i in range(0, detections.shape[2]):
		# filter out weak detections by ensuring the predicted
		# probability is greater than a minimum threshold
		if detections[0, 0, i, 2] > args["confidence"]:
			# compute the (x, y)-coordinates of the bounding box for
			# the object, then update the bounding box rectangles list
			box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
			rects.append(box.astype("int"))

			# draw a bounding box surrounding the object so we can
			# visualize it
			(startX, startY, endX, endY) = box.astype("int")
			cv2.rectangle(frame, (startX, startY), (endX, endY),
				(0, 255, 0), 2)

	# update our centroid tracker using the computed set of bounding
	# box rectangles
	objects = ct.update(rects)

	# loop over the tracked objects
	for (objectID, centroid) in objects.items():
		# draw both the ID of the object and the centroid of the
		# object on the output frame
		text = "ID {}".format(objectID)
		cv2.putText(frame, text, (centroid[0] - 10, centroid[1] - 10),
			cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
		cv2.circle(frame, (centroid[0], centroid[1]), 4, (0, 255, 0), -1)

	# show the output frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

Why not MxNet

@ZhaoJ9014
First, thank you for NUS-Panasonic's great job!
As to dynamic computation graphical approach of Tensorflow, just try Eager Execution and Tensorflow Fold :)
Another similar question, #14 :
Could you explain in detail why not use MxNet framework(ArcFace source code)?
Thanks.

Question about the required training epochs

Hi~

From the original paper, the required training iterations is 180K for MS1MV2.
By converting this number into training epochs, it would be 512*180K/5.8M = 16 epochs.
(if there are any error in this estimation, please correct me.)

From the provided pre-trained model (IR-50), the training epoch is up to 120.
We would like to know how many training epoch is sufficient for training MS1MV2.

On the other hand, we validate the pre-trained model on megaface (cleaned by deepinsight).
The accuracy is less than 95%.
Is this result reasonable?

Thank you.

Cannot unzip ms1m_align_112.zip

Hi~

We downloaded the dataset from Google Drive.
But we cannot unzip the file ms1m_align_112.zip (>25GB).

Could you check this issue?
If possible, would you provide the md5 checksum of the dataset (or all the dataset)?
Thank you.

training loss nan

Thanks for sharing this fantastic repository!
But, when I train my own dataset, I have found training loss has become nan after few epoches.
Could you tell me how to solve this problem?

Inference code

Hi

Thanks for sharing your work,

I don't have resources to train any models, but is there only inference code available to test on my images?

Thanks

Performance Issue

https://github.com/ZhaoJ9014/face.evoLVe.PyTorch/blob/master/align/detector.py#L24

# LOAD MODELS
pnet = PNet()
rnet = RNet()
onet = ONet()
onet.eval()

Model are loaded on detect_faces.

So in face_align.py
for subfolder in tqdm(os.listdir(source_root)):
if not os.path.isdir(os.path.join(dest_root, subfolder)):
os.mkdir(os.path.join(dest_root, subfolder))
for image_name in os.listdir(os.path.join(source_root, subfolder)):
print("Processing\t{}".format(os.path.join(source_root, subfolder, image_name)))
img = Image.open(os.path.join(source_root, subfolder, image_name))
try: # Handle exception
_, landmarks = detect_faces(img)

Models are going to be loaded every time.
They should be put outside of the function

extract_feature_v1.py. I think there is a missing l2_norm in line 75.

Hi.

while idx + batch_size <= len(loader.dataset):
            batch, _ = iter(loader).next()
            if tta:
                fliped = hflip_batch(batch)
                emb_batch = backbone(batch.to(device)).cpu() + backbone(fliped.to(device)).cpu()
                features[idx:idx + batch_size] = l2_norm(emb_batch)
            else:
      75:          features[idx:idx + batch_size] = backbone(batch.to(device)).cpu()
            idx += batch_size

        if idx < len(loader.dataset):
            batch, _ = iter(loader).next()
            if tta:
                fliped = hflip_batch(batch)
                emb_batch = backbone(batch.to(device)).cpu() + backbone(fliped.to(device)).cpu()
                features[idx:] = l2_norm(emb_batch)
            else:
                features[idx:] = l2_norm(backbone(batch.to(device)).cpu())

Line 75 seems to have missing l2_norm as the other 3 cases have.

I'd thank an explanation why l2_norm is necessary. If the aim is to search in the database who that person is, normalizing doesn't seem a good idea.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.