supernotman / retinaface_pytorch Goto Github PK

View Code? Open in Web Editor NEW

306.0 306.0 67.0 87.95 MB

Reimplement RetinaFace with Pytorch

Python 100.00%

retinaface_pytorch's Introduction

RetinaFace_Pytorch

Reimplement RetinaFace with Pytorch

Installation

Clone and install requirements

$ git clone https://github.com/supernotman/RetinaFace_Pytorch.git
$ cd RetinaFace_Pytorch/
$ sudo pip install -r requirements.txt

Pytorch version 1.1.0+ and torchvision 0.3.0+ are needed.

Data

Download widerface dataset
Download annotations (face bounding boxes & five facial landmarks) from baidu cloud or dropbox
Organise the dataset directory as follows:

  widerface/
    train/
      images/
      label.txt
    val/
      images/
      label.txt
    test/
      images/
      label.txt

Train

$ train.py [-h] [data_path DATA_PATH] [--batch BATCH]
                [--epochs EPOCHS]
                [--shuffle SHUFFLE] [img_size IMG_SIZE]
                [--verbose VERBOSE] [--save_step SAVE_STEP]
                [--eval_step EVAL_STEP]
                [--save_path SAVE_PATH]
                [--depth DEPTH]

Example

For multi-gpus training, run:

$ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python train.py --data_path /widerface --batch 32 --save_path ./out

Training log

---- [Epoch 39/200, Batch 400/403] ----
+----------------+-----------------------+
| loss name      | value                 |
+----------------+-----------------------+
| total_loss     | 0.09969855844974518   |
| classification | 0.09288528561592102   |
| bbox           | 0.0034053439740091562 |
| landmarks      | 0.003407923271879554  |
+----------------+-----------------------+
-------- RetinaFace Pytorch --------
Evaluating epoch 39
Recall: 0.7432201780921814
Precision: 0.906913273261629

Pretrained model

You can download the model from baidu cloud or dropbox

Detect

Image

$ python detect.py --model_path model.pt --image_path 4.jpg

Video

$ python video_detect.py --model_path model.pt

Pose

Hey, I find something interesting and add it in the code. Pose detection Hopenet: https://github.com/natanielruiz/deep-head-pose Now you can estimate pose with RetinaFace and Hopenet. Download pose model

$ python pose_detect.py --f_model model.pt --p_model hopenet.pkl --image_path test.jpg

also you can detect in video

$ python pose_detect.py --f_model model.pt --p_model hopenet.pkl --type video --video_path test.avi

Todo:

retinaface_pytorch's People

Contributors

Stargazers

Watchers

retinaface_pytorch's Issues

How to calculate the scales or used the default(1.0)?

when used a same picture with different scales will get different result. It can't adapt to different size of pictures. It can't detect small face and large face. Is there idea to deal with this?

focal_loss = False

focal_loss = False
# focal loss
if focal_loss:
alpha = 0.25
gamma = 2.0
alpha_factor = torch.ones(targets.shape).cuda() * alpha

            alpha_factor = torch.where(torch.eq(targets, 1.), alpha_factor, 1. - alpha_factor)
            focal_weight = torch.where(torch.eq(targets, 1.), 1. - classification, classification)
            focal_weight = alpha_factor * torch.pow(focal_weight, gamma)

            bce = -(targets * torch.log(classification) + (1.0 - targets) * torch.log(1.0 - classification))

            cls_loss = focal_weight * bce

            cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, torch.zeros(cls_loss.shape).cuda())

            classification_losses.append(cls_loss.sum()/torch.clamp(num_positive_anchors.float(), min=1.0))
        else:
            if positive_indices.sum() > 0:
                classification_losses.append(positive_losses.mean() + sorted_losses.mean())
            else:
                classification_losses.append(torch.tensor(0).float().cuda())

never use focalloss???

when do inference, load model is wrong?

RuntimeError: Error(s) in loading state_dict for RetinaFace
Missing key(s) in state_dict: "body.conv1.weight", "body.bn1.weight", "body.bn1.bias",....
Unexpected key(s) in state_dict: "module.body.conv1.weight", "module.body.bn1.weight",...

The pre-train model

Hi, I can't reproduct the real precision, Can you give me the model_epoch_200.pt, Thanx

Unable to detect rotated faces / landmarks inaccurate

These two test images look better in other implementations of RetinaFace for PyTorch, for eg.
https://github.com/bogireddytejareddy/retinaface-pytorch/blob/master/test_results/t1.jpg
https://github.com/bogireddytejareddy/retinaface-pytorch/blob/master/test_results/t4.jpg

About context module

I think there maybe some mistakes of channels in context module

x1 = self.det_conv1(x) # 256 channels
x_ = self.det_context_conv1(x) # 128 channels
x2 = self.det_context_conv2(x_) # 128 channels
x3_ = self.det_context_conv3_1(x_) # 128 channels
x3 = self.det_context_conv3_2(x3_) # 128 channels

and after concat x1,x2,x3 I got 512 channels. This is inconsistent with the paper.(256 channels)
Is there anything wrong with me?

Landmark won't converge

我现在自己尝试用 Caffe 在训练，但是关键点回归得很差。请问有什么经验心得分享吗？🙏

FPS of the model and small face

Thanks for your work!
Is there any speed test(FPS) of your model? thx!

where is the Dense Regression Loss? I can not find it.

where is the landmarks labels?

hi, I was not found the landmarks in your annotations data. I'm trainning a model with resnet18, the landmarks' loss does not decline.Do landmarks and bbox separate to train?

element 0 of tensors does not require grad and does not have a grad_fn

Thank you for your open source, but I encountered the following problem when 104 epoch in training.can you help me? thanks

Traceback (most recent call last):
File "train.py", line 156, in
main()
File "train.py", line 111, in main
loss.backward()
File "/home/boyun/.conda/envs/retinaface/lib/python3.6/site-packages/torch/tensor.py", line 107, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/boyun/.conda/envs/retinaface/lib/python3.6/site-packages/torch/autograd/init.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

输入的图片对于任意的大小是否都可以呢？

在detect.py 文件中，有padded image 这一环节，你是否考虑过对于大小不是640×640的图片，在padding和resize之后输入的模型中，得到的人脸框的位置和关键点的位置与原图之间会有偏移？这个偏移是否应该在显示的时候矫正一下呢？

Have you tested on widerface val?

I have the following result of image size (1200,1200):
Easy Val AP: 0.721983363755764
Medium Val AP: 0.742308954563704
Hard Val AP: 0.6196879642610857

Is there something wrong?

why not in mmdetection framework?

it has dcnv2,and more.

Validation error

hello everyone

Please I need help I get this error when I try to compile train.py
Evaluating epoch 0
0%| | 0/3226 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 151, in
main()
File "train.py", line 136, in main
recall, precision = eval_widerface.evaluate(dataloader_val,retinaface)
File "C:\Desktop\RetinaFace_super\eval_widerface.py", line 74, in evaluate
for data in tqdm(iter(val_data)):
File "C:\AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\tqdm\std.py", line 1099, in iter
for obj in iterable:
File "C:AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\torch\utils\data\dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "C:\AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\torch\utils\data\dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
File "C:\AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\torch\utils\data_utils\worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "C:\AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\torch\utils\data_utils\worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "C:\Desktop\RetinaFace_super\dataloader.py", line 348, in getitem
annotation[0,0] = label[0] # x1
ValueError: setting an array element with a sequence.

0%| | 0/3226 [00:00<?, ?it/s]

what's the inference speed on a image average?

Does anyone tested speed on image with a decent GPU device? such as GTX1080ti etc.

ModuleNotFoundError: No module named 'torchvision.models._utils'

how can I crop an Image from video_detect.py

About data argumentation

Hello
Which data argumentation did you use in your actual trainning? Cuz I saw several methods that you had commented but not sure which ones did you actually use.

BTW, many of them are not working and have bugs.

for example,

add this to line 297 in dataloader.py

pad = torch.from_numpy(np.array(pad))
before this
padded_img = F.pad(img, pad, "constant", value=0).

Or it will show

TypeError: narrow(): argument 'start' (position 2) must be int, not numpy.int64

How to install cpools==0.0.0

Anaconda and pip can't install cpools. Can you help me?

模型文件hopenet_robust_alpha1.pkl不能下载

google总是下载失败，能不能上次至baidu

retinaface做多类别检测可行吗

你好，在使用你的代码做人脸检测。我突发奇想，想用来检测人体和人体关键点+人脸和人脸关键点，请问这个是否可行

Allow for dynamic input sizes / anchor sizes

Currently when tracing the model, the following two warnings apply:

/d/dev/RetinaFace_Pytorch/anchors.py:27: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
image_shape = np.array(image_shape)
/d/dev/RetinaFace_Pytorch/anchors.py:40: TracerWarning: torch.from_numpy results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
return torch.from_numpy(all_anchors.astype(np.float32)).cuda()

The model is then using a hardcoded 640x640 input size and anchors whereas the input size should be dynamic.

annotation links is not available , could you update the link?

no prior_box?

it seems to be no prior_box part in this code. is it unnecessary?

Question about focal loss

@supernotman Hi, thank you for this great project.
May I understand why you use cross entropy loss for classification head, other than focal loss? As Focal loss is the key feature of retinaNet.

Will you upload your pretrained model?

Great job! And could you upload your pretrained model?
Or could you send me by mail? Thank you!

About labels and training

Hello,
I am following your instructions to train the network. However, the label file, in the website, is not like how you described it in the instructions. I changed the name of the bounding box and annotations txt file name to label.txt and the dataloader.py code cannot read it. What is the solution to that problem ?
To be more clear the file in the website of the widerface is like that:

0--Parade/0_Parade_marchingband_1_849.jpg
1
449 330 122 149 0 0 0 0 0 0
0--Parade/0_Parade_Parade_0_904.jpg
1
361 98 263 339 0 0 0 0 0 0
0--Parade/0_Parade_marchingband_1_799.jpg
21
78 221 7 8 2 0 0 0 0 0
78 238 14 17 2 0 0 0 0 0
113 212 11 15 2 0 0 0 0 0
134 260 15 15 2 0 0 0 0 0
163 250 14 17 2 0 0 0 0 0
201 218 10 12 2 0 0 0 0 0
182 266 15 17 2 0 0 0 0 0

And the output of the train.py is like that:

Traceback (most recent call last):
File "train.py", line 150, in
main()
File "train.py", line 53, in main
dataset_train = TrainDataset(train_path,transform=transforms.Compose([Resizer(),PadToSquare()]))
File "/home/barkntuncer/RetinaFace_Pytorch/dataloader.py", line 45, in init
label = [float(x) for x in line]
File "/home/barkntuncer/RetinaFace_Pytorch/dataloader.py", line 45, in
label = [float(x) for x in line]
ValueError: could not convert string to float: '0--Parade/0_Parade_marchingband_1_849.jpg'

Had you ever use other backbone?

Thanks for your great job!
I'd use mobilenet V1 0.25 to replace your resnet ,however, I found it really hard to converge.
Although the loss was quite low even at the first several epochs, but it just keep that way forever.
Had you tried other light-weight backbone for your code? Could you share some details for your training?
Also, I am trying to increase # landmarks to 68 with the 300w dataset with your code, had you ever tried that?
Thanks!

Out of memory

How much memory do you estimate this project needs?
I'm using a Titan V with 12GB and this goes out of memory with a batch size of 16 (default was 32), which seems quite small for WIDER face.

I had to use a batch size of 8, which used 10GB.

Evaluation problem

Hello,
I try to execute your code but there is problem, I cant find any solution
Can you please help me.
I download the dataset wider face as you explain and I tried to run this command on windows:
set CUDA_VISIBLE_DEVICES=0 & python train.py --data_path dataset/widerface --batch 1 --save_path ./out
but I get this problem:
Namespace(batch=1, data_path='dataset/widerface', depth=50, epochs=1, eval_step=3, img_size=512, save_path='./out', save_step=10, shuffle=True, verbose=10)
Traceback (most recent call last):
File "train.py", line 151, in
main()
File "train.py", line 55, in main
dataset_val = ValDataset(val_path,transform=transforms.Compose([RandomCroper()]))
File "C:\Desktop\RetinaFace_super\dataloader.py", line 332, in init
label = [float(x) for x in line]
File "C:\Desktop\RetinaFace_super\dataloader.py", line 332, in
label = [float(x) for x in line]
ValueError: could not convert string to float: '/24--Soldier_Firing/24_Soldier_Firing_Soldier_Firing_24_329.jpg'

when I change the val images with the same as train images it start the training then I get this error :

---- [Epoch 0/1, Batch 12870/12880] ----
+----------------+---------------------+
| loss name | value |
+----------------+---------------------+
| total_loss | 2.6635076999664307 |
| classification | 1.5447975397109985 |
| bbox | 0.34370726346969604 |
| landmarks | 0.7750030159950256 |
+----------------+---------------------+
-------- RetinaFace Pytorch --------
Evaluating epoch 0
0%| | 0/12880 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 151, in
main()
File "train.py", line 136, in main
recall, precision = eval_widerface.evaluate(dataloader_val,retinaface)
File "C:\Desktop\RetinaFace_super\eval_widerface.py", line 74, in evaluate
for data in tqdm(iter(val_data)):
File "C:\AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\tqdm\std.py", line 1099, in iter
for obj in iterable:
File "C:\AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\torch\utils\data\dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "C:\AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\torch\utils\data\dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
File "C:\AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\torch\utils\data_utils\worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "C:\AppData\Local\Continuum\anaconda3\envs\supernotman\lib\site-packages\torch\utils\data_utils\worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "C:\Desktop\RetinaFace_super\dataloader.py", line 347, in getitem
annotation[0,0] = label[0] # x1
ValueError: setting an array element with a sequence.

0%|

I dont know what s go on
I really appreciate if you help me.

Fine tune pre-trained model

I was trying to fine tune pre-trained model but I think you current code did not provide this facility. I added a few lines in train.py, have a look at the following code. If you think it should be the part of it kindly add this in next commit. Thanks for your good work.


import argparse
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, models, transforms
from dataloader import TrainDataset, ValDataset, collater, RandomCroper, RandomFlip, Resizer, PadToSquare
from torch.utils.data import Dataset, DataLoader
from terminaltables import AsciiTable, DoubleTable, SingleTable
from tensorboardX import SummaryWriter
from torch.optim import lr_scheduler
import torch.distributed as dist
import eval_widerface
import torchvision
import model
import os
from torch.utils.data.distributed import DistributedSampler
import torchvision_model

def get_args():
    parser = argparse.ArgumentParser(description="Train program for retinaface.")
    parser.add_argument('--data_path', type=str, help='Path for dataset,default WIDERFACE')
    parser.add_argument('--batch', type=int, default=16, help='Batch size')
    parser.add_argument('--epochs', type=int, default=200, help='Max training epochs')
    parser.add_argument('--shuffle', type=bool, default=True, help='Shuffle dataset or not')
    parser.add_argument('--img_size', type=int, default=640, help='Input image size')
    parser.add_argument('--verbose', type=int, default=10, help='Log verbose')
    parser.add_argument('--save_step', type=int, default=10, help='Save every save_step epochs')
    parser.add_argument('--eval_step', type=int, default=3, help='Evaluate every eval_step epochs')
    parser.add_argument('--save_path', type=str, default='./out', help='Model save path')
    parser.add_argument('--depth', help='Resnet depth, must be one of 18, 34, 50, 101, 152', type=int, default=50)
    parser.add_argument('--pretrained_model_path', type=str, default='./out', help='Pre-Trained Model Path')
    args = parser.parse_args()
    print(args)
    return args


def main():
    args = get_args()
    if not os.path.exists(args.save_path):
        os.mkdir(args.save_path)
    log_path = os.path.join(args.save_path,'log')
    if not os.path.exists(log_path):
        os.mkdir(log_path)

    writer = SummaryWriter(log_dir=log_path)

    data_path = args.data_path
    train_path = os.path.join(data_path,'train/label.txt')
    val_path = os.path.join(data_path,'val/label.txt')
    # dataset_train = TrainDataset(train_path,transform=transforms.Compose([RandomCroper(),RandomFlip()]))
    dataset_train = TrainDataset(train_path,transform=transforms.Compose([Resizer(),PadToSquare()]))
    dataloader_train = DataLoader(dataset_train, num_workers=8, batch_size=args.batch, collate_fn=collater,shuffle=True)
    # dataset_val = ValDataset(val_path,transform=transforms.Compose([RandomCroper()]))
    dataset_val = ValDataset(val_path,transform=transforms.Compose([Resizer(),PadToSquare()]))
    dataloader_val = DataLoader(dataset_val, num_workers=8, batch_size=args.batch, collate_fn=collater)
    
    total_batch = len(dataloader_train)

	# Create the model
    # if args.depth == 18:
    #     retinaface = model.resnet18(num_classes=2, pretrained=True)
    # elif args.depth == 34:
    #     retinaface = model.resnet34(num_classes=2, pretrained=True)
    # elif args.depth == 50:
    #     retinaface = model.resnet50(num_classes=2, pretrained=True)
    # elif args.depth == 101:
    #     retinaface = model.resnet101(num_classes=2, pretrained=True)
    # elif args.depth == 152:
    #     retinaface = model.resnet152(num_classes=2, pretrained=True)
    # else:
    #     raise ValueError('Unsupported model depth, must be one of 18, 34, 50, 101, 152')

    # Create torchvision model
    return_layers = {'layer2':1,'layer3':2,'layer4':3}
    retinaface = torchvision_model.create_retinaface(return_layers)


    retinaface = retinaface.cuda()
    retinaface = torch.nn.DataParallel(retinaface).cuda()
    retinaface.training = True
    
    try:
        pretrained_model_path = args.pretrained_model_path
        state_dict=None
        with open( pretrained_model_path , "br" ) as f:
            stat_dict = torch.load(f)
        retinaface.load_state_dict( stat_dict )
        print( "Previuos Model is Successfully Loaded :)" )
    except:
        print( "Error while loading previous model :(" ) 

    optimizer = optim.Adam(retinaface.parameters(), lr=1e-3)
    # optimizer = optim.SGD(retinaface.parameters(), lr=1e-2, momentum=0.9, weight_decay=0.0005)
    # scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=3, verbose=True)
    # scheduler  = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
    #scheduler  = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10,30,60], gamma=0.1)

    print('Start to train.')

    epoch_loss = []
    iteration = 0

    for epoch in range(args.epochs):
        retinaface.train()

        # Training
        for iter_num,data in enumerate(dataloader_train):
            optimizer.zero_grad()
            classification_loss, bbox_regression_loss,ldm_regression_loss = retinaface([data['img'].cuda().float(), data['annot']])
            classification_loss = classification_loss.mean()
            bbox_regression_loss = bbox_regression_loss.mean()
            ldm_regression_loss = ldm_regression_loss.mean()

            # loss = classification_loss + 1.0 * bbox_regression_loss + 0.5 * ldm_regression_loss
            loss = classification_loss + bbox_regression_loss + ldm_regression_loss

            loss.backward()
            optimizer.step()
            
            if iter_num % args.verbose == 0:
                log_str = "\n---- [Epoch %d/%d, Batch %d/%d] ----\n" % (epoch, args.epochs, iter_num, total_batch)
                table_data = [
                    ['loss name','value'],
                    ['total_loss',str(loss.item())],
                    ['classification',str(classification_loss.item())],
                    ['bbox',str(bbox_regression_loss.item())],
                    ['landmarks',str(ldm_regression_loss.item())]
                    ]
                table = AsciiTable(table_data)
                log_str +=table.table
                print(log_str)
                # write the log to tensorboard
                writer.add_scalar('losses:',loss.item(),iteration*args.verbose)
                writer.add_scalar('class losses:',classification_loss.item(),iteration*args.verbose)
                writer.add_scalar('box losses:',bbox_regression_loss.item(),iteration*args.verbose)
                writer.add_scalar('landmark losses:',ldm_regression_loss.item(),iteration*args.verbose)
                iteration +=1

        # Eval
        if epoch % args.eval_step == 0:
            print('-------- RetinaFace Pytorch --------')
            print ('Evaluating epoch {}'.format(epoch))
            recall, precision = eval_widerface.evaluate(dataloader_val,retinaface)
            print('Recall:',recall)
            print('Precision:',precision)

            writer.add_scalar('Recall:', recall, epoch*args.eval_step)
            writer.add_scalar('Precision:', precision, epoch*args.eval_step)

        # Save model
        if (epoch + 1) % args.save_step == 0 or iter_num>=100:
            torch.save(retinaface.state_dict(), args.save_path + '/model_epoch_{}.pt'.format(epoch + 1))

    writer.close()


if __name__=='__main__':
    main()