mk-minchul / adaface Goto Github PK

View Code? Open in Web Editor NEW

602.0 602.0 116.0 38.92 MB

License: MIT License

Python 3.16% Shell 0.05% Jupyter Notebook 96.79%

adaface's People

Contributors

Stargazers

Watchers

Forkers

clscy felixzhang7 ilycorn keivanb chengyawlow changhy666 supersyz wolfworld6 coderwsj bill007bill peterzs bamaao liuqinglong110 jiaozhentian changgimoon yezimoshi272 aaxwaz cxz baris-unver scutzhe igiardiyanto newuserforstudy tienhoang1094 mromiario zhihong1224 mightycrane thucth-qt juruobudong skymirrora futestechcommunity tony109060581 perfyperfect ko4ro guyuex ynghnji tea2jay lem89757 parisafarmanifard mrlik rafribeiro zivzone manutdzou andreqwert thanhtung-dao kingofdogegg gk0passing seeds2002 aris992-git ryanashbaugh jh-001 aachenhang xiezixiustc jackzhousz helloqyc danial880 anilrgukt omgitsmj24 hinsjane sameetsaurav 22ema githubxuexixuexi xieshanjie 418109764 manhtr09 wangxudong-cq liruilongs wei-baldwin-zeng mitchelldehaven dao258 3dimaging shunyayamagami afm215 ayush0x00 grainw gefeishen gziren zys0119 kappabarbarosa ironicbo xuanjiawang turingvideo shell1013 mr-nobody-dey lbzero cugxchen jw-xiilab suvi-dha mertz1999 haroonhsa007 twenty4-lee ljwdust yeonju52 duclvq kungrainbow unkownworld tmq902005 sssssshf jeff-j-chen quyjleo vareto-pytorch

adaface's Issues

about the inference time

Hi,
thank you for your nice work,
I just try the inference code, but it takes too long to complete the inference. So I wonder how long the normal inference time per image is?

Is it possible to additionally train already trained model. Not to train from the start.

Is it possible to additionally train already trained model. Not to train from the start.
How do I do it?

batch is equal to 1, what to do

batch is equal to 1, what to do?????????????????????????????????????????????

Convert to Tensorrt

Hi, thank you for your excellent work!
I want to deploy this model to Tensorrt. Could you give me some guidelines

Opposite view between norm and quality CASIA-Webface

Hi, thanks for yours excellent work! It help me a lot.
I've add your AdaFace loss with PartialFC to deal with OOM problem when training (ultra) large scale dataset and get some questions:

I've analyze our Cosface model trained with PartialFC and get opposite view between norm & quality
My implementation also have a opposite with small CASIA-Webface dataset (10k ids, 0.5M images), but the accuracy for BLUR domain is still higher (I think the adaptive margin loss didn't work and the main improvement of accuracy was only given by augmentation)

(Backbone used is R50, init batch_mean = 20, batch_std = 100, t = 1.0, tested with t = 0.01 have the same result)
But for my large scale training (5M ids, 105M images), your conclusion is still working

# Our current checkpoint
Training: 2022-08-18 08:10:35,869-[lfw][210000]XNorm: 23.302152
Training: 2022-08-18 08:10:35,870-[lfw][210000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-08-18 08:10:35,870-[lfw][210000]Accuracy-Highest: 0.99800

Training: 2022-08-18 08:11:33,067-[cfp_fp][210000]XNorm: 19.032604
Training: 2022-08-18 08:11:33,067-[cfp_fp][210000]Accuracy-Flip: 0.96914+-0.00833
Training: 2022-08-18 08:11:33,067-[cfp_fp][210000]Accuracy-Highest: 0.96914

Training: 2022-08-18 08:12:21,623-[agedb_30][210000]XNorm: 21.835075
Training: 2022-08-18 08:12:21,624-[agedb_30][210000]Accuracy-Flip: 0.97567+-0.00782
Training: 2022-08-18 08:12:21,624-[agedb_30][210000]Accuracy-Highest: 0.97733
Training: 2022-08-18 08:12:21,624-[+][210000]Score / Score-Highest: 2.94281 / 2.94362

Have we need to re-thinking about the properties of training dataset ?

can it support batch input now??

When I was reading the paper, I had the following puzzles. Can you help me solve them?

Thank you very much for publishing such a good paper. I have some doubts when reading the paper. I hope you can help me answer it.

What problems do boundary forms and gradients solve?

Why use boundary forms and gradients?

How should I understand this formula?

Is it possible to replace EMA of feature norm calculation with a torch.nn.BatchNorm layer?

Hi,

I'm trying to implement AdaFace using MXNet and wondering if it is reasonable to replace EMA of feature norm calculation with a BatchNorm layer (without learnable parameters).

This is to say, can I replace this part

AdaFace/head.py

Lines 75 to 81 in 79af07a

 with torch.no_grad(): 

 mean = safe_norms.mean().detach() 

 std = safe_norms.std().detach() 

 self.batch_mean = mean * self.t_alpha + (1 - self.t_alpha) * self.batch_mean 

 self.batch_std = std * self.t_alpha + (1 - self.t_alpha) * self.batch_std 

 margin_scaler = (safe_norms - self.batch_mean) / (self.batch_std+self.eps) # 66% between -1, 1

with this in constructor

self.norm_layer = torch.nn.BatchNorm1d(1, eps=self.eps, momentum=self.t_alpha, affine=False)

and this in forward()?

margin_scaler = self.norm_layer(safe_norms)

I'm not sure if with torch.no_grad() should be kept or there is something I haven't noticed.
Thank you.

how to handle 'register_buffer' with multiple gpu cards

    self.register_buffer('batch_mean', torch.ones(1)*(20))
    self.register_buffer('batch_std', torch.ones(1)*100)

Adaface was not good at masked face recogniton

I was appreciated your good job, it was a pity that Adaface loss was not fit to masked face recogniton.

About the usage of the loss function

Hi, authors. Thanks for open sourcing such great work. Here I have a question about the usage of the loss function. In the readme https://github.com/mk-minchul/AdaFace#usage, you take the embedding before normalization as input for the adaface loss. But in your implementation, https://github.com/mk-minchul/AdaFace/blob/c4052220c51167a18c35ce15a450044180cbb281/train_val.py#L54 you take the embedding after normalization as input for the adaface loss. I am confused about that. In addition, since you use the norm of the embedding feature as a metric of the image quality, do you think the normalization operation before generating the embedding feature will damage the efficiency of adaface loss. For example, CNN backbone->(1)L2 normalization->pooling->FC->embedding->(2)L2 normalization, will the first L2 normalization operation downgrade the efficiency of adaface loss?

i have an opposite conclusion about norm and quality.

hi, I'm trying to use your adaface_loss in my code, but I find a problem about that the feature with low norm has high quality, and I mean i have an opposite conclusion about norm and quality.

where can I request for IJB-S

Hello,

I wonder where can I request for IJB-S dataset. It can not be found on https://nigos.nist.gov/datasets/ (only IJB-A, IJB-B and IJB-C are available)

Hello, you can provide IR_ 50 CKPT pre training model?

Hello, you can provide IR_ 50 CKPT pre training model?I want to use IR_ 50 model to train, can you provide a pre training model?thanks

adaface_models = {
'ir_50':"pretrained/adaface_ir50_ms1mv2.ckpt",
}

Training time

Would you be so kind as to include the time taken for training with the same procedure with the paper?

RuntimeError: CUDA error: device-side assert triggered

be used to distributed training:
theta_m = torch.clip(theta + m_arc, min=self.eps, max=math.pi-self.eps)
return forward_call(*input, **kwargs)
RuntimeError: CUDA error: device-side assert triggered

adaface loss was used to distributed training

your adaface loss is effective, but it was used to insightface, it occur error.
could you modified adaface loss which input was not embeddings, but logits, as like Arcface in losses.py, linked:
https://github.com/deepinsight/insightface/blob/master/recognition/arcface_torch/losses.py
Thank you first.

What is the best similarity metric for AdaFace?

L2 or inner product/?

Large-scale datasets

Hello!
Thank you for your incredible work! Analysis of margin losses through gradient scaling term is quite eye-opening.

My questions are:

Why didn't you provide any experiments' results for some really large-scale datasets, like WebFace42M?
Any observations or thoughts about stability of AdaFace under noisy labels, compared to default losses like CosFace/ArcFace ?

About the license for this model

Thank you for sharing your great code. 😺

What is the license for this model? I'd like to cite it to the repository I'm working on if possible, but I want to post the license correctly.

https://github.com/PINTO0309/PINTO_model_zoo

Evaluation Pipeline of TinyFace

Hello,

I wonder whether you have finetuned the model on the training set of TinyFace before testing as in https://arxiv.org/pdf/1811.08965.pdf, or directly applied it on the testing set without finetuning.

Can you provide the code to transfer to onnx?

Can you provide the code to onnx?

Issue with Face Detection, MTCNN

Greetings,

I'm trying to run the 'try_mtcnn_steb_by_step' notebook, and after running it i've gotten a vastly different result than the one you have (only four faces surrounded with boxes, with the final result being only 3 out of many faces in the image).

Is there any clue on what causes these differences?

Thanks

Using it on video

First of all, great work and great paper!

Second of all, i've used the inference example and it worked as expected, however is there a way to input a video and have it recognize the faces and compare them to a dataset of images?

I think it might be possible to do so after training the model but im not certain how to do that as well.
I could use any insight you have.

Thank you so much!

Could you please provide the contact information of IJB-S Dataset?

I can't find the email address of the IJB-S author from the origin IJB-S paper and the internet, could you please leave more concrete contact information?
Thanks a lot.

About hyper-parameters of reproduction.

Hi~ It's a wonderful and excellent work and I'm interested in it.
But I got some problems when reproducing it, I cannot achieve high-enough accuracy on this task: AdaFace / ResNet100 / MS1MV2. Could you please provide its bash file, just like run.sh or hparams.yaml.
Looking forward to your reply! Thanks a lot!

有转onnx的代码吗？

想问一下有转onnx的代码吗？
感谢~

When reproducting the paper, result on IJB-B, IJB-C,Tinyface are lower than reported

I trained the model on MS1MV2, on high-quality datasets, and I got a similar result. Average 97.18(reported 97.19)
But when testing on IJB-B, IJB-C, and Tinyface, the results are lower.

Dataset	Avg on high quality	IJB-B	IJB-C	Tinyface-Rank1	Tinyface-Rank5
Reported in the paper	97.19	95.67	96.89	68.21	71.54
Reproducted	97.18	95.37	96.62	67.03	70.52

I followed the code provided and trained on 8-GPUs. And I wonder if there are some special tricks when evaluating on mixed and low-quality datasets?

And I found in the paper, Table3, (b), AdaFace trained on MS1MV2 has a better result on TinyFace than the model trained on MS1MV3. This is quite strange, MS1MV3 dataset is larger than MS1MV2, and the result should be better, but it shows a reverse result on TinyFace. Table 3 shows the trend, the model trained on MS1MV3 has better performance than the model trained on MS1MV2, but only in TinyFace, it's different.

I wonder if it's a small mistake that the result on TinyFace is reversed on MS1MV3 and MS1MV2 ?

Hope to get your reply.

about option "train_data_subset"

Hi author，
Based on the code, the class num is set to 70722 in case of train_data_subset being set to True. Just wondering how "70722" comes. In addition, what's the target of using subset of emore faces? Just to save the training time? Appreicate your suggestions. Thanks.

Table3: our evaluation of the released model

Hello,
Thank you for your great work!
In Table3, "our evaluation of the released model" refers to an Arcface Resnet100 model trained with Webface4M.
However, I cannot find the corresponding checkpoint in insightface anyhow.
Could you please share where you found the checkpoint?

If the checkpoint is actually trained by yourself, does the "Arcface Resnet100 model trained with Webface4M" checkpoint also follow exactly the same data augmentation (crop, resizing, colorjitter) as in Adaface?

Many thanks.

Question about Figure 3

Hi Minchul Kim，
Thank you for your great work!
I am impressed with Figure 3 in your paper, which is really an excellent illustration.
I wonder how to draw such figures, and would you please release the tool or code to draw them.

not the faces_emore data train

Hi,
I want to ask you a question
because faces_emore data is .rec format , your train need img by convert.py , so I fixed code to use the other img data to train(300w img) . now it's always “creating train dataset” ,not train ?

How to solve？can't find this file（mem_file.dat.conf）

I tried to train after processing the data according to the convert.py script you provided, but I couldn't find the MEM of the validation set_ File.dat.conf file

initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1

distributed_backend=nccl
All DDP processes registered. Starting ddp with 1 processes

creating train dataset
creating val dataset
laoding validation data memfile
Traceback (most recent call last):
File "main.py", line 83, in
main(args)
File "main.py", line 55, in main
trainer.fit(trainer_mod, data_mod)
File "/my_app/anaconda3/envs/06.adaface-pytorch1.8-python3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 553, in fit
self._run(model)
File "/my_app/anaconda3/envs/06.adaface-pytorch1.8-python3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 865, in _run
self._call_setup_hook(model) # allow user to setup lightning_module in accelerator environment
File "/my_app/anaconda3/envs/06.adaface-pytorch1.8-python3.7/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1169, in _call_setup_hook
self.datamodule.setup(stage=fn)
File "/my_app/anaconda3/envs/06.adaface-pytorch1.8-python3.7/lib/python3.7/site-packages/pytorch_lightning/core/datamodule.py", line 428, in wrapped_fn
fn(*args, **kwargs)
File "/work/1.model_train/6.face/AdaFace-master/data.py", line 99, in setup
self.val_dataset = val_dataset(self.data_root, self.val_data_path, self.concat_mem_file_name)
File "/work/1.model_train/6.face/AdaFace-master/data.py", line 136, in val_dataset
val_data = evaluate_utils.get_val_data(os.path.join(data_root, val_data_path))
File "/work/1.model_train/6.face/AdaFace-master/evaluate_utils.py", line 12, in get_val_data
agedb_30, agedb_30_issame = get_val_pair(data_path, 'agedb_30')
File "/work/1.model_train/6.face/AdaFace-master/evaluate_utils.py", line 25, in get_val_pair
np_array = read_memmap(mem_file_name)
File "/work/1.model_train/6.face/AdaFace-master/evaluate_utils.py", line 53, in read_memmap
with open(mem_file_name+'.conf', 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: 'faces_emore/agedb_30/memfile/mem_file.dat.conf'

Unexpected result when trained resnet50 on MS1MV3 dataset

Hi，
I trained resnet50 on MS1MV3 dataset and used the same params as given, but got 0.8882 test_acc on test dataset.

Detailed result:
{'agedb_30_num_test_samples': 12000.0,
'agedb_30_test_acc': 0.8686666488647461,
'agedb_30_test_best_threshold': 1.7070000171661377,
'cfp_fp_num_test_samples': 14000.0,
'cfp_fp_test_acc': 0.8317142724990845,
'cfp_fp_test_best_threshold': 1.7899999618530273,
'lfw_num_test_samples': 12000.0,
'lfw_test_acc': 0.9645000100135803,
'lfw_test_best_threshold': 1.5600000619888306,
'test_acc': 0.8882936239242554}

params:
--arch ir_50
--use_16bit
--batch_size 256
--num_workers 8
--epochs 50
--lr_milestones 12,20,24
--lr 0.1
--head adaface
--m 0.4
--h 0.333
--low_res_augmentation_prob 0.2
--crop_augmentation_prob 0.2
--photometric_augmentation_prob 0.2

has two values when inference?Can you explain what the second value is

Hello, I found that the output is two values when inference. Can you explain what the second value is?

About the processed emore faces

Hi author，
A naive question for you. The extracted images using:

python convert.py --rec_path <DATASET_ROOT>/faces_emore

look somehow weird. It seems that RGB channel is wrongly ordered. Any comment? Thanks.

MODEL PERFORMANCE

I reviewed your work. The results look impressive. However, in my tests, I noticed that the performance of arcface is better. Yes, adaface is ahead in benchmarks. But archface provides better recognition in large datasets. Could it be something I did wrong? Or a recommendation?

How many GPUs you use for training on Webface4M

Hi experts,
I would like to know what device for training as your code used when class number over 20k.
class_num

Train the model using my own dataset

Hi, thank you for your excellent work!
Can you give us a Readme about how to train an adaface model using our own dataset? The training part in this Readme is way too simple, it would be very nice if you can show more examples and data preparations on training :)

The performance of model in dataset glint360k

Hi ,Thank you for your work sharing. I added Adaface to project insightFace and verified the effect of Adaface in dataset MV1Mv3, but in Glint360K, the effect of Adaface is worse than that of Cosface. Have you verified it on dataset Glint360K?

dataset | method | backnone | LFW | CFPFP | AGEDB | IJBC(1E-4)
MS1MV3 | Adaface | r100 | 99.867 | 99.014 | 98.3 | 97.17
MS1MV3 | Arcface | r100 | 99.85 | 98.9 | 98.55 | 96.85
Glint360k | Adaface | r100 | 99.83 | 99.15 | 98.45 | 97.38
Glint360k | Cosface | r100 | 99.817 | 99.2 | 98.65 | 97.55

A question about the usage.

Hello,

Thank you for presenting the solid and wonderful work. I have a small question about the usage.

cosine_with_margin = adaface(embbedings, norms, labels)
loss = torch.nn.CrossEntropyLoss()(cosine_with_margin, labels)

After I read the code in head.py. I thought the return value of cosine_with_margin is the variant of the logits. In other words, it is the output of the adaptive margin function in the paper. Has it been passed into the softmax function? Why we can input it into the CrossEntropyLoss and compare it with ground truth directly?

I'm looking forward to your reply. Thank you very much.

can't run success on inference.py, The error:IndexError: too many indices for array: array is 0-dimensional, but 3 were indexed

i try run the demo of inference.py when i get code
but i get this error info:

Traceback (most recent call last):
  File "inference.py", line 41, in <module>
    input = to_input(aligned_rgb_img)
  File "inference.py", line 24, in to_input
    brg_img = ((np_img[:,:,::-1] / 255.) - 0.5) / 0.5
IndexError: too many indices for array: array is 0-dimensional, but 3 were indexed

i find get error because the face is none on the face detection and don't known why i can't get face, i only run the code in new env and don't have any change on code

i think maybe the env have different, this is my env:

Package                 Version
----------------------- -----------
absl-py                 1.1.0
aiohttp                 3.8.1
aiosignal               1.2.0
async-timeout           4.0.2
asynctest               0.13.0
attrs                   21.4.0
bcolz                   1.2.1
cachetools              5.2.0
certifi                 2022.6.15
charset-normalizer      2.0.12
cycler                  0.11.0
fonttools               4.33.3
frozenlist              1.3.0
fsspec                  2022.5.0
future                  0.18.2
google-auth             2.8.0
google-auth-oauthlib    0.4.6
graphviz                0.8.4
grpcio                  1.47.0
idna                    3.3
imageio                 2.19.3
importlib-metadata      4.12.0
joblib                  1.1.0
kiwisolver              1.4.3
Markdown                3.3.7
matplotlib              3.5.2
menpo                   0.11.0
multidict               6.0.2
mxnet                   1.9.1
networkx                2.6.3
numpy                   1.21.6
oauthlib                3.2.0
opencv-python           4.6.0.66
packaging               21.3
pandas                  1.3.5
Pillow                  9.1.1
pip                     22.1.2
prettytable             3.3.0
protobuf                3.19.4
pyasn1                  0.4.8
pyasn1-modules          0.2.8
pyDeprecate             0.3.1
pyparsing               3.0.9
python-dateutil         2.8.2
pytorch-lightning       1.4.4
pytz                    2022.1
PyWavelets              1.3.0
PyYAML                  6.0
requests                2.28.0
requests-oauthlib       1.3.1
rsa                     4.8
scikit-image            0.19.3
scikit-learn            1.0.2
scipy                   1.7.3
setuptools              62.6.0
six                     1.16.0
tensorboard             2.9.1
tensorboard-data-server 0.6.1
tensorboard-plugin-wit  1.8.1
threadpoolctl           3.1.0
tifffile                2021.11.2
torch                   1.8.1+cu111
torchaudio              0.8.1
torchmetrics            0.6.0
torchvision             0.9.1+cu111
tqdm                    4.64.0
typing_extensions       4.2.0
urllib3                 1.26.9
wcwidth                 0.2.5
Werkzeug                2.1.2
wheel                   0.37.1
yarl                    1.7.2
zipp                    3.8.0

Is it possible to use without CUDA?

Is there a simple example on how to use it?

What is range of loss optimal when model reaches convergence?

thanks.

Questions about GST visualization

Hi Minchul, thx for this incredible work! I have few questions about the GST visualization in the Fig 3 of the main paper.

For the CosFace Loss(1st. column of Fig 3), it looks like GST value decreases rapidly near the boundary, how can you adjust the GST value from W_j to the boundary B_1, what is the value of s? I thought the result might be based on the last graph of Fig 1 of supplementary material, then +0.5(m=0.5) in x-axis and -1(P-1) in y-axis to get the function of GST based on cos_theta for CosFace, however it looks different in the 1st. column of Fig 3.
In the ArcFace Loss(2nd. column of Fig 3), we can see GST increases when cos_theta goes up. But according to the Eq. 15, when cos_theta goes up, |(P-1)| goes down and (cos(m)+...) goes up. How to make sure the GST value is positive correlated with cos_theta?
The idea emphasizes hard sample with high norm and easy sample with low norm. But the AdaFace Loss(7th. column of Fig 3) shows white triangle(hard sample, low norm) still has large GST value which doesn't make sense.

I'd appreciate it if you can solve my problem :p

About masked face recognition

Thank you for your excellent work! I tried your pretrained model on 1k-face recognition and it just worked very well!
Now I'm trying masked face recognition. I augmented the dataset faces_emore by putting 5 kinds of masks on the face, so the number of training imgs is 5x bigger than the original dataset. I'm wondering how many epochs I should set if I want to get a good 'ir50' model? (or maybe 'ir18' model). Do you have any suggestions on training this model? I intend to use the same parameters you show in the readme. Since the training process could take a long time, it would be very helpful if you can give me some advice to avoid several trainings.

the 18th formula with AdaFace function that are inconsistent

the 18th formula in paper is inconsistent with AdaFace code

Why do I use IR_ 18 model, the trained CKPT is 530m?

this is my config

parent_parser.add_argument('--data_root', type=str, default='')
parent_parser.add_argument('--train_data_path', type=str, default='faces_emore/imgs')
parent_parser.add_argument('--val_data_path', type=str, default='faces_emore')
parent_parser.add_argument('--train_data_subset', action='store_true')
parent_parser.add_argument('--prefix', type=str, default='default')
parent_parser.add_argument('--gpus', type=int, default=4, help='how many gpus')
parent_parser.add_argument('--distributed_backend', type=str, default='ddp', choices=('dp', 'ddp', 'ddp2'),)
parent_parser.add_argument('--use_16bit', action='store_true', help='if true uses 16 bit precision')
parent_parser.add_argument('--epochs', default=26, type=int, metavar='N', help='number of total epochs to run')
parent_parser.add_argument('--seed', type=int, default=42, help='seed for initializing training.')
parent_parser.add_argument('--batch_size', default=1024, type=int,
                           help='mini-batch size (default: 256), this is the total '
                                'batch size of all GPUs on the current node when '
                                'using Data Parallel or Distributed Data Parallel')

parent_parser.add_argument('--lr',help='learning rate',default=0.002, type=float)
parent_parser.add_argument('--lr_milestones', default='12,20,24', type=str, help='epochs for reducing LR')
parent_parser.add_argument('--lr_gamma', default=0.1, type=float, help='multiply when reducing LR')

parent_parser.add_argument('--num_workers', default=36, type=int)
parent_parser.add_argument('--fast_dev_run', dest='fast_dev_run', action='store_true')
parent_parser.add_argument('--evaluate', action='store_true', help='use with start_from_model_statedict')
parent_parser.add_argument('--resume_from_checkpoint', type=str, default='')
parent_parser.add_argument('--start_from_model_statedict', type=str, default='')
parser.add_argument('--arch', default='ir_18')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M')
parser.add_argument('--weight_decay', default=1e-4, type=float)

parser.add_argument('--head', default='adaface', type=str, choices=('adaface'))
parser.add_argument('--m', default=0.4, type=float)
parser.add_argument('--h', default=0.333, type=float)
parser.add_argument('--s', type=float, default=64.0)
parser.add_argument('--t_alpha', default=0.01, type=float)

parser.add_argument('--low_res_augmentation_prob', default=0.2, type=float)
parser.add_argument('--crop_augmentation_prob', default=0.2, type=float)
parser.add_argument('--photometric_augmentation_prob', default=0.2, type=float)

parser.add_argument('--accumulate_grad_batches', type=int, default=1)
parser.add_argument('--test_run', action='store_true')
parser.add_argument('--save_all_models', action='store_true')

	with torch.no_grad():
	mean = safe_norms.mean().detach()
	std = safe_norms.std().detach()
	self.batch_mean = mean * self.t_alpha + (1 - self.t_alpha) * self.batch_mean
	self.batch_std = std * self.t_alpha + (1 - self.t_alpha) * self.batch_std

	margin_scaler = (safe_norms - self.batch_mean) / (self.batch_std+self.eps) # 66% between -1, 1

mk-minchul / adaface Goto Github PK

adaface's People

Contributors

Stargazers

Watchers

Forkers

adaface's Issues

initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1

distributed_backend=nccl All DDP processes registered. Starting ddp with 1 processes

Recommend Projects

Recommend Topics

Recommend Org

distributed_backend=nccl
All DDP processes registered. Starting ddp with 1 processes