polimi-ispl / icpr2020dfdc Goto Github PK

View Code? Open in Web Editor NEW

243.0 243.0 93.0 99.63 MB

Video Face Manipulation Detection Through Ensemble of CNNs

License: GNU General Public License v3.0

Python 4.13% Shell 0.50% Jupyter Notebook 95.36%

deepfake-detection deepfake-detection-challenge dfdc ensemble face-manipulation forensics paper

icpr2020dfdc's People

Contributors

Stargazers

Watchers

Forkers

aihill zhangconghhh r55555 umit-ai salensoft anothorld abelmarumo xysong1201 robinwenqian jasmin-bharadiya pratimugale vnlinh112 kat-ruska lzythebset yeohoonyun flaber123 ajaypadwal73 hyunji4287 wangbixiang shruti2k17it116 lgry deepfakelearnerhi yathish27 kien-nguyen-ngoc maxstyll kerenalli aydamirmirzayev jhchang momina04 lmwijesundara koola11 jie311 muhammed1244 likhanbiswas pingponglabs moumita-sen-sarma tzuren 263man wwq111111 harshv46 bmquynhlinh shashwatj07 thaondc xuyingzhongguo irsheidat mikenedodai hzhumsc incephalia pluto00 xiaokeai18 munsuch yixzhang mjrafi01 ad999-g radioactive11 simrit1 hello-xiaow shivam-akhouri ahmedyes2000 kurianbenoy-sentient thainguyen54 phanthanhtrung manamukitajima lucasheng shuvo001 ai-ahihi l-chkun nikhil-iitb sslogan666 eshatripathi jasonlin0189impv mgaugasta tanknam tapas bryanchiaws amitpanth taiyi98 sivaramana-h-v tusharravindranxo paulfromrse codyying mrizkymunggaran alienx456 vinay-parekh ashishsiot ouroborosrex hwcho0456 rajat397 irawabi mahendradani kanishkshukla sumeethkumar777

icpr2020dfdc's Issues

Query

Hey. I have a question.
Have you used all the frames extracted from each video to identify a real/fake video? I tried looking for it but since I'm a starter I am facing difficulties in understanding the code.
One more thing, you have used the scores obtained from each frame to classify the whole
video right?

ERROR conda.core.link:_execute(507): An error occurred while installing package 'conda-forge::async_generator-1.10-py_0'

While creating the environment this error occurs and rolls back the creation of all other libraries. How do I get past this?
Also if I wanted to train my own dataset what is the pipeline?

CELEB DF

Hello,
I'm trying to create a model for Celeb dataset and I have a problem with creating indexes. I'm not sure what should the file "List_of_testing_videos.txt" contain. I only gave two paths: one to the file containing two folders with synthesis and real videos and the second, to save the video dataFrames.

Thank You in advance

Celebdf dataset

--traindb ff-c23-720-140-140
--valdb ff-c23-720-140-140
I am facing difficulties in understanding the significance of these in train_all.sh
If I want to train the celebdf dataset what should I replace these with?

extract_faces.py double check for video and video DataFrame

We need a double check when extracting faces from a video:

We need to verify if that video exists: now, even if the video paths is not correct, the DataFrame gets created anyway and is empty. We need a warning for the user to correctly provide the directory containing the videos so that it matches with the index of the videos_dataframe.pkl (I was thinking something around line 232);
we need also to insert a check when we collate the faces DataFrame of the different videos (line 124): if the checkpoint path is not correct, the scripts goes anyway and the resulting DataFrame is empty.

A general question about training process

I have a some real images, which are not from any videos and I want to use them in the training phase, however they are not associated with any deepfake frame or video. I faced several questions

Is it necessary for any fake frame that the original frame is available in the dataset? How about the reverse question. Is it necessary for any original frame in the dataset that the fake frame is available in the dataset
How can I train a model when instead of videos I have some images, specifically I want to know how to create the face_dataframe for the images I have. Those images in my dataset are not associated with any videos.

Best regards,
Saleh

train_triplet.py --train_db --val_db argument value

Hey, I am trying to train the model using train_triplet.py. I can see that you have passed the ff-c23-720-140-140 directory as value for train_db. I cannot figure out what is there in that directory and how did you generate it?
I have downloaded the FFPP dataset and extracted the faces using extract_faces.py.
Thank you for your help!

Wrong output path for facesdf

Hey,
Thank you for your help!
I ran the fixed code of extract_faces.py from the master branch on some part of the dfdc dataset. Your latest commit has fixed the issue that I had raised earlier. I want to bring to your attention another small issue.

After running the latest extract_faces.py:

As expected, all the extracted faces were stored in the path given in --facesfolder argument which is the faces output directory.
But the Output DataFrame of faces which were supposed to be saved in the path given in --facesdf(for me it was icpr2020dfdc/facesdf folder). But the Output DataFrame( i.e. .pkl file) got saved in the icpr2020dfdc/ folder itself( which is not the folder given as --facesdf argument)

Thank you

Originally posted by @chinmaynehate in #19 (comment)

Incomplete faces extraction

Sorry to trouble you again! The DFDC data set is too big to download at one time. I divided it into 50 downloads. It hasn't finished downloading yet, it's about forty. When I ran extract_faces.py, I found that only the dfdc_train_part_0 folder was processed, and the remaining 30 or more folders were not processed. Does the complete data set and the separated data set have more txt files like celeb? And you can add celeb's data set training in train_all.sh. Thank you very much!

Facing error while running train_all.sh

about extract_faces.py

I did not change the structure of the DFDC dataset, but when I run extract_faces.py, I get the following error. Can you help me solve it?

Error while reading: /data0/DFDC/tmp/dfdc_train_part_9/zzivhztyyn.faces.pkl
rename() got an unexpected keyword argument "errors"
Error while reading: /data0/DFDC/tmp/dfdc_train_part_9/zzkzxqgbcy.faces.pkl
rename() got an unexpected keyword argument "errors"
Collecting faces results: 100%|???????????????????????????????????| 119154/119154 [02:40<00:00, 740.14it/s]
Saving videos DataFrame to /home//dfdc/data/dfdc_videos.pkl
Saving faces DataFrame to /data0/DFDC/faces/df_from_video_0_to_video_0.pkl
Traceback (most recent call last):
File "extract_faces.py", line 306, in
main()
File "extract_faces.py", line 157, in main
df_faces = pd.concat(faces_dataset, axis=0, )
File "/home//anaconda3/envs/icpr2020/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 206, in concat
copy=copy)
File "/home/**/anaconda3/envs/icpr2020/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 239, in init
raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

I got an error while training Xception net on ffpp

Hello,

I used the command below to train Xception net on faceforensics++ dataset which contains "youtube" and "actors" as real videos and "DeepFakeDetection", "Deepfakes", "Face2Face", "FaceShifter" , "FaceSwap", "NeuralTextures" as fake videos.

first of all I ran index_ffp.py and produced ffpp_videos.pkl
In the next step I ran
python extract_faces.py
--source path/to/faceforensics++/dataset
--videodf ./data/ffpp_videos.pkl
--facesfolder ./output_faces
--facesdf ./output_faces_df
--checkpoint ./tmp
and 2 things are created: 1) output_faces folder with all extracted frames inside 2) a pickle file "output_faces_df_from_video_0_to_video_0.pkl" (whose name was wierd a bit and it was 40 MB) was created inside the project folder (beside extract_faces.py) . I created a folder "output_faces_df" manually and put the pickle file inside this folder

In the last step I ran

python train_binclass.py --net Xception --traindb ff-c23-720-140-140 --valdb ff-c23-720-140-140 --ffpp_faces_df_path ./output_faces_df/output_faces_df_from_video_0_to_video_0.pkl --ffpp_faces_dir ./output_faces --face scale --size 224 --batch 32 --lr 1e-5 --valint 500 --patience 10 --maxiter 30000 --seed 41 --attention --device 0

however Im getting the below error:

/home/saleh/anaconda3/envs/icpr2020/bin/python /home/saleh/Documents/internship/icpr2020dfdc/train_binclass.py --net Xception --traindb ff-c23-720-140-140 --valdb ff-c23-720-140-140 --ffpp_faces_df_path ./output_faces_df/output_faces_df_from_video_0_to_video_0.pkl --ffpp_faces_dir ./output_faces --face scale --size 224 --batch 32 --lr 1e-5 --valint 500 --patience 10 --maxiter 30000 --seed 41 --attention --device 0
Parameters
{'face': 'scale',
'net': 'Xception',
'seed': 41,
'size': 224,
'traindb': 'ff-c23-720-140-140'}
Tag: net-Xception_traindb-ff-c23-720-140-140_face-scale_size-224_seed-41
Loading data
Traceback (most recent call last):
File "/home/saleh/anaconda3/envs/icpr2020/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'source'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/saleh/Documents/internship/icpr2020dfdc/train_binclass.py", line 460, in
main()
File "/home/saleh/Documents/internship/icpr2020dfdc/train_binclass.py", line 227, in main
dbs={'train': train_datasets, 'val': val_datasets})
File "/home/saleh/Documents/internship/icpr2020dfdc/isplutils/split.py", line 107, in make_splits
split_df = get_split_df(df=full_df, dataset=split_db, split=split_name)
File "/home/saleh/Documents/internship/icpr2020dfdc/isplutils/split.py", line 57, in get_split_df
df[(df['source'] == 'youtube') & (df['quality'] == crf)]['video'].unique())
File "/home/saleh/anaconda3/envs/icpr2020/lib/python3.6/site-packages/pandas/core/frame.py", line 2906, in getitem
indexer = self.columns.get_loc(key)
File "/home/saleh/anaconda3/envs/icpr2020/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc
raise KeyError(key) from err
KeyError: 'source'

Process finished with exit code 1

The error come from dataframe, so tried to debug the code and I saw the dataframe was successfully loaded but it doesnt have a column with 'source' name
as an example I printed a sample row of the dataframe and it has the following keys:

df.iloc[19000]
Out[3]:
video 607
label False
videosubject 0
kp1x 189
kp1y 170
kp2x 332
kp2y 164
kp3x 271
kp3y 227
kp4x 271
kp4y 315
kp5x 113
kp5y 226
kp6x 398
kp6y 211
conf 0.914732
left 29
top 0
right 483
bottom 487
nfaces 1
Name: original_sequences/youtube/c23/videos/671.mp4/fr132_subj0.jpg, dtype: object

It seems that the 'source' should be 'youthube', however we dont have such a key in the dataframe

So what should I do

There is no extract_meta_cv in the isplutils.utils

When I run the index_ffpp.py, it came out with the error, there is no defination about extract_meta_cv. I found no function about extract_meta_cv in isplutils.utils.

network doesn't generalize well

Hi, I tried a few samples from faceshifter paper ( https://arxiv.org/pdf/1912.13457.pdf ) and also tried on obama's deepfake video ( https://www.youtube.com/watch?v=cQ54GDm1eL0 ) . Could Not detect

About the dataset

After I executed make_dataset.sh (only FF++ data was extracted), I executed ‘python train_binclass.py --net=EfficientNetB4’ and a KeyError:'source' appeared. My DataFrame has no source, quality, or original, only ['video','label','videosubject','kp1x','kp1y','kp2x','kp2y','kp3x','kp3y','kp4x ','kp4y','kp5x','kp5y','kp6x','kp6y','conf','left','top','right','bottom','nfaces'] these. What is the problem and how can I solve it. Looking forward to your reply.

There is a error, KeyError: 'original'

I use the FFPP dataset wih this error:

| Train EfficientNetAutoAttB4 on FFc23 | │·················
------------------------------------------------- │·················
Loaded pretrained weights for efficientnet-b4 │·················
Parameters │·················
{'face': 'scale', │·················
'net': 'EfficientNetAutoAttB4', │·················
'seed': 41, │·················
'size': 224, │·················
'traindb': 'ff-c23-720-140-140'} │·················
Tag: net-EfficientNetAutoAttB4_traindb-ff-c23-720-140-140_face-scale_size-224_seed-41 │·················
Loading data │·················
df: video label videosubject original_x class source quality kp1x kp1y kp2x kp2y kp3x kp3y kp4x kp4y kp5x kp5y kp6x kp6y conf left top right bottom nfaces original_ y facepath > original_sequences/actors/c23/videos/20__talking_against_wall.mp4/fr000_subj0.jpg 0 False 0 -1 original_sequences actors c23 189 240 300 206 260 285 276 340 142 284 368 216 0.99936 71 45 440 467 1 -1

What the key "original" means?
why there is not？ what original_x and original_y means

T-SNE code

Hello, can you share your feature visualization code about T-SNE?

ValueError: invalid literal for int() with base 10: 'videos'

When I run index_dfdc.py, I get this error, can anyone help me?

dataset preprocessing

Hello, I am trying to reproduce your code recently. I encountered a problem when processing the data set, because my computer is a Windows version, and running make_dataset.sh on git reported an error. Then try to run the index_celebdf.py file by yourself to report an error. Error display: pandas.errors.EmptyDataError: No columns to parse from file. Do I need to preprocess the celeb data set first when running this program? How was the List_of_testing_videos.txt file generated

The score of the result

Hi,Thanks for your work and the detailed steps.
However,I followed them and trained the model(EfficientNet) in the DFDC
Then I got the strange result such as :

The real scores are greater than 1.And I try to predict it with the notebook 'Image prediction.ipynb'
Error while loading the weights:

rename() got an unexpected keyword argument "errors"

Hey,

For FFPP dataset:

I ran the index_ffpp.py file from which a .pkl file was generated
Next, I ran the extract_faces.py file by providing the paths of source, videodf, facesfolder, facesdf and checkpoint directories. I got the following error. I haven't modified the code.

For the above screenshots, I have run extract_faces.py with --num 20 argument. But I got similar error when I ran the code for entire FFPP dataset.

Thank you

test with single image

I download weight files and I changed test-model.py to test pre-trained models with a single image, part of my codes is:

`device = 0
net_name = 'EfficientNetAutoAttB4'
net_class = getattr(fornet, net_name)
# load model
print('Loading model...')
state_tmp = torch.load('model/ex.pth', map_location='cpu')
if 'net' not in state_tmp.keys():
state = OrderedDict({'net': OrderedDict()})
[state['net'].update({'model.{}'.format(k): v}) for k, v in state_tmp.items()]
else:
state = state_tmp
net: FeatureExtractor = net_class().eval().to(device)

incomp_keys = net.load_state_dict(state['net'], strict=True)
print(incomp_keys)
print('Model loaded!')
I = cv2.imread(img)
I=cv2.resize(I,(224,224))
I=cv2.cvtColor(I, cv2.COLOR_BGR2RGB)
I=np.array(I,dtype=np.float)
I=np.transpose(I)
batch = torch.tensor(I/255).unsqueeze(0)
with torch.no_grad():
        batch = batch.to(device, dtype=torch.float)
        output = net(batch)
        # get prediction
        score= output.cpu().numpy()[:, 0]`

I test with many real and fake images, but score is random (between -3 and 3) and didn't show anything about this image that is real or fake. what is the relation of score and class of image(real or fake)? Is my input image true? Is there any problem in my code?

"No validation samples" error when trying to train EfficientNetB4 on DFDC

Hi,
Thanks for your great work. Your approach seems really interesting. When I was trying to train EfficientNetB4 model on DFDC sample data set I got this error.

here is my script

Thanks in advance!

About test file

I have some questions about the test code? Can you help me?

In the test_model.py, the face_policy, patch_size, net_name and model_name are got from the model path, but in the model you offered there is no information about the patch_size.
In the test_all.sh, the path of DFDC and FFPP are both requried for the input. Can I just run the test code on just one dataset, such as the DFDC dataset?
When generating the DFDC dattaset, in line 55 of thrr index_dfdc.py.
df_tmp['folder'] = int(str(json_path.parts[-2]).split('_')[-1])
But, the DFDC dataset I downloaded from the webset has two folders(test_videos, train_sample_videos), I couldn't get the df_tmp['folder'] .

training data

Thanks for sharing codes and sorry for disturbing you.

When I downloading DFDC-preview dataset (5k videos), I fail to find it and its json file, dataset.json. The name of each video is like pyfnfvsxez.mp4 and is different from the format 1255229_1003254_A_001.mp4. How can I download the same datasets of yours?

About the rocauc value

Hi, @CrohnEngineer
I found that the way to calculate AUC value is a bit strange. The score is used here, and the score is a number that has not changed, some of which are greater than 1. Is this the right way to use this function?
rocauc = M.roc_auc_score(df_res['label'],df_res['score'])

FaceForensics++DataSet

Sorry to trouble you again! I would like to ask about the FF++ data set space you are using. I downloaded it once before. The catalog looks like this, but it seems that the program index_ffpp.py cannot be used. I have always used the DFDC data set before. Now I re-use the download script to download the compressed data set of 130G, is it the same as the data set you used in the code?
This is the structure of the data set I downloaded before, about 100G like this

This is a screenshot of the procedure I got by filling out the form

How to solve the problem of overfitting in training ?

or how to select the best model as the final submission model ?

I also participated in DFDC, but only got 94 places .

Thank you !

about the loss in train_binclass.py

Hi, thanks for the great work. I have some questions about your loss in train_binclass.py. In your paper, I feel the input of your BCEloss are labels and sigmoid(out) ,but in your code, the input of the BCEloss are out and labels.Would you like to tell me why you write your BCEloss code like that? thank you very much

I got an issue while training the model

Hi,

Sorry for bothering. When I tried to train the module with dfdc dataset. The following error is reported. May I ask whether I missed anything.

regards,

What is the Training set of those pre-trianed models that provided by your dropbox link

Hello,
As the title, I just want to know what are the training sets of the DFDC and FFPP models?
I mean for the training set, we have HQ (23% compression) and LQ and raw data.
I got your pre-trained model without the description of the training set.

Error in extract_faces.py

I first ran the index_dfdc.py file from which a .pkl file was generated
Next, I ran the extract_faces file by providing the paths of source, videodf, facesfolder, facesdf and checkpoint directories. I got the following error. I haven't modified the code.

Face size not consistent with the one reported in the paper

We hard-coded the dimension of the cropping for the faces at 512x512, but in the paper we used 224x224

icpr2020dfdc/extract_faces.py

Line 79 in 7ae2c9b

face_size = 512

.
Is that right or am I missing something?

training using the train_triplet file

Hi,
I wanted to train ur model and train the efficientnetb4ST so I started with the triplet file just as you mentioned in the script however I keep getting this message, is there any way to fix this?
I' m using googlecolab for training

KeyError: 'test' in dataframe of celeb-df faces

Hey,

For Celeb-DF V2 dataset:
(EfficientNetB4ST)
I first ran the index_celebdf.py and extract_faces.py and got the faces and dataframe of faces.

After that,

I made the following changes in train_triplet.py:

added arguments for celebdf_faces_dir and celebdf_faces_df_path
added arguments in the make_splits() function at line 226 of train_triplet.py corresponding to celebdf faces and dataframe of celebdf faces
Also made changes in the load_df() function at line 28 of isplutils/split.py to read the celebdf faces dataframe

After this, I ran train_triplet.py using:
(running for 2 iterations only)
python3 train_triplet.py --net EfficientNetB4 --traindb celebdf --valdb celebdf --celebdf_faces_df_path facesdf/faces_df.pkl --celebdf_faces_dir faces/ --face scale --size 224 --workers 0 --traintriplets 70 --valtriplets 20 --maxiter 2 --valint 1

Output:

Loaded pretrained weights for efficientnet-b4
Parameters
{'face': 'scale',
 'net': 'EfficientNetB4',
 'seed': 0,
 'size': 224,
 'traindb': 'celebdf'}
Tag: net-EfficientNetB4_traindb-celebdf_face-scale_size-224_seed-0
Loading data
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2890, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'test'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_triplet.py", line 473, in <module>
    main()
  File "train_triplet.py", line 241, in main
    ffpp_dir=ffpp_faces_dir, celebdf_dir=celebdf_faces_dir, dbs={'train': train_datasets, 'val': val_datasets})
  File "/home/jupyter/celebdf_detection/isplutils/split.py", line 135, in make_splits
    split_df = get_split_df(df=full_df, dataset=split_db, split=split_name)
  File "/home/jupyter/celebdf_detection/isplutils/split.py", line 94, in get_split_df
    df[(df['label'] == False) & (df['test'] == False) ]['video'].unique())
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py", line 2975, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2892, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'test'

In short:
At line 91 of isplutils/split.py I am getting a KeyError for df['test']. There is no 'test' field in the dataframe of celebdf faces.
You can verify this by running:(for celeb-df dataset)

import pandas as pd
df = pd.read_pickle("celebdf/facesdf/faces_df.pkl")
print(df['test'])

I was able to run the code(train_triplet.py) by removing (df['test'] == False) at line 91 of isplutils/split.py

I did similar changes for train_binclass.py and was able to train the model using it as well.

After this:
While running test_model.py for celebdf, again I got the KeyError for 'test' while making a test split at line 99 of isplutils/split.py

Any way to solve this while running the test_model.py code?

Thank you

about how to calculate the AUC score of the model testing on the testdataset

hi thanks for the great work and thanks for answer my issues 16. I read your test_model.py code, but I didn't find the code about how to calculate the AUC score of the model testing on the testdataset. So I write it myself, I download your pretrained Xception_FFPP(best_val.pth). I run the script:
python test_model.py --model_path /media/lixuan/others/dataset/icpr2020dfdc/weights/binclass/net-Xception_traindb-ff-c23-720-140-140_face-scale_size-64_seed-41/bestval.pth --testsets ff-c23-720-140-140 dfdc-35-5-10 --dfdc_faces_df_path $DFDC_FACES_DF --dfdc_faces_dir $DFDC_FACES_DIR --ffpp_faces_df_path $FFPP_FACES_DF --ffpp_faces_dir $FFPP_FACES_DIR --device $DEVICE --override
and get the result,
then I write code to calculate the AUC score like this:

import pickle
import pandas as pd
import numpy as np
from sklearn.metrics import roc_auc_score
import torch
res_path = 'results/net-Xception_traindb-ff-c23-720-140-140_face-scale_size-64_seed-41_bestval/ff-c23-720-140-140_test.pkl'
with open(res_path, 'rb') as res:
    result = pickle.load(res)
labels = result['label'].values
labels_bool2value = []
for i in range(labels.shape[0]):
    if labels[i]:
        labels_bool2value.append(0)
    else:
        labels_bool2value.append(1)
pred = torch.sigmoid(torch.from_numpy(result['score'].values)).numpy()
auc = roc_auc_score(np.array(labels_bool2value), pred)
print(auc)

but the AUC score I get is 0.46364417251275514.It's lower than the result provided in your paper.Would you like to tell me what is wrong with my code ? Or would you like to share your code about how to calculate the AUC score of the model testing on the testdataset. thank you very much

Can't find the file data_frame_df.pkl

extractfaces.py --videodf argument

What is the video dataframe that seems to be required here? Not passing anything returns None which makes it problematic for the script to run

Training with partial dataset

Hello, I have processed the face programs of the dfdc dataset and the Celeb dataset according to your instructions. But I have some doubts about the data set segmentation in your code. Is it to run the split.py program? The name of the available dataset is different from the one I downloaded. Because the dfdc data set is too big, I only downloaded part of it. May I ask how to modify this program in split.py.

The following is the dfdc data set I downloaded

the proportion of true and false video is very unbalanced

Hey, @CrohnEngineer
the proportion of true and false video is very unbalanced, but it seems that you split it by 1:4. Doesn't it need to be balanced?

Celeb Dataset training

You have changed the two files train_binclass.py and split.py before. But notice that the two parameters related to celeb are not passed in the make_splits in split.py. I want to add training for the Celeb dataset in training.py. I would like to ask, do I need to modify other files if I add it directly to the parameter list? Can you edit and add it for me? There is another problem. It seems that you did not add the AUC accuracy parameter in train_triplet.py, and there is only the loss map in the obtained map.

Visualization

Hello, don't bother to disturb you again! I trained several models according to the division of my data set. But I want to check the training process and what the loss is. I think you are using tensorboardX, how do you visualize the training process?

About test in ffpp dataset

Hey Thanks for your work @CrohnEngineer
I noted that

icpr2020dfdc/isplutils/split.py

Line 57 in 7ae2c9b

random_youtube_videos = np.random.permutation(

Does this mean that only real videos are used for train/val/test's segmentation ?

missing_keys after loading model in train_binclass.py

Hey,

For FFPP dataset:
EfficientNetB4 (Triplet)
I have kept the maxiter and valint arguments very small(just for testing)

I first ran train_triplet.py using:
python3 train_triplet.py --net EfficientNetB4 --traindb ff-c23-720-140-140 --valdb ff-c23-720-140-140 --ffpp_faces_df_path facesdf/faces_df.pkl --ffpp_faces_dir faces/ --face scale --size 224 --workers 0 --traintriplets 70 --valtriplets 20 --maxiter 2 --valint 1 --seed 0

Output:

Loaded pretrained weights for efficientnet-b4
Parameters
{'face': 'scale',
 'net': 'EfficientNetB4',
 'seed': 0,
 'size': 224,
 'traindb': 'ff-c23-720-140-140'}
Tag: net-EfficientNetB4_traindb-ff-c23-720-140-140_face-scale_size-224_seed-0
Loading data
Training triplets: 70
Validation triplets: 20
Epoch 000: 2it [00:22, 11.59s/it] Maximum number of iterations reached
Completed

As expected, a bestval.pth file was saved in weights/triplet/net-EfficientNetB4_traindb-ff-c23-720-140-140_face-scale_size-224_seed-0/ directory

After this, I ran train_binclass.py using:

python3 train_binclass.py --net EfficientNetB4ST --traindb ff-c23-720-140-140 --valdb ff-c23-720-140-140 --ffpp_faces_df_path facesdf/faces_df.pkl --ffpp_faces_dir faces/ --face scale --size 224 --workers 0 --init weights/triplet/net-EfficientNetB4_traindb-ff-c23-720-140-140_face-scale_size-224_seed-0/bestval.pth --trainsamples 50 --valsamples 20 --maxiter 2 --valint 1 --seed 0
Output:

Loaded pretrained weights for efficientnet-b4
Parameters
{'face': 'scale',
 'net': 'EfficientNetB4ST',
 'seed': 0,
 'size': 224,
 'traindb': 'ff-c23-720-140-140'}
Tag: net-EfficientNetB4ST_traindb-ff-c23-720-140-140_face-scale_size-224_seed-0
Loading model form: weights/triplet/net-EfficientNetB4_traindb-ff-c23-720-140-140_face-scale_size-224_seed-0/bestval.pth
_IncompatibleKeys(missing_keys=['classifier.0.weight', 'classifier.0.bias', 'classifier.0.running_mean', 'classifier.0.running_var', 'classifier.1.weight', 'classifier.1.bias'], unexpected_keys=[])
Loading data
Training samples: 50
Validation samples: 20
Epoch 001: 0it [00:00, ?it/s]     Maximum number of iterations reached
Completed

I am getting some incompatible keys at line 187

Is this expected?
OR
"All keys matched successfully" should have been printed?

Thank you

problems of extract_faces.py

hey thanks for your work
but I have some question when I try to run the extract_faces.py
I seems to see the faces have been extracted,but the list have nothing

Loading video DataFrame
Loading face extractor
<function main.. at 0x7fd95923fea0>
Extracting faces: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 76/76 [00:00<00:00, 246.73it/s]
path height width frames class label source name original nfaces
0 Celeb-synthesis/id3_id1_0001.mp4 500 528 380 Celeb-synthesis True id3_id1_0001.mp4 id3_id1_0001 -1 0
1 Celeb-synthesis/id9_id1_0008.mp4 500 944 461 Celeb-synthesis True id9_id1_0008.mp4 id9_id1_0008 -1 0
2 Celeb-synthesis/id0_id9_0006.mp4 500 944 534 Celeb-synthesis True id0_id9_0006.mp4 id0_id9_0006 -1 0
3 Celeb-synthesis/id9_id4_0007.mp4 500 944 456 Celeb-synthesis True id9_id4_0007.mp4 id9_id4_0007 -1 0
4 Celeb-synthesis/id7_id12_0005.mp4 500 944 161 Celeb-synthesis True id7_id12_0005.mp4 id7_id12_0005 -1 0
... ... ... ... ... ... ... ... ... ... ...
1198 YouTube-real/00098.mp4 500 892 453 YouTube-real False 00098.mp4 00098 -1 0
1199 YouTube-real/00009.mp4 500 892 575 YouTube-real False 00009.mp4 00009 -1 0
1200 YouTube-real/00207.mp4 500 892 126 YouTube-real False 00207.mp4 00207 -1 0
1201 YouTube-real/00174.mp4 500 892 465 YouTube-real False 00174.mp4 00174 -1 0
1202 YouTube-real/00040.mp4 500 892 381 YouTube-real False 00040.mp4 00040 -1 0

[1203 rows x 10 columns]
Collecting faces results: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1203/1203 [00:00<00:00, 12715.78it/s]
Saving videos DataFrame to data/celebdf_videos.pkl
Saving faces DataFrame to faces/celeb/output_from_video_0_to_video_0.pkl
Traceback (most recent call last):
File "/home/ubuntu/project/icpr2020dfdc-master/extract_faces.py", line 317, in
main()
File "/home/ubuntu/project/icpr2020dfdc-master/extract_faces.py", line 164, in main
df_faces = pd.concat(faces_dataset, axis=0, )
File "/home/ubuntu/.conda/envs/icpr2020/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 284, in concat
sort=sort,
File "/home/ubuntu/.conda/envs/icpr2020/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 331, in init
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

Analyze results

Hello, I install step training data. But when I opened the jupyter notebook and ran Analyze results, the following error appeared.

My result folder structure is as follows

Look at the code, it seems that the .csv file is created automatically, I am very confused

ValueError: high is out of bounds for int32

Hello,
I'm getting error "ValueError: high is out of bounds for int32" after running the train_binclass.py code:

Do you know how can I fix it?

About DFDC

Hey @zhangconghhh ,

In your ICPR paper, you mention the test result on the dfdc dataset is 0.8782. The test dataset in your paper is the test_sample_videos part or the last 10 veidos in the train_sample_videos? Because I saw there is a separation in the split.py.

as you can find in the paper, we used only the videos from the training set for training, validation and testing. In particular, we used the videos from the first 35 folders as training set, videos from folder 35 to 40 as validation set, and finally videos from the last 10 folders as test set.
At the time the paper has been written, the challenge wasn't closed already, so we didn't have at hand the videos from the test_sample_videos folder you cited previously in your comments.

And when I run the make_dataset.sh for the DFDC dataset, I also meet a problem.
image
I use the path to train_sample_videos as the DFDC path. And I got the json_path as 'PosixPath('/media/disk/Backup/zhangcong/deepfake/dfdc/train_sample_videos/metadata.json')'.
I can't get a int type for the df_tmp['folder'].
Can you help me with that?

I'm sorry, I think I didn't understand your question. In make_dataset.sh from the last release we execute first index_dfdc.py and then extract_faces.py; in index_dfdc.py however, you just need to specify the path to the folder containing the 50 folders of videos of the DFDC training set, already unzipped.
The metadata json file the script elaborates is then taken from each one of the 50 folders and is used to create an overall Pandas Dataframe with info about the whole dataset. You don't need to specify any json path anywhere in the code. I'll recap this for you quickly.
Before launching make_dataset.sh:

You should have downloaded the DFDC dataset from Kaggle;
You should have unzipped all the 50 folders contained in the train_sample_videos folder;
You should run index_dfdc.py indicating as argument the path to the folder containing the 50 folders of the DFDC training set.

Hope this helps. Have a good weekend!
Cheers

Edoardo

Originally posted by @CrohnEngineer in #2 (comment)

It seems that the DFDC dateset has been diffrent from what you said. I downloaded the DFDC dataset,there're not 50 folders in the train_sample videos. there are just 401 mp4 files.So I guess that it is the puzzle that the questioner haved.

What is the true or false distribution of the test data set ?

For FF++ datasets , balance or imbalance ?
For DFDC datasets , balance or imbalance ?
thank you

Build a heatmap

Hello!

I want to convert the images that show the attention masks into a heatmap on the images. Would you be able to guide me as to how I could do that?

polimi-ispl / icpr2020dfdc Goto Github PK

icpr2020dfdc's People

Contributors

Stargazers

Watchers

Forkers

icpr2020dfdc's Issues

Recommend Projects

Recommend Topics

Recommend Org