Giter VIP home page Giter VIP logo

learn-to-cluster's Introduction

Learning to Cluster Faces

This repo provides an official implementation for [1, 2] and a re-implementation of [3].

Paper

  1. Learning to Cluster Faces on an Affinity Graph, CVPR 2019 (Oral) [Project Page]
  2. Learning to Cluster Faces via Confidence and Connectivity Estimation, CVPR 2020 [Project Page]
  3. Linkage-based Face Clustering via Graph Convolution Network, CVPR 2019

Requirements

Setup and get data

Install dependencies

conda install faiss-gpu -c pytorch
pip install -r requirements.txt

Datasets

Please refer to DATASET.md for data preparation.

Model zoo

Pretrained models are available in the model zoo.

Run

  1. Fetch code & Create soft link
git clone [email protected]:yl-1993/learn-to-cluster.git
cd learn-to-cluster
ln -s xxx/data data
  1. Run algorithms

Follow the instructions in dsgcn, vegcn and lgcn to run algorithms.

Results on part1_test (584K)

Method Precision Recall F-score
Chinese Whispers (k=80, th=0.6, iters=20) 55.49 52.46 53.93
Approx Rank Order (k=80, th=0) 99.77 7.2 13.42
MiniBatchKmeans (ncluster=5000, bs=100) 45.48 80.98 58.25
KNN DBSCAN (k=80, th=0.7, eps=0.25, min=1) 95.25 52.79 67.93
FastHAC (dist=0.72, single) 92.07 57.28 70.63
DaskSpectral (ncluster=8573, affinity='rbf') 78.75 66.59 72.16
CDP (single model, th=0.7) 80.19 70.47 75.02
L-GCN (k_at_hop=[200, 10], active_conn=10, step=0.6, maxsz=300) 74.38 83.51 78.68
GCN-D (2 prpsls) 95.41 67.77 79.25
GCN-D (5 prpsls) 94.62 72.59 82.15
GCN-D (8 prpsls) 94.23 79.69 86.35
GCN-D (20 prplss) 94.54 81.62 87.61
GCN-D + GCN-S (2 prpsls) 99.07 67.22 80.1
GCN-D + GCN-S (5 prpsls) 98.84 72.01 83.31
GCN-D + GCN-S (8 prpsls) 97.93 78.98 87.44
GCN-D + GCN-S (20 prpsls) 97.91 80.86 88.57
GCN-V 92.45 82.42 87.14
GCN-V + GCN-E 92.56 83.74 87.93

Note that the prpsls in above table indicate the number of parameters for generating proposals, rather than the actual number of proposals. For example, 2 prpsls generates 34578 proposals and 20 prpsls generates 283552 proposals.

Benchmarks (5.21M)

1, 3, 5, 7, 9 denotes different scales of clustering. Details can be found in Face Clustering Benchmarks.

Pairwise F-score 1 3 5 7 9
CDP (single model, th=0.7) 75.02 70.75 69.51 68.62 68.06
LGCN 78.68 75.83 74.29 73.7 72.99
GCN-D (2 prpsls) 79.25 75.72 73.90 72.62 71.63
GCN-D (5 prpsls) 82.15 77.71 75.5 73.99 72.89
GCN-D (8 prpsls) 86.35 82.41 80.32 78.98 77.87
GCN-D (20 prpsls) 87.61 83.76 81.62 80.33 79.21
GCN-V 87.14 83.49 81.51 79.97 78.77
GCN-V + GCN-E 87.93 84.04 82.1 80.45 79.3
BCubed F-score 1 3 5 7 9
CDP (single model, th=0.7) 78.7 75.82 74.58 73.62 72.92
LGCN 84.37 81.61 80.11 79.33 78.6
GCN-D (2 prpsls) 78.89 76.05 74.65 73.57 72.77
GCN-D (5 prpsls) 82.56 78.33 76.39 75.02 74.04
GCN-D (8 prpsls) 86.73 83.01 81.1 79.84 78.86
GCN-D (20 prpsls) 87.76 83.99 82 80.72 79.71
GCN-V 85.81 82.63 81.05 79.92 79.08
GCN-V + GCN-E 86.09 82.84 81.24 80.09 79.25
NMI 1 3 5 7 9
CDP (single model, th=0.7) 94.69 94.62 94.63 94.62 94.61
LGCN 96.12 95.78 95.63 95.57 95.49
GCN-D (2 prpsls) 94.68 94.66 94.63 94.59 94.55
GCN-D (5 prpsls) 95.64 95.19 95.03 94.91 94.83
GCN-D (8 prpsls) 96.75 96.29 96.08 95.95 95.85
GCN-D (20 prpsls) 97.04 96.55 96.33 96.18 96.07
GCN-V 96.37 96.01 95.83 95.69 95.6
GCN-V + GCN-E 96.41 96.03 95.85 95.71 95.62

Results on YouTube-Faces

Method Pairwise F-score BCubed F-score NMI
Chinese Whispers (k=160, th=0.75, iters=20) 72.9 70.55 93.25
Approx Rank Order (k=200, th=0) 76.45 75.45 94.34
Kmeans (ncluster=1436) 67.86 75.77 93.99
KNN DBSCAN (k=160, th=0., eps=0.3, min=1) 91.35 89.34 97.52
FastHAC (dist=0.72, single) 93.07 87.98 97.19
GCN-D (4 prpsls) 94.44 91.33 97.97

Results on DeepFashion

Method Pairwise F-score BCubed F-score NMI
Chinese Whispers (k=5, th=0.7, iters=20) 31.22 53.25 89.8
Approx Rank Order (k=10, th=0) 25.04 52.77 88.71
Kmeans (ncluster=3991) 32.02 53.3 88.91
KNN DBSCAN (k=4, th=0., eps=0.1, min=2) 25.07 53.23 90.75
FastHAC (dist=0.4, single) 22.54 48.77 90.44
Meanshift (bandwidth=0.5) 31.61 56.73 89.29
Spectral (ncluster=3991, affinity='rbf') 29.6 47.12 86.95
DaskSpectral (ncluster=3991, affinity='rbf') 24.25 44.11 86.21
CDP (single model, k=2, th=0.5, maxsz=200) 28.28 57.83 90.93
L-GCN (k_at_hop=[5, 5], active_conn=5, step=0.5, maxsz=50) 30.7 60.13 90.67
GCN-D (2 prpsls) 29.14 59.09 89.48
GCN-D (8 prpsls) 32.52 57.52 89.54
GCN-D (20 prpsls) 33.25 56.83 89.36
GCN-V 33.59 59.41 90.88
GCN-V + GCN-E 38.47 60.06 90.5

Face Recognition

For training face recognition and feature extraction, you may use any frameworks below, including but not limited to:

https://github.com/yl-1993/hfsoftmax

https://github.com/XiaohangZhan/face_recognition_framework

Citation

Please cite the following paper if you use this repository in your reseach.

@inproceedings{yang2019learning,
  title={Learning to Cluster Faces on an Affinity Graph},
  author={Yang, Lei and Zhan, Xiaohang and Chen, Dapeng and Yan, Junjie and Loy, Chen Change and Lin, Dahua},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}
@inproceedings{yang2020learning,
  title={Learning to Cluster Faces via Confidence and Connectivity Estimation},
  author={Yang, Lei and Chen, Dapeng and Zhan, Xiaohang and Zhao, Rui and Loy, Chen Change and Lin, Dahua},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

learn-to-cluster's People

Contributors

rhxw avatar yl-1993 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

learn-to-cluster's Issues

Slightly better Batched K-means results on YouTube Faces

Hi,

Firstly, thanks for the detailed codebase and results.

I was trying to replicate your results using minibatched K-means on YouTubeFaces. I took the train/test split of YTB you have linked on the website. The test split had 140629 samples and 1436 classes.

Using larger batch-size and different num-clusters (K), we can get a better performance. Also, I get somewhat lower results when using the reported settings from your Clustering Benchmark.

Here are the clustering results with minibatched K-means with varying batch-sizes:

Method Num clusters batch-size prec rec f-score
recreation K=1500 100 50.51 53.11 51.77
K=1500 2000 80.04 53.17 63.90
reported (yours) K=1500 100 76.87 51.86 61.93

issue on testing lgcn

Hello,
We try to reproduce the result for your code of lgcn, but meet some issue. The detail is as follows:
1.We use the pretrained_lgcn_ms1m.pth from your model zoo, and your part1_test feature bin and meta, and the newest code in your github.
2. We first generate faiss_gpu_k_80.npz by scripts/tools/test_knn.sh, with --knn_method faiss_gpu
3. Then we run scripts/test_lgcn_ms1m.sh, with knn =80 and knn_method = 'faiss_gpu' in cfg_test_lgcn_ms1m.py
4. The resulting F-pairwise = 0.6958, F-bcubic = 0.7120. Which
5. If we change knn to 200, the result change to: f-pairwise = 0.18,F-bcubic =0.18

Thanks very much!

The error when i was running 'sh scripts/pipeline.sh'

Thanks for your paper and code, but there was something error when i was running
'sh scripts/pipeline.sh', the error is

: not foundeline.sh: 3: scripts/pipeline.sh: : not foundeline.sh: 7: scripts/pipeline.sh: : not foundeline.sh: 15: scripts/pipeline.sh: : not foundeline.sh: 16: scripts/pipeline.sh: : not foundeline.sh: 18: scripts/pipeline.sh: scripts/pipeline.sh: 20: scripts/pipeline.sh: Syntax error: word unexpected (expecting "do")
what should I do ? could you give some suggestions? thank you

compare with chinese_whispers

Hi, thanks for your great work. Have you compared with chinese whispers algorithm (it's also a face clustering algorithm).

error in file: sh scripts/test_cluster_det.sh

run : sh scripts/test_cluster_det.sh,it give errors as follows:
trainer@f7c6756f593f:/data/raid/learn-to-cluster$ sh scripts/test_cluster_det.sh [2019-07-26 01:13:57,415] Set random seed to 42 [./data/labels/part1_test.meta] #cls: 8573, #inst: 584013 [Time] read meta and feature consumes 1.5541 s read proposals from folder: ./data/cluster_proposals/part1_test/faiss_k_80_th_0.7_step_0.05_minsz_3_maxsz_300_iter_0/proposals/ read proposals from folder: ./data/cluster_proposals/part1_test/faiss_k_80_th_0.75_step_0.05_minsz_3_maxsz_300_iter_0/proposals/ [Time] read proposal list consumes 0.5840 s #cluster: 34578, #output: 1, feature shape: (584013, 256), norm_adj: True, wo_weight: False Traceback (most recent call last): File "dsgcn/main.py", line 80, in <module> main() File "dsgcn/main.py", line 76, in main handler(model, cfg, logger) File "/data/raid/learn-to-cluster/dsgcn/test_cluster_det.py", line 48, in test_cluster_det for i, data in enumerate(data_loader): File "/home/trainer/anaconda3/envs/pytorch_py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 281, in __next__ return self._process_next_batch(batch) File "/home/trainer/anaconda3/envs/pytorch_py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch raise batch.exc_type(batch.exc_msg) TypeError: Traceback (most recent call last): File "/home/trainer/anaconda3/envs/pytorch_py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/data/raid/learn-to-cluster/dsgcn/datasets/build_dataloader.py", line 39, in collate_graphs pad_feat = default_collate(pad_feat) File "/home/trainer/anaconda3/envs/pytorch_py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 137, in default_collate raise TypeError((error_msg.format(type(batch[0])))) TypeError: batch must contain tensors, numbers, dicts or lists; found <class 'torch.autograd.variable.Variable'>

Re-training on clustering pseudo-labels

Hi,

Could you please share more details about the re-training on pseudo-labels (Sec 4.2.2 of your paper) ?

When you are re-training the network on the clustering labels:

  1. Do you start from the baseline recognition network (trained on labeled data) and fine-tune or start from scratch with labeled and pseudo-labeled data?
  2. The pseudo-labeled data from clustering would have a lot of noise (overestimating the number of classes, etc.) -- do you train using alternate batches of labeled + pseudo-labeled data, or just train using default options?
  3. Do you remove very small clusters, or place some threshold on the IoU or IoP of the clusters that are used as pseudo-labels (i.e., remove clusters with low IoU scores etc. because they could be noisy) ?

(nice work btw!)

Training on custom dataset

I want to train on my own dataset. Can you please tell me the steps to do so. How do I generate my own labels.meta file and features.bin file?

Thanks

Issue on setting knn

Hello Yang Lei
How are you?
I met an issue on setting in configuration.
In your project, knn for gcn-v model test of ms1m dataset is 80 but it is 160 for gcn-e model.
After gcn-v model test, I ran the test for gcn-e model.
image

I thought two knn values were equal and set to 80 both values so the test of gcn-e model worked well.
Please let me know an exact solution.
Also you trained gcn-v model with knn=80 and gcn-e model with knn=160 on msm1m dataset.
I'd like to know how to set knn in both train and test configuration for gcn-v and gcn-e models in detail.
Thanks

Issue on use_gcn_feat

Hi Yang Lei, I met an issue on setting in configuration.In my project, the cluster data is vehicle data which is using for vehicle reid task.
In cfg_test_gcnv_*.py,have use_gcn_feat=True by default. After I train gcn-v model, I test gcn-v model. Here are the results.

Model | KNN K value | Metric | Presicion | Recall | Pairwise F-score
GCN-V | 60 | Snbr_t=0.8 | 0.8868 | 0.8041 | 0.8435
GCN-V | 60 | Snbr_t=0.85 | 0.9569 | 0.7799 | 0.8594
GCN-V | 60 | Snbr_t=0.9 | 0.9861 | 0.7001 | 0.8188
GCN-V | 60 | SFnbr_t=0.85 | 0.6536 | 0.7563 | 0.7012
GCN-V | 60 | SFnbr_t=0.9 | 0.8632 | 0.7366 | 0.7949
GCN-V | 60 | SFnbr_t=0.95 | 0.9688 | 0.6581 | 0.7838

SFnber indicates that using the transformed features to rebuild the affinity graph achieves relatively lower performance.And Two definitions are sensitive to the threshold.
I'd like to know how to set the threshold in test configuration for gcn-v and gcn-e models in detail.
Thanks.

为什么不用更深的gcn?

为什么不用更深的gcn?

我看论文中只用了2层的gcn, 为什么不用更深的gcn网络, 那样是会有什么问题吗??

about train GCNV

@yl-1993 作者您好,我在想训练一个接受512维输入的GCN-V网络,但在配置参数时,我有些疑惑。
image
其中原始的nclass=1,我对这个参数配置感到疑惑,我不明白为什么是一,这不是要分类的个数吗?请您解答我的疑惑。期望您的早日答复。

sh scripts/pipeline.sh error

I have download the 1 part testing data, but when i run sh scripts/pipeline.sh, I encountered the following error:
ave_pre: 0.9732, ave_rec: 0.5245, fscore: 0.6816
th_pos=-1.0, th_iou=1.0, pred_score=./data/work_dir/cfg_0.7_0.75/pretrained_gcn_d.npz, pred_label_fn=./data/work_dir/cfg_0.7_0.75/pretrained_gcn_d_th_iou_1.0_pos_-1.0_pred_label.txt
Traceback (most recent call last):
File "./post_process/deoverlap.py", line 35, in
d = np.load(args.pred_score)
File "/home/tx/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: './data/work_dir/cfg_0.7_0.75/pretrained_gcn_d.npz'
Traceback (most recent call last):
File "evaluation/evaluate.py", line 31, in
pred_labels, pred_lb_set = _read_meta(args.pred_labels)
File "evaluation/evaluate.py", line 15, in read_meta
with open(fn) as f:
FileNotFoundError: [Errno 2] No such file or directory: './data/work_dir/cfg_0.7_0.75/pretrained_gcn_th_iou_1.0_pos
-1.0_pred_label.txt'

fast_knn2spmat question

Hi Lei Yang, thanks for sharing your work~!
May I ask one question? why do we have to make sure the dist is between -eps to 1+eps in the fast_knn2spmat function: here

I didn't encounter any errors when I tried to run the test code, but since we are using cosine similarity, chances are we will get scores like -0.1 in other situations.

thanks!

clustering with a strong backbone

Hi Yang Lei, may I ask another question?
the cluster problem will become easier with a stronger backbone( the face feature extractor ), with the perfect backbone the similarity score between positive pairs would always be 1.0 and 0 for those negative pairs.

so I reckoned with the gap between the simple CW method and the GCN-V would be small before training, and then I found the results were almost identical... to be clear, I didn't test it on a large dataset yet.

but that raises the questions here: what backbone and loss function did you used during the training? Will the advantage of this method go away if you use a stronger backbone?

thanks!

How to get IOP score for each proposal?

How to get IOP predict value for each proposal?
Is the output_prob of model in test_cluster_det contain IOP prediction score?
I checked the code about the train code for GCN-D,but only found iou signal in line 69 of cluster_det_processor.py

Thanks very much

fail to run clustering with chinese_whispers : 'Graph' object has no attribute 'node'

I'm using the following script to run different clustering methods over my data :
python tools/baseline_cluster.py ...

It works great with method knn_dbscan, but when i try chinese whispers i get the following error :

#nodes: 590, #edges: 50999
[Time] create graph consumes 0.8626 s
[Time] whisper iteratively (iters=20) consumes 0.0004 s
[Time] chinese_whispers consumes 1.0397 s
Traceback (most recent call last):
  File "/Users/yossib/Dev/learn-to-cluster/tools/baseline_cluster.py", line 143, in <module>
    pred_labels = cluster_func(feats, **args.__dict__)
  File "/Users/yossib/Dev/learn-to-cluster/baseline/chinese_whispers.py", line 57, in chinese_whispers
    assigned_cluster = G.node[nbr]['cluster']
AttributeError: 'Graph' object has no attribute 'node'

Process finished with exit code 1

Issue on multi-gpus train

Hi Yang Lei,I try to add multi-gpus train,but get error when run the .sh file.

Traceback (most recent call last):
File "vegcn/main.py", line 105, in
main()
File "vegcn/main.py", line 101, in main
handler(model, cfg, logger)
File "/home/celin/learn-to-cluster/vegcn/train_gcn_v.py", line 37, in train_gcn_v
_single_train(model, dataset, cfg)
File "/home/celin/learn-to-cluster/vegcn/train_gcn_v.py", line 68, in _single_train
runner.run(train_data, cfg.workflow, cfg.total_epochs)
File "/home/celin/anaconda3/lib/python3.6/site-packages/mmcv/runner/runner.py", line 359, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/celin/learn-to-cluster/vegcn/runner/runner.py", line 17, in train_gcnv
**kwargs)
File "/home/celin/learn-to-cluster/vegcn/train_gcn_v.py", line 17, in batch_processor
_, loss = model(data, return_loss=True)
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 119, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 130, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 35, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
return scatter_map(inputs)
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 17, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
return Scatter.apply(target_gpus, None, dim, obj)
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/celin/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: sparse tensors do not have strides (strides at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/ATen/SparseTensorImpl.cpp:39)
frame #0: at::native::slice(at::Tensor const&, long, long, long, long) + 0x97 (0x7f17009baa77 in /home/celin/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #1: at::Type::slice(at::Tensor const&, long, long, long, long) const + 0x4e (0x7f1700b6914e in /home/celin/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #2: at::native::narrow(at::Tensor const&, long, long, long) + 0xb1 (0x7f17009bd481 in /home/celin/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: at::Type::narrow(at::Tensor const&, long, long, long) const + 0x49 (0x7f1700b6a639 in /home/celin/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #4: at::native::split(at::Tensor const&, long, long) + 0x1a7 (0x7f17009bf857 in /home/celin/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #5: at::Type::split(at::Tensor const&, long, long) const + 0x41 (0x7f1700b68fd1 in /home/celin/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #6: torch::autograd::VariableType::split(at::Tensor const&, long, long) const + 0x3ce (0x7f16e3a894de in /home/celin/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #7: at::native::chunk(at::Tensor const&, long, long) + 0x80 (0x7f17009bebe0 in /home/celin/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #8: at::Type::chunk(at::Tensor const&, long, long) const + 0x41 (0x7f1700b6e9e1 in /home/celin/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #9: torch::autograd::VariableType::chunk(at::Tensor const&, long, long) const + 0x183 (0x7f16e3a75de3 in /home/celin/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #10: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals
, std::allocator<CUDAStreamInternals
> > > const&) + 0xd98 (0x7f16e3e76128 in /home/celin/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #11: + 0xc42a0b (0x7f16e3e7da0b in /home/celin/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #12: + 0x38a5cb (0x7f16e35c55cb in /home/celin/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #23: THPFunction_apply(_object
, _object
) + 0x38f (0x7f16e39a3a2f in /home/celin/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

I'd like to know how to train with multi-gpus.
Thanks.

issue on using "faiss_gpu"

你好!
我在跑模型gcn-v时,当knn_method = 'faiss'时,运行结果正确。
但是我在跑gcn-v构建knn时,设置了knn_method = 'faiss_gpu',报如下错误:

Traceback (most recent call last):
File "vegcn/main.py", line 104, in
main()
File "vegcn/main.py", line 100, in main
handler(model, cfg, logger)
File "/home/deeplearn/project/test/learn-to-cluster/vegcn/test_gcn_v.py", line 51, in test_gcn_v
dataset = build_dataset(cfg.model['type'], cfg.test_data)
File "/home/deeplearn/project/test/learn-to-cluster/vegcn/datasets/init.py", line 13, in build_dataset
return factorymodel_type
File "/home/deeplearn/project/test/learn-to-cluster/vegcn/datasets/gcn_v_dataset.py", line 52, in init
cfg.knn)
File "/home/deeplearn/project/test/learn-to-cluster/utils/knn.py", line 210, in build_knns
num_process=num_process)
File "/home/deeplearn/project/test/learn-to-cluster/utils/knn.py", line 398, in init
verbose=False)
File "/home/deeplearn/project/test/learn-to-cluster/utils/faiss_search.py", line 110, in faiss_search_knn
verbose=verbose)
File "/home/deeplearn/project/test/learn-to-cluster/utils/faiss_search.py", line 20, in precise_dist
verbose=verbose)
File "/home/deeplearn/project/test/learn-to-cluster/utils/faiss_search.py", line 36, in precise_dist_share_mem
num_per_proc = int(num / num_process) + 1
TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

然后我更改了 num_preocess,并设置为1 、或者直接通过multiprocessing.cpu_count()设置、或者设置为其他值,上述错误消失,但得到的结果特别差。
请教下,这可能是由于什么原因导致的呢?

MS-Celeb-1M splits filenames

Could you please share the filenames used for the train/test splits of Ms-Celeb-1M?

The paper mentions (Sec 4.1) creating 10 random splits. To be consistent with your results, it would be very useful if you give the list of files in each split.

thank you!

Question on iterations.

Hi,

Thanks for sharing your work. It has been very helpful and interesting and I have a question on your code.
Could you please explain the purpose of the "generate_iter_proposals.sh"? You mentioned in your paper that you use iterations for merging super-vertices but you don't mention this here in your pipeline description. When should someone use this script?

About MS-Celeb-1M dataset

Hi, I am interested to test my own feature extractor. But I am unable to match the splits filename to the MS-Celeb-1M. Could you please point out where to get the MS-Celeb-1M dataset that works with the splits filenames?

about use vegcn for unlable feature.

@yl-1993 您好,我正在尝试用vegcn的方法来聚类,不过在过程中出现了问题。我是这么做的:1.先提取特征并转存.bin文件 2.在尝试vegcn,先提GCN-V特征,然后用GCN-E细分聚类。但我的feature是没有标签的,所以在运行是出错。这个.meta文件是必须的吗,我改怎么改动。期待您的回复。

Making own custom training dataset

Hello
Thanks for contributing this paper and project.
I have a question.
How should I make own custom training dataset?
Please let me know about the detailed structure of each files in training dataset.
Thanks

关于gcn-s结构相关问题

你好,最近自己动手复现了论文中的gcn-s部分,有一些疑问还希望解答

  1. 看其他提问中,训练iou和iop可以分开单独训练,我是把最后的64-1的全连接变为64-2,分别代表iou和iop,请问这样有什么问题么,会造成一个损失主导的情况么

  2. 文中对于gcn-s的label制作,随机选一个proposal中的点作为主类,其他为负类,请问论文中说的多次循环,是否指的是每个epoch对同一个proposal都进行一次主类的选择来实现的,还是说每一个epoch,对于一个proposal都要进行多次主类选择,一个样本在一个epoch中变成了多个样本

  3. 在gcn-s的预测过程中,文中说的选择proposal预测出主类数量最多的一次最为最终的预测结果,请问,训练完成后,是每次inference的时候还要选一个随机种子,然后每一个数据预测多次对么?这个地方开放的代码段没有体现,个人不是很理解怎么实现的,还麻烦指点

  4. iou的训练预测,源码里没有加sigmoid进行约束,但是增加gcn-s之后,iop的部分是不是在inference的时候需要加sigmoid约束更合理,那这个时候iou还需要加sigmoid约束么,反之训练iou和iop的时候是不是也应该加sigmoid,这样才能在推理的时候分布一致

  5. 增加gcn-s,根据iop将对应的proposal送到gcn-s中,得到提纯的结果,请问这些新的proposal还需要重新计算gcn-d么,因为iou这个时候已经变了

感谢答疑

The way to get 256-dimension features

Hello,
I'm wondering whether you got 256-dimension features from Higher-Dimension features with dimensionality reduction algorithm or Network Output?
If you were using dimensionality reduction algorithm, what's the name of it.
Or something else?

Thank You :)

baseline features network

Hi, could you please point to the codebase or network architecture used to compute the face features (i.e. the face recognition baseline model)?

thanks!

Issue on testing with unlabeled data

Hello
How are you?
I am going to cluster an unlabeled data with vegcn module.
How should I make a meta file?
There is not a detailed explanation for this part in DATASET.md
I set all of the labels to -1 in the meta file.

the public share nodes

你好,文中提到了有些节点有可能是几个图共有的,请问当前的情况,这种共有节点的两个子图之间是不是只有包含关系,按照论文算法流程并不存在交叉的关系,对么,谢谢

Large-scale data test GCN-V

@yl-1993 ,您好,我在大规模数据对GCN-V测试时出现‘ CUDA out of memory.’的问题,并且在修改batch_size_per_gpu = 1后,仍然出现内存问题,我所用的GPU时RTX2080ti,请问论文里提到的用5M数据测试时,参数时怎么设置的,支持多GPU测试吗?如果支持,在哪里修改它?期待您的早日回复。

Cluster proposal configuration for YouTube Faces

Could you add the configuration used for the results in the paper for the YouTube Faces dataset?

I can reproduce the FastHAC baseline (ave_pre: 0.9964, ave_rec: 0.8731, fscore: 0.9307) with the provided features but unable to reproduce your results. Using the same configuration as provided for MS-Celeb-1M gives rather disappointing results, e.g. 20 prpsls config using the GCN-D model in the YTBFace download: ave_pre: 0.9442, ave_rec: 0.4964, fscore: 0.6507

Trying to increase minsz_i0 and maxsz_i0 to 9 and 900, respectively, yields slightly better but still inferior results: ave_pre: 0.9491, ave_rec: 0.8182, fscore: 0.8788

How did you achieve precision 96.75, recall 92.27, f1 score 94.46 in the paper (1904.02749 table 2)?

issue on zero classes prediction

Hello!
First of all, thank you for sharing the code of your project!
I was trying to implement it on a custom dataset and ran into the following error:

[Time] build super vertices consumes 2.7493 s
[warn] idx2lb is empty! skip write idx2lb to ./data/cluster_proposals/part0_train/hnsw_k_30_th_0.6_step_0.05_minsz_3_maxsz_300_iter_0/pred_labels.txt
[Time] dump clustering to ./data/cluster_proposals/part0_train/hnsw_k_30_th_0.6_step_0.05_minsz_3_maxsz_300_iter_0/pred_labels.txt consumes 0.0001 s
saving cluster proposals to ./data/cluster_proposals/part0_train/hnsw_k_30_th_0.6_step_0.05_minsz_3_maxsz_300_iter_0/proposals
0it [00:00, ?it/s]
k=2, th_knn=0.4, th_step=0.05, minsz=3, maxsz=500, sv_minsz=2, sv_maxsz=8, is_rebuild=False
[Time] read proposal list consumes 4.6750 s
Traceback (most recent call last):
  File "dsgcn/main.py", line 104, in <module>
    main()
  File "dsgcn/main.py", line 100, in main
    handler(model, cfg, logger)
  File "/home/username/learn-to-cluster/dsgcn/train_cluster_det.py", line 18, in train_cluster_det
    train_cluster(model, cfg, logger, batch_processor)
  File "/home/username/learn-to-cluster/dsgcn/train.py", line 16, in train_cluster
    dataset = build_dataset(cfg.train_data)
  File "/home/username/learn-to-cluster/dsgcn/datasets/__init__.py", line 13, in build_dataset
    return ClusterDataset(cfg)
  File "/home/username/learn-to-cluster/dsgcn/datasets/cluster_dataset.py", line 59, in __init__
    self._read(feat_path, label_path, proposal_folders)
  File "/home/username/learn-to-cluster/dsgcn/datasets/cluster_dataset.py", line 94, in _read
    proposal_folders = proposal_folders()
  File "/home/username/learn-to-cluster/proposals/generate_proposals.py", line 69, in generate_proposals
    **param_i1)
  File "/home/username/learn-to-cluster/proposals/generate_iter_proposals.py", line 112, in generate_iter_proposals
    raise FileNotFoundError('{} not found.'.format(sv_labels))
FileNotFoundError: ./data/cluster_proposals/part0_train/hnsw_k_30_th_0.6_step_0.05_minsz_3_maxsz_300_iter_0/pred_labels.txt not found.

Does it mean that no clusters were detected? What do you think should be done in this case?

AttributeError: 'ConfigDict' object has no attribute 'model'

error when run sh scripts/vegcn/test_gcn_e_ms1m.sh

Traceback (most recent call last):
File "vegcn/main.py", line 105, in
main()
File "vegcn/main.py", line 47, in main
cfg = Config.fromfile(args.config)
File "/home/celin/anaconda3/envs/gtw_center/lib/python3.6/site-packages/mmcv/utils/config.py", line 152, in fromfile
cfg_dict, cfg_text = Config._file2dict(filename)
File "/home/celin/anaconda3/envs/gtw_center/lib/python3.6/site-packages/mmcv/utils/config.py", line 91, in _file2dict
mod = import_module('_tempconfig')
File "/home/celin/anaconda3/envs/gtw_center/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 994, in _gcd_import
File "", line 971, in _find_and_load
File "", line 955, in _find_and_load_unlocked
File "", line 665, in _load_unlocked
File "", line 678, in exec_module
File "", line 219, in _call_with_frames_removed
File "/tmp/tmpwsk1pmar/_tempconfig.py", line 21, in
File "/home/celin/anaconda3/envs/gtw_center/lib/python3.6/site-packages/mmcv/utils/config.py", line 264, in getattr
return getattr(self._cfg_dict, name)
File "/home/celin/anaconda3/envs/gtw_center/lib/python3.6/site-packages/mmcv/utils/config.py", line 34, in getattr
raise ex
AttributeError: 'ConfigDict' object has no attribute 'model'

Question about generating proposals

Thanks for your wonderful work.
I have a question in clusters proposal section. In your original paper, you proposed a cluster proposal method. I think, knn is not involved in that section. However, in your code proposals/generate_proposals.py, knn method with parameters knn_method='faiss' and k=80 is used. I am wondering, what this knn method is used for and why you choose k=80.
Thanks for your time.

batch_size can't be greater than 1

Thank you for the paper and the source code!
I have run test_cluster_det.sh on given part1_test data, but when set the param batch_size_per_gpu greater than 1, an tensor size error arise.
Besides, I written the train code based on the given test code (test_cluster_det.py), but the performance is far from given fscore (0.62 compare to 0.71), and the batch_size error also arise. Could you please tell me how to get "pretrained_gcn_d.pth.tar" , such as hyper-parameter setting and training strategy and so on.

Does it support data incremental processing?

Thanks for sharing your outstanding work. It has been very helpful and interesting and I have a question on your code. Does it support data incremental processing?What if I want to process the data incrementally?

Merging two clusters

Hi @yl-1993 , thanks for sharing the work. Really informative and well structured.

I wanted to know, is there any way to merge 2 clusters to create 1 Custer as an output? Thanks.

make gcn-v to very large scale dataset

hi, Lei Yang, in your paper, you write "Empirically, a 1-layer GCN takes 37G CPU Ram and 92s with 16 CPU on a graph with 5.2M vertices for inference", how do you make it?

Code of GCN-S part

Hello, I saw that you will open the training code in late May. I want to ask if you will open the code of GCN-S part at that time. Thank you.

Issue on test gcn E fashion

When I test the gcn E on fashion dataset, I meet an issue: FileNotFoundError: [Errno 2] No such file or directory: './data/work_dir/cfg_test_gcnv_fashion/pred_confs.npz'.

The issue occurs when I use the model provided by the author. I have followed the instruction, first perform "sh scripts/vegcn/test_gcn_v_ms1m.sh" , then perform "sh scripts/vegcn/test_gcn_e_ms1m.sh".

More details:
[2020-04-28 09:18:43,381] Set random seed to 42
[./data/labels/deepfashion_test.meta] #cls: 3991, #inst: 26960
[Time] read meta and feature consumes 0.0498 s
feature shape: (26960, 512), k: 80, norm_feat: True
read knn from ./data/work_dir/cfg_test_gcnv_fashion/deepfashion_test_gcnv_k_5_th_0.0/knns/pretrained_gcn_v_fashion/faiss_k_80.npz
read estimated confidence from ./data/work_dir/cfg_test_gcnv_fashion/pred_confs.npz
[Time] read knn graph consumes 0.4678 s
Traceback (most recent call last):
File "vegcn/main.py", line 104, in
main()
File "vegcn/main.py", line 100, in main
handler(model, cfg, logger)
File "/home/by/workspace/DA/learn-to-cluster/vegcn/test_gcn_e.py", line 105, in test_gcn_e
dataset = build_dataset(cfg.model['type'], cfg.test_data)
File "/home/by/workspace/DA/learn-to-cluster/vegcn/datasets/init.py", line 13, in build_dataset
return factorymodel_type
File "/home/by/workspace/DA/learn-to-cluster/vegcn/datasets/gcn_e_dataset.py", line 77, in init
self.confs = np.load(cfg.pred_confs)['pred_confs']
File "/home/by/.conda/envs/torch1.0_python3.6/lib/python3.6/site-packages/numpy/lib/npyio.py", line 428, in load
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: './data/work_dir/cfg_test_gcnv_fashion/pred_confs.npz'

Questions about the calculation of IoU

hi,

I have a question about the IoU.

The GCN-D directly predicts the IoP and IoU for each proposal and do not need other information. It can be understood that the IoP can be predicted just using the single proposal.

However, when predicting the IoU, the proposal do not contain the full information. We do not know how many nodes are outside the proposal.

I think that clustering is not the same as object detection. For detection, the network can learn a prior knowledge about the shape of the object. But for clustering, the number of the nodes outside the proposal is uncertain and there is no explicit relation between the current proposal and the whole nodes including the outside nodes.

So I do not understand why the IoU can be predicted. Could you explain it?

Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.