verg-avesta / countr Goto Github PK

View Code? Open in Web Editor NEW

93.0 93.0 10.0 19.95 MB

CounTR: Transformer-based Generalised Visual Counting

Home Page: https://verg-avesta.github.io/CounTR_Webpage/

License: MIT License

Python 99.12% Shell 0.88%

generalised-visual-counting vision-transformer

countr's People

Contributors

Stargazers

Watchers

Forkers

hzg0505 songrise giofic95 iqra0908 sjh0354 tobias-janoschka

countr's Issues

[Reproduce] Cannot Reproduce the results.

Hello. Thank you for the great work.

However, under the given instructions with the pretrained weight, I could not reproduce the reported result on FSC-147.

Actually, due to the error regarding 'torch._six container_abcs', I could not run the evaluation code with torch==1.10.
(Same error occurs even with 3090Ti server)

Then, when I tried to lower the torch version to 1.8 which is stable, the performance on the test set becomes "MAE : 14.98, MSE : 106.51".

Is there any way to fix this issue and reproduce the reported results?
Or, if may, can you provide the docker image?

Question about the pretrained weight

Hello, may I ask whether the pre-trained weight you provide refer to MAE pre-training weight on FSC-147/CARPK or the results after finetune?

Counting Multiple object class

If I have got objects having different shapes,sizes and colors then do I need exemplar bounding box for each different type of object varrying in shapes/sizes/colors to count
Say I have total 40 objects , 8 Different type of objects having 5 instances each.
Do I need min 8 exampler boxes or it can be less object type
@WeidiXie @Verg-Avesta

关于CARPK数据集微调部分

在Finetune on CARPK部分，我发现训练速度非常的缓慢，大约一小时2个epoch，按照这个速度在3090上需要训练几十天，这显然是不正常的。与此同时内存的占用很高，但是显存的占用很低。
我尝试修改num_workers的数值以及batch size的大小，发现会因为输入形状问题报错。不知道这是否是因为使用了deeplake的缘故。
或许应该在本地CARPK上进行训练？如果在本地训练的话，我该怎么修改FSC_finetune_CARPK.py，或许您能给我一些建议！！感激不尽！！！

Demo with ONNX

Hello,
Great work. This isn't exactly an issue. I have converted the FSC147.pth model to ONNX format and create a library that makes it easier for users to use it. I'm not sure if you can add my repo to the readme?

https://github.com/tamnguyenvan/vision-counter

Thanks,

Without Pretraining

I am just curious how the model performs without doing any pertaining. I only want to train the model on FSC147 data. I commented out the following line in FSC_finetune_cross.py.
misc.load_model_FSC(args=args, model_without_ddp=model_without_ddp)

Am I missing anything?

Since I'm not fine-tuning, only training on FSC147 data. Any idea, what should be my epoch number to get an optimal result?

Thanks!

Dataset images

Which dataset images & model
we should use to run demo.py
I was trying out with fsc147 but dataset has got density map images not the raw jpg images
I am unable to find the annotations file 'annotation_FSC147_384.json' .
Why do we need bounding boxes out here. Could you help understand

TypeError: 'type' object is not subscriptable

I'm getting the following error. Anyone else have seen this before? Thanks!

Traceback (most recent call last):
  File "FSC_test_cross_few-shot.py", line 24, in <module>
    import util.misc as misc
  File "/home/jibanul/research/counting/CounTR/util/misc.py", line 407, in <module>
    def plot_counts(res_csv: Union[str, list[str]], output_dir: str, suffix: str = "", smooth: bool = False):
TypeError: 'type' object is not subscriptable

External exemplars

Hi @Verg-Avesta,
is it possible to make inference providing exemplars which are not part of the input image?
That is, I would like to select k exemplars from the first of n images composing my test dataset, and then use the same k crops from the first image for all the n queries, given that the objects are of the same class across the whole dataset.

Is this possibility already available and implemented somewhere?
Thank you in advance.

Zero count with this image

Any reason
why should model return zero count with this image. I marked some boxes each row of shelf and passed along as exempler
@Verg-Avesta
https://drive.google.com/file/d/1LHE8nzhVNk_e7gteL9SvBxG8TBQYcgt3/view?usp=share_link

While from same image if I pass per shelf i get the count as per models performance.

关于预训练权重

尊敬的大佬您好，感谢您的杰出的工作！！
我正在尝试复现您的代码，我发现您提供的在FSC147上Fine-tuned weights文件FSC147.pth大小为1.1G

我从头开始训练，在Pretrain on FSC147阶段得到的权重文件大小是1.3G，大小与您提供的pretrain.pth文件（1.3G）吻合

但是，在Finetune on FSC147阶段
选用mae_vit_base_patch16参数最后得到的pth文件大小为481.9MB
选用mae_vit_base6_patch16 = mae_vit_base_patch16_fim6参数训练后得到的权重文件大小674.5MB
这与您最终分享的权重FSC147.pth1.1G相差过大
我该如何才能够得到正确大小的权重文件
您是否能解答我的疑惑

[Reproduce] Cannot reproduce the results with base MAE

Hi @Verg-Avesta, I tried to reproduce your pre-training + fine-tuning process, but my results are still different if I use the base MAE model mae_vit_base_patch16, even using the pretrained weights mentioned in issue #6, and even after the fixes suggested in issue #23: I get MAE 13.95 and RMSE 90.25.
On the other hand, if I use the large MAE model mae_vit_large_patch16 I obtain MAE 12.58 and RMSE 87.25, which are closer to the results discussed in the aforementioned issue (MAE: 12.44, RMSE: 89.86), but this isn't mentioned anywhere, as far as I know.

What lets me think that this may be the reason of the difference, besides the fact that the other parameters seems to be the same indicated in the paper or in readme/issues, is the observation that the size of the fine-tuned weights you uploaded on drive (FSC147.pth) is 1.2GB, while the size of my fine-tuned model is ~500MB, as already noticed in issue #7, as far as I can understand using Google Translate.

Other combinations may work as well, e.g. base MAE for pre-training and large MAE for fine-tuning, but I haven't still tried it.

Here are the parameters I used, in case I missed something

for pre-training:

{
    "lr": {
        "desc": null,
        "value": 0.000005
    },
    "blr": {
        "desc": null,
        "value": 0.001
    },
    "seed": {
        "desc": null,
        "value": 0
    },
    "team": {
        "desc": null,
        "value": "wsense"
    },
    "model": {
        "desc": null,
        "value": "mae_vit_base_patch16"
    },
    "title": {
        "desc": null,
        "value": "CounTR_pretraining_paper"
    },
    "wandb": {
        "desc": null,
        "value": "counting"
    },
    "_wandb": {
        "desc": null,
        "value": {
            "t": {
                "1": [
                    1,
                    41,
                    55,
                    63
                ],
                "2": [
                    1,
                    41,
                    55,
                    63
                ],
                "3": [
                    2,
                    13,
                    15,
                    16,
                    23
                ],
                "4": "3.9.15",
                "5": "0.13.9",
                "8": [
                    5
                ]
            },
            "framework": "torch",
            "start_time": 1674927257.097797,
            "cli_version": "0.13.9",
            "is_jupyter_run": false,
            "python_version": "3.9.15",
            "is_kaggle_kernel": false
        }
    },
    "device": {
        "desc": null,
        "value": "cuda"
    },
    "epochs": {
        "desc": null,
        "value": 300
    },
    "gt_dir": {
        "desc": null,
        "value": "gt_density_map_adaptive_384_VarV2"
    },
    "im_dir": {
        "desc": null,
        "value": "images_384_VarV2"
    },
    "min_lr": {
        "desc": null,
        "value": 0
    },
    "resume": {
        "desc": null,
        "value": "./weights/mae_pretrain_vit_base_full.pth"
    },
    "log_dir": {
        "desc": null,
        "value": "None"
    },
    "pin_mem": {
        "desc": null,
        "value": true
    },
    "dist_url": {
        "desc": null,
        "value": "env://"
    },
    "wandb_id": {
        "desc": null,
        "value": null
    },
    "anno_file": {
        "desc": null,
        "value": "annotation_FSC147_384.json"
    },
    "data_path": {
        "desc": null,
        "value": "./data/FSC147/"
    },
    "accum_iter": {
        "desc": null,
        "value": 1
    },
    "batch_size": {
        "desc": null,
        "value": 16
    },
    "local_rank": {
        "desc": null,
        "value": -1
    },
    "mask_ratio": {
        "desc": null,
        "value": 0.5
    },
    "output_dir": {
        "desc": null,
        "value": "./data/out/pretrain"
    },
    "world_size": {
        "desc": null,
        "value": 1
    },
    "dist_on_itp": {
        "desc": null,
        "value": false
    },
    "distributed": {
        "desc": null,
        "value": false
    },
    "num_workers": {
        "desc": null,
        "value": 10
    },
    "start_epoch": {
        "desc": null,
        "value": 0
    },
    "weight_decay": {
        "desc": null,
        "value": 0.05
    },
    "norm_pix_loss": {
        "desc": null,
        "value": false
    },
    "warmup_epochs": {
        "desc": null,
        "value": 10
    },
    "data_split_file": {
        "desc": null,
        "value": "Train_Test_Val_FSC_147.json"
    }
}

and fine-tuning:

{
    "lr": {
        "desc": null,
        "value": 0.00001
    },
    "blr": {
        "desc": null,
        "value": 0.001
    },
    "seed": {
        "desc": null,
        "value": 0
    },
    "team": {
        "desc": null,
        "value": "wsense"
    },
    "model": {
        "desc": null,
        "value": "mae_vit_base_patch16"
    },
    "title": {
        "desc": null,
        "value": "CounTR_finetuning_paper"
    },
    "wandb": {
        "desc": null,
        "value": "counting"
    },
    "_wandb": {
        "desc": null,
        "value": {
            "t": {
                "1": [
                    1,
                    41,
                    55,
                    63
                ],
                "2": [
                    1,
                    41,
                    55,
                    63
                ],
                "3": [
                    2,
                    13,
                    15,
                    16,
                    23
                ],
                "4": "3.9.15",
                "5": "0.13.9",
                "8": [
                    5
                ]
            },
            "framework": "torch",
            "start_time": 1674944766.966494,
            "cli_version": "0.13.9",
            "is_jupyter_run": false,
            "python_version": "3.9.15",
            "is_kaggle_kernel": false
        }
    },
    "device": {
        "desc": null,
        "value": "cuda"
    },
    "epochs": {
        "desc": null,
        "value": 1000
    },
    "gt_dir": {
        "desc": null,
        "value": "gt_density_map_adaptive_384_VarV2"
    },
    "im_dir": {
        "desc": null,
        "value": "images_384_VarV2"
    },
    "min_lr": {
        "desc": null,
        "value": 0
    },
    "resume": {
        "desc": null,
        "value": "./data/out/pretrain/checkpoint__pretraining_299.pth"
    },
    "log_dir": {
        "desc": null,
        "value": "None"
    },
    "pin_mem": {
        "desc": null,
        "value": true
    },
    "dist_url": {
        "desc": null,
        "value": "env://"
    },
    "wandb_id": {
        "desc": null,
        "value": null
    },
    "anno_file": {
        "desc": null,
        "value": "annotation_FSC147_384.json"
    },
    "data_path": {
        "desc": null,
        "value": "./data/FSC147/"
    },
    "accum_iter": {
        "desc": null,
        "value": 1
    },
    "batch_size": {
        "desc": null,
        "value": 8
    },
    "class_file": {
        "desc": null,
        "value": "ImageClasses_FSC147.txt"
    },
    "local_rank": {
        "desc": null,
        "value": -1
    },
    "mask_ratio": {
        "desc": null,
        "value": 0.5
    },
    "output_dir": {
        "desc": null,
        "value": "./data/out/finetune"
    },
    "world_size": {
        "desc": null,
        "value": 1
    },
    "dist_on_itp": {
        "desc": null,
        "value": false
    },
    "distributed": {
        "desc": null,
        "value": false
    },
    "num_workers": {
        "desc": null,
        "value": 10
    },
    "start_epoch": {
        "desc": null,
        "value": 0
    },
    "weight_decay": {
        "desc": null,
        "value": 0.05
    },
    "norm_pix_loss": {
        "desc": null,
        "value": false
    },
    "warmup_epochs": {
        "desc": null,
        "value": 10
    },
    "data_split_file": {
        "desc": null,
        "value": "Train_Test_Val_FSC_147.json"
    }
}

Does it sound reasonable? Maybe you run a fine-tuning with the large MAE?
Thanks in advance

[Reproduce] Cannot Reproduce the results.

Hello. Thank you for the great work.

However, under the given instructions with the pretrained weight, I could not reproduce the reported result on FSC-147.

I followed your instructiond and my test results are "Current MAE: 23.07, RMSE: 104.73".

My results are so far away from your results in the paper. Is there any hyper-parameters different from yours?

What is the actual difference between the zero-shot and the few-shot test scripts?

Just that, what is the exact difference? I know zero-shot takes longer than few-shot, but I dont know exactly well what is the objective of one or the other.

Understanding PreTrain

@Verg-Avesta
Could you help understand the versions of pretain

Pretrain on FSC147	models_mae_noct.py	FSC_pretrain.py
Finetune on FSC147	models_mae_cross.py	FSC_finetune_cross.py
Finetune on CARPK	models_mae_cross.py	FSC_finetune_CARPK.py

What is difference between these files, Pretrain, finetunecross and finetune CARPK

List index out of range while resizeing

Getting this resize issue while finetuning with FSC_finetune_cross.py on my custom dataset. Can you please help me with this?

How to test the trained model on new data without box information?

I'm very interested in your work, and I want to know how to test the trained model on new data without box information?

Besides, I try to run the demo.py but failed, it seems there missing some important files.

shape mismatch error

I run the demo with my test data.

cy@cy-MS-7D40:~/export/CounTR$ python3 demo.py 
^[[CResume checkpoint ./ckpt/FSC147.pth
/home/cy/.local/lib/python3.10/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
  warnings.warn(
Traceback (most recent call last):
  File "/home/cy/export/CounTR/demo.py", line 212, in <module>
    result, elapsed_time = run_one_image(samples, boxes, pos, model)
  File "/home/cy/export/CounTR/demo.py", line 143, in run_one_image
    output, = model(samples[:, :, :, start:start + 384], boxes, 3)
  File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/cy/export/CounTR/models_mae_cross.py", line 205, in forward
    latent = self.forward_encoder(imgs)
  File "/home/cy/export/CounTR/models_mae_cross.py", line 138, in forward_encoder
    x = self.patch_embed(x)
  File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/cy/.local/lib/python3.10/site-packages/timm/models/layers/patch_embed.py", line 34, in forward
    x = self.proj(x).flatten(2).transpose(1, 2)
  File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [768, 3, 16, 16], expected input[1, 4, 384, 384] to have 3 channels, but got 4 channels instead

Could please upload pre-trained weight?

    Hello.It’s the fine-tuned model weights on FSC147 and CARPK. I have renamed it to avoid ambiguity.

Originally posted by @Verg-Avesta in #3 (comment)

Could you please provide the MAE(MIM) pre-trained weight on FSC-147/CARPK to reproduce the result? Please!

License?

Could you please create the license file? Preferrably MIT or BSD

The test result from uploaded FSC147.pth does not match the result of the paper

The test result from uploaded FSC147.pth does not match the result of the paper.

The inference result for FSC147 test data is MAE 15.71, RMSE 104.99. I use the FSC147 fine-tuned weights which you upload on the document.

However, the zero shot result is similar to the result of the paper.

Few shot : 15.71 / 104.99
Zero shot: 14.70 / 106.87

I ran the evaluation code with torch 1.10, timm (0.3.2, 0.4.5). I tried 0.3.2 and 0.4.5 version of timm.

Is there any way to fix this issue?

Input image size doesn't match model

Hi @Verg-Avesta,
while running FSC_test_cross(zero-shot).py with images of size (690, 1280), i got the error in the title:

Traceback (most recent call last):
  File "/home/gficarra/Code/countr/FSC_test_cross(zero-shot).py", line 420, in <module>
    main(args)
  File "/home/gficarra/Code/countr/FSC_test_cross(zero-shot).py", line 318, in main
    output, = model(samples[:, :, :, start:start + 384], boxes, 0)
  File "/home/gficarra/anaconda3/envs/countr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gficarra/Code/countr/models_mae_cross.py", line 198, in forward
    latent = self.forward_encoder(imgs)
  File "/home/gficarra/Code/countr/models_mae_cross.py", line 133, in forward_encoder
    x = self.patch_embed(x)
  File "/home/gficarra/anaconda3/envs/countr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gficarra/anaconda3/envs/countr/lib/python3.9/site-packages/timm/models/layers/patch_embed.py", line 32, in forward
    assert H == self.img_size[0] and W == self.img_size[1], \
AssertionError: Input image size (960*384) doesn't match model (384*384).

If I'm not mistaken, the sliding window moves horizontally along the width of the image, while the height is fixed to 384.
So, I changed lines 128 and 129 from

new_H = 16*int(H/16)
new_W = 16*int(W/16)

new_H = 384
new_W = 16 * int((W / H * 384) / 16)

like in lines 28 and 29 in demo.py.

Maybe you could check if it makes sense, and if would be useful to apply this change.
Thank you.

Count in Case of Multiple categories

@Verg-Avesta so far your model working good on our images.
I have this q

If suppose there are multiple categories of objects present in counting area say Cat A-4, Cat B-5 , Cat C-10
What is going to be count returned by the model, can it provide Category wise count as shown above or total count i.e 19 in this case.
For better counting results do I have to give atleast 1 BBX from each cat as assistance ?

How to count detected objects in image

I am running demo.py and I want to print no. of objects detected in image

[Reproduce] Cannot reproduce the results of the pretrain stage.

Hello. Thank you for your great work.

I used the pre-trained model you provided and tried the second stage parameters provided in the readme.md and got similar results to those in the paper.

However, when I used the first stage parameters provided in the document, started training on the first stage using FSC-147, and then trained on the second stage with the parameters provided in the document, I ended up with the following results.
MAE: 23.76, RMSE: 105.93

I think I have a problem with the first stage training. Is the first stage pre-training model in the paper obtained by using the parameters in the readme.md, and also is there anything else that needs to be modified?

Package version conflict

Your requirements and readme list timm version 0.3.2, but the FSC_pretrain.py has assertion forcing timm to have a version between 0.4.5 and 0.4.9

Other Architecture

Hi @Verg-Avesta

What other architecture can we use with demo.py from the list below .
Do we have pretrained weights for all below architectures in the list ?
Did you try with using Vit small , that uses patch size of 32 ?


# set recommended archs
mae_vit_base_patch16 = mae_vit_base_patch16_dec512d8b  
mae_vit_base4_patch16 = mae_vit_base_patch16_fim4 # decoder: 4 blocks
mae_vit_base6_patch16 = mae_vit_base_patch16_fim6 # decoder: 6 blocks
mae_vit_large_patch16 = mae_vit_large_patch16_dec512d8b  
mae_vit_huge_patch14 = mae_vit_huge_patch14_dec512d8b

Loss scale factor

Dear @Verg-Avesta,
in your paper, you mention that "We scale the loss by a factor of 60". In fact, during training, the density map is first multiplied by 60 in the data preprocessing (https://github.com/Verg-Avesta/CounTR/blob/main/util/FSC147.py#L265), and then divided by 60 when computing MAE (https://github.com/Verg-Avesta/CounTR/blob/main/FSC_finetune_cross.py#L299).

Could you explain how you found this number? How should it be adapted when finetuning on another dataset?

Thank you very much.

About hyper parameters

Are batch_size and learning rate the same as stated in the paper during Fine-tuning stage? In the script file, batch 26, lr=2e-4 is set to be learned. On the other hand, the paper states batch 8, lr=1e-5.
This is the result of finetune with the provided FSC147.pth.
Batch 26, lr=2e-4 : 12.79 mae/ 86.49 rmse
Batch 8, lr=1e-5 : 13.77 mae/ 87.69 rmse

Cannot reproduce results on CARPK

I’ve tried to reproduce the results of finetuning on CARPK but the training seems to deteriorate the results. I trained for 1000 epochs and I get an MAE of 14.9 and RMSE of 20.21. I’ve finetuned the model from the FSC14.pth checkpoint and before finetuning I get an MAE of 10.12 and RMSE of 12.48. I have also unfrozen the encoder (in model_mae_cross). Could you give more information on how you obtained your results ? Thank you

Density Maps

How do we generate the density maps for the Training on custom Data.
Are they needed ?

shape mismatch while running demo.py

Traceback (most recent call last):
File "demo.py", line 198, in
result = run_one_image(samples, boxes, pos, model)
File "demo.py", line 126, in run_one_image
with torch.no_grad():
File "/home/pdguest/environments/mmdet_new/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pdguest/natesh/experiment/CounTR/models_mae_cross.py", line 199, in forward
pred = self.forward_decoder(latent, boxes, shot_num) # [N, 384, 384]
File "/home/pdguest/natesh/experiment/CounTR/models_mae_cross.py", line 169, in forward_decoder
y = torch.cat(y1,dim=0).reshape(shot_num,N,C).to(x.device)
RuntimeError: shape '[3, 1, 512]' is invalid for input of size 1024

I've changed the path for my image and box coordinates with respect to that. I'm not sure what exactly is the cause of error, can you please look into this? thanks

Training epochs in fine-tuning stage

Hi,
How many epochs did you train in fine-tuning stage? The paper only mentioned 300 epochs in pre-train stage.

[Training] how much time it takes to train from scratch

Great work! I would like to ask how long it takes to train your model from scratch on a single 3090. Thank you very much.

Results of zero-shot counting

In your paper, Table.1 presents the results of zero-shot counting:

However, this is different from Table.2:

So which is the right results?

question about density map

Hello. Thank you for the great work.
I got curious. Why did you choose to mark dots on the ground truth density map, instead of using original density map with Gaussian distribution from FSC147 dataset ? Just wondering about your thoughts on this. Thanks