verg-avesta / countr Goto Github PK
View Code? Open in Web Editor NEWCounTR: Transformer-based Generalised Visual Counting
Home Page: https://verg-avesta.github.io/CounTR_Webpage/
License: MIT License
CounTR: Transformer-based Generalised Visual Counting
Home Page: https://verg-avesta.github.io/CounTR_Webpage/
License: MIT License
Hello. Thank you for the great work.
However, under the given instructions with the pretrained weight, I could not reproduce the reported result on FSC-147.
Actually, due to the error regarding 'torch._six container_abcs', I could not run the evaluation code with torch==1.10.
(Same error occurs even with 3090Ti server)
Then, when I tried to lower the torch version to 1.8 which is stable, the performance on the test set becomes "MAE : 14.98, MSE : 106.51".
Is there any way to fix this issue and reproduce the reported results?
Or, if may, can you provide the docker image?
Hello, may I ask whether the pre-trained weight you provide refer to MAE pre-training weight on FSC-147/CARPK or the results after finetune?
If I have got objects having different shapes,sizes and colors then do I need exemplar bounding box for each different type of object varrying in shapes/sizes/colors to count
Say I have total 40 objects , 8 Different type of objects having 5 instances each.
Do I need min 8 exampler boxes or it can be less object type
@WeidiXie @Verg-Avesta
在Finetune on CARPK部分,我发现训练速度非常的缓慢,大约一小时2个epoch,按照这个速度在3090上需要训练几十天,这显然是不正常的。与此同时内存的占用很高,但是显存的占用很低。
我尝试修改num_workers的数值以及batch size的大小,发现会因为输入形状问题报错。不知道这是否是因为使用了deeplake的缘故。
或许应该在本地CARPK上进行训练?如果在本地训练的话,我该怎么修改FSC_finetune_CARPK.py,或许您能给我一些建议!!感激不尽!!!
Hello,
Great work. This isn't exactly an issue. I have converted the FSC147.pth model to ONNX format and create a library that makes it easier for users to use it. I'm not sure if you can add my repo to the readme?
https://github.com/tamnguyenvan/vision-counter
Thanks,
FSC_finetune_cross.py
.misc.load_model_FSC(args=args, model_without_ddp=model_without_ddp)
Am I missing anything?
Thanks!
Which dataset images & model
we should use to run demo.py
I was trying out with fsc147 but dataset has got density map images not the raw jpg images
I am unable to find the annotations file 'annotation_FSC147_384.json' .
Why do we need bounding boxes out here. Could you help understand
I'm getting the following error. Anyone else have seen this before? Thanks!
Traceback (most recent call last):
File "FSC_test_cross_few-shot.py", line 24, in <module>
import util.misc as misc
File "/home/jibanul/research/counting/CounTR/util/misc.py", line 407, in <module>
def plot_counts(res_csv: Union[str, list[str]], output_dir: str, suffix: str = "", smooth: bool = False):
TypeError: 'type' object is not subscriptable
Hi @Verg-Avesta,
is it possible to make inference providing exemplars which are not part of the input image?
That is, I would like to select k exemplars from the first of n images composing my test dataset, and then use the same k crops from the first image for all the n queries, given that the objects are of the same class across the whole dataset.
Is this possibility already available and implemented somewhere?
Thank you in advance.
Any reason
why should model return zero count with this image. I marked some boxes each row of shelf and passed along as exempler
@Verg-Avesta
https://drive.google.com/file/d/1LHE8nzhVNk_e7gteL9SvBxG8TBQYcgt3/view?usp=share_link
While from same image if I pass per shelf i get the count as per models performance.
尊敬的大佬您好,感谢您的杰出的工作!!
我正在尝试复现您的代码,我发现您提供的在FSC147上Fine-tuned weights文件FSC147.pth大小为1.1G
我从头开始训练,在Pretrain on FSC147阶段得到的权重文件大小是1.3G,大小与您提供的pretrain.pth文件(1.3G)吻合
但是,在Finetune on FSC147阶段
选用mae_vit_base_patch16参数最后得到的pth文件大小为481.9MB
选用mae_vit_base6_patch16 = mae_vit_base_patch16_fim6参数训练后得到的权重文件大小674.5MB
这与您最终分享的权重FSC147.pth1.1G相差过大
我该如何才能够得到正确大小的权重文件
您是否能解答我的疑惑
Hi @Verg-Avesta, I tried to reproduce your pre-training + fine-tuning process, but my results are still different if I use the base MAE model mae_vit_base_patch16
, even using the pretrained weights mentioned in issue #6, and even after the fixes suggested in issue #23: I get MAE 13.95 and RMSE 90.25.
On the other hand, if I use the large MAE model mae_vit_large_patch16
I obtain MAE 12.58 and RMSE 87.25, which are closer to the results discussed in the aforementioned issue (MAE: 12.44, RMSE: 89.86), but this isn't mentioned anywhere, as far as I know.
What lets me think that this may be the reason of the difference, besides the fact that the other parameters seems to be the same indicated in the paper or in readme/issues, is the observation that the size of the fine-tuned weights you uploaded on drive (FSC147.pth
) is 1.2GB, while the size of my fine-tuned model is ~500MB, as already noticed in issue #7, as far as I can understand using Google Translate.
Other combinations may work as well, e.g. base MAE for pre-training and large MAE for fine-tuning, but I haven't still tried it.
for pre-training:
{
"lr": {
"desc": null,
"value": 0.000005
},
"blr": {
"desc": null,
"value": 0.001
},
"seed": {
"desc": null,
"value": 0
},
"team": {
"desc": null,
"value": "wsense"
},
"model": {
"desc": null,
"value": "mae_vit_base_patch16"
},
"title": {
"desc": null,
"value": "CounTR_pretraining_paper"
},
"wandb": {
"desc": null,
"value": "counting"
},
"_wandb": {
"desc": null,
"value": {
"t": {
"1": [
1,
41,
55,
63
],
"2": [
1,
41,
55,
63
],
"3": [
2,
13,
15,
16,
23
],
"4": "3.9.15",
"5": "0.13.9",
"8": [
5
]
},
"framework": "torch",
"start_time": 1674927257.097797,
"cli_version": "0.13.9",
"is_jupyter_run": false,
"python_version": "3.9.15",
"is_kaggle_kernel": false
}
},
"device": {
"desc": null,
"value": "cuda"
},
"epochs": {
"desc": null,
"value": 300
},
"gt_dir": {
"desc": null,
"value": "gt_density_map_adaptive_384_VarV2"
},
"im_dir": {
"desc": null,
"value": "images_384_VarV2"
},
"min_lr": {
"desc": null,
"value": 0
},
"resume": {
"desc": null,
"value": "./weights/mae_pretrain_vit_base_full.pth"
},
"log_dir": {
"desc": null,
"value": "None"
},
"pin_mem": {
"desc": null,
"value": true
},
"dist_url": {
"desc": null,
"value": "env://"
},
"wandb_id": {
"desc": null,
"value": null
},
"anno_file": {
"desc": null,
"value": "annotation_FSC147_384.json"
},
"data_path": {
"desc": null,
"value": "./data/FSC147/"
},
"accum_iter": {
"desc": null,
"value": 1
},
"batch_size": {
"desc": null,
"value": 16
},
"local_rank": {
"desc": null,
"value": -1
},
"mask_ratio": {
"desc": null,
"value": 0.5
},
"output_dir": {
"desc": null,
"value": "./data/out/pretrain"
},
"world_size": {
"desc": null,
"value": 1
},
"dist_on_itp": {
"desc": null,
"value": false
},
"distributed": {
"desc": null,
"value": false
},
"num_workers": {
"desc": null,
"value": 10
},
"start_epoch": {
"desc": null,
"value": 0
},
"weight_decay": {
"desc": null,
"value": 0.05
},
"norm_pix_loss": {
"desc": null,
"value": false
},
"warmup_epochs": {
"desc": null,
"value": 10
},
"data_split_file": {
"desc": null,
"value": "Train_Test_Val_FSC_147.json"
}
}
and fine-tuning:
{
"lr": {
"desc": null,
"value": 0.00001
},
"blr": {
"desc": null,
"value": 0.001
},
"seed": {
"desc": null,
"value": 0
},
"team": {
"desc": null,
"value": "wsense"
},
"model": {
"desc": null,
"value": "mae_vit_base_patch16"
},
"title": {
"desc": null,
"value": "CounTR_finetuning_paper"
},
"wandb": {
"desc": null,
"value": "counting"
},
"_wandb": {
"desc": null,
"value": {
"t": {
"1": [
1,
41,
55,
63
],
"2": [
1,
41,
55,
63
],
"3": [
2,
13,
15,
16,
23
],
"4": "3.9.15",
"5": "0.13.9",
"8": [
5
]
},
"framework": "torch",
"start_time": 1674944766.966494,
"cli_version": "0.13.9",
"is_jupyter_run": false,
"python_version": "3.9.15",
"is_kaggle_kernel": false
}
},
"device": {
"desc": null,
"value": "cuda"
},
"epochs": {
"desc": null,
"value": 1000
},
"gt_dir": {
"desc": null,
"value": "gt_density_map_adaptive_384_VarV2"
},
"im_dir": {
"desc": null,
"value": "images_384_VarV2"
},
"min_lr": {
"desc": null,
"value": 0
},
"resume": {
"desc": null,
"value": "./data/out/pretrain/checkpoint__pretraining_299.pth"
},
"log_dir": {
"desc": null,
"value": "None"
},
"pin_mem": {
"desc": null,
"value": true
},
"dist_url": {
"desc": null,
"value": "env://"
},
"wandb_id": {
"desc": null,
"value": null
},
"anno_file": {
"desc": null,
"value": "annotation_FSC147_384.json"
},
"data_path": {
"desc": null,
"value": "./data/FSC147/"
},
"accum_iter": {
"desc": null,
"value": 1
},
"batch_size": {
"desc": null,
"value": 8
},
"class_file": {
"desc": null,
"value": "ImageClasses_FSC147.txt"
},
"local_rank": {
"desc": null,
"value": -1
},
"mask_ratio": {
"desc": null,
"value": 0.5
},
"output_dir": {
"desc": null,
"value": "./data/out/finetune"
},
"world_size": {
"desc": null,
"value": 1
},
"dist_on_itp": {
"desc": null,
"value": false
},
"distributed": {
"desc": null,
"value": false
},
"num_workers": {
"desc": null,
"value": 10
},
"start_epoch": {
"desc": null,
"value": 0
},
"weight_decay": {
"desc": null,
"value": 0.05
},
"norm_pix_loss": {
"desc": null,
"value": false
},
"warmup_epochs": {
"desc": null,
"value": 10
},
"data_split_file": {
"desc": null,
"value": "Train_Test_Val_FSC_147.json"
}
}
Does it sound reasonable? Maybe you run a fine-tuning with the large MAE?
Thanks in advance
Hello. Thank you for the great work.
However, under the given instructions with the pretrained weight, I could not reproduce the reported result on FSC-147.
I followed your instructiond and my test results are "Current MAE: 23.07, RMSE: 104.73".
My results are so far away from your results in the paper. Is there any hyper-parameters different from yours?
Just that, what is the exact difference? I know zero-shot takes longer than few-shot, but I dont know exactly well what is the objective of one or the other.
@Verg-Avesta
Could you help understand the versions of pretain
Pretrain on FSC147 | models_mae_noct.py | FSC_pretrain.py |
---|---|---|
Finetune on FSC147 | models_mae_cross.py | FSC_finetune_cross.py |
Finetune on CARPK | models_mae_cross.py | FSC_finetune_CARPK.py |
What is difference between these files, Pretrain, finetunecross and finetune CARPK
I'm very interested in your work, and I want to know how to test the trained model on new data without box information?
Besides, I try to run the demo.py but failed, it seems there missing some important files.
I run the demo with my test data.
cy@cy-MS-7D40:~/export/CounTR$ python3 demo.py
^[[CResume checkpoint ./ckpt/FSC147.pth
/home/cy/.local/lib/python3.10/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
warnings.warn(
Traceback (most recent call last):
File "/home/cy/export/CounTR/demo.py", line 212, in <module>
result, elapsed_time = run_one_image(samples, boxes, pos, model)
File "/home/cy/export/CounTR/demo.py", line 143, in run_one_image
output, = model(samples[:, :, :, start:start + 384], boxes, 3)
File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cy/export/CounTR/models_mae_cross.py", line 205, in forward
latent = self.forward_encoder(imgs)
File "/home/cy/export/CounTR/models_mae_cross.py", line 138, in forward_encoder
x = self.patch_embed(x)
File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cy/.local/lib/python3.10/site-packages/timm/models/layers/patch_embed.py", line 34, in forward
x = self.proj(x).flatten(2).transpose(1, 2)
File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/cy/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [768, 3, 16, 16], expected input[1, 4, 384, 384] to have 3 channels, but got 4 channels instead
Hello.It’s the fine-tuned model weights on FSC147 and CARPK. I have renamed it to avoid ambiguity.
Originally posted by @Verg-Avesta in #3 (comment)
Could you please provide the MAE(MIM) pre-trained weight on FSC-147/CARPK to reproduce the result? Please!
Could you please create the license file? Preferrably MIT or BSD
The test result from uploaded FSC147.pth does not match the result of the paper.
The inference result for FSC147 test data is MAE 15.71, RMSE 104.99. I use the FSC147 fine-tuned weights which you upload on the document.
However, the zero shot result is similar to the result of the paper.
Few shot : 15.71 / 104.99
Zero shot: 14.70 / 106.87
I ran the evaluation code with torch 1.10, timm (0.3.2, 0.4.5). I tried 0.3.2 and 0.4.5 version of timm.
Is there any way to fix this issue?
Hi @Verg-Avesta,
while running FSC_test_cross(zero-shot).py
with images of size (690, 1280)
, i got the error in the title:
Traceback (most recent call last):
File "/home/gficarra/Code/countr/FSC_test_cross(zero-shot).py", line 420, in <module>
main(args)
File "/home/gficarra/Code/countr/FSC_test_cross(zero-shot).py", line 318, in main
output, = model(samples[:, :, :, start:start + 384], boxes, 0)
File "/home/gficarra/anaconda3/envs/countr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/gficarra/Code/countr/models_mae_cross.py", line 198, in forward
latent = self.forward_encoder(imgs)
File "/home/gficarra/Code/countr/models_mae_cross.py", line 133, in forward_encoder
x = self.patch_embed(x)
File "/home/gficarra/anaconda3/envs/countr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/gficarra/anaconda3/envs/countr/lib/python3.9/site-packages/timm/models/layers/patch_embed.py", line 32, in forward
assert H == self.img_size[0] and W == self.img_size[1], \
AssertionError: Input image size (960*384) doesn't match model (384*384).
If I'm not mistaken, the sliding window moves horizontally along the width of the image, while the height is fixed to 384.
So, I changed lines 128 and 129 from
new_H = 16*int(H/16)
new_W = 16*int(W/16)
to
new_H = 384
new_W = 16 * int((W / H * 384) / 16)
like in lines 28 and 29 in demo.py
.
Maybe you could check if it makes sense, and if would be useful to apply this change.
Thank you.
@Verg-Avesta so far your model working good on our images.
I have this q
If suppose there are multiple categories of objects present in counting area say Cat A-4, Cat B-5 , Cat C-10
What is going to be count returned by the model, can it provide Category wise count as shown above or total count i.e 19 in this case.
For better counting results do I have to give atleast 1 BBX from each cat as assistance ?
I am running demo.py and I want to print no. of objects detected in image
Hello. Thank you for your great work.
I used the pre-trained model you provided and tried the second stage parameters provided in the readme.md and got similar results to those in the paper.
However, when I used the first stage parameters provided in the document, started training on the first stage using FSC-147, and then trained on the second stage with the parameters provided in the document, I ended up with the following results.
MAE: 23.76, RMSE: 105.93
I think I have a problem with the first stage training. Is the first stage pre-training model in the paper obtained by using the parameters in the readme.md, and also is there anything else that needs to be modified?
Your requirements and readme list timm version 0.3.2, but the FSC_pretrain.py has assertion forcing timm to have a version between 0.4.5 and 0.4.9
Hi @Verg-Avesta
# set recommended archs
mae_vit_base_patch16 = mae_vit_base_patch16_dec512d8b
mae_vit_base4_patch16 = mae_vit_base_patch16_fim4 # decoder: 4 blocks
mae_vit_base6_patch16 = mae_vit_base_patch16_fim6 # decoder: 6 blocks
mae_vit_large_patch16 = mae_vit_large_patch16_dec512d8b
mae_vit_huge_patch14 = mae_vit_huge_patch14_dec512d8b
Dear @Verg-Avesta,
in your paper, you mention that "We scale the loss by a factor of 60". In fact, during training, the density map is first multiplied by 60 in the data preprocessing (https://github.com/Verg-Avesta/CounTR/blob/main/util/FSC147.py#L265), and then divided by 60 when computing MAE (https://github.com/Verg-Avesta/CounTR/blob/main/FSC_finetune_cross.py#L299).
Could you explain how you found this number? How should it be adapted when finetuning on another dataset?
Thank you very much.
Are batch_size and learning rate the same as stated in the paper during Fine-tuning stage? In the script file, batch 26, lr=2e-4 is set to be learned. On the other hand, the paper states batch 8, lr=1e-5.
This is the result of finetune with the provided FSC147.pth.
Batch 26, lr=2e-4 : 12.79 mae/ 86.49 rmse
Batch 8, lr=1e-5 : 13.77 mae/ 87.69 rmse
I’ve tried to reproduce the results of finetuning on CARPK but the training seems to deteriorate the results. I trained for 1000 epochs and I get an MAE of 14.9 and RMSE of 20.21. I’ve finetuned the model from the FSC14.pth checkpoint and before finetuning I get an MAE of 10.12 and RMSE of 12.48. I have also unfrozen the encoder (in model_mae_cross). Could you give more information on how you obtained your results ? Thank you
How do we generate the density maps for the Training on custom Data.
Are they needed ?
Traceback (most recent call last):
File "demo.py", line 198, in
result = run_one_image(samples, boxes, pos, model)
File "demo.py", line 126, in run_one_image
with torch.no_grad():
File "/home/pdguest/environments/mmdet_new/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pdguest/natesh/experiment/CounTR/models_mae_cross.py", line 199, in forward
pred = self.forward_decoder(latent, boxes, shot_num) # [N, 384, 384]
File "/home/pdguest/natesh/experiment/CounTR/models_mae_cross.py", line 169, in forward_decoder
y = torch.cat(y1,dim=0).reshape(shot_num,N,C).to(x.device)
RuntimeError: shape '[3, 1, 512]' is invalid for input of size 1024
I've changed the path for my image and box coordinates with respect to that. I'm not sure what exactly is the cause of error, can you please look into this? thanks
Hi,
How many epochs did you train in fine-tuning stage? The paper only mentioned 300 epochs in pre-train stage.
Great work! I would like to ask how long it takes to train your model from scratch on a single 3090. Thank you very much.
Hello. Thank you for the great work.
I got curious. Why did you choose to mark dots on the ground truth density map, instead of using original density map with Gaussian distribution from FSC147 dataset ? Just wondering about your thoughts on this. Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.