hustvl / yolos Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS 2021] You Only Look at One Sequence
Home Page: https://arxiv.org/abs/2106.00666
License: MIT License
[NeurIPS 2021] You Only Look at One Sequence
Home Page: https://arxiv.org/abs/2106.00666
License: MIT License
Thanks for your great work and releasing the code!
I find that in engine.py, AMP-related code is commented out. And I am wondering that if I can use AMP in this project. Would it speed up the training and would it hurt the performance?
In your paper, you say that your pre-trained models are under this repo, but I can't find them! I had searched it everywhere, but I can't find them at any other place. I have no time to train them by myself, so I need your help!
If you can give them to me privately, you can send them to [email protected].
Thanks!
Hi, I test YOLOS on pascal voc 2007 with default parameters, I can't get a satisfactory result, here is my result:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.276
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.497
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.274
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.006
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.085
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.390
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.297
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.433
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.490
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.053
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.289
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.628
I wonder is there anything go wrong, could you give me some advice?
We load our pretrained model of vit-base trained with mae method, and we meet the size mismatch for pos_embed. Is there any solution to this problem please?
RuntimeError: Error(s) in loading state_dict for VisionTransformer: size mismatch for pos_embed: copying a param with shape torch.Size([1, 785, 768]) from checkpoint, the shape in current model is torch.Size([1, 578, 768]).
Using the default FeatureExtractor settings for the HuggingFace port of YOLOS, I am consistently running into CUDA OOM errors on a 16GB V100 (even with a training batch size of 1).
I would like to train YOLOS on publaynet and ideally use 4-8 V100s.
Is there a way to lower the CUDA memory usage while training YOLOS besides batch size (whilst preserving the accuracy and leveraging the pertained models)?
I see that other models (e.g. DiT) use image sizes of 244x244. However, is it fair to assume that such a small image size would not be appropriate for object detection as too much information is lost? In the DiT case document image classification was the objective.
hi, I want to know what does eval_size、init_pe_size and mid_pe_size mean in the code? thanks for your answer.
How much GPU memory does the training need? It always raise out-of-memory error for me.
Thank you for your great work!
And I have a question that how to train and to detect my custom objects.
Hi thanks for opensourcing the code base this gives steps to learn transformers, i am having few queries
Thanks in advance
Thank you for your great work to examine transformers in OD. My question is that why do we start with a very small learning rate 2.5 * 10e-5
as there is no clue in your paper? My first guess is that you inherited the settings from the DETR framework.
Have you tried with larger learning rates? To speed up the training procedure with more GPUs, any rule to scale up the learning rate for YOLOS as you experimented without losing the performance?
Many thanks.
For training a custom dataset is it possible just change the path from coco to my own dataset ?
As the title suggested
Hi is there a limit somewhere on how many bounding boxes are being predicted. I have a very populated image and when I am running the models I noticed that they consistently are predicting only 100 bboxes. Is this a limit somewhere that I can change or is it something else that I am not noticing?
models\layers\helper.py, line 6, import error in my env:
torch 1.9.0+cu111
torch-tb-profiler 0.2.0
torchaudio 0.9.0
torchvision 0.10.0+cu111
error msg like:
from torch._six import container_abcs
ImportError: cannot import name 'container_abcs' from 'torch._six' (C:\python39\lib\site-packages\torch_six.py)
this link fix it
huggingface/pytorch-image-models@94ca140#diff-c7abf83bc43184f6101237b08d7c489c361f3d57b3538d633f6f01d35254b73c
thanks for your code
Hi,what is the magnitude of YOLOS model parameters ?
PS :Do you have the official implementation of article 'Benchmarking Detection Transfer Learning with Vision Transformers' over there ?I use your MIMdet model, but my GPU is RTX3070 with 8G and can't run it.
Hello! I wonder if anyone else is getting GPU memory errors even with the small model (yolos_small) ?
I am on a 4 GPUs node with Geforce Gtx 1080 ti with 11gb memory each. I use batch size 1 as recommended. Both distributed and non-distributed versions throw the same error.
Tiny model trains smoothly without a trouble.
If there are any tips to reduce memory usage that would be awesome as well!
Congratulation for publishing a good work.
How is performance wrt to YOLO5 and other YOlo series and also its standing on Object detection LB.
Why the step of learning rate scheduler after each epoch instead of each batch in main.py?
Won't the change rate of lr be too slow? (and unstable for various dataset sizes)
Hi, thanks for the excellent work!
I follow the instructions in README to evaluate the models provided in your repo. However, the AP I got for yolos_ti .pth, yolos_s_200_pre.pth, yolos_s_300_pre.pth, yolos_s_dWr.pth, and yolos_base.pth are 28.7, 12.5, 12.7, 13.2, and 13.8, respectively. While yolos_ti.pth matches the performance in your paper and log, other four models are significantly lower than what's expected.
Any idea why this would happen? Thanks in advance!
For example, when evaluating the base model, I ran
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path ../data/coco --batch_size 2 --backbone_name base --eval --eval_size 800 --init_pe_size 800 1344 --mid_pe_size 800 1344 --resume ../trained_weights/yolos/yolos_base.pth
and was expected to obtain a 42.0 AP performance, as shown in your paper and log. However, the result is only 13.8 AP.
The complete evaluation output is shown below.
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
| distributed init (rank 0): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 6): env://
| distributed init (rank 5): env://
| distributed init (rank 7): env://
| distributed init (rank 4): env://
Namespace(backbone_name='base', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='../data/coco', dataset_file='coco', decay_rate=0.1, det_token_num=100, device='cuda', dice_loss_coef=1, dist_backend='nccl', dist_url='env://', distributed=True, eos_coef=0.1, epochs=150, eval=True, eval_size=800, giou_loss_coef=2, gpu=0, init_pe_size=[800, 1344], lr=0.0001, lr_backbone=1e-05, lr_drop=100, mid_pe_size=[800, 1344], min_lr=1e-07, num_workers=2, output_dir='', pre_trained='', rank=0, remove_difficult=False, resume='../trained_weights/yolos/yolos_base.pth', sched='warmupcos', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, use_checkpoint=False, warmup_epochs=0, warmup_lr=1e-06, weight_decay=0.0001, world_size=8)
Has mid pe
number of params: 127798368
loading annotations into memory...
Done (t=23.52s)
creating index...
index created!
800
loading annotations into memory...
Done (t=3.00s)
creating index...
index created!
Test: [ 0/313] eta: 0:39:39 class_error: 29.21 loss: 2.1542 (2.1542) loss_bbox: 0.4245 (0.4245) loss_ce: 0.7761 (0.7761) loss_giou: 0.9535 (0.9535) cardinality_error_unscaled: 5.3750 (5.3750) class_error_unscaled: 29.2100 (29.2100) loss_bbox_unscaled: 0.0849 (0.0849) loss_ce_unscaled: 0.7761 (0.7761) loss_giou_unscaled: 0.4768 (0.4768) time: 7.6030 data: 0.5298 max mem: 3963
Test: [256/313] eta: 0:00:26 class_error: 17.22 loss: 2.5668 (2.6435) loss_bbox: 0.5639 (0.5792) loss_ce: 0.8598 (0.8386) loss_giou: 1.1904 (1.2257) cardinality_error_unscaled: 3.8750 (4.2398) class_error_unscaled: 28.7817 (28.6160) loss_bbox_unscaled: 0.1128 (0.1158) loss_ce_unscaled: 0.8598 (0.8386) loss_giou_unscaled: 0.5952 (0.6129) time: 0.4406 data: 0.0137 max mem: 10417
Test: [312/313] eta: 0:00:00 class_error: 16.29 loss: 2.8745 (2.6626) loss_bbox: 0.5974 (0.5833) loss_ce: 0.8791 (0.8461) loss_giou: 1.3012 (1.2332) cardinality_error_unscaled: 3.8750 (4.2370) class_error_unscaled: 26.2946 (28.7748) loss_bbox_unscaled: 0.1195 (0.1167) loss_ce_unscaled: 0.8791 (0.8461) loss_giou_unscaled: 0.6506 (0.6166) time: 0.4251 data: 0.0134 max mem: 10417
Test: Total time: 0:02:25 (0.4663 s / it)
Averaged stats: class_error: 16.29 loss: 2.8745 (2.6626) loss_bbox: 0.5974 (0.5833) loss_ce: 0.8791 (0.8461) loss_giou: 1.3012 (1.2332) cardinality_error_unscaled: 3.8750 (4.2370) class_error_unscaled: 26.2946 (28.7748) loss_bbox_unscaled: 0.1195 (0.1167) loss_ce_unscaled: 0.8791 (0.8461) loss_giou_unscaled: 0.6506 (0.6166)
Accumulating evaluation results...
DONE (t=15.78s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.13810
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.26766
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.11832
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.05146
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.13066
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.23324
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.18115
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.29001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.31740
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.12520
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.31154
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.49446
In your paper, you say that your pre-trained models are under this repo, but I can't find them! I had found it everywhere, but I can't find them at any other place since I have no time to train them by myself.
If you can give them to me privately, you can send them to [email protected].
Thanks!
I tried something like this:
python demo.py --resume weights/yolos_s_dWr.pth --data_file ../yolov7/images/COCO_val2014_000000001856.jpg --mid_pe_size 800 864 --init_pe_size 800 864
Not using distributed mode
Namespace(backbone_name='small_dWr', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path=None, data_file='../yolo/images/COCO_val2014_000000001856.jpg', dataset_file='coco', decay_rate=0.1, det_token_num=100, device='cuda', dice_loss_coef=1, dist_url='env://', distributed=False, eos_coef=0.1, epochs=150, eval=False, eval_size=800, giou_loss_coef=2, init_pe_size=[800, 864], lr=0.0001, lr_backbone=1e-05, lr_drop=100, mid_pe_size=[800, 864], min_lr=1e-07, num_workers=2, output_dir='', pre_trained='', remove_difficult=False, resume='weights/yolos_s_dWr.pth', sched='warmupcos', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, use_checkpoint=False, warmup_epochs=0, warmup_lr=1e-06, weight_decay=0.0001, world_size=1)
Got:
torch1.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Detector:
size mismatch for backbone.pos_embed: copying a param with shape torch.Size([1, 1829, 330]) from checkpoint, the shape in current model is torch.Size([1, 2801, 330]).
size mismatch for backbone.mid_pos_embed: copying a param with shape torch.Size([13, 1, 1829, 330]) from checkpoint, the shape in current model is torch.Size([13, 1, 2801, 330]).
I convert PASCAL VOC dataset to COCO format, but when I trained yolos-tiny with 150 epochs and pre-trained weights , the results is so bad. I get no ideas.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.039 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.084
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.032 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.005 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.059 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.124
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.234 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.283 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.066 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.402
How to inference on a single image and visualize the result?
From your code main.py
, I only see the learning rate updated after every epoch:
Line 217 in 2e10dc4
Looking at your logs, it also seems to confirm that.
Did you use a warm-up learning rate scheduler for the first few iterations?
Can we export YOLOS models to ONNX format?
Because I want to deploy YOLOS model on onnxruntime to save deployment cost and run it via docker on NVIDIA Jetson series
Hi YOLOS team :)
I've implemented YOLOS as a fork of 🤗 HuggingFace Transformers, and I'm going to add it soon to the library (see huggingface/transformers#16848). Here's a notebook that illustrates inference with it: https://colab.research.google.com/drive/18ti9HrRoVE6d0vGBtnaeq93Tau3EYqOK?usp=sharing
The reason I'm adding YOLOS is because I really like the simplicity of it, compared to very complex frameworks such as Faster R-CNN and Mask R-CNN. I've added DETR previously also because it simplifies the task of object detection a lot.
As you may or may not know, any model on the HuggingFace hub has its own Github repository. E.g. the YOLOS-small checkpoint can be found here: https://huggingface.co/nielsr/yolos-s. If you check the "files and versions" tab, it includes the weights. The model hub uses git-LFS (large file storage) to use Git with large files such as model weights. This means that any model has its own Git commit history!
A model card can also be added to the repo, which is just a README.
Are you interested in creating an organization on the hub, such that we can store all model checkpoints there (rather than under my user name)?
Let me know!
Kind regards,
Niels
ML Engineer @ HuggingFace
Hi,
When I read your code, I found "Unresolved reference 'sigmoid_focal_loss' and Unresolved reference 'dice_loss' " in 'model/detector.py' file, and I could not find any code related to these two functions. May I ask if you forget to upload the definitions of the two functions?
Looking forward to your reply.
Hello, thank you for this great contribution. I'm asking if with your architecture, we can control witch patches to feed to YOLOS, because i have already the RoI (Region of interest) of each image of my dataset, and i want to train the model just on theses regions of image, so can we do this by controlling the patches?
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.