hustvl / yolos Goto Github PK

View Code? Open in Web Editor NEW

810.0 21.0 118.0 13.67 MB

[NeurIPS 2021] You Only Look at One Sequence

Home Page: https://arxiv.org/abs/2106.00666

License: MIT License

Python 28.36% Jupyter Notebook 71.64%

vision-transformer transformer object-detection computer-vision

yolos's Issues

AMP Support?

Thanks for your great work and releasing the code!

I find that in engine.py, AMP-related code is commented out. And I am wondering that if I can use AMP in this project. Would it speed up the training and would it hurt the performance?

Where are the pre-trained models?

In your paper, you say that your pre-trained models are under this repo, but I can't find them! I had searched it everywhere, but I can't find them at any other place. I have no time to train them by myself, so I need your help!
If you can give them to me privately, you can send them to [email protected].
Thanks!

How is the performance on Pascal VOC？

❔How is the performance on Pascal VOC？

Hi, I test YOLOS on pascal voc 2007 with default parameters, I can't get a satisfactory result, here is my result：

I wonder is there anything go wrong, could you give me some advice?

Additional context

Error of the size mismatch for pos_embed

We load our pretrained model of vit-base trained with mae method, and we meet the size mismatch for pos_embed. Is there any solution to this problem please?

RuntimeError: Error(s) in loading state_dict for VisionTransformer: size mismatch for pos_embed: copying a param with shape torch.Size([1, 785, 768]) from checkpoint, the shape in current model is torch.Size([1, 578, 768]).

CUDA Out of Memory Errors w Batch Size of 1 on 16GB V100

Using the default FeatureExtractor settings for the HuggingFace port of YOLOS, I am consistently running into CUDA OOM errors on a 16GB V100 (even with a training batch size of 1).

I would like to train YOLOS on publaynet and ideally use 4-8 V100s.

Is there a way to lower the CUDA memory usage while training YOLOS besides batch size (whilst preserving the accuracy and leveraging the pertained models)?

I see that other models (e.g. DiT) use image sizes of 244x244. However, is it fair to assume that such a small image size would not be appropriate for object detection as too much information is lost? In the DiT case document image classification was the objective.

How can i get the ImageNet pretrained model?

❔Question

Additional context

confusion about pe

❔Question

hi, I want to know what does eval_size、init_pe_size and mid_pe_size mean in the code? thanks for your answer.

Additional context

How much GPU memory does the training need？

How much GPU memory does the training need？ It always raise out-of-memory error for me.

How to train (to detect my custom objects)

❔Question

Thank you for your great work!
And I have a question that how to train and to detect my custom objects.

Additional context

Implmenetation queries

❔Question

Hi thanks for opensourcing the code base this gives steps to learn transformers, i am having few queries

The dataset is loaded from coco.py using which function since "ConvertCocoPolysToMask" is not called inherently anywhere
Your load the data training for each epoch using train_one_epoch() for the whole dataset which internally performs losses and then the out for that is performed with evaluation this is performed for each 300 epoch so what's the idea behind this training
Does yolos provide panoptic segmentation also?can we get pretrained model on this

Thanks in advance

Additional context

Small learning rate value

❔Question

Thank you for your great work to examine transformers in OD. My question is that why do we start with a very small learning rate 2.5 * 10e-5 as there is no clue in your paper? My first guess is that you inherited the settings from the DETR framework.

Have you tried with larger learning rates? To speed up the training procedure with more GPUs, any rule to scale up the learning rate for YOLOS as you experimented without losing the performance?

Many thanks.

Train with custom dataset

❔Question

For training a custom dataset is it possible just change the path from coco to my own dataset ?

Additional context

Can you explain why YOLOS-Small has 30 Million parameter while DeiT-S has 22 Million parameter

As the title suggested

Limit on output boudning boxes

Hi is there a limit somewhere on how many bounding boxes are being predicted. I have a very populated image and when I am running the models I noticed that they consistently are predicting only 100 bboxes. Is this a limit somewhere that I can change or is it something else that I am not noticing?

ImportError: cannot import name 'container_abcs' from 'torch._six'

models\layers\helper.py, line 6, import error in my env:
torch 1.9.0+cu111
torch-tb-profiler 0.2.0
torchaudio 0.9.0
torchvision 0.10.0+cu111
error msg like:
from torch._six import container_abcs
ImportError: cannot import name 'container_abcs' from 'torch._six' (C:\python39\lib\site-packages\torch_six.py)

this link fix it
huggingface/pytorch-image-models@94ca140#diff-c7abf83bc43184f6101237b08d7c489c361f3d57b3538d633f6f01d35254b73c
thanks for your code

Hi，what is the magnitude of model parameters ?

❔Question

Hi，what is the magnitude of YOLOS model parameters ?

Additional context

PS :Do you have the official implementation of article 'Benchmarking Detection Transfer Learning with Vision Transformers' over there ？I use your MIMdet model, but my GPU is RTX3070 with 8G and can't run it.

Anyone else getting memory issues?

❔Question

Hello! I wonder if anyone else is getting GPU memory errors even with the small model (yolos_small) ?

Additional context

I am on a 4 GPUs node with Geforce Gtx 1080 ti with 11gb memory each. I use batch size 1 as recommended. Both distributed and non-distributed versions throw the same error.

Tiny model trains smoothly without a trouble.

If there are any tips to reduce memory usage that would be awesome as well!

Object Detection LB

❔Question

Congratulation for publishing a good work.
How is performance wrt to YOLO5 and other YOlo series and also its standing on Object detection LB.

Additional context

About Learning Rate Scheduler

❔Question

Why the step of learning rate scheduler after each epoch instead of each batch in main.py?

Won't the change rate of lr be too slow? (and unstable for various dataset sizes)

[URGENT] Eval results are much lower than what's reported

Hi, thanks for the excellent work!

I follow the instructions in README to evaluate the models provided in your repo. However, the AP I got for yolos_ti .pth, yolos_s_200_pre.pth, yolos_s_300_pre.pth, yolos_s_dWr.pth, and yolos_base.pth are 28.7, 12.5, 12.7, 13.2, and 13.8, respectively. While yolos_ti.pth matches the performance in your paper and log, other four models are significantly lower than what's expected.
Any idea why this would happen? Thanks in advance!

For example, when evaluating the base model, I ran

python  -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path ../data/coco --batch_size 2 --backbone_name base --eval --eval_size 800 --init_pe_size 800 1344 --mid_pe_size 800 1344 --resume ../trained_weights/yolos/yolos_base.pth

and was expected to obtain a 42.0 AP performance, as shown in your paper and log. However, the result is only 13.8 AP.

The complete evaluation output is shown below.

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
| distributed init (rank 0): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 6): env://
| distributed init (rank 5): env://
| distributed init (rank 7): env://
| distributed init (rank 4): env://
Namespace(backbone_name='base', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='../data/coco', dataset_file='coco', decay_rate=0.1, det_token_num=100, device='cuda', dice_loss_coef=1, dist_backend='nccl', dist_url='env://', distributed=True, eos_coef=0.1, epochs=150, eval=True, eval_size=800, giou_loss_coef=2, gpu=0, init_pe_size=[800, 1344], lr=0.0001, lr_backbone=1e-05, lr_drop=100, mid_pe_size=[800, 1344], min_lr=1e-07, num_workers=2, output_dir='', pre_trained='', rank=0, remove_difficult=False, resume='../trained_weights/yolos/yolos_base.pth', sched='warmupcos', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, use_checkpoint=False, warmup_epochs=0, warmup_lr=1e-06, weight_decay=0.0001, world_size=8)
Has mid pe
number of params: 127798368
loading annotations into memory...
Done (t=23.52s)
creating index...
index created!
800
loading annotations into memory...
Done (t=3.00s)
creating index...
index created!
Test:  [  0/313]  eta: 0:39:39  class_error: 29.21  loss: 2.1542 (2.1542)  loss_bbox: 0.4245 (0.4245)  loss_ce: 0.7761 (0.7761)  loss_giou: 0.9535 (0.9535)  cardinality_error_unscaled: 5.3750 (5.3750)  class_error_unscaled: 29.2100 (29.2100)  loss_bbox_unscaled: 0.0849 (0.0849)  loss_ce_unscaled: 0.7761 (0.7761)  loss_giou_unscaled: 0.4768 (0.4768)  time: 7.6030  data: 0.5298  max mem: 3963
Test:  [256/313]  eta: 0:00:26  class_error: 17.22  loss: 2.5668 (2.6435)  loss_bbox: 0.5639 (0.5792)  loss_ce: 0.8598 (0.8386)  loss_giou: 1.1904 (1.2257)  cardinality_error_unscaled: 3.8750 (4.2398)  class_error_unscaled: 28.7817 (28.6160)  loss_bbox_unscaled: 0.1128 (0.1158)  loss_ce_unscaled: 0.8598 (0.8386)  loss_giou_unscaled: 0.5952 (0.6129)  time: 0.4406  data: 0.0137  max mem: 10417
Test:  [312/313]  eta: 0:00:00  class_error: 16.29  loss: 2.8745 (2.6626)  loss_bbox: 0.5974 (0.5833)  loss_ce: 0.8791 (0.8461)  loss_giou: 1.3012 (1.2332)  cardinality_error_unscaled: 3.8750 (4.2370)  class_error_unscaled: 26.2946 (28.7748)  loss_bbox_unscaled: 0.1195 (0.1167)  loss_ce_unscaled: 0.8791 (0.8461)  loss_giou_unscaled: 0.6506 (0.6166)  time: 0.4251  data: 0.0134  max mem: 10417
Test: Total time: 0:02:25 (0.4663 s / it)
Averaged stats: class_error: 16.29  loss: 2.8745 (2.6626)  loss_bbox: 0.5974 (0.5833)  loss_ce: 0.8791 (0.8461)  loss_giou: 1.3012 (1.2332)  cardinality_error_unscaled: 3.8750 (4.2370)  class_error_unscaled: 26.2946 (28.7748)  loss_bbox_unscaled: 0.1195 (0.1167)  loss_ce_unscaled: 0.8791 (0.8461)  loss_giou_unscaled: 0.6506 (0.6166)
Accumulating evaluation results...
DONE (t=15.78s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.13810
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.26766
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.11832
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.05146
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.13066
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.23324
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.18115
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.29001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.31740
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.12520
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.31154
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.49446

Where're the pre-trained models?

In your paper, you say that your pre-trained models are under this repo, but I can't find them! I had found it everywhere, but I can't find them at any other place since I have no time to train them by myself.
If you can give them to me privately, you can send them to [email protected].
Thanks!

Input size can not be dynamic?

I tried something like this:

 python demo.py --resume weights/yolos_s_dWr.pth --data_file ../yolov7/images/COCO_val2014_000000001856.jpg --mid_pe_size 800 864 --init_pe_size 800 864
Not using distributed mode
Namespace(backbone_name='small_dWr', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path=None, data_file='../yolo/images/COCO_val2014_000000001856.jpg', dataset_file='coco', decay_rate=0.1, det_token_num=100, device='cuda', dice_loss_coef=1, dist_url='env://', distributed=False, eos_coef=0.1, epochs=150, eval=False, eval_size=800, giou_loss_coef=2, init_pe_size=[800, 864], lr=0.0001, lr_backbone=1e-05, lr_drop=100, mid_pe_size=[800, 864], min_lr=1e-07, num_workers=2, output_dir='', pre_trained='', remove_difficult=False, resume='weights/yolos_s_dWr.pth', sched='warmupcos', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, use_checkpoint=False, warmup_epochs=0, warmup_lr=1e-06, weight_decay=0.0001, world_size=1)

Got:

torch1.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Detector:
	size mismatch for backbone.pos_embed: copying a param with shape torch.Size([1, 1829, 330]) from checkpoint, the shape in current model is torch.Size([1, 2801, 330]).
	size mismatch for backbone.mid_pos_embed: copying a param with shape torch.Size([13, 1, 1829, 330]) from checkpoint, the shape in current model is torch.Size([13, 1, 2801, 330]).

Train problem with VOC

❔Question

I convert PASCAL VOC dataset to COCO format, but when I trained yolos-tiny with 150 epochs and pre-trained weights , the results is so bad. I get no ideas.

Additional context

Visualization demo?

How to inference on a single image and visualize the result?

About learning rate scheduler

❔Question

From your code main.py, I only see the learning rate updated after every epoch:

YOLOS/main.py

Line 217 in 2e10dc4

lr_scheduler.step(epoch)

Looking at your logs, it also seems to confirm that.
Did you use a warm-up learning rate scheduler for the first few iterations?

ONNX Export

❔Question

Can we export YOLOS models to ONNX format?

Additional context

Because I want to deploy YOLOS model on onnxruntime to save deployment cost and run it via docker on NVIDIA Jetson series

Adding YOLOS to HuggingFace Transformers

Hi YOLOS team :)

I've implemented YOLOS as a fork of 🤗 HuggingFace Transformers, and I'm going to add it soon to the library (see huggingface/transformers#16848). Here's a notebook that illustrates inference with it: https://colab.research.google.com/drive/18ti9HrRoVE6d0vGBtnaeq93Tau3EYqOK?usp=sharing

The reason I'm adding YOLOS is because I really like the simplicity of it, compared to very complex frameworks such as Faster R-CNN and Mask R-CNN. I've added DETR previously also because it simplifies the task of object detection a lot.

As you may or may not know, any model on the HuggingFace hub has its own Github repository. E.g. the YOLOS-small checkpoint can be found here: https://huggingface.co/nielsr/yolos-s. If you check the "files and versions" tab, it includes the weights. The model hub uses git-LFS (large file storage) to use Git with large files such as model weights. This means that any model has its own Git commit history!

A model card can also be added to the repo, which is just a README.

Are you interested in creating an organization on the hub, such that we can store all model checkpoints there (rather than under my user name)?

Let me know!

Kind regards,

Niels
ML Engineer @ HuggingFace

The definitions of sigmoid_focal_loss and dice_loss not found in model/detector.py

Hi,

When I read your code, I found "Unresolved reference 'sigmoid_focal_loss' and Unresolved reference 'dice_loss' " in 'model/detector.py' file, and I could not find any code related to these two functions. May I ask if you forget to upload the definitions of the two functions?

Looking forward to your reply.

Control the patches

❔Question

Hello, thank you for this great contribution. I'm asking if with your architecture, we can control witch patches to feed to YOLOS, because i have already the RoI (Region of interest) of each image of my dataset, and i want to train the model just on theses regions of image, so can we do this by controlling the patches?

Thanks.

hustvl / yolos Goto Github PK

yolos's Issues

❔How is the performance on Pascal VOC？

Additional context

❔Question

Additional context

❔Question

Additional context

❔Question

Additional context

❔Question

Additional context

❔Question

❔Question

Additional context

❔Question

Additional context

❔Question

Additional context

❔Question

Additional context

❔Question

❔Question

Additional context

❔Question

❔Question

Additional context

❔Question

Additional context

Recommend Projects

Recommend Topics

Recommend Org