Giter VIP home page Giter VIP logo

deformable-detr's Introduction

Deformable DETR

By Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

This repository is an official implementation of the paper Deformable DETR: Deformable Transformers for End-to-End Object Detection.

Introduction

TL; DR. Deformable DETR is an efficient and fast-converging end-to-end object detector. It mitigates the high complexity and slow convergence issues of DETR via a novel sampling-based efficient attention mechanism.

deformable_detr

deformable_detr

Abstract. DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10× less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach.

License

This project is released under the Apache 2.0 license.

Changelog

See changelog.md for detailed logs of major changes.

Citing Deformable DETR

If you find Deformable DETR useful in your research, please consider citing:

@article{zhu2020deformable,
  title={Deformable DETR: Deformable Transformers for End-to-End Object Detection},
  author={Zhu, Xizhou and Su, Weijie and Lu, Lewei and Li, Bin and Wang, Xiaogang and Dai, Jifeng},
  journal={arXiv preprint arXiv:2010.04159},
  year={2020}
}

Main Results

Method Epochs AP APS APM APL params
(M)
FLOPs
(G)
Total
Train
Time
(GPU
hours)
Train
Speed
(GPU
hours
/epoch)
Infer
Speed
(FPS)
Batch
Infer
Speed
(FPS)
URL
Faster R-CNN + FPN 109 42.0 26.6 45.4 53.4 42 180 380 3.5 25.6 28.0 -
DETR 500 42.0 20.5 45.8 61.1 41 86 2000 4.0 27.0 38.3 -
DETR-DC5 500 43.3 22.5 47.3 61.1 41 187 7000 14.0 11.4 12.4 -
DETR-DC5 50 35.3 15.2 37.5 53.6 41 187 700 14.0 11.4 12.4 -
DETR-DC5+ 50 36.2 16.3 39.2 53.9 41 187 700 14.0 11.4 12.4 -
Deformable DETR
(single scale)
50 39.4 20.6 43.0 55.5 34 78 160 3.2 27.0 42.4 config
log
model
Deformable DETR
(single scale, DC5)
50 41.5 24.1 45.3 56.0 34 128 215 4.3 22.1 29.4 config
log
model
Deformable DETR 50 44.5 27.1 47.6 59.6 40 173 325 6.5 15.0 19.4 config
log
model
+ iterative bounding box refinement 50 46.2 28.3 49.2 61.5 41 173 325 6.5 15.0 19.4 config
log
model
++ two-stage Deformable DETR 50 46.9 29.6 50.1 61.6 41 173 340 6.8 14.5 18.8 config
log
model

Note:

  1. All models of Deformable DETR are trained with total batch size of 32.
  2. Training and inference speed are measured on NVIDIA Tesla V100 GPU.
  3. "Deformable DETR (single scale)" means only using res5 feature map (of stride 32) as input feature maps for Deformable Transformer Encoder.
  4. "DC5" means removing the stride in C5 stage of ResNet and add a dilation of 2 instead.
  5. "DETR-DC5+" indicates DETR-DC5 with some modifications, including using Focal Loss for bounding box classification and increasing number of object queries to 300.
  6. "Batch Infer Speed" refer to inference with batch size = 4 to maximize GPU utilization.
  7. The original implementation is based on our internal codebase. There are slight differences in the final accuracy and running time due to the plenty details in platform switch.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n deformable_detr python=3.7 pip

    Then, activate the environment:

    conda activate deformable_detr
  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

    For example, if your CUDA version is 9.2, you could install pytorch and torchvision as following:

    conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=9.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

Please download COCO 2017 dataset and organize them as following:

code_root/
└── data/
    └── coco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Training

Training on single node

For example, the command for training Deformable DETR on 8 GPUs is as following:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_deformable_detr.sh

Training on multiple nodes

For example, the command for training Deformable DETR on 2 nodes of each with 8 GPUs is as following:

On node 1:

MASTER_ADDR=<IP address of node 1> NODE_RANK=0 GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 16 ./configs/r50_deformable_detr.sh

On node 2:

MASTER_ADDR=<IP address of node 1> NODE_RANK=1 GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 16 ./configs/r50_deformable_detr.sh

Training on slurm cluster

If you are using slurm cluster, you can simply run the following command to train on 1 node with 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deformable_detr 8 configs/r50_deformable_detr.sh

Or 2 nodes of each with 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deformable_detr 16 configs/r50_deformable_detr.sh

Some tips to speed-up training

  • If your file system is slow to read images, you may consider enabling '--cache_mode' option to load whole dataset into memory at the beginning of training.
  • You may increase the batch size to maximize the GPU utilization, according to GPU memory of yours, e.g., set '--batch_size 3' or '--batch_size 4'.

Evaluation

You can get the config file and pretrained model of Deformable DETR (the link is in "Main Results" session), then run following command to evaluate it on COCO 2017 validation set:

<path to config file> --resume <path to pre-trained model> --eval

You can also run distributed evaluation by using ./tools/run_dist_launch.sh or ./tools/run_dist_slurm.sh.

deformable-detr's People

Contributors

jackroos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deformable-detr's Issues

Deformable DETR not fully end-to-end because of missing no_object class?

Hello,

I have another question. First I want to know if I understood the method for suppressing multiple detections and the used label loss correctly. In contrast to the original DETR, the Deformable DETR doesn't output no_object class. Thats why in training the target of all unmatched detections ist not a high score for no_object class, but a low score for every class. And thats why you use focal loss with binary cross entropy for the label loss.

My question is now, if the Deformable DETR is still fully end-to-end since I have to set a score threshold to filter out no_object predictions. The original DETR already outputs the correct classes without post processing.

what does samples.mask do?

` def forward(self, samples: NestedTensor):
""" The forward expects a NestedTensor, which consists of:
- samples.tensor: batched images, of shape [batch_size x 3 x H x W]
- samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels

        It returns a dict with the following elements:
           - "pred_logits": the classification logits (including no-object) for all queries.
                            Shape= [batch_size x num_queries x (num_classes + 1)]
           - "pred_boxes": The normalized boxes coordinates for all queries, represented as
                           (center_x, center_y, height, width). These values are normalized in [0, 1],
                           relative to the size of each individual image (disregarding possible padding).
                           See PostProcess for information on how to retrieve the unnormalized bounding box.
           - "aux_outputs": Optional, only returned when auxilary losses are activated. It is a list of
                            dictionnaries containing the two above keys for each decoder layer.
    """`

As is described in models/deformable_detr.py, what does samples.mask do here?

Initialization for bias parameters of the attention weights seems to be wrong

Reference to the paper's decription:
'Bias parameters of the linear projection are initialized to make A(mlqk) = 1/LK'

But in the code implementation:

constant_(self.attention_weights.bias.data, 0.)

we can see it was actually initialized to 0!

Due to the weight parameters are also initialized to 0, this will make the finally output becomes all zero..
Am I wrong.. so confused~

Why sampling_offsets in `MSDeformAttn` is normalized by n_points

As shown here,

sampling_locations = reference_points[:, :, None, :, None, :2] \
                                 + sampling_offsets / self.n_points * reference_points[:, :, None, :, None, 2:] * 0.5

I find it strange to normalize the sampling_offsets by n_points.

In my opinion, the predicted vector sampling_offsets indicates the sampling location relative to the reference_point, which is irrelevant to the total number of the sampling points. So that normalizing the offsets with the number of the sampling points doesn't seem to make sense.

Could anyone give some explanation? Thanks!

FLOP count

Hi,

Could you please share how you calculated the FLOPs of your MultiscaleDeformableAttention CUDA module?

Some question about the batch size

Hi, I have some question about the batch size.

First, can we run the code on 2080Ti with 2 image per card?
I run the command "python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py " follow detr but it will raise out of memory.
I am not sure if the command "GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_deformable_detr.sh" can run smoothly because I have met some problems with it.
When I run "GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_deformable_detr.sh", it raise
"bash: ./tools/run_dist_launch.sh: Permission denied".
And when I try "GPUS_PER_NODE=8 sh ./tools/run_dist_launch.sh 8 ./configs/r50_deformable_detr.sh", it raise
"+ GPUS=8
./tools/run_dist_launch.sh: 11: ./tools/run_dist_launch.sh: Bad substitution"

Second, will the learning rate be auto-scaled based on the batch size?
Is the default setting batch size 32 with the base learning rate 2e-4?
So if I use 8 GPU with 1 image per card, should I change the base learning rate to 5e-5?

lower AP for Deformable DETR

I just follow the instructions to re-run the code for Deformable DETR, but get a lower AP.

When using a batch size 16x2, I get an AP of 44.1;

When using a batch size 32x1, I get an AP of 44.3;

When using a batch size 16x4, I get an AP of 43.5;

Also, for Deformable DETR with Iterative Refinement, I get an AP of 45.7 with batch size 16x2 (vs the released AP of 46.2).

Is this in the normal fluctuation range?

Moreover, it seems that the batch size will influence the final performance without adjusting the learning rate.

Thanks for your response in advance.

Mismatch in loading model

Hi,
@jackroos I am trying to run the code by using the model weights provided. I used r50_deformable_detr_plus_iterative_bbox_refinement_plus_plus_two_stage-checkpoint.pth
and respective config.
When loading the model it gives mismatch of transformer shape.
Is the checkpoint correct?

Visualize model predictions (get scores and boxes)

Hello,

thank you for your great work!

I want to test the performance of my network on some test images. For this I visualize the predicted boxes and scores on the images. I got everything working as I can use my code from the original DETR, but I was wondering how to get the correct scores and labels.

For DETR I did:

# keep only predictions with 0.7+ confidence
probas = outputs['pred_logits'].softmax(-1)[0, :, :-1]
keep = probas.max(-1).values > 0.7

# convert boxes from [0; 1] to image scales
bboxes_scaled = rescale_bboxes(outputs['pred_boxes'][0, keep], im.size)

scores, boxes = probas[keep], bboxes_scaled

Since you use sigmoid function for DeformableDETR I replaced these lines with:
(Heavily inspired by PostProcess class from deformable_detr.py 😄)

 prob = out_logits.sigmoid() 
 topk_values, topk_indexes = torch.topk(prob.view(out_logits.shape[0], -1), 100, dim=1) 
 scores = topk_values 
 topk_boxes = topk_indexes // out_logits.shape[2] 
 labels = topk_indexes % out_logits.shape[2] 
 boxes = box_ops.box_cxcywh_to_xyxy(out_bbox) 
 boxes = torch.gather(boxes, 1, topk_boxes.unsqueeze(-1).repeat(1,1,4)) 
  
 # and from relative [0, 1] to absolute [0, height] coordinates 
 img_h, img_w = im.size
 img_w = torch.tensor(img_w, device=boxes.device)
 img_h = torch.tensor(img_h, device=boxes.device)
 scale_fct = torch.unsqueeze(torch.stack([img_w, img_h, img_w, img_h], 0))
 boxes = boxes * scale_fct[:, None, :]

With this I get a lot of false positives. The scores are pretty low compared to softmax scores, so which threshold would you recommend to get rid of the false positives?

something is wrong with ap=0

after I trained the code followed the readme without any change for 6 epoch, the AP is close to 0.
anyone knows what's wrong with it? and how to solve it?
any help will appreciate!
image

Concatenation or summation?

Hi, @jackroos

I am a beginner in attention.
I wonder why you express MHAttention like this in paper:
image
rather than use concatenation and Wo. Is there any particular purpose to this?

Same question for the Deformable Attention and MS Deformable Attention.

Thanks in advance.

Attempt to Reproduce the Results

Recently I attempt to train the Deformable DETR model on a 8-GPU machine (8x TITAN RTX) following the command:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/r50_deformable_detr.sh

The results I got are shown as below:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.435
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.625
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.472
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.256
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.467
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.575
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.578
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.620
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.394
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.665
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.808
Training time 2 days, 6:18:48

which has lower performance (around 1% lower) compared with the reported result:

Method Epochs AP APS APM APL
Deformable DETR 50 44.5 27.1 47.6 59.6

I wonder if the results I obtained are reasonable? As shown in #1, it seems the pre-trained checkpoints are trained using batch size 32 (i.e. 2 nodes, 8 GPUs per node, 2 images per GPU). Is the performance gap due to the batch size? (Or the environment settings?) Thanks for your great help!

DETR-DC5+ 500 epoch

Hi! Great method! Do you have the number for DETR-DC5+ 500 epoch experiment? I can't find it in the paper. But I think it is important for comparisons (and I don't have the resource to produce that result myself).

Training/ finetuning Deformable-DETR on custom dataset?

Thanks for sharing the Deformable -DETR code.
Can you clarify recommendations for training on a custom dataset?
Should we build a model from scratch, or better to use and fine-tune a full coco pretrained model and adjust the linear layer to desired class count?

Slurm training command not working

Hi,

I am using a Slurm system and when I run the training command as mentioned in the Readme file I get an error.

The file cannot be found but it is present .

Logs :

bash-4.2$ GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh ml deformable_detr 8 ./configs/r50_deformable_detr.sh
: No such file or directory

Please help.

Questions about several implementation details

Thanks for this amazing work and code published. I am wondering the reasons for some implementation details and whether they make a huge difference to the final performance.

  1. In https://github.com/fundamentalvision/Deformable-DETR/blob/main/models/deformable_detr.py#L100
nn.init.constant_(self.bbox_embed[0].layers[-1].bias.data[2:], -2.0)

Why the box reg bias is initialized as -2.0?

  1. In https://github.com/fundamentalvision/Deformable-DETR/blob/main/models/ops/modules/ms_deform_attn.py#L105 and https://github.com/fundamentalvision/Deformable-DETR/blob/main/models/ops/modules/ms_deform_attn.py#L108
if reference_points.shape[-1] == 2:
     offset_normalizer = torch.stack([input_spatial_shapes[..., 1], input_spatial_shapes[..., 0]], -1)
     sampling_locations = reference_points[:, :, None, :, None, :] \
                                 + sampling_offsets / offset_normalizer[None, None, None, :, None, :]
elif reference_points.shape[-1] == 4:
     sampling_locations = reference_points[:, :, None, :, None, :2] \
                                 + sampling_offsets / self.n_points * reference_points[:, :, None, :, None, 2:] * 0.5

It seems like the implementation for scaling sampling points on (x, y) reference points and (x, y, w, h) reference boxes are different. Does this implementation have any specific reasons?

Need clarification on the implementation of attention weights.

In the code implementation, it seems that only the query is used to compute the attention weights via a linear project (Line99 in ms_deform_attn.py). This is different from the implementation of attention in Transformer, which computes the inner product between query and key. But in your VALSE lecture, you mentioned that when K=HW, this deformable transformer will become Transformer. I am confused here how deformable transformer become Transformer when you use a different way to represent attention (without referring to key). Could you please clarify this?

Thank you very much!

valid_ratio bug here

reference_points = reference_points[:, :, None] * valid_ratios[:, None]

The valid ratio has no effect. I consider it as a bug. Please check it and fix it if possible.

# use valid ratio
(Pdb) reference_points[0, :118, 0, 0]
tensor([0.0043, 0.0128, 0.0214, 0.0299, 0.0385, 0.0470, 0.0556, 0.0641, 0.0726,
        0.0812, 0.0897, 0.0983, 0.1068, 0.1154, 0.1239, 0.1325, 0.1410, 0.1496,
        0.1581, 0.1667, 0.1752, 0.1838, 0.1923, 0.2009, 0.2094, 0.2179, 0.2265,
        0.2350, 0.2436, 0.2521, 0.2607, 0.2692, 0.2778, 0.2863, 0.2949, 0.3034,
        0.3120, 0.3205, 0.3291, 0.3376, 0.3462, 0.3547, 0.3632, 0.3718, 0.3803,
        0.3889, 0.3974, 0.4060, 0.4145, 0.4231, 0.4316, 0.4402, 0.4487, 0.4573,
        0.4658, 0.4744, 0.4829, 0.4915, 0.5000, 0.5085, 0.5171, 0.5256, 0.5342,
        0.5427, 0.5513, 0.5598, 0.5684, 0.5769, 0.5855, 0.5940, 0.6026, 0.6111,
        0.6197, 0.6282, 0.6368, 0.6453, 0.6538, 0.6624, 0.6709, 0.6795, 0.6880,
        0.6966, 0.7051, 0.7137, 0.7222, 0.7308, 0.7393, 0.7479, 0.7564, 0.7650,
        0.7735, 0.7821, 0.7906, 0.7991, 0.8077, 0.8162, 0.8248, 0.8333, 0.8419,
        0.8504, 0.8590, 0.8675, 0.8761, 0.8846, 0.8932, 0.9017, 0.9103, 0.9188,
        0.9274, 0.9359, 0.9444, 0.9530, 0.9615, 0.9701, 0.9786, 0.9872, 0.9957,
        0.0043], device='cuda:0')
# not use valid_ratio
(Pdb) myreference_points[0, :118, 0, 0]
tensor([0.0043, 0.0128, 0.0214, 0.0299, 0.0385, 0.0470, 0.0556, 0.0641, 0.0726,
        0.0812, 0.0897, 0.0983, 0.1068, 0.1154, 0.1239, 0.1325, 0.1410, 0.1496,
        0.1581, 0.1667, 0.1752, 0.1838, 0.1923, 0.2009, 0.2094, 0.2179, 0.2265,
        0.2350, 0.2436, 0.2521, 0.2607, 0.2692, 0.2778, 0.2863, 0.2949, 0.3034,
        0.3120, 0.3205, 0.3291, 0.3376, 0.3462, 0.3547, 0.3632, 0.3718, 0.3803,
        0.3889, 0.3974, 0.4060, 0.4145, 0.4231, 0.4316, 0.4402, 0.4487, 0.4573,
        0.4658, 0.4744, 0.4829, 0.4915, 0.5000, 0.5085, 0.5171, 0.5256, 0.5342,
        0.5427, 0.5513, 0.5598, 0.5684, 0.5769, 0.5855, 0.5940, 0.6026, 0.6111,
        0.6197, 0.6282, 0.6368, 0.6453, 0.6538, 0.6624, 0.6709, 0.6795, 0.6880,
        0.6966, 0.7051, 0.7137, 0.7222, 0.7308, 0.7393, 0.7479, 0.7564, 0.7650,
        0.7735, 0.7821, 0.7906, 0.7991, 0.8077, 0.8162, 0.8248, 0.8333, 0.8419,
        0.8504, 0.8590, 0.8675, 0.8761, 0.8846, 0.8932, 0.9017, 0.9103, 0.9188,
        0.9274, 0.9359, 0.9444, 0.9530, 0.9615, 0.9701, 0.9786, 0.9872, 0.9957,
        0.0043], device='cuda:0')
# code
def get_reference_points(spatial_shapes, valid_ratios, device):
        """Core implementation of Deformable Transformer.
        """
        reference_points_list = []
        myreference_points_list = []
        for lvl, (H_, W_) in enumerate(spatial_shapes):
            ref_y, ref_x = torch.meshgrid(torch.linspace(0.5, H_ - 0.5, H_, dtype=torch.float32, device=device),
                                          torch.linspace(0.5, W_ - 0.5, W_, dtype=torch.float32, device=device))
           
            ref = torch.stack((_ref_x, _ref_y), -1)
            reference_points_list.append(ref)

            ref_y = ref_y.reshape(-1)[None] / H_
            ref_x = ref_x.reshape(-1)[None] / W_
            ref = torch.stack((ref_x, ref_y), -1)
            myreference_points_list.append(ref)

        reference_points = torch.cat(reference_points_list, 1)
        myreference_points = torch.cat(myreference_points_list, 1)

        reference_points = reference_points[:, :, None] * valid_ratios[:, None]
        myreference_points = myreference_points[:, :, None]
        diff = myreference_points - reference_points
        import pdb
        pdb.set_trace()
        return reference_points

RuntimeError: Error compiling objects for extension

Hi, could someone help me?

when I Compiling CUDA operators, this error happens:

(deformable_detr) zyi@ZHU:~/Documents/Project/Deformable-DETR/models/ops$ sh ./make.sh
running build
running build_py
running build_ext
building 'MultiScaleDeformableAttention' extension
Emitting ninja build file /home/zyi/Documents/Project/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda-10.2/bin/nvcc -DWITH_CUDA -I/home/zyi/Documents/Project/Deformable-DETR/models/ops/src -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/TH -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.2/include -I/home/zyi/anaconda3/envs/deformable_detr/include/python3.7m -c -c /home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.cu -o /home/zyi/Documents/Project/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
FAILED: /home/zyi/Documents/Project/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.o
/usr/local/cuda-10.2/bin/nvcc -DWITH_CUDA -I/home/zyi/Documents/Project/Deformable-DETR/models/ops/src -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/TH -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.2/include -I/home/zyi/anaconda3/envs/deformable_detr/include/python3.7m -c -c /home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.cu -o /home/zyi/Documents/Project/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/bin/sh: 1: /usr/local/cuda-10.2/bin/nvcc: not found
[2/2] c++ -MMD -MF /home/zyi/Documents/Project/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o.d -pthread -B /home/zyi/anaconda3/envs/deformable_detr/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/zyi/Documents/Project/Deformable-DETR/models/ops/src -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/TH -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.2/include -I/home/zyi/anaconda3/envs/deformable_detr/include/python3.7m -c -c /home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.cpp -o /home/zyi/Documents/Project/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
FAILED: /home/zyi/Documents/Project/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o
c++ -MMD -MF /home/zyi/Documents/Project/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o.d -pthread -B /home/zyi/anaconda3/envs/deformable_detr/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/zyi/Documents/Project/Deformable-DETR/models/ops/src -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/TH -I/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.2/include -I/home/zyi/anaconda3/envs/deformable_detr/include/python3.7m -c -c /home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.cpp -o /home/zyi/Documents/Project/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/zyi/Documents/Project/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.cpp:14:0:
/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:5:30: fatal error: cuda_runtime_api.h: No such file or directory
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1423, in _run_ninja_build
check=True)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "setup.py", line 70, in
cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 603, in build_extensions
build_ext.build_extensions(self)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
depends=ext.depends)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 437, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1163, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/home/zyi/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1436, in _run_ninja_build
raise RuntimeError(message)
RuntimeError: Error compiling objects for extension

My CUDA version is 10.1 install pytorch==1.5.1 torchvision==0.6.1

Import Error after Compilation

I'm using anaconda env with pytorch 1.51 as per suggested and tried both cuda 10.1 and 9.2, but keep getting the same error after compilation. Could you suggest how to solve this problem? Thanks.

import MultiScaleDeformableAttention as MSDA
ImportError: /home/xxx/anaconda3/envs/deform/lib/python3.7/site-packages/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c1011CPUTensorIdEv

About the the train/test_cardinality_error

Hi,
I add your focal loss method alone to the original DETR, and I found the result is not good as that of the original DETR. From the logging information, I found the training and testing cardinality errors is much higher than those of DETR. And after reading your logging records, I found that Deformable DETR seems to have the same issue. Could you explain this part.

Thanks a lot.

Inference Batchsize Influence the mAP

I downloaded the official model r50_deformable_detr-checkpoint.pth and run the following command

bash configs/r50_deformable_detr.sh  --resume modelzoo/r50_deformable_detr-checkpoint.pth --eval --batch_size=2

With --batch_size=2 I get following result:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.379
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.602
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.402
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.201
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.408
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.535
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.317
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.519
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.557
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.327
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.771

When I change to --batch_size=8, I get:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.419
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.623
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.457
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.244
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.453
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.571
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.338
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.560
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.601
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.382
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.644
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.802

It seems the batch_size influences the mAP severely.
Is that normal? Why the inference batch size can influence the evaluation result? Could you give some explanation? Thanks!

Reproduce Table 3

Hi,

First, I want to thank you to provide your code despite the migration from your internal codebase.

I am wondering what I need to do to reproduce the state-of-the-art results in Table 3. Either I missed sth or you do not explain this in your readme.

It would be awesome if you could give some pointer and maybe even provide a pre-trained model.

Thanks,
Simon

CUDA memory issue

Hi, thanks for your great work! But I found many problems related to CUDA memory usage.

  • The memory consumptions for different GPUs are not balanced

    image
    The memory consumption difference between GPUs could even higher than 3GB(I only have 11GB memory per card).

  • There seems to be memory leakage
    As the training process goes on, the CUDA memory consumption becomes higher. However, the memory allocated by PyTorch is stable. Although the used CUDA memory is larger than 8GB, the memory allocated by PyTorch is only 2757MB as here:

    memory=torch.cuda.max_memory_allocated() / MB))

Why not calculate the loss per batch?

Hi, guys,
I am learning about Deformable-DETR these days, and I am curious about the implementation of HungarianMatcher,

  1. Why match the bboxes per batch, as TORCH.CDIST provide the function to calculate the distances per batch, so the outputs["pred_boxes"].flatten(0, 1) seems unnecessary.
  2. Matching the bboxes regardless of the batch index seems unreasonable, (or maybe I don't really understand the intrinsic nature of your implementation?)

Any answer or idea will be appreciated!

a little problem may occured in DeformableDETR's forward function

If we set return_intermediate=False, the shape of 'hs' returned by Transformer will be (bs, n_query, hidden_dim), thus incompatible with the following part:

for lvl in range(hs.shape[0]):

Cuz hs.shape[0] is batch size, not the number of decoder layers.

Hence, when setting return_intermediate=False, we shall replace the following code:

return output, reference_points

by 'return [output], [reference_points]'

Although this is not kind'a significant thing, I still mention here to let u guys konw there's truly exist a problem.

The seed of training model

I have been following your work recently. When I trained the model of Deformable DETR with 44.5 mAP, I can not reimplement the same log of your model on github with two V100 GPU, the pre-trained Resnet50 and the following parameters.

(aux_loss=True, backbone='resnet50', batch_size=4, bbox_loss_coef=5, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='../datasets/coco', dataset_file='coco', dec_layers=6, dec_n_points=4, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=1024, distributed=False, dropout=0.1, enc_layers=6, enc_n_points=4, epochs=50, eval=False, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, lr=0.0002, lr_backbone=2e-05, lr_backbone_names=['backbone.0'], lr_drop=40, lr_drop_epochs=None, lr_linear_proj_mult=0.1, lr_linear_proj_names=['reference_points', 'sampling_offsets'], mask_loss_coef=1, masks=False, nheads=8, num_feature_levels=4, num_queries=300, num_workers=2, output_dir='', position_embedding='sine', position_embedding_scale=6.283185307179586, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_giou=2, sgd=False, start_epoch=0, two_stage=False, weight_decay=0.0001, with_box_refine=False)

Bigger batch size with less memory consumption

Hello,

I do fine tuning on a custom dataset and I want to try, if a bigger batch size results in a better performance. But my graphics card has not enough memory to do a batch size bigger than 1. So I looked up a solution, but the final performance drops and the network converges slower with higher batch sizes (the higher the batch size, the slower the convergence).

My code is printed below. I know that the training time will be nearly the same with my code, but with performance I mean better Average Precision. I read that this code is exactly the same as real bigger batch sizes, except for batch normalization layers. Could the problem be the batch normalization layers in the ResNet backbone or would that have another impact?

I changed these lines:

# for samples, targets in metric_logger.log_every(data_loader, print_freq, header):
for _ in metric_logger.log_every(range(len(data_loader)), print_freq, header):
outputs = model(samples)
loss_dict = criterion(outputs, targets)
weight_dict = criterion.weight_dict
losses = sum(loss_dict[k] * weight_dict[k] for k in loss_dict.keys() if k in weight_dict)
# reduce losses over all GPUs for logging purposes
loss_dict_reduced = utils.reduce_dict(loss_dict)
loss_dict_reduced_unscaled = {f'{k}_unscaled': v
for k, v in loss_dict_reduced.items()}
loss_dict_reduced_scaled = {k: v * weight_dict[k]
for k, v in loss_dict_reduced.items() if k in weight_dict}
losses_reduced_scaled = sum(loss_dict_reduced_scaled.values())
loss_value = losses_reduced_scaled.item()
if not math.isfinite(loss_value):
print("Loss is {}, stopping training".format(loss_value))
print(loss_dict_reduced)
sys.exit(1)
optimizer.zero_grad()
losses.backward()
if max_norm > 0:
grad_total_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
else:
grad_total_norm = utils.get_total_grad_norm(model.parameters(), max_norm)
optimizer.step()

To:

optimizer.zero_grad()
# for samples, targets in metric_logger.log_every(data_loader, print_freq, header):
for i in metric_logger.log_every(range(len(data_loader)), print_freq, header):
    outputs = model(samples)
    loss_dict = criterion(outputs, targets)
    weight_dict = criterion.weight_dict
    losses = sum(loss_dict[k] * weight_dict[k] for k in loss_dict.keys() if k in weight_dict)

    # reduce losses over all GPUs for logging purposes
    loss_dict_reduced = utils.reduce_dict(loss_dict)
    loss_dict_reduced_unscaled = {f'{k}_unscaled': v
                                  for k, v in loss_dict_reduced.items()}
    loss_dict_reduced_scaled = {k: v * weight_dict[k]
                                for k, v in loss_dict_reduced.items() if k in weight_dict}
    losses_reduced_scaled = sum(loss_dict_reduced_scaled.values())

    loss_value = losses_reduced_scaled.item()

    if not math.isfinite(loss_value):
        print("Loss is {}, stopping training".format(loss_value))
        print(loss_dict_reduced)
        sys.exit(1)

    # optimizer.zero_grad()
    losses.backward()

    if (i+1)%mini_batchsize == 0:
        if max_norm > 0:
            grad_total_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
        else:
            grad_total_norm = utils.get_total_grad_norm(model.parameters(), max_norm)
        optimizer.step()
        optimizer.zero_grad()

Thank you in advance!

slow inference speed

hello,

I am getting something close to 1hz for inference speed with deformable detr single scale with a rtx 3090. i imagine i wont get the 27hz of the v100 but i guess it should be close.
I am using model.eval() and torch.no_grad()

thanks in advance

edit:
cuda 11.1
pytorch 1.7.1

Why it saves the parameter?

For example, the parameter of Deform-DETR is 34M while DETR is 41M when both use the single level feature.

Didn't find the backbone of ResNeXt-101 + DCN

Hi guys,
I searched the word "resnext", but it seemed there was nothing related to the resnext.
And I want to learn about the backbone of ResNeXt-101 + DCN, so could you please tell me where the code lies?

Your answer will be appreciated!

make.sh

when I run sh ./make.sh, I got this message:

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] :/home1/hli/cuda10.2/bin/nvcc -DWITH_CUDA -I/home1/hli/Deformable-DETR/models/ops/src -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/TH -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/THC -I:/home1/hli/cuda10.2/include -I/home1/hli/anaconda3/envs/detr/include/python3.7m -c -c /home1/hli/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.cu -o /home1/hli/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home1/hli/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++14
FAILED: /home1/hli/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home1/hli/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.o
:/home1/hli/cuda10.2/bin/nvcc -DWITH_CUDA -I/home1/hli/Deformable-DETR/models/ops/src -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/TH -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/THC -I:/home1/hli/cuda10.2/include -I/home1/hli/anaconda3/envs/detr/include/python3.7m -c -c /home1/hli/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.cu -o /home1/hli/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home1/hli/Deformable-DETR/models/ops/src/cuda/ms_deform_attn_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75 -std=c++14
/bin/sh: 1: :/home1/hli/cuda10.2/bin/nvcc: not found
[2/3] c++ -MMD -MF /home1/hli/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home1/hli/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o.d -pthread -B /home1/hli/anaconda3/envs/detr/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home1/hli/Deformable-DETR/models/ops/src -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/TH -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/THC -I:/home1/hli/cuda10.2/include -I/home1/hli/anaconda3/envs/detr/include/python3.7m -c -c /home1/hli/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.cpp -o /home1/hli/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home1/hli/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
FAILED: /home1/hli/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home1/hli/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o
c++ -MMD -MF /home1/hli/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home1/hli/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o.d -pthread -B /home1/hli/anaconda3/envs/detr/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home1/hli/Deformable-DETR/models/ops/src -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/TH -I/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/THC -I:/home1/hli/cuda10.2/include -I/home1/hli/anaconda3/envs/detr/include/python3.7m -c -c /home1/hli/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.cpp -o /home1/hli/Deformable-DETR/models/ops/build/temp.linux-x86_64-3.7/home1/hli/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=MultiScaleDeformableAttention -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home1/hli/Deformable-DETR/models/ops/src/cpu/ms_deform_attn_cpu.cpp:14:0:
/home1/hli/anaconda3/envs/detr/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:5:30: fatal error: cuda_runtime_api.h: No such file or directory
compilation terminated.

my Cuda version is 10.2.89, the other settings are the same as yours, torch 1.5.1 python3.7. Can you help me? thanks

have problem when install MultiScaleDeformableAttention

I encounter problem when running test.py after ./mask.sh.
The bug is:
ImportError: /envs/anaconda3/envs/venv/lib/python3.8/site-packages/MultiScaleDeformableAttention-1.0-py3.8-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-38-x86_64-linux-gnu.so: undefined symbol: cudaSetupArgument

What causes this problem?

Possible Normalization Bug in Deformable Attention Coordinates

This pertains to be cross-attention between queries and encoder feature maps in deformable_transformer.py. The reference_points for the deformable attention operation appear to be interpreted as (x,y) coordinates since they are multiplied by src_valid_ratios in line 339 which has a (width, height) format (as evident from the get_valid_ratio method, line 125).

However, the src_spatial_shapes tensor contains dimensions in (height, width) format (see line 138 and 139). Then, in ms_deform_attn.py the reference points and offsets are combined as follows:

sampling_locations = reference_points[:, :, None, :, None, :] \
                       + sampling_offsets / input_spatial_shapes[None, None, None, :, None, :]

So it appears that the sampling_offsets / input_spatial_shapes[None, None, None, :, None, :] part assumes coordinates to be in (y,x) format whereas reference_points[:, :, None, :, None, :] appears to be in (x,y) format.

Of course I'm not sure if I followed the code correctly here (I could have missed something), but just wanted to bring this to the authors' attention in case they did not know already.

ImportError: MultiScaleDeformableAttention undefined symbol

Hi
I'm having troubles importing the MultiScaleDeformableAttention module.
I followed the instructions, I have pytorch 1.5.1 and cuda 9.2.
Thanks

ImportError: .conda/envs/deformable_detr/lib/python3.7/site-packages/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c107complexINS2_4HalfEEEEEPKNS_6detail12TypeMetaDataEv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.