Giter VIP home page Giter VIP logo

gcnet's People

Contributors

caoyue10 avatar cclauss avatar chensnathan avatar donnyyou avatar hellock avatar innerlee avatar libuyu avatar lindahua avatar liushuchun avatar luxiin avatar lyuwenyu avatar lzhbrian avatar myownskyw7 avatar oceanpang avatar ozps avatar patrick-llgc avatar slidelucask avatar sovrasov avatar stupidzz avatar sty-yyj avatar thangvubk avatar tjsongzw avatar xvjiarui avatar ychfan avatar yhcao6 avatar youkaichao avatar zehaos avatar zhangtemplar avatar zhihuagao avatar zhijl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcnet's Issues

change mask-rcnn to faster-rcnn?

Hello,can I change the model in faster-rcnn resnet instead of mask rcnn only by change this demo?
//---------------------------------------------------------------------------------------------------------------------

model settings
model = dict(
type='FasterRCNN',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
gcb=dict(ratio=1. / 4., ),
stage_with_gcb=(False, True, True, True),
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_scales=[8],
anchor_ratios=[0.5, 1.0, 2.0],
anchor_strides=[4, 8, 16, 32, 64],
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0],
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Code for Kinetics dataset

Hi, thanks for releasing your code.

Do you have plans to release your training code for Kinetics dataset anytime soon?

Zero init of conv layers before addition

Thanks for sharing your excellent work! I haven't run the code but I am curious about the implementation of GCNet. Is the Wv2 of conv layer before addition is initialized to zero to not affect the initial behaviour of the original backbone? This has been implemented in the Non-local net by set the scale of BN as zero.

Does GCNet have 1d?

Hello, thank you very much for your work, I noticed that your GCNet is 2D, but nonlocal is from 1D to 3D. Do you have 1D code?

AP, AR=-1 while evaluation at the end of each epoch

Hello, I was trying to run gcnet form the MMDetection repository. I wished to train GCNet on my custom dataset in which each image is 800x800 and all the annotations are in proper COCO format. But however, my annotations are for for boxes alone and nothing else.
I gave the respective paths and then ran the following command:

./dist_train.sh ../configs/gcnet/mask_rcnn_r101_fpn_r4_gcb_c3-c5_1x_coco.py 2

When I ran this, I had the training begun. Here's a small part of my log file when I was trying to reproduce the error:

'''

loading annotations into memory...
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
Done (t=0.78s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
2020-05-23 03:17:35,421 - mmdet - INFO - Start running, host: user@c6e7e60caee9, work_dir: /mnt/user
2.log
/mmdetection/tools/work_dirs/mask_rcnn_r101_fpn_r4_gcb_c3-c5_1x_coco
2020-05-23 03:17:35,421 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
2020-05-23 03:18:11,654 - mmdet - INFO - Epoch [1][50/1730] lr: 0.00198, eta: 4:09:57, time: 0.724, data_time: 0.267, memory: 4932, loss_rpn_cls: 0.5057, loss_rpn_bbox: 0.2989, loss_cls: 1.0971, acc:
85.3574, loss_bbox: 0.0454, loss_mask: 0.4885, loss: 2.4356
2020-05-23 03:18:39,866 - mmdet - INFO - Epoch [1][100/1730] lr: 0.00398, eta: 3:41:52, time: 0.565, data_time: 0.101, memory: 4932, loss_rpn_cls: 0.2872, loss_rpn_bbox: 0.2404, loss_cls: 0.4054, acc:
93.0947, loss_bbox: 0.1764, loss_mask: 0.3700, loss: 1.4794
2020-05-23 03:19:08,571 - mmdet - INFO - Epoch [1][150/1730] lr: 0.00597, eta: 3:33:17, time: 0.574, data_time: 0.101, memory: 5062, loss_rpn_cls: 0.1821, loss_rpn_bbox: 0.2431, loss_cls: 0.4640, acc:
90.5537, loss_bbox: 0.2871, loss_mask: 0.3372, loss: 1.5135
2020-05-23 03:19:37,701 - mmdet - INFO - Epoch [1][200/1730] lr: 0.00797, eta: 3:29:28, time: 0.582, data_time: 0.104, memory: 5271, loss_rpn_cls: 0.1201, loss_rpn_bbox: 0.2288, loss_cls: 0.4637, acc:
87.8535, loss_bbox: 0.3915, loss_mask: 0.3074, loss: 1.5114
2020-05-23 03:20:07,551 - mmdet - INFO - Epoch [1][250/1730] lr: 0.00997, eta: 3:27:59, time: 0.597, data_time: 0.105, memory: 5422, loss_rpn_cls: 0.1004, loss_rpn_bbox: 0.2055, loss_cls: 0.3916, acc:
87.1963, loss_bbox: 0.5066, loss_mask: 0.2894, loss: 1.4935
2020-05-23 03:20:37,901 - mmdet - INFO - Epoch [1][300/1730] lr: 0.01197, eta: 3:27:24, time: 0.607, data_time: 0.107, memory: 5422, loss_rpn_cls: 0.0506, loss_rpn_bbox: 0.1710, loss_cls: 0.3173, acc:
88.8096, loss_bbox: 0.5628, loss_mask: 0.2750, loss: 1.3767
2020-05-23 03:21:08,620 - mmdet - INFO - Epoch [1][350/1730] lr: 0.01397, eta: 3:27:11, time: 0.614, data_time: 0.108, memory: 5422, loss_rpn_cls: 0.0498, loss_rpn_bbox: 0.1539, loss_cls: 0.2873, acc:
89.3711, loss_bbox: 0.5508, loss_mask: 0.2680, loss: 1.3098
2020-05-23 03:21:39,345 - mmdet - INFO - Epoch [1][400/1730] lr: 0.01596, eta: 3:26:55, time: 0.615, data_time: 0.104, memory: 5422, loss_rpn_cls: 0.0801, loss_rpn_bbox: 0.1680, loss_cls: 0.2922, acc:
89.8096, loss_bbox: 0.5162, loss_mask: 0.2573, loss: 1.3137
2020-05-23 03:22:10,246 - mmdet - INFO - Epoch [1][450/1730] lr: 0.01796, eta: 3:26:43, time: 0.618, data_time: 0.107, memory: 5422, loss_rpn_cls: 0.0481, loss_rpn_bbox: 0.1469, loss_cls: 0.2658, acc:
90.3389, loss_bbox: 0.5173, loss_mask: 0.2487, loss: 1.2268
2020-05-23 03:22:41,038 - mmdet - INFO - Epoch [1][500/1730] lr: 0.01996, eta: 3:26:23, time: 0.616, data_time: 0.106, memory: 5422, loss_rpn_cls: 0.0358, loss_rpn_bbox: 0.1340, loss_cls: 0.2562, acc:
90.1816, loss_bbox: 0.5243, loss_mask: 0.2481, loss: 1.1984
2020-05-23 03:23:11,924 - mmdet - INFO - Epoch [1][550/1730] lr: 0.02000, eta: 3:26:04, time: 0.618, data_time: 0.107, memory: 5422, loss_rpn_cls: 0.0413, loss_rpn_bbox: 0.1406, loss_cls: 0.2616, acc:
90.0938, loss_bbox: 0.5202, loss_mask: 0.2372, loss: 1.2007
2020-05-23 03:23:42,898 - mmdet - INFO - Epoch [1][600/1730] lr: 0.02000, eta: 3:25:46, time: 0.619, data_time: 0.109, memory: 5422, loss_rpn_cls: 0.0717, loss_rpn_bbox: 0.1463, loss_cls: 0.2397, acc:
90.8945, loss_bbox: 0.4870, loss_mask: 0.2516, loss: 1.1963
2020-05-23 03:24:14,103 - mmdet - INFO - Epoch [1][650/1730] lr: 0.02000, eta: 3:25:34, time: 0.624, data_time: 0.108, memory: 5422, loss_rpn_cls: 0.0434, loss_rpn_bbox: 0.1369, loss_cls: 0.2699, acc:
89.6836, loss_bbox: 0.5065, loss_mask: 0.2456, loss: 1.2023

'''
The above log file is when I was trying to reproduce the error. But when I got the error originally, I got the AP=AR=-1 for all values after first epoch.
If you observe the bbox loss is not really changing also the learning rate has quickly come upto 0.02.
Can someone please explain to me what's the issue here ??
I am training on two GPU's. Also I wish to include validation at the end of each epoch as well.

ENVIRONMENT:
Python 3.6.9
CUDA 10.1
Using vent with ubuntu 18.04.
Thanks!!

understanding of distance of feature map

i am confused about the following words ( last paragraph of 3.2 in paper )

But the values of cosine distance in ‘output’ are quite small, indicating that global context features modeled by the non-local block are almost the same for different query positions.

in my opinion,smaller distance can only reflect smaller distance between feature vectors of arbitrary position than input feature vector。
Why can it reflect that global context features are the same for different locations?

runtime inrease about 15ms?

hi, I use gcnet with the setting as 'resnet50-fpn+c3~c5 r16', but runtime increases about 15ms, could you tell me the reason?

import torch
from torch import nn
import torch.nn.functional as F

def kaiming_init(module,
a=0,
mode='fan_out',
nonlinearity='relu',
bias=0,
distribution='normal'):
assert distribution in ['uniform', 'normal']
if distribution == 'uniform':
nn.init.kaiming_uniform_(
module.weight, a=a, mode=mode, nonlinearity=nonlinearity)
else:
nn.init.kaiming_normal_(
module.weight, a=a, mode=mode, nonlinearity=nonlinearity)
if hasattr(module, 'bias') and module.bias is not None:
nn.init.constant_(module.bias, bias)

def constant_init(module, val, bias=0):
nn.init.constant_(module.weight, val)
if hasattr(module, 'bias') and module.bias is not None:
nn.init.constant_(module.bias, bias)

def last_zero_init(m):
if isinstance(m, nn.Sequential):
constant_init(m[-1], val=0)
m[-1].inited = True
else:
constant_init(m, val=0)
m.inited = True

class ContextBlock2d(nn.Module):

def __init__(self, inplanes, ratio = 1./16.):
    super(ContextBlock2d, self).__init__()

    self.inplanes = inplanes
    self.planes = int(self.inplanes * ratio)
   
    self.channel_add_conv = nn.Sequential(
        nn.Conv2d(self.inplanes, self.planes, kernel_size=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(self.planes, self.inplanes, kernel_size=1)
    )
    
    self.reset_parameters()

def reset_parameters(self):
    last_zero_init(self.channel_add_conv)

def spatial_pool(self, x):
    batch, channel, height, width = x.size()
    # [N, C, 1, 1]
    context = F.avg_pool2d(x, (height, width))

    return context

def forward(self, x):
    # [N, C, 1, 1]
    context = self.spatial_pool(x)

    # [N, C, 1, 1]
    channel_add_term = self.channel_add_conv(context)
    out = x + channel_add_term

    return out

Visualization code wanted

Hi, Thanks for your great work!
In your paper, attention maps of particular query points are shown. Could you share this visualization code?
More specifically, did you implement an interactive interface to do the visualization?

Many thanks again.

Does anyone have problems in training from scratch with GCNet on ImageNet?

Using the best setting of GC-ResNet50 and train it from scratch on ImageNet, I found it will be stuck in a high loss in the early epochs before the training loss begins to decline normally. Therefore the final result is much lower than original ResNet50. Note that one difference from the original paper is that the GC modules are embedded in each bottleneck exactly as SE does, for a fair comparison.

Does anyone have the same problem?

This may be the case since the authors report the ImageNet results via a finetuning setting, which is not very common when validating models on ImageNet Benchmarks. At least all other modules (SE, SK, BAM, CBAM, AA) are following a training-from-scratch setting.

When taking videos as input

When taking videos input, the feature maps in each layer have four dimensions, i.e., THW*C. Are the attention maps are still query-independent? Could you please give more details? Thanks a lot.

Train on custom data

Hey,

I am trying to train custom data using GCNet. I have the data in COCO data format. I want to know the exact procedure to train it. Because, just running the train.sh script, gives me Index error.

I am changing the config file to make it work, but didn't find any luck with that. Please let me know the fields that should be changed to make it work.

Thanks.

Mask for training

Supposed I want to get benchmark using GCNet but only object detection on my custom dataset, do I need to have mask for trainging or just bounding box is enough?

singe GPU train

if I only have single GPU, how much performence without SyncBacthNorm will dorp ?, Simply to say, training on single GPU could achieve similar performance within +-1% error or not ?
should I adjust the learning rate to 1/8 of its orignal lr and set more epoch_num to get close result with orignal implements.
thanks!!!

some of the problems

in the mmdet/ops/gcb/ContextBlock

line 28

self.planes = int(inplanes * ratio)

this * Should change it to // ?

Add location based on yolov7

First of all, thank you very much for your sharing. I want to load this module into yolov7. How can I modify the yolov7.yaml configuration file

Zero mAP without mask

Hello.
Thank you for nice work, I try to use non local nets (GCNet) on practice
This config DCN + GCNet r4 + scale_augmentation and without mask -- faster RCNN (cascade)
mAP =0
I read the log and its strange acc = 97.6621 from begining to end -- maybe it is trivial solution always 0

# model settings
model = dict(
    type='CascadeRCNN',
    num_stages=3,
    pretrained='modelzoo://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        style='pytorch',
        ct=dict(
            insert_pos='after_1x1',
            ratio=1./4.,
        ),
        stage_with_ct=(False, True, True, True),
        dcn=dict(
            modulated=False,
            groups=32,
            deformable_groups=1,
            fallback_on_stride=False),
        stage_with_dcn=(False, True, True, True),
        normalize=dict(type='SyncBN', frozen=False),
        norm_eval=False,
    ),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_scales=[8],
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[4, 8, 16, 32, 64],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        use_sigmoid_cls=True),
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',
        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
        out_channels=256,
        featmap_strides=[4, 8, 16, 32]),
    bbox_head=[
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=81,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.1, 0.1, 0.2, 0.2],
            reg_class_agnostic=True),
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=81,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.05, 0.05, 0.1, 0.1],
            reg_class_agnostic=True),
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=81,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.033, 0.033, 0.067, 0.067],
            reg_class_agnostic=True)
    ])
# model training and testing settings
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=0,
        pos_weight=-1,
        smoothl1_beta=1 / 9.0,
        debug=False),
    rcnn=[
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.6,
                neg_iou_thr=0.6,
                min_pos_iou=0.6,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.7,
                min_pos_iou=0.7,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)
    ],
    stage_loss_weights=[1, 0.5, 0.25])
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=2000,
        max_num=2000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100),
    keep_all_stages=False)
# dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/COCO/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        img_scale=[(1600, 400), (1600, 1400)],
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=True,
        with_crowd=True,
        with_label=True),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=True,
        with_crowd=True,
        with_label=True),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_label=False,
        test_mode=True))
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 12
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = '/media/HD2/nsergievskiy/models/cascde_gcnet_r50'
load_from = None
resume_from = None
workflow = [('train', 1)]

20190514_202710.log

Config files

Hi,
I am trying to use "X-101-FPN | DCN Cascade Mask | GC(c3-c5, r4)", but I am getting errors. I think that's due to using a wrong config file. Would you let me know which config file corresponds to that model?

Thanks!

possible replacements for layernorm

Hello, the performances of layernorm might not be optimized on some inference platforms on edge devices.
Have you tried other replacements for layernorm? Replacing it with BN might not be a good idea, so how about simply remove the layernorm layer?
Thanks!

GCNet with pretrianed model on COCO detection?

Hello, I'm not sure if GCNet in mmdetection use ImageNet pretrained model (specifically GCNet + ResNet pretrained model) cause I saw in the GCNet/config, it only uses standard "torchvision://resnet50" in official mmedetection and "modelzoo://resnet50" in this repo as pretrained model, but in the original paper, the auther firstly trained on ImageNet and transfor to COCO detection task.

Would you explain does mmdetection use (ResNet pretrained model) or (ResNet+GC pretrained model)? if you only use ResNet pretrained model, why does COCO have such a large improvement?

why non-local block learns a query-independent attention map in detection task

Hi, can you explain why non-local block learns a query-independent attention map in object detection task? Since in segmentation task, both OCNet and DANet have shown that spatial attention module (same as non-local block) can learn attention maps that concentrated on pixels with the same category as the query one, rather than query-independent.

how to set the "lr" when using the "ap.SyncBatchNorm"?

hi @xvjiarui
i move the code into maskrcnn-benchmark and run the config of mask_rcnn_r16_ct_c3-c5_r50_sbn_fpn_1x with the settings: 16 images / 8 GPUs, lr=0.02, and using ap.SyncBatchNorm. it encouters the NaN in the first few interations, it seems to use more GPU than mask_rcnn_r50_fpn_1x.
when i set lr to be 0.0025, the training can run successfully. so can u give me some tips how to set the lr when using the ap.SyncBatchNorm?

What make your attention be different from the DANet?

First, your work bring a new view for the NLNet and create GC Block with much less computation than NonLocal Block, which is useful for my work.
But I have read a Paper named Dual Attention Network for Scene Segmentation
, the spatial attention this paper used is the similar as NonLocal Block, the network architecture is as below:
fig1
But the visualization of attention maps is different from what you gave in the paper:
fig2
What is causing the effect of this article(Dual Attention Network for Scene Segmentation) to be different from yours?

找个GC Block这么难?

代码结构不能明白一点么?本文重点不是GC注意力模块么,怎么让人找的那么费劲,不能直接明白放出来么?

error: undefined symbol: __cudaPopCallConfiguration

After setup the code, I tried to trian the model and got this error:
anaconda3/lib/python3.7/site-packages/mmdet-0.6.0+a132aab-py3.7.egg/mmdet/ops/dcn/deform_conv_cuda.cpython-37m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

How to fix it?

Environment

  • PyTorch version 1.1.0
  • CUDA version 10.0

Simplified NL

Hi,

Thank you for your interesting papers and sharing the code.
I have a question about the paper, in particular the simplified NL module.

If I understand well, you are using the self attention in order to get some features which will permits you , then to weighted the different channel of your input images right ?

So if I want to code it from your code it will give :


context = self.spatial_pool(x) # dim context: NxCx1x1
output = conv2D(context) # conv1x1 with C input channels and C output channels , dim output NxCx1x1 

return x + output 

is that right ?

Could it be used in 3D data?

Do you verified your idea in 3D model? Or in 3D data, dose the attention value of every point is still the same?

How can I use gc block in resnet18?

Considering the limitations of hardware performance, I tried to train a lightweight model.
So I use resnet18.
But resnet18 use basicblock instead of bottleneck, but it seems that you don't inplement it for basicblock.
If I want to use gc block in basicblock , how can I designed it?
Thank you !

Questions about training

I found that even finetune from ImageNet pretrain, the loss grows up. I used the ResNet-vd as the baseline arch, and add GC-Block in. I also double check the performance of baseline (w/o GC Block) which is normal (~79% top-1). But when start training (finetuning), loss grows up quickly (~6.xxx) and the val Top-1 Acc is only less than 10% after first epoch. And after 20 epochs training, Top-1 Acc is near 55%. Something wrong?

文章疑问

我有个疑问 就算每个点的attention map都一样 为什么最后学习变成了一个通道向量?而不是map?

different between your GCNet and GENet (gather-excite network) in NIPS2018

In your paper, your analysis about the non-local block is really impressive. But finally you comes to conclude the current attention mechanism as three steps:

  1. gather global context;
  2. transform the global context to capture channel-wise relationship;
  3. merge the global context to original feature

The above three steps are very similar to the formulation of "Gather-Excite Network", which also divides the attention mechanism into gathering and exciting step.

May I ask what's the difference between your GCNet and gather-excite network. Thanks!

What‘s the value of transform module mean?

I used GCNet in my model and it 's very good.But I what to know what‘s the value of transform module mean?Is it suggest the importance of each channel?The lower the value, the less important it is?Hope your anwser ,thanks!

Should GCNet always use pretrained backbone and finetune with GCNet?

Hi, I've noticed that ImageNet training takes two steps process in which ResNet without GC is trained first and finetuned further using GC block.

You describe that this was to speed up the experiment. Similarly, Kinetics used pretrained ResNet on ImageNet to inflate the Slow-Only model.

Section 4.1 from SGENet paper (https://arxiv.org/pdf/1905.09646.pdf) also notes that it was difficult to train GCNet from beginning. Have you experimented GCNet by training from scratch? Is there a reason why you chose not to train from scratch with GC module attached?

gcnet performs not good on segmentation tasks.

I am working on text semantic segmentation task.
I tried to introduce gcblock into resnet .
I performed experiments on both resnet18 and resnet50,and I find gcblock even made the model be worse.
How should i solve this?
Thank you!

got an unexpected keyword argument 'ct'

python tools/train.py /media/ices18/Data/sms/competition/JD/model/GCNet/configs/cascade_mask_rcnn_r4_ct_dconv_c3-c5_x101_32x4d_sbn_fpn_1x.py
2019-05-07 16:18:55,643 - INFO - Distributed training: False
Traceback (most recent call last):
File "tools/train.py", line 90, in
main()
File "tools/train.py", line 77, in main
cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 51, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 31, in build
return _build_module(cfg, registry, default_args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 23, in _build_module
return obj_type(**args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/detectors/cascade_rcnn.py", line 34, in init
self.backbone = builder.build_backbone(backbone)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 35, in build_backbone
return build(cfg, BACKBONES)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 31, in build
return _build_module(cfg, registry, default_args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 23, in _build_module
return obj_type(**args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/backbones/resnext.py", line 177, in init
super(ResNeXt, self).init(**kwargs)
TypeError: init() got an unexpected keyword argument 'ct'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.