xvjiarui / gcnet Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 166.0 3.2 MB

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

License: Apache License 2.0

Shell 0.16% Python 90.43% C++ 3.17% Cuda 6.24%

computer-vision deep-learning instance-segmentation object-detection

gcnet's People

Contributors

Stargazers

Watchers

Forkers

wpf535236337 3dmm-icme2023 lidehuihxjz hajungong007 chiukin dreadlord1984 scape1989 xig007 leo-xxx smallflyingpig happog trantorrepository liyiqiang2016 xychen9459 salt-fly dongyangcai jjwangnlp hfxunlp kindle0226 bruinxiong renly jiazewang jiangxuehan tonysy awesome-archive mathpopo futureprecd govan111 hzhang57 gaimjkp hhgxx123 yuexinpu zhyj3038 imfinethankyou frizy-up cv-junchengli yangsenwxy zhouleisjtu mttsky softwaregift tclan8023 yuexingyu micous clxie belye bingxinyang hans1984 youtang1993 chenliqiong saintlogos1234 ammieqi jryongithub nnu-gisa shiyonglian zhengsx dedekinds artechstark shamazharikh ancientmooner jingang-cv leonardyao llf10811020205 wolfworld6 chaos1992 chushuiniba justrypython shannongxn zeitgeistqian jianhua2022 yongqis lemingguo lxtgh jy00002 taokong damien911224 juingzhou maodong2056 niezhongliang xiaoyubing brain-tumor deepparrot liuzhuang1024 j911 itking666 elaineok yuzhijun2 singer-yang preyfor zhangandin jackroos louisnust dathudeptrai pciodyuc changxm11 sailyung holygen amzhanghan gintsuki9349 hityzy1122 2017tjm

gcnet's Issues

change mask-rcnn to faster-rcnn?

Hello,can I change the model in faster-rcnn resnet instead of mask rcnn only by change this demo?
//---------------------------------------------------------------------------------------------------------------------

model settings
model = dict(
type='FasterRCNN',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
gcb=dict(ratio=1. / 4., ),
stage_with_gcb=(False, True, True, True),
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_scales=[8],
anchor_ratios=[0.5, 1.0, 2.0],
anchor_strides=[4, 8, 16, 32, 64],
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0],
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	1.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

how to reproduce the results of coco detection with only 4 gpus

I have 4 tesla v100 gpus, can you recommend the script to reproduce the results

Code for Kinetics dataset

Hi, thanks for releasing your code.

Do you have plans to release your training code for Kinetics dataset anytime soon?

Zero init of conv layers before addition

Thanks for sharing your excellent work! I haven't run the code but I am curious about the implementation of GCNet. Is the Wv2 of conv layer before addition is initialized to zero to not affect the initial behaviour of the original backbone? This has been implemented in the Non-local net by set the scale of BN as zero.

Does GCNet have 1d?

Hello, thank you very much for your work, I noticed that your GCNet is 2D, but nonlocal is from 1D to 3D. Do you have 1D code?

AP, AR=-1 while evaluation at the end of each epoch

Hello, I was trying to run gcnet form the MMDetection repository. I wished to train GCNet on my custom dataset in which each image is 800x800 and all the annotations are in proper COCO format. But however, my annotations are for for boxes alone and nothing else.
I gave the respective paths and then ran the following command:

./dist_train.sh ../configs/gcnet/mask_rcnn_r101_fpn_r4_gcb_c3-c5_1x_coco.py 2

When I ran this, I had the training begun. Here's a small part of my log file when I was trying to reproduce the error:

'''

loading annotations into memory...
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
Done (t=0.78s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
2020-05-23 03:17:35,421 - mmdet - INFO - Start running, host: user@c6e7e60caee9, work_dir: /mnt/user
2.log
/mmdetection/tools/work_dirs/mask_rcnn_r101_fpn_r4_gcb_c3-c5_1x_coco
2020-05-23 03:17:35,421 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
2020-05-23 03:18:11,654 - mmdet - INFO - Epoch [1][50/1730] lr: 0.00198, eta: 4:09:57, time: 0.724, data_time: 0.267, memory: 4932, loss_rpn_cls: 0.5057, loss_rpn_bbox: 0.2989, loss_cls: 1.0971, acc:
85.3574, loss_bbox: 0.0454, loss_mask: 0.4885, loss: 2.4356
2020-05-23 03:18:39,866 - mmdet - INFO - Epoch [1][100/1730] lr: 0.00398, eta: 3:41:52, time: 0.565, data_time: 0.101, memory: 4932, loss_rpn_cls: 0.2872, loss_rpn_bbox: 0.2404, loss_cls: 0.4054, acc:
93.0947, loss_bbox: 0.1764, loss_mask: 0.3700, loss: 1.4794
2020-05-23 03:19:08,571 - mmdet - INFO - Epoch [1][150/1730] lr: 0.00597, eta: 3:33:17, time: 0.574, data_time: 0.101, memory: 5062, loss_rpn_cls: 0.1821, loss_rpn_bbox: 0.2431, loss_cls: 0.4640, acc:
90.5537, loss_bbox: 0.2871, loss_mask: 0.3372, loss: 1.5135
2020-05-23 03:19:37,701 - mmdet - INFO - Epoch [1][200/1730] lr: 0.00797, eta: 3:29:28, time: 0.582, data_time: 0.104, memory: 5271, loss_rpn_cls: 0.1201, loss_rpn_bbox: 0.2288, loss_cls: 0.4637, acc:
87.8535, loss_bbox: 0.3915, loss_mask: 0.3074, loss: 1.5114
2020-05-23 03:20:07,551 - mmdet - INFO - Epoch [1][250/1730] lr: 0.00997, eta: 3:27:59, time: 0.597, data_time: 0.105, memory: 5422, loss_rpn_cls: 0.1004, loss_rpn_bbox: 0.2055, loss_cls: 0.3916, acc:
87.1963, loss_bbox: 0.5066, loss_mask: 0.2894, loss: 1.4935
2020-05-23 03:20:37,901 - mmdet - INFO - Epoch [1][300/1730] lr: 0.01197, eta: 3:27:24, time: 0.607, data_time: 0.107, memory: 5422, loss_rpn_cls: 0.0506, loss_rpn_bbox: 0.1710, loss_cls: 0.3173, acc:
88.8096, loss_bbox: 0.5628, loss_mask: 0.2750, loss: 1.3767
2020-05-23 03:21:08,620 - mmdet - INFO - Epoch [1][350/1730] lr: 0.01397, eta: 3:27:11, time: 0.614, data_time: 0.108, memory: 5422, loss_rpn_cls: 0.0498, loss_rpn_bbox: 0.1539, loss_cls: 0.2873, acc:
89.3711, loss_bbox: 0.5508, loss_mask: 0.2680, loss: 1.3098
2020-05-23 03:21:39,345 - mmdet - INFO - Epoch [1][400/1730] lr: 0.01596, eta: 3:26:55, time: 0.615, data_time: 0.104, memory: 5422, loss_rpn_cls: 0.0801, loss_rpn_bbox: 0.1680, loss_cls: 0.2922, acc:
89.8096, loss_bbox: 0.5162, loss_mask: 0.2573, loss: 1.3137
2020-05-23 03:22:10,246 - mmdet - INFO - Epoch [1][450/1730] lr: 0.01796, eta: 3:26:43, time: 0.618, data_time: 0.107, memory: 5422, loss_rpn_cls: 0.0481, loss_rpn_bbox: 0.1469, loss_cls: 0.2658, acc:
90.3389, loss_bbox: 0.5173, loss_mask: 0.2487, loss: 1.2268
2020-05-23 03:22:41,038 - mmdet - INFO - Epoch [1][500/1730] lr: 0.01996, eta: 3:26:23, time: 0.616, data_time: 0.106, memory: 5422, loss_rpn_cls: 0.0358, loss_rpn_bbox: 0.1340, loss_cls: 0.2562, acc:
90.1816, loss_bbox: 0.5243, loss_mask: 0.2481, loss: 1.1984
2020-05-23 03:23:11,924 - mmdet - INFO - Epoch [1][550/1730] lr: 0.02000, eta: 3:26:04, time: 0.618, data_time: 0.107, memory: 5422, loss_rpn_cls: 0.0413, loss_rpn_bbox: 0.1406, loss_cls: 0.2616, acc:
90.0938, loss_bbox: 0.5202, loss_mask: 0.2372, loss: 1.2007
2020-05-23 03:23:42,898 - mmdet - INFO - Epoch [1][600/1730] lr: 0.02000, eta: 3:25:46, time: 0.619, data_time: 0.109, memory: 5422, loss_rpn_cls: 0.0717, loss_rpn_bbox: 0.1463, loss_cls: 0.2397, acc:
90.8945, loss_bbox: 0.4870, loss_mask: 0.2516, loss: 1.1963
2020-05-23 03:24:14,103 - mmdet - INFO - Epoch [1][650/1730] lr: 0.02000, eta: 3:25:34, time: 0.624, data_time: 0.108, memory: 5422, loss_rpn_cls: 0.0434, loss_rpn_bbox: 0.1369, loss_cls: 0.2699, acc:
89.6836, loss_bbox: 0.5065, loss_mask: 0.2456, loss: 1.2023

'''
The above log file is when I was trying to reproduce the error. But when I got the error originally, I got the AP=AR=-1 for all values after first epoch.
If you observe the bbox loss is not really changing also the learning rate has quickly come upto 0.02.
Can someone please explain to me what's the issue here ??
I am training on two GPU's. Also I wish to include validation at the end of each epoch as well.

ENVIRONMENT:
Python 3.6.9
CUDA 10.1
Using vent with ubuntu 18.04.
Thanks!!

Did anyone use GCNet on Optical Flow features?

Did anyone use GCNet on Optical Flow features? Dose it work on Optical Flow?

understanding of distance of feature map

i am confused about the following words （ last paragraph of 3.2 in paper ）

But the values of cosine distance in ‘output’ are quite small, indicating that global context features modeled by the non-local block are almost the same for different query positions.

in my opinion，smaller distance can only reflect smaller distance between feature vectors of arbitrary position than input feature vector。
Why can it reflect that global context features are the same for different locations?

performance about sync bn

How much performance improvement can you compare with or without sync bn?

runtime inrease about 15ms?

hi, I use gcnet with the setting as 'resnet50-fpn+c3~c5 r16', but runtime increases about 15ms, could you tell me the reason?

import torch
from torch import nn
import torch.nn.functional as F

def kaiming_init(module,
a=0,
mode='fan_out',
nonlinearity='relu',
bias=0,
distribution='normal'):
assert distribution in ['uniform', 'normal']
if distribution == 'uniform':
nn.init.kaiming_uniform_(
module.weight, a=a, mode=mode, nonlinearity=nonlinearity)
else:
nn.init.kaiming_normal_(
module.weight, a=a, mode=mode, nonlinearity=nonlinearity)
if hasattr(module, 'bias') and module.bias is not None:
nn.init.constant_(module.bias, bias)

def constant_init(module, val, bias=0):
nn.init.constant_(module.weight, val)
if hasattr(module, 'bias') and module.bias is not None:
nn.init.constant_(module.bias, bias)

def last_zero_init(m):
if isinstance(m, nn.Sequential):
constant_init(m[-1], val=0)
m[-1].inited = True
else:
constant_init(m, val=0)
m.inited = True

class ContextBlock2d(nn.Module):

def __init__(self, inplanes, ratio = 1./16.):
    super(ContextBlock2d, self).__init__()

    self.inplanes = inplanes
    self.planes = int(self.inplanes * ratio)
   
    self.channel_add_conv = nn.Sequential(
        nn.Conv2d(self.inplanes, self.planes, kernel_size=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(self.planes, self.inplanes, kernel_size=1)
    )
    
    self.reset_parameters()

def reset_parameters(self):
    last_zero_init(self.channel_add_conv)

def spatial_pool(self, x):
    batch, channel, height, width = x.size()
    # [N, C, 1, 1]
    context = F.avg_pool2d(x, (height, width))

    return context

def forward(self, x):
    # [N, C, 1, 1]
    context = self.spatial_pool(x)

    # [N, C, 1, 1]
    channel_add_term = self.channel_add_conv(context)
    out = x + channel_add_term

    return out

gc

Visualization code wanted

Hi, Thanks for your great work!
In your paper, attention maps of particular query points are shown. Could you share this visualization code?
More specifically, did you implement an interactive interface to do the visualization?

Many thanks again.

Does anyone have problems in training from scratch with GCNet on ImageNet?

Using the best setting of GC-ResNet50 and train it from scratch on ImageNet, I found it will be stuck in a high loss in the early epochs before the training loss begins to decline normally. Therefore the final result is much lower than original ResNet50. Note that one difference from the original paper is that the GC modules are embedded in each bottleneck exactly as SE does, for a fair comparison.

Does anyone have the same problem?

This may be the case since the authors report the ImageNet results via a finetuning setting, which is not very common when validating models on ImageNet Benchmarks. At least all other modules (SE, SK, BAM, CBAM, AA) are following a training-from-scratch setting.

When taking videos as input

When taking videos input, the feature maps in each layer have four dimensions, i.e., THW*C. Are the attention maps are still query-independent? Could you please give more details? Thanks a lot.

请问在论证attentions maps在各个位置都相同时，计算余弦和JS时，vi,vj具体指什么，怎么得到的？

代表位置的向量，很费解。在代码上如何表示？

Train on custom data

Hey,

I am trying to train custom data using GCNet. I have the data in COCO data format. I want to know the exact procedure to train it. Because, just running the train.sh script, gives me Index error.

I am changing the config file to make it work, but didn't find any luck with that. Please let me know the fields that should be changed to make it work.

Thanks.

Mask for training

Supposed I want to get benchmark using GCNet but only object detection on my custom dataset, do I need to have mask for trainging or just bounding box is enough?

what's the T in cosine learning rate when training on ImageNet?

I wanna training the gcnet on ImageNet, but when I finished the linear warmup, the loss almost no drop. I set the learning rate as 0.1 and the Tmax as 10 with CosineAnnealingLR in pytorch, is that anything wrong ? thx

singe GPU train

if I only have single GPU, how much performence without SyncBacthNorm will dorp ?, Simply to say, training on single GPU could achieve similar performance within +-1% error or not ?
should I adjust the learning rate to 1/8 of its orignal lr and set more epoch_num to get close result with orignal implements.
thanks!!!

some of the problems

in the mmdet/ops/gcb/ContextBlock

line 28

self.planes = int(inplanes * ratio)

this * Should change it to // ？

Add location based on yolov7

First of all, thank you very much for your sharing. I want to load this module into yolov7. How can I modify the yolov7.yaml configuration file

Zero mAP without mask

Hello.
Thank you for nice work, I try to use non local nets (GCNet) on practice
This config DCN + GCNet r4 + scale_augmentation and without mask -- faster RCNN (cascade)
mAP =0
I read the log and its strange acc = 97.6621 from begining to end -- maybe it is trivial solution always 0

# model settings
model = dict(
    type='CascadeRCNN',
    num_stages=3,
    pretrained='modelzoo://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        style='pytorch',
        ct=dict(
            insert_pos='after_1x1',
            ratio=1./4.,
        ),
        stage_with_ct=(False, True, True, True),
        dcn=dict(
            modulated=False,
            groups=32,
            deformable_groups=1,
            fallback_on_stride=False),
        stage_with_dcn=(False, True, True, True),
        normalize=dict(type='SyncBN', frozen=False),
        norm_eval=False,
    ),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_scales=[8],
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[4, 8, 16, 32, 64],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        use_sigmoid_cls=True),
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',
        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
        out_channels=256,
        featmap_strides=[4, 8, 16, 32]),
    bbox_head=[
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=81,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.1, 0.1, 0.2, 0.2],
            reg_class_agnostic=True),
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=81,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.05, 0.05, 0.1, 0.1],
            reg_class_agnostic=True),
        dict(
            type='SharedFCBBoxHead',
            num_fcs=2,
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=81,
            target_means=[0., 0., 0., 0.],
            target_stds=[0.033, 0.033, 0.067, 0.067],
            reg_class_agnostic=True)
    ])
# model training and testing settings
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=0,
        pos_weight=-1,
        smoothl1_beta=1 / 9.0,
        debug=False),
    rcnn=[
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.6,
                neg_iou_thr=0.6,
                min_pos_iou=0.6,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False),
        dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.7,
                min_pos_iou=0.7,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)
    ],
    stage_loss_weights=[1, 0.5, 0.25])
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=2000,
        max_num=2000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100),
    keep_all_stages=False)
# dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/COCO/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        img_scale=[(1600, 400), (1600, 1400)],
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=True,
        with_crowd=True,
        with_label=True),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=True,
        with_crowd=True,
        with_label=True),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_label=False,
        test_mode=True))
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 12
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = '/media/HD2/nsergievskiy/models/cascde_gcnet_r50'
load_from = None
resume_from = None
workflow = [('train', 1)]

20190514_202710.log

Can I use this framework to train the object detection without mask instance label ?

I only have the classes labels and boudingbox labels in my train dataset . Can i use this framework to do the object detection task ?

Config files

Hi,
I am trying to use "X-101-FPN | DCN Cascade Mask | GC(c3-c5, r4)", but I am getting errors. I think that's due to using a wrong config file. Would you let me know which config file corresponds to that model?

Thanks!

possible replacements for layernorm

Hello, the performances of layernorm might not be optimized on some inference platforms on edge devices.
Have you tried other replacements for layernorm? Replacing it with BN might not be a good idea, so how about simply remove the layernorm layer?
Thanks!

Can GCNet be used to estimation tasks

Hi,@xvjiarui
Thanks for your code sharing. Can GC block be used to refine net about regression tasks?For example, hand joint coordinate estimation

Where is the code of the "Global context (GC) block"

GCNet with pretrianed model on COCO detection?

Hello, I'm not sure if GCNet in mmdetection use ImageNet pretrained model (specifically GCNet + ResNet pretrained model) cause I saw in the GCNet/config, it only uses standard "torchvision://resnet50" in official mmedetection and "modelzoo://resnet50" in this repo as pretrained model, but in the original paper, the auther firstly trained on ImageNet and transfor to COCO detection task.

Would you explain does mmdetection use (ResNet pretrained model) or (ResNet+GC pretrained model)? if you only use ResNet pretrained model, why does COCO have such a large improvement?

Did anyone implement this paper using TensorFlow?

Thanks!

why non-local block learns a query-independent attention map in detection task

Hi, can you explain why non-local block learns a query-independent attention map in object detection task? Since in segmentation task, both OCNet and DANet have shown that spatial attention module (same as non-local block) can learn attention maps that concentrated on pixels with the same category as the query one, rather than query-independent.

How about sync bn performance in this repo compared with in pytorch1.1 ?

Where is GCNet in your GCNet-master? I want to know implementation of GCNet structure.Thank you !

how to set the "lr" when using the "ap.SyncBatchNorm"?

hi @xvjiarui
i move the code into maskrcnn-benchmark and run the config of mask_rcnn_r16_ct_c3-c5_r50_sbn_fpn_1x with the settings: 16 images / 8 GPUs, lr=0.02, and using ap.SyncBatchNorm. it encouters the NaN in the first few interations, it seems to use more GPU than mask_rcnn_r50_fpn_1x.
when i set lr to be 0.0025, the training can run successfully. so can u give me some tips how to set the lr when using the ap.SyncBatchNorm?

What make your attention be different from the DANet?

First, your work bring a new view for the NLNet and create GC Block with much less computation than NonLocal Block, which is useful for my work.
But I have read a Paper named Dual Attention Network for Scene Segmentation
, the spatial attention this paper used is the similar as NonLocal Block, the network architecture is as below：

But the visualization of attention maps is different from what you gave in the paper:

What is causing the effect of this article(Dual Attention Network for Scene Segmentation) to be different from yours?

how to visualize the attention maps for different query positions

hello， in your paper，you first visualize the attention maps for different query positions. can you give me some help. i do not know how to visualize the attention maps for different query positions
thank you!

找个GC Block这么难？

代码结构不能明白一点么？本文重点不是GC注意力模块么，怎么让人找的那么费劲，不能直接明白放出来么？

error: undefined symbol: __cudaPopCallConfiguration

After setup the code, I tried to trian the model and got this error:
anaconda3/lib/python3.7/site-packages/mmdet-0.6.0+a132aab-py3.7.egg/mmdet/ops/dcn/deform_conv_cuda.cpython-37m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

How to fix it?

Environment

PyTorch version 1.1.0
CUDA version 10.0

Simplified NL

Hi,

Thank you for your interesting papers and sharing the code.
I have a question about the paper, in particular the simplified NL module.

If I understand well, you are using the self attention in order to get some features which will permits you , then to weighted the different channel of your input images right ?

So if I want to code it from your code it will give :


context = self.spatial_pool(x) # dim context: NxCx1x1
output = conv2D(context) # conv1x1 with C input channels and C output channels , dim output NxCx1x1 

return x + output

is that right ?

Could it be used in 3D data?

Do you verified your idea in 3D model? Or in 3D data, dose the attention value of every point is still the same?

How can I use gc block in resnet18?

Considering the limitations of hardware performance, I tried to train a lightweight model.
So I use resnet18.
But resnet18 use basicblock instead of bottleneck, but it seems that you don't inplement it for basicblock.
If I want to use gc block in basicblock , how can I designed it?
Thank you !

How to visualize the attention maps of different query position?

I could not find corresponding codes in this repository.

Attention maps in Different query position

Hello,
Is it possible to view Attention maps at different query positions ?
If yes, how can we do it?

Questions about training

I found that even finetune from ImageNet pretrain, the loss grows up. I used the ResNet-vd as the baseline arch, and add GC-Block in. I also double check the performance of baseline (w/o GC Block) which is normal (~79% top-1). But when start training (finetuning), loss grows up quickly (~6.xxx) and the val Top-1 Acc is only less than 10% after first epoch. And after 20 epochs training, Top-1 Acc is near 55%. Something wrong?

文章疑问

我有个疑问就算每个点的attention map都一样为什么最后学习变成了一个通道向量？而不是map？

different between your GCNet and GENet (gather-excite network) in NIPS2018

In your paper, your analysis about the non-local block is really impressive. But finally you comes to conclude the current attention mechanism as three steps:

gather global context;
transform the global context to capture channel-wise relationship;
merge the global context to original feature

The above three steps are very similar to the formulation of "Gather-Excite Network", which also divides the attention mechanism into gathering and exciting step.

May I ask what's the difference between your GCNet and gather-excite network. Thanks!

What‘s the value of transform module mean?

I used GCNet in my model and it 's very good.But I what to know what‘s the value of transform module mean?Is it suggest the importance of each channel?The lower the value, the less important it is?Hope your anwser ,thanks!

Should GCNet always use pretrained backbone and finetune with GCNet?

Hi, I've noticed that ImageNet training takes two steps process in which ResNet without GC is trained first and finetuned further using GC block.

You describe that this was to speed up the experiment. Similarly, Kinetics used pretrained ResNet on ImageNet to inflate the Slow-Only model.

Section 4.1 from SGENet paper (https://arxiv.org/pdf/1905.09646.pdf) also notes that it was difficult to train GCNet from beginning. Have you experimented GCNet by training from scratch? Is there a reason why you chose not to train from scratch with GC module attached?

gcnet performs not good on segmentation tasks.

I am working on text semantic segmentation task.
I tried to introduce gcblock into resnet .
I performed experiments on both resnet18 and resnet50，and I find gcblock even made the model be worse.
How should i solve this？
Thank you!

got an unexpected keyword argument 'ct'

python tools/train.py /media/ices18/Data/sms/competition/JD/model/GCNet/configs/cascade_mask_rcnn_r4_ct_dconv_c3-c5_x101_32x4d_sbn_fpn_1x.py
2019-05-07 16:18:55,643 - INFO - Distributed training: False
Traceback (most recent call last):
File "tools/train.py", line 90, in
main()
File "tools/train.py", line 77, in main
cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 51, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 31, in build
return _build_module(cfg, registry, default_args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 23, in _build_module
return obj_type(**args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/detectors/cascade_rcnn.py", line 34, in init
self.backbone = builder.build_backbone(backbone)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 35, in build_backbone
return build(cfg, BACKBONES)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 31, in build
return _build_module(cfg, registry, default_args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 23, in _build_module
return obj_type(**args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/backbones/resnext.py", line 177, in init
super(ResNeXt, self).init(**kwargs)
TypeError: init() got an unexpected keyword argument 'ct'