xvjiarui / gcnet Goto Github PK
View Code? Open in Web Editor NEWGCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
License: Apache License 2.0
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
License: Apache License 2.0
Hello,can I change the model in faster-rcnn resnet instead of mask rcnn only by change this demo?
//---------------------------------------------------------------------------------------------------------------------
model settings
model = dict(
type='FasterRCNN',
pretrained='torchvision://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
gcb=dict(ratio=1. / 4., ),
stage_with_gcb=(False, True, True, True),
style='pytorch'),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_scales=[8],
anchor_ratios=[0.5, 1.0, 2.0],
anchor_strides=[4, 8, 16, 32, 64],
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0],
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.
Here are the OpenMMLab 2.0 repos branches:
OpenMMLab 1.0 branch | OpenMMLab 2.0 branch | |
---|---|---|
MMEngine | 0.x | |
MMCV | 1.x | 2.x |
MMDetection | 0.x 、1.x、2.x | 3.x |
MMAction2 | 0.x | 1.x |
MMClassification | 0.x | 1.x |
MMSegmentation | 0.x | 1.x |
MMDetection3D | 0.x | 1.x |
MMEditing | 0.x | 1.x |
MMPose | 0.x | 1.x |
MMDeploy | 0.x | 1.x |
MMTracking | 0.x | 1.x |
MMOCR | 0.x | 1.x |
MMRazor | 0.x | 1.x |
MMSelfSup | 0.x | 1.x |
MMRotate | 1.x | 1.x |
MMYOLO | 0.x |
Attention: please create a new virtual environment for OpenMMLab 2.0.
I have 4 tesla v100 gpus, can you recommend the script to reproduce the results
Hi, thanks for releasing your code.
Do you have plans to release your training code for Kinetics dataset anytime soon?
Thanks for sharing your excellent work! I haven't run the code but I am curious about the implementation of GCNet. Is the Wv2 of conv layer before addition is initialized to zero to not affect the initial behaviour of the original backbone? This has been implemented in the Non-local net by set the scale of BN as zero.
Hello, thank you very much for your work, I noticed that your GCNet is 2D, but nonlocal is from 1D to 3D. Do you have 1D code?
Hello, I was trying to run gcnet form the MMDetection repository. I wished to train GCNet on my custom dataset in which each image is 800x800 and all the annotations are in proper COCO format. But however, my annotations are for for boxes alone and nothing else.
I gave the respective paths and then ran the following command:
./dist_train.sh ../configs/gcnet/mask_rcnn_r101_fpn_r4_gcb_c3-c5_1x_coco.py 2
When I ran this, I had the training begun. Here's a small part of my log file when I was trying to reproduce the error:
'''
loading annotations into memory...
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
Done (t=0.78s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
2020-05-23 03:17:35,421 - mmdet - INFO - Start running, host: user@c6e7e60caee9, work_dir: /mnt/user
2.log
/mmdetection/tools/work_dirs/mask_rcnn_r101_fpn_r4_gcb_c3-c5_1x_coco
2020-05-23 03:17:35,421 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
2020-05-23 03:18:11,654 - mmdet - INFO - Epoch [1][50/1730] lr: 0.00198, eta: 4:09:57, time: 0.724, data_time: 0.267, memory: 4932, loss_rpn_cls: 0.5057, loss_rpn_bbox: 0.2989, loss_cls: 1.0971, acc:
85.3574, loss_bbox: 0.0454, loss_mask: 0.4885, loss: 2.4356
2020-05-23 03:18:39,866 - mmdet - INFO - Epoch [1][100/1730] lr: 0.00398, eta: 3:41:52, time: 0.565, data_time: 0.101, memory: 4932, loss_rpn_cls: 0.2872, loss_rpn_bbox: 0.2404, loss_cls: 0.4054, acc:
93.0947, loss_bbox: 0.1764, loss_mask: 0.3700, loss: 1.4794
2020-05-23 03:19:08,571 - mmdet - INFO - Epoch [1][150/1730] lr: 0.00597, eta: 3:33:17, time: 0.574, data_time: 0.101, memory: 5062, loss_rpn_cls: 0.1821, loss_rpn_bbox: 0.2431, loss_cls: 0.4640, acc:
90.5537, loss_bbox: 0.2871, loss_mask: 0.3372, loss: 1.5135
2020-05-23 03:19:37,701 - mmdet - INFO - Epoch [1][200/1730] lr: 0.00797, eta: 3:29:28, time: 0.582, data_time: 0.104, memory: 5271, loss_rpn_cls: 0.1201, loss_rpn_bbox: 0.2288, loss_cls: 0.4637, acc:
87.8535, loss_bbox: 0.3915, loss_mask: 0.3074, loss: 1.5114
2020-05-23 03:20:07,551 - mmdet - INFO - Epoch [1][250/1730] lr: 0.00997, eta: 3:27:59, time: 0.597, data_time: 0.105, memory: 5422, loss_rpn_cls: 0.1004, loss_rpn_bbox: 0.2055, loss_cls: 0.3916, acc:
87.1963, loss_bbox: 0.5066, loss_mask: 0.2894, loss: 1.4935
2020-05-23 03:20:37,901 - mmdet - INFO - Epoch [1][300/1730] lr: 0.01197, eta: 3:27:24, time: 0.607, data_time: 0.107, memory: 5422, loss_rpn_cls: 0.0506, loss_rpn_bbox: 0.1710, loss_cls: 0.3173, acc:
88.8096, loss_bbox: 0.5628, loss_mask: 0.2750, loss: 1.3767
2020-05-23 03:21:08,620 - mmdet - INFO - Epoch [1][350/1730] lr: 0.01397, eta: 3:27:11, time: 0.614, data_time: 0.108, memory: 5422, loss_rpn_cls: 0.0498, loss_rpn_bbox: 0.1539, loss_cls: 0.2873, acc:
89.3711, loss_bbox: 0.5508, loss_mask: 0.2680, loss: 1.3098
2020-05-23 03:21:39,345 - mmdet - INFO - Epoch [1][400/1730] lr: 0.01596, eta: 3:26:55, time: 0.615, data_time: 0.104, memory: 5422, loss_rpn_cls: 0.0801, loss_rpn_bbox: 0.1680, loss_cls: 0.2922, acc:
89.8096, loss_bbox: 0.5162, loss_mask: 0.2573, loss: 1.3137
2020-05-23 03:22:10,246 - mmdet - INFO - Epoch [1][450/1730] lr: 0.01796, eta: 3:26:43, time: 0.618, data_time: 0.107, memory: 5422, loss_rpn_cls: 0.0481, loss_rpn_bbox: 0.1469, loss_cls: 0.2658, acc:
90.3389, loss_bbox: 0.5173, loss_mask: 0.2487, loss: 1.2268
2020-05-23 03:22:41,038 - mmdet - INFO - Epoch [1][500/1730] lr: 0.01996, eta: 3:26:23, time: 0.616, data_time: 0.106, memory: 5422, loss_rpn_cls: 0.0358, loss_rpn_bbox: 0.1340, loss_cls: 0.2562, acc:
90.1816, loss_bbox: 0.5243, loss_mask: 0.2481, loss: 1.1984
2020-05-23 03:23:11,924 - mmdet - INFO - Epoch [1][550/1730] lr: 0.02000, eta: 3:26:04, time: 0.618, data_time: 0.107, memory: 5422, loss_rpn_cls: 0.0413, loss_rpn_bbox: 0.1406, loss_cls: 0.2616, acc:
90.0938, loss_bbox: 0.5202, loss_mask: 0.2372, loss: 1.2007
2020-05-23 03:23:42,898 - mmdet - INFO - Epoch [1][600/1730] lr: 0.02000, eta: 3:25:46, time: 0.619, data_time: 0.109, memory: 5422, loss_rpn_cls: 0.0717, loss_rpn_bbox: 0.1463, loss_cls: 0.2397, acc:
90.8945, loss_bbox: 0.4870, loss_mask: 0.2516, loss: 1.1963
2020-05-23 03:24:14,103 - mmdet - INFO - Epoch [1][650/1730] lr: 0.02000, eta: 3:25:34, time: 0.624, data_time: 0.108, memory: 5422, loss_rpn_cls: 0.0434, loss_rpn_bbox: 0.1369, loss_cls: 0.2699, acc:
89.6836, loss_bbox: 0.5065, loss_mask: 0.2456, loss: 1.2023
'''
The above log file is when I was trying to reproduce the error. But when I got the error originally, I got the AP=AR=-1 for all values after first epoch.
If you observe the bbox loss is not really changing also the learning rate has quickly come upto 0.02.
Can someone please explain to me what's the issue here ??
I am training on two GPU's. Also I wish to include validation at the end of each epoch as well.
ENVIRONMENT:
Python 3.6.9
CUDA 10.1
Using vent with ubuntu 18.04.
Thanks!!
Did anyone use GCNet on Optical Flow features? Dose it work on Optical Flow?
i am confused about the following words ( last paragraph of 3.2 in paper )
But the values of cosine distance in ‘output’ are quite small, indicating that global context features modeled by the non-local block are almost the same for different query positions.
in my opinion,smaller distance can only reflect smaller distance between feature vectors of arbitrary position than input feature vector。
Why can it reflect that global context features are the same for different locations?
How much performance improvement can you compare with or without sync bn?
hi, I use gcnet with the setting as 'resnet50-fpn+c3~c5 r16', but runtime increases about 15ms, could you tell me the reason?
import torch
from torch import nn
import torch.nn.functional as F
def kaiming_init(module,
a=0,
mode='fan_out',
nonlinearity='relu',
bias=0,
distribution='normal'):
assert distribution in ['uniform', 'normal']
if distribution == 'uniform':
nn.init.kaiming_uniform_(
module.weight, a=a, mode=mode, nonlinearity=nonlinearity)
else:
nn.init.kaiming_normal_(
module.weight, a=a, mode=mode, nonlinearity=nonlinearity)
if hasattr(module, 'bias') and module.bias is not None:
nn.init.constant_(module.bias, bias)
def constant_init(module, val, bias=0):
nn.init.constant_(module.weight, val)
if hasattr(module, 'bias') and module.bias is not None:
nn.init.constant_(module.bias, bias)
def last_zero_init(m):
if isinstance(m, nn.Sequential):
constant_init(m[-1], val=0)
m[-1].inited = True
else:
constant_init(m, val=0)
m.inited = True
class ContextBlock2d(nn.Module):
def __init__(self, inplanes, ratio = 1./16.):
super(ContextBlock2d, self).__init__()
self.inplanes = inplanes
self.planes = int(self.inplanes * ratio)
self.channel_add_conv = nn.Sequential(
nn.Conv2d(self.inplanes, self.planes, kernel_size=1),
nn.ReLU(inplace=True),
nn.Conv2d(self.planes, self.inplanes, kernel_size=1)
)
self.reset_parameters()
def reset_parameters(self):
last_zero_init(self.channel_add_conv)
def spatial_pool(self, x):
batch, channel, height, width = x.size()
# [N, C, 1, 1]
context = F.avg_pool2d(x, (height, width))
return context
def forward(self, x):
# [N, C, 1, 1]
context = self.spatial_pool(x)
# [N, C, 1, 1]
channel_add_term = self.channel_add_conv(context)
out = x + channel_add_term
return out
Hi, Thanks for your great work!
In your paper, attention maps of particular query points are shown. Could you share this visualization code?
More specifically, did you implement an interactive interface to do the visualization?
Many thanks again.
Using the best setting of GC-ResNet50 and train it from scratch on ImageNet, I found it will be stuck in a high loss in the early epochs before the training loss begins to decline normally. Therefore the final result is much lower than original ResNet50. Note that one difference from the original paper is that the GC modules are embedded in each bottleneck exactly as SE does, for a fair comparison.
Does anyone have the same problem?
This may be the case since the authors report the ImageNet results via a finetuning setting, which is not very common when validating models on ImageNet Benchmarks. At least all other modules (SE, SK, BAM, CBAM, AA) are following a training-from-scratch setting.
When taking videos input, the feature maps in each layer have four dimensions, i.e., THW*C. Are the attention maps are still query-independent? Could you please give more details? Thanks a lot.
代表位置的向量,很费解。在代码上如何表示?
Hey,
I am trying to train custom data using GCNet. I have the data in COCO data format. I want to know the exact procedure to train it. Because, just running the train.sh script, gives me Index error.
I am changing the config file to make it work, but didn't find any luck with that. Please let me know the fields that should be changed to make it work.
Thanks.
Supposed I want to get benchmark using GCNet but only object detection on my custom dataset, do I need to have mask for trainging or just bounding box is enough?
I wanna training the gcnet on ImageNet, but when I finished the linear warmup, the loss almost no drop. I set the learning rate as 0.1 and the Tmax as 10 with CosineAnnealingLR in pytorch, is that anything wrong ? thx
if I only have single GPU, how much performence without SyncBacthNorm will dorp ?, Simply to say, training on single GPU could achieve similar performance within +-1% error or not ?
should I adjust the learning rate to 1/8 of its orignal lr and set more epoch_num to get close result with orignal implements.
thanks!!!
in the mmdet/ops/gcb/ContextBlock
line 28
self.planes = int(inplanes * ratio)
this * Should change it to // ?
First of all, thank you very much for your sharing. I want to load this module into yolov7. How can I modify the yolov7.yaml configuration file
Hello.
Thank you for nice work, I try to use non local nets (GCNet) on practice
This config DCN + GCNet r4 + scale_augmentation and without mask -- faster RCNN (cascade)
mAP =0
I read the log and its strange acc = 97.6621 from begining to end -- maybe it is trivial solution always 0
# model settings
model = dict(
type='CascadeRCNN',
num_stages=3,
pretrained='modelzoo://resnet50',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch',
ct=dict(
insert_pos='after_1x1',
ratio=1./4.,
),
stage_with_ct=(False, True, True, True),
dcn=dict(
modulated=False,
groups=32,
deformable_groups=1,
fallback_on_stride=False),
stage_with_dcn=(False, True, True, True),
normalize=dict(type='SyncBN', frozen=False),
norm_eval=False,
),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_scales=[8],
anchor_ratios=[0.5, 1.0, 2.0],
anchor_strides=[4, 8, 16, 32, 64],
target_means=[.0, .0, .0, .0],
target_stds=[1.0, 1.0, 1.0, 1.0],
use_sigmoid_cls=True),
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=[
dict(
type='SharedFCBBoxHead',
num_fcs=2,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=81,
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2],
reg_class_agnostic=True),
dict(
type='SharedFCBBoxHead',
num_fcs=2,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=81,
target_means=[0., 0., 0., 0.],
target_stds=[0.05, 0.05, 0.1, 0.1],
reg_class_agnostic=True),
dict(
type='SharedFCBBoxHead',
num_fcs=2,
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=81,
target_means=[0., 0., 0., 0.],
target_stds=[0.033, 0.033, 0.067, 0.067],
reg_class_agnostic=True)
])
# model training and testing settings
train_cfg = dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
smoothl1_beta=1 / 9.0,
debug=False),
rcnn=[
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.6,
neg_iou_thr=0.6,
min_pos_iou=0.6,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.7,
min_pos_iou=0.7,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)
],
stage_loss_weights=[1, 0.5, 0.25])
test_cfg = dict(
rpn=dict(
nms_across_levels=False,
nms_pre=2000,
nms_post=2000,
max_num=2000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=dict(
score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100),
keep_all_stages=False)
# dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/COCO/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
img_scale=[(1600, 400), (1600, 1400)],
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0.5,
with_mask=True,
with_crowd=True,
with_label=True),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0,
with_mask=True,
with_crowd=True,
with_label=True),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
img_scale=(1333, 800),
img_norm_cfg=img_norm_cfg,
size_divisor=32,
flip_ratio=0,
with_mask=False,
with_label=False,
test_mode=True))
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
# runtime settings
total_epochs = 12
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = '/media/HD2/nsergievskiy/models/cascde_gcnet_r50'
load_from = None
resume_from = None
workflow = [('train', 1)]
I only have the classes labels and boudingbox labels in my train dataset . Can i use this framework to do the object detection task ?
Hi,
I am trying to use "X-101-FPN | DCN Cascade Mask | GC(c3-c5, r4)", but I am getting errors. I think that's due to using a wrong config file. Would you let me know which config file corresponds to that model?
Thanks!
Hello, the performances of layernorm might not be optimized on some inference platforms on edge devices.
Have you tried other replacements for layernorm? Replacing it with BN might not be a good idea, so how about simply remove the layernorm layer?
Thanks!
Hi,@xvjiarui
Thanks for your code sharing. Can GC block be used to refine net about regression tasks?For example, hand joint coordinate estimation
Where is the code of the "Global context (GC) block"
Hello, I'm not sure if GCNet in mmdetection use ImageNet pretrained model (specifically GCNet + ResNet pretrained model) cause I saw in the GCNet/config, it only uses standard "torchvision://resnet50" in official mmedetection and "modelzoo://resnet50" in this repo as pretrained model, but in the original paper, the auther firstly trained on ImageNet and transfor to COCO detection task.
Would you explain does mmdetection use (ResNet pretrained model) or (ResNet+GC pretrained model)? if you only use ResNet pretrained model, why does COCO have such a large improvement?
Thanks!
Hi, can you explain why non-local block learns a query-independent attention map in object detection task? Since in segmentation task, both OCNet and DANet have shown that spatial attention module (same as non-local block) can learn attention maps that concentrated on pixels with the same category as the query one, rather than query-independent.
hi @xvjiarui
i move the code into maskrcnn-benchmark and run the config of mask_rcnn_r16_ct_c3-c5_r50_sbn_fpn_1x
with the settings: 16 images / 8 GPUs, lr=0.02, and using ap.SyncBatchNorm
. it encouters the NaN
in the first few interations, it seems to use more GPU than mask_rcnn_r50_fpn_1x
.
when i set lr to be 0.0025, the training can run successfully. so can u give me some tips how to set the lr when using the ap.SyncBatchNorm?
First, your work bring a new view for the NLNet and create GC Block with much less computation than NonLocal Block, which is useful for my work.
But I have read a Paper named Dual Attention Network for Scene Segmentation
, the spatial attention this paper used is the similar as NonLocal Block, the network architecture is as below:
But the visualization of attention maps is different from what you gave in the paper:
What is causing the effect of this article(Dual Attention Network for Scene Segmentation) to be different from yours?
hello, in your paper,you first visualize the attention maps for different query positions. can you give me some help. i do not know how to visualize the attention maps for different query positions
thank you!
代码结构不能明白一点么?本文重点不是GC注意力模块么,怎么让人找的那么费劲,不能直接明白放出来么?
After setup the code, I tried to trian the model and got this error:
anaconda3/lib/python3.7/site-packages/mmdet-0.6.0+a132aab-py3.7.egg/mmdet/ops/dcn/deform_conv_cuda.cpython-37m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration
How to fix it?
Environment
Hi,
Thank you for your interesting papers and sharing the code.
I have a question about the paper, in particular the simplified NL module.
If I understand well, you are using the self attention in order to get some features which will permits you , then to weighted the different channel of your input images right ?
So if I want to code it from your code it will give :
context = self.spatial_pool(x) # dim context: NxCx1x1
output = conv2D(context) # conv1x1 with C input channels and C output channels , dim output NxCx1x1
return x + output
is that right ?
Do you verified your idea in 3D model? Or in 3D data, dose the attention value of every point is still the same?
Considering the limitations of hardware performance, I tried to train a lightweight model.
So I use resnet18.
But resnet18 use basicblock instead of bottleneck, but it seems that you don't inplement it for basicblock.
If I want to use gc block in basicblock , how can I designed it?
Thank you !
I could not find corresponding codes in this repository.
Hello,
Is it possible to view Attention maps at different query positions ?
If yes, how can we do it?
I found that even finetune from ImageNet pretrain, the loss grows up. I used the ResNet-vd as the baseline arch, and add GC-Block in. I also double check the performance of baseline (w/o GC Block) which is normal (~79% top-1). But when start training (finetuning), loss grows up quickly (~6.xxx) and the val Top-1 Acc is only less than 10% after first epoch. And after 20 epochs training, Top-1 Acc is near 55%. Something wrong?
我有个疑问 就算每个点的attention map都一样 为什么最后学习变成了一个通道向量?而不是map?
In your paper, your analysis about the non-local block is really impressive. But finally you comes to conclude the current attention mechanism as three steps:
The above three steps are very similar to the formulation of "Gather-Excite Network", which also divides the attention mechanism into gathering and exciting step.
May I ask what's the difference between your GCNet and gather-excite network. Thanks!
I used GCNet in my model and it 's very good.But I what to know what‘s the value of transform module mean?Is it suggest the importance of each channel?The lower the value, the less important it is?Hope your anwser ,thanks!
Hi, I've noticed that ImageNet training takes two steps process in which ResNet without GC is trained first and finetuned further using GC block.
You describe that this was to speed up the experiment. Similarly, Kinetics used pretrained ResNet on ImageNet to inflate the Slow-Only model.
Section 4.1 from SGENet paper (https://arxiv.org/pdf/1905.09646.pdf) also notes that it was difficult to train GCNet from beginning. Have you experimented GCNet by training from scratch? Is there a reason why you chose not to train from scratch with GC module attached?
I am working on text semantic segmentation task.
I tried to introduce gcblock into resnet .
I performed experiments on both resnet18 and resnet50,and I find gcblock even made the model be worse.
How should i solve this?
Thank you!
python tools/train.py /media/ices18/Data/sms/competition/JD/model/GCNet/configs/cascade_mask_rcnn_r4_ct_dconv_c3-c5_x101_32x4d_sbn_fpn_1x.py
2019-05-07 16:18:55,643 - INFO - Distributed training: False
Traceback (most recent call last):
File "tools/train.py", line 90, in
main()
File "tools/train.py", line 77, in main
cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 51, in build_detector
return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 31, in build
return _build_module(cfg, registry, default_args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 23, in _build_module
return obj_type(**args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/detectors/cascade_rcnn.py", line 34, in init
self.backbone = builder.build_backbone(backbone)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 35, in build_backbone
return build(cfg, BACKBONES)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 31, in build
return _build_module(cfg, registry, default_args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/builder.py", line 23, in _build_module
return obj_type(**args)
File "/home/ices18/.local/lib/python3.7/site-packages/mmdet-0.6rc0+21a6d41-py3.7.egg/mmdet/models/backbones/resnext.py", line 177, in init
super(ResNeXt, self).init(**kwargs)
TypeError: init() got an unexpected keyword argument 'ct'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.