Giter VIP home page Giter VIP logo

nas_fpn_tensorflow's Introduction

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Abstract

This repo is based on FPN, and completed by YangXue.

Train on COCO train2017 and test on COCO val2017 (coco minival).

1

COCO

Model Backbone Train Schedule GPU Image/GPU FP16 Box AP
Faster (Face++ & Detectron) R50v1-FPN 1X 8X TITAN Xp 2 no 36.4
Faster (SimpleDet) R50v1-FPN 1X 8X 1080Ti 2 no 36.5
Faster (ours) R50v1-FPN 1X 1X TITAN Xp 1 no 36.1
Faster (ours) R50v1-FPN 1X 4X TITAN Xp 1 no 36.1
Model Backbone Pyramid method Train Schedule GPU Image/GPU Stacks Dimension 3x3 relu Box AP
Faster (ours) R50v1 FPN 1X 4X TITAN Xp 1 0 256 no 36.1
Faster (ours) R50v1 FPN 1X 8X 2080Ti 1 3 256 yes 35.8
Faster (ours) R50v1 NAS-FPN 1X 8X 2080Ti 1 3 256 yes 37.9
Faster (ours) R50v1 NAS-FPN 1X 8X 2080Ti 1 7 256 yes 38.1
Faster (ours) R50v1 NAS-FPN 1X 8X 2080Ti 1 7 384 yes 38.9

My Development Environment

1、python3.5 (anaconda recommend)
2、cuda9.0 (If you want to use cuda8, please set CUDA9 = False in the cfgs.py file.)
3、opencv(cv2)
4、tfplot (optional)
5、tensorflow == 1.12

Download Model

Pretrain weights

1、Please download resnet50_v1, resnet101_v1 pre-trained models on Imagenet, put it to data/pretrained_weights.
2、(Recommend) Or you can choose to use a better backbone, refer to gluon2TF.

Trained weights

Select a configuration file in the folder ($PATH_ROOT/libs/configs/) and copy its contents into cfgs.py, then download the corresponding weights.

Others

1、COCO dataset related

Compile

cd $PATH_ROOT/libs/box_utils/cython_utils
python setup.py build_ext --inplace

Train

1、If you want to train your own data, please note:

(1) Modify parameters (such as CLASS_NUM, DATASET_NAME, VERSION, etc.) in $PATH_ROOT/libs/configs/cfgs.py
(2) Add category information in $PATH_ROOT/libs/label_name_dict/lable_dict.py     
(3) Add data_name to $PATH_ROOT/data/io/read_tfrecord.py 

2、make tfrecord

cd $PATH_ROOT/data/io/  
python convert_data_to_tfrecord_coco.py --VOC_dir='/PATH/TO/JSON/FILE/' 
                                        --save_name='train' 
                                        --dataset='coco'

3、multi-gpu train

cd $PATH_ROOT/tools
python multi_gpu_train.py

Eval

cd $PATH_ROOT/tools
python eval_coco.py --eval_data='/PATH/TO/IMAGES/'  
                    --eval_gt='/PATH/TO/TEST/ANNOTATION/'
                    --GPU='0'

Tensorboard

cd $PATH_ROOT/output/summary
tensorboard --logdir=.

3 4

Reference

1、https://github.com/endernewton/tf-faster-rcnn
2、https://github.com/zengarden/light_head_rcnn
3、https://github.com/tensorflow/models/tree/master/research/object_detection
4、https://github.com/CharlesShang/FastMaskRCNN
5、https://github.com/matterport/Mask_RCNN
6、https://github.com/msracver/Deformable-ConvNets

nas_fpn_tensorflow's People

Contributors

yangxue0827 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nas_fpn_tensorflow's Issues

mAP question

Training with my own dataset can achieve 0.95 mAP on FPN. Why on nas-fpn only have 0.3

Questions

Fix whether it is based on Resnet or NAS

  1. Accuracy difference between nas_fpn vs resnet_fpn ?
  2. Can this repo be used for rotated output ?

Link failure

Select a configuration file in the folder ($PATH_ROOT/libs/configs/) and copy its contents into cfgs.py, then download the corresponding weights.
Hello,the link failure,could you give me again?Thank you!!!

Is the code really slow?

I rebuild the code following the repo, finding the nas_fpn is too slow compared with the standard FPNm method(about 10 times slower). Is there anyone who has the similar conclusion?

NUM_FPN 这个参数什么含义

代码中:
def resnet_base(img_batch, scope_name, is_training=True):
if scope_name.endswith('b'):
get_resnet_fn = get_resnet_v1_b_base
elif scope_name.endswith('d'):
get_resnet_fn = get_resnet_v1_d_base
else:
raise ValueError("scope Name erro....")

_, feature_dict = get_resnet_fn(input_x=img_batch, scope=scope_name,
                                bottleneck_nums=BottleNeck_NUM_DICT[scope_name],
                                base_channels=BASE_CHANNELS_DICT[scope_name],
                                is_training=is_training, freeze_norm=True,
                                freeze=cfgs.FREEZE_BLOCKS)

pyramid_dict = {}
with tf.variable_scope('build_pyramid'):
    with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(cfgs.WEIGHT_DECAY),
                        activation_fn=None, normalizer_fn=None):

        P5 = slim.conv2d(feature_dict['C5'],
                         num_outputs=cfgs.FPN_CHANNEL,
                         kernel_size=[1, 1],
                         stride=1, scope='build_P5')

        pyramid_dict['P5'] = P5

        for level in range(4, 1, -1):  # build [P4, P3, P2]

            pyramid_dict['P%d' % level] = fusion_two_layer(C_i=feature_dict["C%d" % level],
                                                           P_j=pyramid_dict["P%d" % (level + 1)],
                                                           scope='build_P%d' % level)
        for level in range(5, 1, -1):
            pyramid_dict['P%d' % level] = slim.conv2d(pyramid_dict['P%d' % level],
                                                      num_outputs=cfgs.FPN_CHANNEL, kernel_size=[3, 3],
                                                      padding="SAME", stride=1, scope="fuse_P%d" % level,
                                                      activation_fn=tf.nn.relu if cfgs.USE_RELU else None)
        if "P6" in cfgs.LEVLES:
            P6 = slim.avg_pool2d(P5, kernel_size=[1, 1], stride=2, scope='build_P6')
            pyramid_dict['P6'] = P6

# for level in range(5, 1, -1):
#     add_heatmap(pyramid_dict['P%d' % level], name='Layer%d/P%d_heat' % (level, level))

for i in range(cfgs.NUM_FPN):
    pyramid_dict = fpn(pyramid_dict, i)

for i in range(cfgs.NUM_NAS_FPN):
    pyramid_dict = nas_fpn(pyramid_dict, i)

您在cfgs.py这个文件默认值为0, 为什么在已经搭建好的fpn网络之后,还要执行这个fpn函数尼?
我的理解是nas_fpn这个函数是在搜索fpn,可之前的是什么意义?以及这个参数?
感谢您!

Question about BN in nas_fpn

Hi, thanks for your code.
I try to implement NAS-FPN in pytorch, but there seems to be some errors with my BN layer (Both 1-stage(RetinaNeet) / 2-stage). I freeze the BN layer in r50 backbone, fine-tune the BN in NAS-FPN (initialized with weight=1 bias=0), am I right?
The log is like this, training seems OK, but eval is always 0 (0.001).

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
2019-08-08 01:31:15,314 - INFO - Epoch [1][7330/7330] lr: 0.02000, bbox_mAP: 0.0000, bbox_mAP_50: 0.0000, bbox_mAP_75: 0.0000, bbox_mAP_s: 0.0000, bbox_mAP_m: 0.0000, bbox_mAP_l: 0.0000, bbox_mAP_copypaste: 0.000 0.000 0.000 0.000 0.000 0.000
2019-08-08 01:31:39,196 - INFO - Epoch [2][50/7330] lr: 0.02000, eta: 8:01:08, time: 0.477, data_time: 0.070, memory: 4057, loss_rpn_cls: 0.0640, loss_rpn_bbox: 0.0223, loss_cls: 0.1917, acc: 96.7520, loss_bbox: 0.0561, loss: 0.3341
2019-08-08 01:31:56,286 - INFO - Epoch [2][100/7330] lr: 0.02000, eta: 8:00:42, time: 0.342, data_time: 0.007, memory: 4057, loss_rpn_cls: 0.0718, loss_rpn_bbox: 0.0185, loss_cls: 0.2137, acc: 96.5977, loss_bbox: 0.0619, loss: 0.3659

ValueError: Dimensions must be equal, but are 256 and 384 for 'tower_0/build_pyramid/build_P4/add' (op: 'Add') with input shapes: [1,?,?,256], [1,?,?,384].

I try to training my dataset, and i have reference the three steps in README.md, I use the pretrained weights resnet_v1_50.ckpt, and i modify the $PATH_ROOT/libs/configs/cfgs.py and $PATH_ROOT/libs/label_name_dict/lable_dict.py and $PATH_ROOT/data/io/read_tfrecord.py, the NET_NAME has been alter with "resnet_v1_50", However ,after "python ./tools/multi_gpu_train.py",the error msg:
++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--
./NAS_FPN_Tensorflow-master
WARNING:tensorflow:From multi_gpu_train.py:123: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
tfrecord path is --> ./NAS_FPN_Tensorflow-master/data/tfrecord/occlusion_recognition_train*
Traceback (most recent call last):
File "./envs/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1589, in _create_c_op
c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 256 and 384 for 'tower_0/build_pyramid/build_P4/add' (op: 'Add') with input shapes: [1,?,?,256], [1,?,?,384].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "multi_gpu_train.py", line 352, in
train()
File "multi_gpu_train.py", line 212, in train
gtboxes_batch=gtboxes_and_label)
File "../libs/networks/build_whole_network.py", line 386, in build_whole_detection_network
P_list = self.build_base_network(input_img_batch)
File "../libs/networks/build_whole_network.py", line 35, in build_base_network
return resnet.resnet_base(input_img_batch, scope_name=self.base_network_name, is_training=self.is_training)
File "../libs/networks/resnet.py", line 189, in resnet_base
scope='build_P%d' % level)
File "../libs/networks/resnet.py", line 63, in fusion_two_layer
add_f = 0.5upsample_p + 0.5reduce_dim_c
File "./envs/anaconda/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 847, in binary_op_wrapper
return func(x, y, name=name)
File "./envs/anaconda/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 297, in add
"Add", x=x, y=y, name=name)
File "./envs/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "./envs/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "./anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1756, in init
control_input_ops)
File "./envs/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1592, in _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 256 and 384 for 'tower_0/build_pyramid/build_P4/add' (op: 'Add') with input shapes: [1,?,?,256], [1,?,?,384].

检测结果置信度很低的问题

@yangJirui @yangxue0827 向大佬求助,我的参数设置和训练有什么硬伤:
用nas=2,FPN_CHANNEL=256来训练DOTA_H,数据集为裁剪后的800大小的图片,随机抽取10428张,模型采用resnet50_v1d.
学习率设置如下:
IMGNUM_IN_DATASET = 10428
EPSILON = 1e-5
MOMENTUM = 0.9
BATCH_SIZE = 1
WARM_SETP = 4IMGNUM_IN_DATASET
#make lr bigger
LR = 5e-3 * 2 * len(GPU_GROUP.strip().split(',')) * BATCH_SIZE
MAX_ITERATION = 20
SAVE_WEIGHTS_INTE
DECAY_STEP = [13IMGNUM_IN_DATASET, 20IMGNUM_IN_DATASET] # 50000, 70000
MAX_ITERATION = 20*IMGNUM_IN_DATASET
anchor设置如下:
USE_CENTER_OFFSET = True
LEVLES = ['P2', 'P3', 'P4', 'P5', 'P6']
BASE_ANCHOR_SIZE_LIST = [16, 32, 64, 128, 256]
ANCHOR_STRIDE_LIST = [4, 8, 16, 32, 64]
ANCHOR_SCALES = [1.0]
ANCHOR_RATIOS = [1, 1 / 2, 2., 1 / 3., 3., 5., 1 / 4., 4., 1 / 5., 6., 1 / 6., 7., 1 / 7.]
ROI_SCALE_FACTORS = [10., 10., 5.0, 5.0, 5.0]
ANCHOR_SCALE_FACTORS = None
FPN设置如下:
SHARE_HEADS = True
KERNEL_SIZE = 3
RPN_IOU_POSITIVE_THRESHOLD = 0.7
RPN_IOU_NEGATIVE_THRESHOLD = 0.3
TRAIN_RPN_CLOOBER_POSITIVES = False

RPN_MINIBATCH_SIZE = 256
RPN_POSITIVE_RATE = 0.5
RPN_NMS_IOU_THRESHOLD = 0.7
RPN_TOP_K_NMS_TRAIN = 12000
RPN_MAXIMUM_PROPOSAL_TARIN = 2000

RPN_TOP_K_NMS_TEST = 6000
RPN_MAXIMUM_PROPOSAL_TEST = 1000
NAS_FPN和fast rcnn设置如下:
--------------------------------------------NAS FPN config
NUM_FPN = 0
NUM_NAS_FPN = 2
USE_RELU = True
#FPN_CHANNEL = 384
FPN_CHANNEL = 256
-------------------------------------------Fast-RCNN config
ROI_SIZE = 14
ROI_POOL_KERNEL_SIZE = 2
USE_DROPOUT = False
KEEP_PROB = 1.0
SHOW_SCORE_THRSHOLD = 0.6 # only show in tensorboard
#R2CNN CONFIG
FAST_RCNN_NMS_IOU_THRESHOLD = 0.1 # 0.6
FAST_RCNN_NMS_MAX_BOXES_PER_CLASS = 150
FAST_RCNN_IOU_POSITIVE_THRESHOLD = 0.4
FAST_RCNN_IOU_NEGATIVE_THRESHOLD = 0.0 # 0.1 < IOU < 0.5 is negative
FAST_RCNN_MINIBATCH_SIZE = 256 # if is -1, that is train with OHEM
FAST_RCNN_POSITIVE_RATE = 0.35
ADD_GTBOXES_TO_TRAIN = True
USE_ATTENTION = False
在训练过程中查看tensorboard的输出,检测效果尚可。但是导出ckpt检测,15个类别检测结果的置信度都很低,加上显示检测结果的阈值以后就没有可以显示的结果了。
比如这是ship的检测结果:
P0484_0000_0000 0.046 563.9 113.7 784.6 328.9
P0484_0000_0000 0.046 563.9 433.7 784.6 648.9
P0484_0000_0000 0.032 666.6 26.5 776.4 127.9
P0484_0000_0000 0.032 23.8 16.0 133.8 139.7
P0484_0000_0000 0.031 665.6 341.6 776.1 449.2
P0484_0000_0000 0.031 665.6 629.6 776.1 737.2
P0484_0000_0000 0.031 121.0 16.2 229.1 138.9
P0484_0000_0000 0.031 537.1 16.3 645.1 138.9
P0484_0000_0000 0.031 409.1 16.3 517.1 138.9
P0484_0000_0000 0.031 313.1 16.3 421.1 138.9
P0484_0000_0000 0.031 217.1 16.3 325.1 138.9
P0484_0000_0000 0.031 772.2 738.8 798.9 786.4
P0484_0000_0000 0.030 55.8 657.7 171.0 777.6
P0484_0000_0000 0.030 0.9 741.6 55.0 798.4
P0484_0000_0000 0.030 215.6 657.5 330.7 777.6
将显示检测框的阈值设置为0.03后,所有的框都是一个形状:
image
image

Multi gpu training

Hi~
I met a problem when I use the pre-training model you supplied 'resnet50_v1d.ckpt'.

The error is like this:

NotFoundError (see above for traceback): Tensor name "resnet50_v1d/C2/bottleneck_0/conv0/BatchNorm/beta" not found in checkpoint files /home/DATA3/user-work/NAS_FPN_Tensorflow/data/pretrained_weights/resnet50_v1d.ckpt
[[Node: save/RestoreV2_15 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_15/tensor_names, save/RestoreV2_15/shape_and_slices)]]

So, is the pretrain model correct? How can I fix it?

NAS+Cascade loss 无法下降

您好,

我这边将您的NAS和Cascade连在了一起,但是loss始终降不下来。

我的实际操作就是将nas_fpn.py 放入了cascade中,然后在那边的cascade的resnet.py稍微修改了一下。

但是loss始终在3, 降不下来, 请您给点建议指点,谢谢。

The correctness of the way to reduce or unify the numbers of channels of feature maps from backbone

Good work!
I fail to find the the way to unify the numbers of channels of the feature from the backbone in the paper. According to the original design of FPN, they use 1*1 convs to reduce the numbers of channels. However, I notice that you use 3*3 2d convs to handle with the inconsistent numbers of channels. I wonder whether this design is from the paper or yourselves?
Chinese translation:
感谢你们的工作!
fpn模块涉及到两个不同channel数的feature map相加的操作,这必然要求要先把这两个feature map的通道数统一。但是我在论文中找不到他们是怎么做的。传统的FPN是用1*1conv来做降维的,但是我发现你们的代码中用的是3*3conv。特此请求确认。
感谢!

resnet101_v1d model load error

I want use resnet101_v1d , when i try to train model on my datasets ,it is failed, i can not restore the model, I want kown how to use it ?thanks

GPU is not utilized and no progress happening

I have modified the code for Resnet_152_v1 and python multi_gpu_train.py. But GPU memory 1234MB only used, But GPU is not utilized ( i have set the GPU_GROUP = "0"). CPU utilization is above 230%. But it does not show the steps/epochs. I have waited for 1 hour also, no progress (refer the screenshot below) happening. Could you please suggest me what is missing in my code and config?


Screen Shot 2021-01-20 at 1 24 07 AM

Have the search code ?

Have the search code ? I think the most import of NAS-FPN is the search method,could'you update the code for search FPN ?

How do you know the positions of two feature maps in Global pooling operations?

Thanks for your work!

In the Global pooling operation, the roles of two input feature maps are not equivalent. Only given Figure 6 and Figure 7 in the paper, I cannot know which feature map play the role of channel attention. I fail to find any latent rule from your realization. I wonder how do you determine the positions of two feature maps at every Global pooling operation. Did you ask the authors?

Another little question is why you take the mean value in Global pooling operation while it uses max pooling according to the paper.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.