A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

TeX 100.00%

remote-sensing deep-learning change-detection classification object-detection self-supervised-learning semantic-segmentation transfer-learning vision-transformer

vitae-transformer-remote-sensing's Introduction

⏰ The repo of the paper "An Empirical Study of Remote Sensing Pretraining" has been moved to RSP

Remote Sensing

This repo contains a comprehensive list of our research works related to Remote Sensing. For any related questions, please contact Di Wang at [email protected] or [email protected].

Overview

1. An Empirical Study of Remote Sensing Pretraining [TGRS-2022]

2. Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model [TGRS-2022]

3. SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model [NeurIPS-2023]

4. MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [arXiv-2024]

Projects

📘 An Empirical Study of Remote Sensing Pretraining [TGRS-2022]

Di Wang^∗, Jing Zhang^∗, Bo Du, Gui-Song Xia and Dacheng Tao

Paper | Github Code | BibTex

We train different networks from scratch with the help of the largest remote sensing scene recognition dataset up to now-MillionAID, to obtain the remote sensing pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks. Then, we investigate the impact of ImageNet pretraining (IMP) and remote sensing pretraining (RSP) on a series of downstream tasks including scene recognition, semantic segmentation, object detection, and change detection using the CNN and vision transformers backbones.

📘 Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model [TGRS-2022]

Di Wang^∗, Qiming Zhang^∗, Yufei Xu^∗, Jing Zhang, Bo Du, Dacheng Tao and Liangpei Zhang.

Paper | Github Code | BibTex

We resort to plain vision transformers with about 100M and make the first attempt to propose large vision models customized for RS tasks and propose a new rotated varied-size window attention (RVSA) to substitute the original full attention to handle the large image size and objects of various orientations in RS images. The RVSA could significantly reduce the computational cost and memory footprint while learn better object representation by extracting rich context from the generated diverse windows.

📘 SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model [NeurIPS-2023]

Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, Dacheng Tao and Liangpei Zhang.

Paper | Github Code | BibTex

In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS surpasses existing high-resolution RS segmentation datasets in size by several orders of magnitude, and provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. We hope it could facilitate research in RS segmentation, particularly in large model pre-training.

📘 MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [arXiv-2024]

Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao and Liangpei Zhang.

Paper | Github Code | BibTex

In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. We hope this research encourages further exploration of RS foundation models and anticipate the widespread application of these models across diverse fields of RS image interpretation.

vitae-transformer-remote-sensing's People

Contributors

Stargazers

Watchers

vitae-transformer-remote-sensing's Issues

About download the pretained model with change detection.

hello , I can't find the link of pretrained model ,like :RS_CLS_finetune/output/resnet_50_224/epoch120/millionAID_224_None/0.0005_0.05_192/resnet/100/ckpt.pth
Swin-Transformer-main/output/swin_tiny_patch4_window7_224/epoch120/swin_tiny_patch4_window7_224/default/ckpt.pth
...

What are the differences between 'Your_ResNet' and MMCV's ResNet vb/vc/vd

I find your config file the backbone network is 'our_resnet'. I see the code of your resnet and want to know what are the differences between general resnet and yours? Could I load the checkpoint file in mmcv's resnet vb/vc/vd directly？

Couldn't reproduce the semantic segmentation experiment

Tried to by simply eval the model, as the notebook 'Semantic segmentatin/demo/inference_demo.ipynb' do. But instead of using restnet, use the uploaded model.
By providing the config and the model of RSP-ViTAEv2-S-E100, at the notebook, init_segmentor doesn't work, due to configs related with data outside the repo.

I would be pretty happy if you provide some notebook to reproduce that. :)

Best regards

questions about exp. of semantic seg.

Hi, thanks for your great work and codebase.

The batch size is 8 in the paper, and 4 in the config of Swin-T-IMP+UperNet.
And I did not find any description of num_gpu for the semantic seg. subsection.
In the README.md of semantic seg., the command:

python -m torch.distributed.launch --nproc_per_node=1 --master_port=40001 tools/train.py \
    configs/upernet/upernet_our_r50_512x512_80k_potsdam_epoch300.py \
    --launcher 'pytorch'

which seems to set num_gpus_per_node as 1? or your command is for 2 single GPU nodes and batch_size 4 for each (2x4)?

Reproduce the SeCo DOTA result.

Hi~I'm recently working on some comparison experiments. When I fine-tune the official SeCo pre-trained model (SeCo-1M) on DOTA objection detection tasks. The test set mAP result was much lower than the paper's (TABLE VIII 70.07).

I strictly followed the experimental setup in the paper, but instead of OBBDetection I used mmrotate, and the difference is, I think, not that big.

Do you have any suggestions for reproduction? Thanks~

The mmrotate config which I use is given blow:

angle_version = 'le90'
dataset_type = 'DOTADataset'
data_root = '../DOTA'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version='le90'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='RResize'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='DOTADataset',
        ann_file=data_root + "/trainVal/annfiles",
        img_prefix=data_root + "/trainVal/images",
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(
                type='RRandomFlip',
                flip_ratio=[0.25, 0.25, 0.25],
                direction=['horizontal', 'vertical', 'diagonal'],
                version='le90'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ]),
    val=dict(
        type='DOTADataset',
        ann_file=data_root + "/trainVal/annfiles",
        img_prefix=data_root + "/trainVal/images",
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='DOTADataset',
        ann_file=data_root + "/test/annfiles",
        img_prefix=data_root + "/tests/images",
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1024, 1024),
                flip=False,
                transforms=[
                    dict(type='RResize'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='mAP')
optimizer = dict(type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(
    type='Fp16OptimizerHook',
    distributed=False,
    grad_clip=dict(max_norm=35.0, norm_type=2))
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.3333333333333333,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(
    interval=50,
    hooks=[dict(type='TextLoggerHook'),
           dict(type='TensorboardLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
model = dict(
    type='OrientedRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(
            type='Pretrained',
            checkpoint='../pretrain_checkpoint/SeCo1m.pth')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='OrientedRPNHead',
        in_channels=256,
        feat_channels=256,
        version='le90',
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='MidpointOffsetCoder',
            angle_range='le90',
            target_means=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='OrientedStandardRoIHead',
        bbox_roi_extractor=dict(
            type='RotatedSingleRoIExtractor',
            roi_layer=dict(
                type='RoIAlignRotated',
                out_size=7,
                sample_num=2,
                clockwise=True),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='RotatedShared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=15,
            bbox_coder=dict(
                type='DeltaXYWHAOBBoxCoder',
                angle_range='le90',
                norm_factor=None,
                edge_swap=True,
                proj_xy=True,
                target_means=(0.0, 0.0, 0.0, 0.0, 0.0),
                target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)),
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                iou_calculator=dict(type='RBboxOverlaps2D'),
                ignore_iof_thr=-1),
            sampler=dict(
                type='RRandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            nms_pre=2000,
            min_bbox_size=0,
            score_thr=0.05,
            nms=dict(iou_thr=0.1),
            max_per_img=2000)))
work_dir = './seco_result'
auto_resume = False
gpu_ids = [0]

label讀取的問題

在測試和訓練時，讀取dota1.0數據時，json和pkl中分別寫什麽，可以給一個示例嗎？運行時，測試時，提示我在lable文件夾裏面沒有split_config_json和ori_annfile_pkl，您可以及時回復我嗎？多謝

模型训练问题

用的mmseg原生的potsdam.py预处理的
参照#9 用的potsdam_ori.py
config
upernet_swin_tiny_patch4_window7_512x512_80k_potsdam
参数未改
batch_size = 8
model = dict(
pretrained='checkpoint/upernet-rsp-swin-t-potsdam-latest.pth')
pipline跟data中的
reduce_zero_label = True

2023-03-10 00:13:21,947 - mmseg - INFO - Iter [80000/80000] lr: 7.500e-10, eta: 0:00:00, time: 0.378, data_time: 0.003, memory: 15450, decode.loss_ce: 0.1454, decode.acc_seg: 74.9332, aux.loss_ce: 0.0674, aux.acc_seg: 74.0879, loss: 0.2129

+--------------------+-------+-------+--------+-----------+--------+
| Class | IoU | Acc | Fscore | Precision | Recall |
+--------------------+-------+-------+--------+-----------+--------+
| impervious_surface | 82.19 | 92.91 | 90.23 | 87.69 | 92.91 |
| building | 91.04 | 97.03 | 95.31 | 93.64 | 97.03 |
| low_vegetation | 70.99 | 88.3 | 83.03 | 78.36 | 88.3 |
| tree | 73.7 | 83.01 | 84.86 | 86.8 | 83.01 |
| car | 81.17 | 89.05 | 89.61 | 90.17 | 89.05 |
| clutter | 0.0 | 0.0 | nan | nan | 0.0 |
+--------------------+-------+-------+--------+-----------+--------+
2023-03-10 00:13:54,226 - mmseg - INFO - Summary:
2023-03-10 00:13:54,226 - mmseg - INFO -
+-------+-------+-------+---------+------------+---------+
| aAcc | mIoU | mAcc | mFscore | mPrecision | mRecall |
+-------+-------+-------+---------+------------+---------+
| 86.86 | 66.52 | 75.05 | 88.61 | 87.33 | 75.05 |
+-------+-------+-------+---------+------------+---------+

F1 88.61
请问，哪里出问题了么
感谢:)

模型预训练权重在哪下载

你好！我在训练是发生找不到权重 VitAE_window/output/ViTAE_Window_NoShift_12_basic_stages4_14_224/epoch100/ViTAE_Window_NoShift_12_basic_stages4_14/default/ckpt.pth

DIOR-R Benchmark question.

In your paper, the performance of your model in dior-r dataset is showed in table.
However, there is no information on whether this is single-scale or multi-scale.
I want to know the performance of your model in dior-r dataset is in single-scale setting or multi-scale setting.
Thank you.

变化检测预训练权重问题

如图，加载预训练权重，报错，没遇到过这样的错误，还请解惑一下。具体报错：

ModuleNotFoundError: No module named 'mmdet.version'

Traceback (most recent call last):
File "/home/dgx/workspace/cui/ViTAE/tools/train.py", line 13, in
from mmdet.apis import set_random_seed, train_detector
File "/home/dgx/workspace/cui/ViTAE/mmdet/init.py", line 1, in
from .version import version, short_version
ModuleNotFoundError: No module named 'mmdet.version'

in the mmdet/init.py, I found the code to be written like this

from .version import version, short_version

all = ['version', 'short_version']

but the .version is not the python file, in the .version file, It is only one line of code
2.2.0

Semantic Segmentation: Potsdam 数据集复现性能差距有点大

未修改 ’upernet_vitae_win_window7_512x512_80k_potsdam_epoch100.py ‘参数，分别使用
RGB+ Label_all，IRRG + Label_all复现了一遍

我是使用 tools/convert_datasets/potsdam.py 脚本处理的

config, model

结果log和放出的log差距较大

是我处理的Potsdam数据集不对？

change detection

我在运行python eval.py
--backbone 'swin' --dataset 'levir' --mode 'rsp_300'
--path [model path] 实例时报错如下：

Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2

请问你们用的pytorch那个版本，change detection 我安装按照
https://github.com/likyoo/Siam-NestedUNet/blob/master/README.md
Requirements
Python 3.6

Pytorch 1.4

torchvision 0.5.0

other packages needed

pip install opencv-python tqdm tensorboardX sklearn

帮忙分析下，谢谢了。

About Labels of Million-AID Dataset

The original split of MillionAID is used for recognition. Our study is about pretraining, so we resplit the training and testing sets. The obtained training set is relatively large for transferring the pretrained weights to downstream tasks. All RSP pretrained weights are available at https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing/blob/main/README.md

Thank you very much. Your work is very inspiring to us. We are working to do some research in the field of remote sensing image pre-training and would like to use your work as a baseline. However, we found that the open source MillionAID data has less annotated data than in your paper, so we propose this issue.

Originally posted by @pUmpKin-Co in #3 (comment)

Hello,
I wasn't able to understand what the conclusion is.

we resplit the training and testing sets.

I think the annotated data is needed to use all images and split the images again. Is the labeled data provided for test data? I mean, in my understanding, we need both "train_label.txt" and "valid_label.txt" to use million-AID, but I don't know where I can download them. I appreciate your help.

Can't find the hrsc2016 in configs/_base_/datasets?

ann_file是什么格式的，怎么把八点法的labelTxt转成ann_file

KeyError: "EncoderDecoder: 'ViTAE_Window_NoShift_basic is not in the models registry'"

作者您好，我在尝试复现您在论文中，在Potsdam数据集上的实验时，运行以下代码
python -m torch.distributed.launch --nproc_per_node=1 --master_port=40001 tools/train.py configs/vitae_win/upernet_vitae_win_imp_window7_512x512_80k_potsdam.py --launcher 'pytorch'，
但是出现了KeyError: "EncoderDecoder: 'ViTAE_Window_NoShift_basic is not in the models registry'"的问题，请问这是为什么呢？有什么办法解决吗？
我还想请教一下如何正确复现您的实验呢？

reproduce problem about swin-t in scene classification.

Hi, I try to follow your hyperparameters to reproduce the classification results in misclassification, but I train aid (2:8) using max_epochs=200, base_lr=5e-4, and other settings following:
base = [
# '../base/models/swin_transformer/base_224.py',
# "../base/datasets/ucmerced_landuse_bs64_swin_224.py",
"../base/datasets/aid_bs64_autoaug.py",
"../base/schedules/imagenet_bs64_adamw_swin.py",
"../base/default_runtime.py",
]

refer to SimMIM paper

ADJUST_FACTOR = 1.0
BATCH_SIZE = 64
BASE_LR = 5e-4 * ADJUST_FACTOR # todo: adjust.
WARMUP_LR = 5e-7 * ADJUST_FACTOR
MIN_LR = 5e-6 * ADJUST_FACTOR
NUM_GPUS = 1
DROP_PATH_RATE = 0.2
SCALE_FACTOR = 512.0
MAX_EPOCHS = 200

model settings

model = dict(
type="ImageClassifier",
backbone=dict(
type="SwinTransformer",
# arch="base",
arch="tiny",
img_size=224,
# drop_path_rate=0.1, # DROP_PATH_RATE
drop_path_rate=DROP_PATH_RATE,
),
neck=dict(type="GlobalAveragePooling"),
head=dict(
type="LinearClsHead",
num_classes=21,
# in_channels=1024,
in_channels=768,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(type="LabelSmoothLoss", label_smooth_val=0.1, mode="original"),
cal_acc=False,
),
init_cfg=[
dict(type="TruncNormal", layer="Linear", std=0.02, bias=0.0),
dict(type="Constant", layer="LayerNorm", val=1.0, bias=0.0),
],
train_cfg=dict(
augments=[
dict(type="BatchMixup", alpha=0.8, num_classes=21, prob=0.5),
dict(type="BatchCutMix", alpha=1.0, num_classes=21, prob=0.5),
]
),
)

optimizer

paramwise_cfg = dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys={
".absolute_pos_embed": dict(decay_mult=0.0),
".relative_position_bias_table": dict(decay_mult=0.0),
},
)

optimizer = dict(
type="AdamW",
# lr=1e-3 * 64 / 256, # 5e-4 * 64 / 512, # 1e-3 * 64 / 256,
# lr=1.25e-3 * 96 * 1 / 512.0,
# BASE_LR * BATCH_SIZE * NUM_GPUS / 512.0, # 1e-3 * 64 / 256,
lr=BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR,
weight_decay=0.05,
eps=1e-8,
betas=(0.9, 0.999),
paramwise_cfg=paramwise_cfg,
)
optimizer_config = dict(grad_clip=dict(max_norm=5.0))

learning policy

lr_config = dict(
policy="CosineAnnealing",
# min_lr=2.5e-7,
# by_epoch=False, # todo: try
by_epoch=False,
# min_lr_ratio=(2.5e-7 * 96 * 1 / 512.0) / (1.25e-3 * 96 * 1 / 512.0), # 1e-2,
min_lr_ratio=(MIN_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR)
/ (BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR),
# min_lr=2.5e-7, # MIN_LR,
warmup="linear",
# warmup_ratio=(2.5e-7 * 96 * 1 / 512.0) / (1.25e-3 * 96 * 1 / 512.0), # 1e-3,
warmup_ratio=(WARMUP_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR)
/ (BASE_LR * BATCH_SIZE * NUM_GPUS / SCALE_FACTOR),
# warmup_lr=2.5e-7, # WARMUP_LR,
warmup_iters=20, # todo: 0
warmup_by_epoch=True,
)

checkpoint_config = dict(interval=MAX_EPOCHS // 10)
evaluation = dict(
interval=MAX_EPOCHS // 10, metric="accuracy", save_best="auto"
) # save the checkpoint with highest accuracy
runner = dict(type="EpochBasedRunner", max_epochs=MAX_EPOCHS)

data = dict(samples_per_gpu=96, workers_per_gpu=8,)

data = dict(samples_per_gpu=BATCH_SIZE, workers_per_gpu=8,)

fp16 settings

fp16 = dict(loss_scale="dynamic")

so could you help me with this? or provide your training log?
Thanks!

Where is the train_labels_{}_{}.txt for scene recognition?

I am running the code of scene recognition and I have downloaded the AID, UCM, and NWPU datasets from their official webpages.

But there are no train_labels_{}{}.txt in these datasets. Where are train_labels{}_{}.txt used in your code?

About Reproducing Training of Remote Sensing Semantic Segmentation Models

Hello, I am interested in the remote sensing semantic segmentation of this project. I have downloaded all the relevant Potsdam datasets and configured the program running environment, but the reproduction steps of the entire training are still unclear. The downloaded public datasets Whether preprocessing is required, images and label images need to be cut into small pieces. If you don't do distributed training, you can simply remove the distributed settings when training on one card. Do you have a more detailed description of the training steps? The description of the training parameter configuration config file allows us to reproduce the model training. Thank you.

What does this ‘BIT’ abbreviation stand for

hello，great job. What does this ‘BIT’ abbreviation stand for? ths.

模型注册问题

KeyError: 'swin is not in the models registry'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
return build_from_cfg(cfg, registry, default_args) File "/media/user/volume/PycharmProjects/ViTAE-Transformer-Remote-Sensing-main/Semantic Segmentation/tools/train.py", line 234, in

File "/home/user/.conda/envs/pytorch/lib/python3.10/site-packages/mmcv/utils/registry.py", line 61, in build_from_cfg
raise KeyError(
KeyError: 'swin is not in the models registry'

求问如何debug

mmcv版本问题

感谢您提出了这么优秀的模型！我在尝试使用您的模型框架时发现，您的代码版本已经和现在mmcv2.X版本的不匹配，您能提供下你当时所使用的版本吗？

About the IMP weights on change detection

Hello, @DotWang. Your work is great. The results in your paper show that the bit with IMP-ViTAEv2-S weights performs best. So I wonder whether the pretrained weights from IMP on change detection will be released. Thank you very much.

About MillionAID dataset

I downloaded the MillionAID dataset from the official homepage: MillionAID. The training set was found to have only 10K images. The test set is not labeled. May I know how the pre-training data in the paper was obtained?

use one image to test issues

想问一下，我在利用您给的ViTAEv2模型做语义分割测试时，发现所给的图片无法通过测试，问题出在ReductionCell.py的assert N==HW，但我在利用另外两个权重对该图片进行测试时均可正常产生结果。
1.所以该ViTAEv2模型是否只支持图片大小为2的倍数的情况呢？
2.当我使用1024512的png图片进行预测时，依旧会出问题，在ViTAE_Window_NoShiftt/base_model.py的outs.append(x.view(b,wh,wh,-1).permute(0,3,1,2))处也会出现size不一致的问题

我是用的权重如下图

如果您有空回答，十分感谢

数据集处理问题

感谢您提出了这么优秀的模型！我在使用您的模型框架对我的数据进行训练的过程中，测试结果为‘forest land’这一类标签他的IoU=0，Acc=0，Fscore=nan，precision=nan，影响了我的测试结果，经过debug发现可能是标签设置过程中forest land这一类标签没有打上去，该如何修改呢

vitae-transformer / vitae-transformer-remote-sensing Goto Github PK

vitae-transformer-remote-sensing's Introduction

⏰ The repo of the paper "An Empirical Study of Remote Sensing Pretraining" has been moved to RSP

Remote Sensing

Overview

Projects

📘 An Empirical Study of Remote Sensing Pretraining [TGRS-2022]

📘 Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model [TGRS-2022]

📘 SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model [NeurIPS-2023]

📘 MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [arXiv-2024]

vitae-transformer-remote-sensing's People

Contributors

Stargazers

Watchers

Forkers

vitae-transformer-remote-sensing's Issues

other packages needed

refer to SimMIM paper

model settings

optimizer

learning policy

data = dict(samples_per_gpu=96, workers_per_gpu=8,)

fp16 settings

Recommend Projects

Recommend Topics

Recommend Org