vitae-transformer / samrs Goto Github PK

View Code? Open in Web Editor NEW

270.0 270.0 13.0 30.74 MB

The official repo for [NeurIPS'23] "SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model"

Python 77.79% Shell 0.06% C++ 2.39% Cuda 19.75%

dataset deep-learning pre-training remote-sensing sam segment-anything-model semantic-segmentation transfer-learning

samrs's Introduction

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Updates | Introduction | Statement |

Current applications

Image Classification: Please see ViTAE-Transformer for image classification;

Object Detection: Please see ViTAE-Transformer for object detection;

Sementic Segmentation: Please see ViTAE-Transformer for semantic segmentation;

Animal Pose Estimation: Please see ViTAE-Transformer for animal pose estimation;

Matting: Please see ViTAE-Transformer for matting;

Remote Sensing: Please see ViTAE-Transformer for Remote Sensing;

Updates

09/04/2021

The pretrained models for ViTAE on matting and remote sensing are released! Please try and have fun!

24/03/2021

The pretrained models for both ViTAE and ViTAEv2 are released. The code for downstream tasks are also provided for reference.

07/12/2021

The code is released!

19/10/2021

The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021

The paper is post on arxiv! The code will be made public available once cleaned up.

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers. In ViTAEv2, we explore the usage of window attentions without shift operations to obtain a better balance between memory footprint, speed, and performance. We also stack the proposed RC and NC in a multi-stage manner to faciliate the learning on other vision tasks including detection, segmentation, and pose.

Fig.1 - The details of RC and NC design in ViTAE.

Fig.2 - The multi-stage design of ViTAEv2.

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

Citing ViTAE and ViTAEv2

@article{xu2021vitae,
  title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias},
  author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
@article{zhang2022vitaev2,
  title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond},
  author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng},
  journal={arXiv preprint arXiv:2202.10108},
  year={2022}
}

samrs's People

Contributors

Stargazers

Watchers

Forkers

drroad superz678 lixianshen20 martin416 aocalderon wenquanlu tuoscar alan-pro yjingyu gloryofroad jihyunrs wangjuenew ruolidevelop

samrs's Issues

作者您好，请问vib那个模型的泛化性怎么样？如果用您的数据集训练后用到地物分割这种场景中精度会是怎样的？

百度网盘

您是否可以给一个百度网盘的链接？onedrive下载慢且不稳定。感谢

FileNotFoundError: [Errno 2] No such file or directory: '/root/dw/pretrn/sam_vit_h_4b8939.pth'

作者大大，我在把项目部署到本地的过程中报错了，但压根找不到报错的原因，求解FileNotFoundError: [Errno 2] No such file or directory: '/root/dw/pretrn/sam_vit_h_4b8939.pth'

关于用mmseg处理postsdam数据集的问题

作者大人，我反复翻看往期issue，始终不太明白用mmseg处理postsdam数据集的目的以及作用。我已经按照readme完成了模型的预训练工作，需要用postsdam数据集对模型进行finetune。参照往期issue，我选用3_Ortho_IRRG.zip和5_Labels_all.zip。我去除了3_Ortho_IRRG.zip中所有的tfw文件，只保留了tif格式。同样，5_Labels_all.zip中也只保存了tif格式文件。然后整个文件夹的结构如下图所示。我不明白还需要用mmseg处理什么内容，是用mmseg进行训练吗？还是对图像进行裁剪（因为图像尺寸6000*6000太大的原因）？还有为什么不直接使用裁剪好的RGB图像呢。（小人实在是弄得有点迷糊了，希望博主能援助一下，完成整个项目后，我想录制一个从0复现整个项目的视频，希望对后面学习的同学有所帮助）。

期待代码的发布

期待代码的发布，谢谢

main_sam_hbox_semantic.py对dota数据集

博主，请问一下main_sam_hbox_semantic.py对dota数据集处理生成数据集，具体代码需要做什么修改。因为看到代码中默认的是对dior数据集的处理路径

How to deal with PKL files in an ins folder?

[{'mask': {'size': [1024, 1024], 'counts': 'cRYf07fo04M3M2O2O01O1N2O0O3M2LbmZ9'}, 'bbox': array([714., 76., 726., 95.]), 'category': 'small-vehicle', 'size': 159, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'kRjf06fo05M2N3N10000O2O01N1O1N3L4M^mg8'}, 'bbox': array([731., 82., 744., 101.]), 'category': 'small-vehicle', 'size': 199, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'YWWd02lo04L4L3L4N1O2N1000O100O1N3N1N3N3LQiW;'}, 'bbox': array([648., 222., 667., 242.]), 'category': 'small-vehicle', 'size': 250, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'nSjf01no02N2N2O001N1000010O0010O001O10O01O1O100Onk8'}, 'bbox': array([733., 122., 754., 136.]), 'category': 'small-vehicle', 'size': 152, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': ']TQh02lo04M2O001O000010O0010O0010O001N3Mak]7'}, 'bbox': array([770., 137., 788., 151.]), 'category': 'small-vehicle', 'size': 136, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'lTUi06io02O1O001O00000001O0010O010OO101N2NQkX6'}, 'bbox': array([806., 153., 827., 168.]), 'category': 'small-vehicle', 'size': 153, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'UTce05io03L4M2O1O101O0000O2O0O2I9KQln9'}, 'bbox': array([691., 122., 705., 141.]), 'category': 'small-vehicle', 'size': 175, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'YTke02lo03M3L4M2O10000001O0O1O2N2KRlf9'}, 'bbox': array([700., 126., 714., 145.]), 'category': 'small-vehicle', 'size': 155, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'dTQf05io03J6N1O2N1000O101N1O2M2M4Mek9'}, 'bbox': array([706., 140., 718., 157.]), 'category': 'small-vehicle', 'size': 182, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'fSel05io0201O001O0010O010O01O01O010O001O1O0O2NVlf2'}, 'bbox': array([917., 117., 938., 131.]), 'category': 'small-vehicle', 'size': 166, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': ']Skl05jo02N101O1O000010O01O10O001O001O1O001O1O0O_l_2'}, 'bbox': array([924., 106., 945., 122.]), 'category': 'small-vehicle', 'size': 188, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'kRXm05jo02O001O001O010O00010O01O010O001N102LQmT2'}, 'bbox': array([935., 88., 955., 104.]), 'category': 'small-vehicle', 'size': 145, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'TYaj07go03L3N2N3N1O1O101O00O2O0O1M3N3M2N4KWgl4'}, 'bbox': array([849., 282., 864., 303.]), 'category': 'small-vehicle', 'size': 281, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'V[_h05io03M3M2N3N2O1N11O01O0001M3E[eR7'}, 'bbox': array([785., 349., 796., 367.]), 'category': 'small-vehicle', 'size': 212, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': '\[hh04jo03K5N1M3N2000O100O2M2M4MPej6'}, 'bbox': array([793., 354., 805., 370.]), 'category': 'small-vehicle', 'size': 168, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'c[i05ho04M2N2N2N3O00O100O1O1N3N2L5LfdP6'}, 'bbox': array([817., 361., 830., 376.]), 'category': 'small-vehicle', 'size': 188, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'WRVo05ho04M3M2N200O1000O1O2N1N4KSn<'}, 'bbox': array([ 999., 62., 1011., 80.]), 'category': 'small-vehicle', 'size': 167, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': ']Yec01no02M2HMPO6_o06010O01O01O0O2O0O2M2Nmfk;'}, 'bbox': array([631., 291., 646., 308.]), 'category': 'small-vehicle', 'size': 156, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'VY_c06ho04L2N3N100000000001O001N2O1M3M2O2Mmfn;'}, 'bbox': array([623., 287., 636., 304.]), 'category': 'small-vehicle', 'size': 216, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': '^Zmb02ko05K4N1N2N20000000O101M2N3N1Nmec<'}, 'bbox': array([607., 324., 621., 340.]), 'category': 'small-vehicle', 'size': 176, 'label': 9}]

Great work! I have two questions: 1. What exactly is 'counts'? 2. Are there any polygons for instance segmentation? Thank you

ModuleNotFoundError: No module named 'DCNv3'

感谢您开源的代码，我希望运行test_gpu.py但是报错了，请问您有遇到这个问题吗

(oneformer) lscsc@lscsc-System-Product-Name:SAMRS/Pretraining and Finetuning/End_to_End$ CUDA_VISIBLE_DEVICES=0 python test_gpu.py --backbone 'vit_b' --dataset 'potsdam' --ms 'False' --mode 'test' --resume /media/lscsc/nas/yihan/SegAN/SAMRS/
Pretraining and Finetuning/weight/vit_b_samrs_mae_clip_checkpoint-1599.pth --save_path /media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining an
d Finetuning/Encoder_Decoder/output
Traceback (most recent call last):
File "test_gpu.py", line 14, in
from models import SemsegFinetuneFramework
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/models.py", line 9, in
from backbone.intern_image import InternImage
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/intern_image.py", line 18, in
from .ops_dcnv3 import modules as opsm
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/ops_dcnv3/modules/init.py", line 7, in
from .dcnv3 import DCNv3, DCNv3_pytorch
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/ops_dcnv3/modules/dcnv3.py", line 16, in
from ..functions import DCNv3Function, dcnv3_core_pytorch
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/ops_dcnv3/functions/init.py", line 7, in
from .dcnv3_func import DCNv3Function, dcnv3_core_pytorch
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/ops_dcnv3/functions/dcnv3_func.py", line 16, in
import DCNv3
ModuleNotFoundError: No module named 'DCNv3'

关于两篇论文中的结果存在出入的疑惑

首先感谢您的工作！我在阅读SAMRS与RSP（An Empirical Study of Remote Sensing Pretraining）时，发现两篇文章公布的rsp-r50分割结果有所差异。
具体而言:
(1) SAMRS中表3 rsp-r50在potsdam结果为 OA=90.49 mF1=90.97

RSP中表6 rsp-r50的结果为OA=90.61 mF1=89.94

(2) SAMRS中表4 rsp-r50在isaid结果为mIoU为32.97
$Z~W$N1A_WI{JR9UZ NXI@7B$
RSP中表7 rsp-r50的结果为mIoU为61.6

我不太清楚造成这种差异的原因是什么？特别是在isaid上的结果差异比较大，我有点不太确定该以哪篇文章的结果为准。期待您的回复。

main_finetune.py: error: unrecognized arguments: --load network

作者大人，我在运行encoder_and_decoder下的main_finetune.py文件报错日志显示脚本 main_finetune.py 无法识别 --load 参数。查看main_finetune.py 也没有找到对参数load的编译代码。

finetune on a custom dataset

I wanna ask how to generate custom datasets using the code or fine-tune the model on custom datasets using the code

Onedrive Link is Failed

When I open the onedrive link, I got this page information.

If possible, can you update your dataset download link. plz
thx

OneDrive downloading advice

Not really an issue, more of a PSA:
For anyone downloading the data through the OneDrive link, I have found that if you go into the dataset folders and download each of those files individually (e.g inside the SIOR directory download the isinlabels.zip, samlabels.zip, test_images.zipm trainval_images.zip, train.txt and val.txt files) you get much higher speeds, for me it was < 1MB/s before up to a peak of 21MB/s downloading individually and the zips unpack correctly. I had issues unpacking the SIOR.zip file, but not downloading them individually.

@DotWang you could possibly add this recommendation as a note in the readme.

AttributeError: 'Namespace' object has no attribute 'background'

作者大人，我在运行encoder_and_decoder下的main_pretrain.py文件时候遇到了这个报错，显示在 datasets.py 文件的第 91 行代码中，尝试访问 self.args.background 时出现了 AttributeError，因为 background 属性没有定义在 args 对象中。但是运行main_pretrain.py文件的命令中没有background参数的设置。不明白是哪里出现了问题，请教博主大人

HRSC2016, DOTA, DIOR, FAIR1M数据集来源链接

作者大人，我在网上下载的数据集结构与你的不太一样，导致预训练无法正常进行。请问作者大人可以把您使用的这些数据集的原链接给一下吗，或者把数据集上传。十分感谢！万分感谢

Amazing Work!

Congrats to your excellent work first!
I want to know if there is any plan to release the pre-training and finetuning codes?

hyperparams for SAM

Hi,

Thanks for the great work! I was wondering if you can provide the SAM hyperparmeter settings used to generate the dataset.

Can I use this code to get other dataset masks

it's a good job，and I saw in your paper that you obtained the segmentation results of DOTA 2.0 version. Can I use this code to obtain the segmentation results of DOTA 1.0?
look forward to you reply,
thank you very much

关于测试的问题

作者大大，您好，最近在尝试复现您的论文，遇到一些问题请教一下
1.https://github.com/ViTAE-Transformer/SAMRS/tree/main/Pretraining%20and%20Finetuning 这个链接下的提供的Segmentation Pretrained Models提供的是直接可以用来测试ISPRS Potsdam数据集（我把图裁剪成了512）的吗？
2.我使用resnet50_upernet_imp_sep_model.pth这个权重测试ISPRS Potsdam数据集的时候发现测试出来的精度很低。
使用的指令是：
CUDA_VISIBLE_DEVICES=0 python Pretraining_and_Finetuning/Encoder_Decoder/test_gpu.py --backbone 'resnet50' --decoder 'upernet' --dataset 'potsdam' --ms 'False' --mode 'test' --resume resnet50_upernet_imp_sep_model.pth --save_path ./save

加载权重时候会出现这个，不知道这会不会影响权重的正确载入？
最后日志文件是：

请问您觉得有什么可能发生的问题会影响精度测试，我排查一下
3.请问可以提供百度云链接的数据集下载方式吗？onedrive下载会一直中断

微调权重

请问作者有没有在遥感数据集上微调好的权重呢

请问作者，在微调vit_b+rvsa+upernet模型的是，报错没有这个模块layer_decay_optimizer_constructor_vitae，看了代码好像没有这个文件？

关于数据集尺寸的问题

请问作者，数据集的H, W必须相同吗，如果是高宽不一致的图像该如何得到对应( H, W )的mask

环境配置

cuda12.2 在conda里面配置相关环境一直不成功，需要对应什么版本呀

About your data for semantic segmenation

Hello,

After going through your data, you just labeled objects which has boxes. The background like sky or water are not labeled.

Therefore, I am curious how your data can be used for semantic segmentation as you claimed in your conclusion on page 6.
Thanks,

End to End 和 Encoder Decoder是啥区别

如图所述，Encoder Decoder可以跑通，End to End不行

custom dataset Question

What source code should I run to create an automatic label dataset?

Could the geographic coordinates of the image be obtained？

about data labels

Hello,

We are wondering if a picture has multiple objects, you just use gray labels. Therefore, i am curious how i can distinguish different objects in a picture. Is it possible to share codes to generate pictures like your Figure 6 in your SAMRS paper?

Thanks,
Liya

Dataset Copyright

Is the dataset copyright free of charge?

I'm trying to participate in a certain competition, and I'm asking because I'm trying to learn with that dataset.

Thanks

有微调的预训练模型吗

您好，请问作者您有将SAM直接在遥感大数据集上微调的模型吗

ModuleNotFoundError: No module named 'DCNv3'

请问各位遇到这个DCNv3模块问题怎么解决的呀。（研0新手小白）

数据集格式信息问题

你好，想请教下您生成的SAMRS中数据集实例分割标注的格式是怎样的，以SIOR为例解析ins标注文件夹下的pkl文件得到如下信息：

我本身是想用实例分割的mask标注进行可视化，但是这里的mask标签是一串字符串，是我解析的方式有什么问题吗？

AttributeError: 'SemsegPretrnFramework' object has no attribute 'decoder'

作者大人，我在预训练时候根据您github上的示例命令选择--decoder 'upernet'。但报错显示SemsegPretrnFramework 类中没有对‘upernet’ decoder 的处理逻辑，我翻看了model.py文件也没有找到。所以想问一下作者大人。以下是我自己的运行命令：CUDA_VISIBLE_DEVICES=0 python main_pretrain.py --backbone 'resnet50' --decoder 'upernet' --datasets 'sota' 'sior' 'fast' --batch_size 12 --batch_size_val 12 --workers 8 --save_path '/root/autodl-tmp/SAMRs/save' --distributed 'False' --end_iter 80000 --image_size 224 --init_backbone 'imp'

ZeroDivisionError: float division by zero

作者大人，我在运行main_finetune.py时候遇到报错。具体是main_finetune.py表明在计算 F1 分数时出现了除零错误。我怀疑是我postdam数据集的结构有问题吗？下图是我的文件夹结构。其中images文件夹下保存rbg图像（格式jpg），train.txt和valid.txt分别保存训练集和验证集图片的名称，lables文件夹下保存images文件夹中相同图像的分割图像。

分布式训练运行指令问题

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8
--nnodes=1 --master_port=10001 --master_addr = [server ip] main_pretrain.py
--backbone 'resnet50' --decoder 'upernet'
--datasets 'sota' 'sior' 'fast'
--batch_size 12 --batch_size_val 12 --workers 8
--save_path '[SEP model save path]'
--distributed 'True' --end_iter 80000
--image_size 224 --init_backbone 'imp'
作者大人，这个分布式预训练运行脚本是一机多卡式还是多机多卡式的。如果我想只用一个gpu单卡运行，可以吗？需要对main_pretrain.py文件进行修改吗？

NotImplementedError

Thank you very much for sharing your research code.

I have one question, and I hope you can help me.

In the files main_finetune.py, main_pretrain.py, and test_gpu, the datasets are set to POTSDAM, VAIHNGEN, and ISAID, instead of the FAST, SOTA, and SIOR datasets that you uploaded to OneDrive. Should I modify the code to use the FAST, SOTA, and SIOR datasets instead?

Looking forward to your response.
Thank you so much.