Giter VIP home page Giter VIP logo

samrs's Introduction

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Updates | Introduction | Statement |

Current applications

Image Classification: Please see ViTAE-Transformer for image classification;

Object Detection: Please see ViTAE-Transformer for object detection;

Sementic Segmentation: Please see ViTAE-Transformer for semantic segmentation;

Animal Pose Estimation: Please see ViTAE-Transformer for animal pose estimation;

Matting: Please see ViTAE-Transformer for matting;

Remote Sensing: Please see ViTAE-Transformer for Remote Sensing;

Updates

09/04/2021

24/03/2021

  • The pretrained models for both ViTAE and ViTAEv2 are released. The code for downstream tasks are also provided for reference.

07/12/2021

  • The code is released!

19/10/2021

  • The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021

  • The paper is post on arxiv! The code will be made public available once cleaned up.

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers. In ViTAEv2, we explore the usage of window attentions without shift operations to obtain a better balance between memory footprint, speed, and performance. We also stack the proposed RC and NC in a multi-stage manner to faciliate the learning on other vision tasks including detection, segmentation, and pose.

Fig.1 - The details of RC and NC design in ViTAE.

Fig.2 - The multi-stage design of ViTAEv2.

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

Citing ViTAE and ViTAEv2

@article{xu2021vitae,
  title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias},
  author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
@article{zhang2022vitaev2,
  title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond},
  author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng},
  journal={arXiv preprint arXiv:2202.10108},
  year={2022}
}

Other Links

Image Classification: See ViTAE for Image Classification

Object Detection: See ViTAE for Object Detection.

Semantic Segmentation: See ViTAE for Semantic Segmentation.

Animal Pose Estimation: See ViTAE for Animal Pose Estimation.

Matting: See ViTAE for Matting.

Remote Sensing: See ViTAE for Remote Sensing.

samrs's People

Contributors

chaimi2013 avatar dotwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

samrs's Issues

百度网盘

您是否可以给一个百度网盘的链接?onedrive下载 慢且不稳定。感谢

关于用mmseg处理postsdam数据集的问题

作者大人,我反复翻看往期issue,始终不太明白用mmseg处理postsdam数据集的目的以及作用。我已经按照readme完成了模型的预训练工作,需要用postsdam数据集对模型进行finetune。参照往期issue,我选用3_Ortho_IRRG.zip和5_Labels_all.zip。我去除了3_Ortho_IRRG.zip中所有的tfw文件,只保留了tif格式。同样,5_Labels_all.zip中也只保存了tif格式文件。然后整个文件夹的结构如下图所示。我不明白还需要用mmseg处理什么内容,是用mmseg进行训练吗?还是对图像进行裁剪(因为图像尺寸6000*6000太大的原因)?还有为什么不直接使用裁剪好的RGB图像呢。(小人实在是弄得有点迷糊了,希望博主能援助一下,完成整个项目后,我想录制一个从0复现整个项目的视频,希望对后面学习的同学有所帮助)。
image

main_sam_hbox_semantic.py对dota数据集

博主,请问一下main_sam_hbox_semantic.py对dota数据集处理生成数据集,具体代码需要做什么修改。因为看到代码中默认的是对dior数据集的处理路径

How to deal with PKL files in an ins folder?

[{'mask': {'size': [1024, 1024], 'counts': 'cRYf07fo04M3M2O2O01O1N2O0O3M2LbmZ9'}, 'bbox': array([714., 76., 726., 95.]), 'category': 'small-vehicle', 'size': 159, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'kRjf06fo05M2N3N10000O2O01N1O1N3L4M^mg8'}, 'bbox': array([731., 82., 744., 101.]), 'category': 'small-vehicle', 'size': 199, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'YWWd02lo04L4L3L4N1O2N1000O100O1N3N1N3N3LQiW;'}, 'bbox': array([648., 222., 667., 242.]), 'category': 'small-vehicle', 'size': 250, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'nSjf01no02N2N2O001N1000010O0010O001O10O01O1O100Onk8'}, 'bbox': array([733., 122., 754., 136.]), 'category': 'small-vehicle', 'size': 152, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': ']TQh02lo04M2O001O000010O0010O0010O001N3Mak]7'}, 'bbox': array([770., 137., 788., 151.]), 'category': 'small-vehicle', 'size': 136, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'lTUi06io02O1O001O00000001O0010O010OO101N2NQkX6'}, 'bbox': array([806., 153., 827., 168.]), 'category': 'small-vehicle', 'size': 153, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'UTce05io03L4M2O1O101O0000O2O0O2I9KQln9'}, 'bbox': array([691., 122., 705., 141.]), 'category': 'small-vehicle', 'size': 175, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'YTke02lo03M3L4M2O10000001O0O1O2N2KRlf9'}, 'bbox': array([700., 126., 714., 145.]), 'category': 'small-vehicle', 'size': 155, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'dTQf05io03J6N1O2N1000O101N1O2M2M4Mek9'}, 'bbox': array([706., 140., 718., 157.]), 'category': 'small-vehicle', 'size': 182, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'fSel05io0201O001O0010O010O01O01O010O001O1O0O2NVlf2'}, 'bbox': array([917., 117., 938., 131.]), 'category': 'small-vehicle', 'size': 166, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': ']Skl05jo02N101O1O000010O01O10O001O001O1O001O1O0O_l_2'}, 'bbox': array([924., 106., 945., 122.]), 'category': 'small-vehicle', 'size': 188, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'kRXm05jo02O001O001O010O00010O01O010O001N102LQmT2'}, 'bbox': array([935., 88., 955., 104.]), 'category': 'small-vehicle', 'size': 145, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'TYaj07go03L3N2N3N1O1O101O00O2O0O1M3N3M2N4KWgl4'}, 'bbox': array([849., 282., 864., 303.]), 'category': 'small-vehicle', 'size': 281, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'V[_h05io03M3M2N3N2O1N11O01O0001M3E[eR7'}, 'bbox': array([785., 349., 796., 367.]), 'category': 'small-vehicle', 'size': 212, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': '\[hh04jo03K5N1M3N2000O100O2M2M4MPej6'}, 'bbox': array([793., 354., 805., 370.]), 'category': 'small-vehicle', 'size': 168, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'c[i05ho04M2N2N2N3O00O100O1O1N3N2L5LfdP6'}, 'bbox': array([817., 361., 830., 376.]), 'category': 'small-vehicle', 'size': 188, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'WRVo05ho04M3M2N200O1000O1O2N1N4KSn<'}, 'bbox': array([ 999., 62., 1011., 80.]), 'category': 'small-vehicle', 'size': 167, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': ']Yec01no02M2HMPO6_o06010O01O01O0O2O0O2M2Nmfk;'}, 'bbox': array([631., 291., 646., 308.]), 'category': 'small-vehicle', 'size': 156, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': 'VY_c06ho04L2N3N100000000001O001N2O1M3M2O2Mmfn;'}, 'bbox': array([623., 287., 636., 304.]), 'category': 'small-vehicle', 'size': 216, 'label': 9}, {'mask': {'size': [1024, 1024], 'counts': '^Zmb02ko05K4N1N2N20000000O101M2N3N1Nmec<'}, 'bbox': array([607., 324., 621., 340.]), 'category': 'small-vehicle', 'size': 176, 'label': 9}]

Great work! I have two questions: 1. What exactly is 'counts'? 2. Are there any polygons for instance segmentation? Thank you

ModuleNotFoundError: No module named 'DCNv3'

感谢您开源的代码,我希望运行test_gpu.py但是报错了,请问您有遇到这个 问题吗

(oneformer) lscsc@lscsc-System-Product-Name:SAMRS/Pretraining and Finetuning/End_to_End$ CUDA_VISIBLE_DEVICES=0 python test_gpu.py --backbone 'vit_b' --dataset 'potsdam' --ms 'False' --mode 'test' --resume /media/lscsc/nas/yihan/SegAN/SAMRS/
Pretraining and Finetuning/weight/vit_b_samrs_mae_clip_checkpoint-1599.pth --save_path /media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining an
d Finetuning/Encoder_Decoder/output
Traceback (most recent call last):
File "test_gpu.py", line 14, in
from models import SemsegFinetuneFramework
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/models.py", line 9, in
from backbone.intern_image import InternImage
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/intern_image.py", line 18, in
from .ops_dcnv3 import modules as opsm
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/ops_dcnv3/modules/init.py", line 7, in
from .dcnv3 import DCNv3, DCNv3_pytorch
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/ops_dcnv3/modules/dcnv3.py", line 16, in
from ..functions import DCNv3Function, dcnv3_core_pytorch
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/ops_dcnv3/functions/init.py", line 7, in
from .dcnv3_func import DCNv3Function, dcnv3_core_pytorch
File "/media/lscsc/nas/yihan/SegAN/SAMRS/Pretraining and Finetuning/End_to_End/backbone/ops_dcnv3/functions/dcnv3_func.py", line 16, in
import DCNv3
ModuleNotFoundError: No module named 'DCNv3'

关于两篇论文中的结果存在出入的疑惑

首先感谢您的工作!我在阅读SAMRS与RSP(An Empirical Study of Remote Sensing Pretraining)时,发现两篇文章公布的rsp-r50分割结果有所差异。
具体而言:
(1) SAMRS中表3 rsp-r50在potsdam结果为 OA=90.49 mF1=90.97
QJHWYP@5@TO_M8B4J7M60SG
RSP中表6 rsp-r50的结果为OA=90.61 mF1=89.94
HT%L8E}CYS7S首先感谢您的工作!我在阅读SAMRS与RSP(An Empirical Study of Remote Sensing Pretraining)时,发现两篇文章公布的rsp-r50分割结果有所差异。 具体而言: (1) SAMRS中表3 rsp-r50在potsdam结果为 OA=90.49 mF1=90.97 QJHWYP@5@TO_M8B4J7M60SG RSP中表6 rsp-r50的结果为OA=90.61 mF1=89.94 B)SKSJI}0
(2) SAMRS中表4 rsp-r50在isaid结果为mIoU为32.97
Z~W$N1A_WI{JR9UZ NXI@7B
RSP中表7 rsp-r50的结果为mIoU为61.6
GIFU$R_T1VHR_CY)TZS6 HB
我不太清楚造成这种差异的原因是什么?特别是在isaid上的结果差异比较大,我有点不太确定该以哪篇文章的结果为准。期待您的回复。

finetune on a custom dataset

I wanna ask how to generate custom datasets using the code or fine-tune the model on custom datasets using the code

Onedrive Link is Failed

When I open the onedrive link, I got this page information.

image

If possible, can you update your dataset download link. plz
thx

OneDrive downloading advice

Not really an issue, more of a PSA:
For anyone downloading the data through the OneDrive link, I have found that if you go into the dataset folders and download each of those files individually (e.g inside the SIOR directory download the isinlabels.zip, samlabels.zip, test_images.zipm trainval_images.zip, train.txt and val.txt files) you get much higher speeds, for me it was < 1MB/s before up to a peak of 21MB/s downloading individually and the zips unpack correctly. I had issues unpacking the SIOR.zip file, but not downloading them individually.

@DotWang you could possibly add this recommendation as a note in the readme.

AttributeError: 'Namespace' object has no attribute 'background'

作者大人,我在运行encoder_and_decoder下的main_pretrain.py文件时候遇到了这个报错,显示在 datasets.py 文件的第 91 行代码中,尝试访问 self.args.background 时出现了 AttributeError,因为 background 属性没有定义在 args 对象中。但是运行main_pretrain.py文件的命令中没有background参数的设置。不明白是哪里出现了问题,请教博主大人
屏幕截图 2024-08-04 135923

HRSC2016, DOTA, DIOR, FAIR1M数据集来源链接

作者大人,我在网上下载的数据集结构与你的不太一样,导致预训练无法正常进行。请问作者大人可以把您使用的这些数据集的原链接给一下吗,或者把数据集上传。十分感谢!万分感谢

Amazing Work!

Congrats to your excellent work first!
I want to know if there is any plan to release the pre-training and finetuning codes?

hyperparams for SAM

Hi,

Thanks for the great work! I was wondering if you can provide the SAM hyperparmeter settings used to generate the dataset.

Can I use this code to get other dataset masks

it's a good job,and I saw in your paper that you obtained the segmentation results of DOTA 2.0 version. Can I use this code to obtain the segmentation results of DOTA 1.0?
look forward to you reply,
thank you very much

关于测试的问题

作者大大,您好,最近在尝试复现您的论文,遇到一些问题请教一下
1.https://github.com/ViTAE-Transformer/SAMRS/tree/main/Pretraining%20and%20Finetuning 这个链接下的提供的Segmentation Pretrained Models提供的是直接可以用来测试ISPRS Potsdam数据集(我把图裁剪成了512)的吗?
2.我使用resnet50_upernet_imp_sep_model.pth这个权重测试ISPRS Potsdam数据集的时候发现测试出来的精度很低。
使用的指令是:
CUDA_VISIBLE_DEVICES=0 python Pretraining_and_Finetuning/Encoder_Decoder/test_gpu.py --backbone 'resnet50' --decoder 'upernet' --dataset 'potsdam' --ms 'False' --mode 'test' --resume resnet50_upernet_imp_sep_model.pth --save_path ./save
image
加载权重时候会出现这个,不知道这会不会影响权重的正确载入?
最后日志文件是:
image
请问您觉得有什么可能发生的问题会影响精度测试,我排查一下
3.请问可以提供百度云链接的数据集下载方式吗?onedrive下载会一直中断

微调权重

请问作者有没有在遥感数据集上微调好的权重呢

关于数据集尺寸的问题

请问作者,数据集的H, W必须相同吗,如果是高宽不一致的图像该如何得到对应( H, W )的mask

环境配置

Uploading 1.png…
cuda12.2 在conda里面配置相关环境一直不成功,需要对应什么版本呀

About your data for semantic segmenation

Hello,

After going through your data, you just labeled objects which has boxes. The background like sky or water are not labeled.

Therefore, I am curious how your data can be used for semantic segmentation as you claimed in your conclusion on page 6.
Thanks,

about data labels

Hello,

We are wondering if a picture has multiple objects, you just use gray labels. Therefore, i am curious how i can distinguish different objects in a picture. Is it possible to share codes to generate pictures like your Figure 6 in your SAMRS paper?

Thanks,
Liya

Dataset Copyright

Is the dataset copyright free of charge?

I'm trying to participate in a certain competition, and I'm asking because I'm trying to learn with that dataset.

Thanks

数据集格式信息问题

你好,想请教下您生成的SAMRS中数据集实例分割标注的格式是怎样的,以SIOR为例解析ins标注文件夹下的pkl文件得到如下信息:
屏幕截图 2024-01-16 104509
我本身是想用实例分割的mask标注进行可视化,但是这里的mask标签是一串字符串,是我解析的方式有什么问题吗?

AttributeError: 'SemsegPretrnFramework' object has no attribute 'decoder'

作者大人,我在预训练时候根据您github上的示例命令选择--decoder 'upernet'。但报错显示SemsegPretrnFramework 类中没有对‘upernet’ decoder 的处理逻辑,我翻看了model.py文件也没有找到。所以想问一下作者大人。以下是我自己的运行命令:CUDA_VISIBLE_DEVICES=0 python main_pretrain.py --backbone 'resnet50' --decoder 'upernet' --datasets 'sota' 'sior' 'fast' --batch_size 12 --batch_size_val 12 --workers 8 --save_path '/root/autodl-tmp/SAMRs/save' --distributed 'False' --end_iter 80000 --image_size 224 --init_backbone 'imp'
屏幕截图 2024-08-03 180430

ZeroDivisionError: float division by zero

作者大人,我在运行main_finetune.py时候遇到报错。具体是main_finetune.py表明在计算 F1 分数时出现了除零错误。我怀疑是我postdam数据集的结构有问题吗?下图是我的文件夹结构。其中images文件夹下保存rbg图像(格式jpg),train.txt和valid.txt分别保存训练集和验证集图片的名称,lables文件夹下保存images文件夹中相同图像的分割图像。
image

分布式训练运行指令问题

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8
--nnodes=1 --master_port=10001 --master_addr = [server ip] main_pretrain.py
--backbone 'resnet50' --decoder 'upernet'
--datasets 'sota' 'sior' 'fast'
--batch_size 12 --batch_size_val 12 --workers 8
--save_path '[SEP model save path]'
--distributed 'True' --end_iter 80000
--image_size 224 --init_backbone 'imp'
作者大人,这个分布式预训练运行脚本是一机多卡式还是多机多卡式的。如果我想只用一个gpu单卡运行,可以吗?需要对main_pretrain.py文件进行修改吗?

NotImplementedError

Thank you very much for sharing your research code.

I have one question, and I hope you can help me.

In the files main_finetune.py, main_pretrain.py, and test_gpu, the datasets are set to POTSDAM, VAIHNGEN, and ISAID, instead of the FAST, SOTA, and SIOR datasets that you uploaded to OneDrive. Should I modify the code to use the FAST, SOTA, and SIOR datasets instead?

Looking forward to your response.
Thank you so much.

分布式训练命令的问题

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8
--nnodes=1 --master_port=10001 --master_addr = [server ip] main_pretrain.py
--backbone 'resnet50' --decoder 'upernet'
--datasets 'sota' 'sior' 'fast'
--batch_size 12 --batch_size_val 12 --workers 8
--save_path '[SEP model save path]'
--distributed 'True' --end_iter 80000
--image_size 224 --init_backbone 'imp'
作者大人,这个分布式预训练运行脚本是一机多卡式还是多机多卡式的。如果我想只用一个gpu单卡运行,可以吗?需要对main_pretrain.py文件进行修改吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.