Giter VIP home page Giter VIP logo

vitae-transformer / mtp Goto Github PK

View Code? Open in Web Editor NEW
145.0 3.0 7.0 18.39 MB

The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"

License: MIT License

Python 74.81% Shell 0.05% C++ 2.95% Cuda 22.19%
change-detection classification deep-learning foundation-models object-detection pre-training remote-sensing semantic-segmentation transfer-learning remote-sensing-foundation-model

mtp's Introduction

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du,

Dacheng Tao and Liangpei Zhang

Update | Overview | Datasets and Models | Usage | Statement

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

🚩 Current applications

Remote Sensing Related Works: Please see Remote Sensing;

Remote Sensing Supervised Pretraining Foundation Model: Please see RSP;

100M-parameter Remote Sensing Unsupervised Pretraining Foundation Model: Please see RVSA;

Large-Scale RS Segmentation Pretraining Dataset: Please see SAMRS;

Other applications: ViTAE | VSA | QFormer | ViTPose | Matting | Scene Text Spotting | Video Object Segmentation

🔥 Update

2024.05.24

  • Accepted by IEEE JSTARS Special issue on "Large-Scale Pretraining for Interpretation Promotion in Remote Sensing Domain"

2024.03.30

  • The codes, configs and logs are released!

2024.03.29

  • The change detection finetuned models are released!

2024.03.29

  • The semantic segmentation finetuned models are released!

2024.03.28

  • The rotated object detection finetuned models are released!

2024.03.28

  • The horizontal object detection finetuned models are released!

2024.03.27

  • The scene classification finetuned models are released!

2024.03.26

  • The pretrained models are released!

2024.03.25

  • The SOTA-RBB set of the pretraining dataset is uploaded to OneDrive and Baidu!

2024.03.21

  • The paper is post on arxiv!

🌞 Overview

This is the official repository of the paper: MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Figure 1: The overall pipeline of MTP.

In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. We hope this research encourages further exploration of RS foundation models and anticipate the widespread application of these models across diverse fields of RS image interpretation.

📖 Datasets and Models

Pretraining Dataset

We clip the DOTA-2.0 rotated bounding box version and produce the segmentation label by SAM, obtaining SOTA-RBB. (original SAMRS uses DOTA-2.0 horizontal bounding box version)

SOTA-RBB and the SIOR and FAST of original SAMRS is together used for implementing MTP.

We have uploaded SOTA-RBB to OneDive and Baidu.

Pretrained Models

Pretrain Pretraining Dataset Backbone Backbone Weights Model Weights
MAE Million-AID ViT-L Baidu & OneDrive -
MAE + MTP SAMRS ViT-B+RVSA Baidu & OneDrive Baidu & OneDrive
MAE + MTP SAMRS ViT-L+RVSA Baidu & OneDrive Baidu & OneDrive
IMP + MTP SAMRS InternImage-XL Baidu & OneDrive Baidu & OneDrive

Finetuned Models

Scene Classification

Pretrain Dataset Backbone OA Config Log Weights
MAE + MTP EuroSAT ViT-B+RVSA 98.76 Config Log Baidu & OneDrive
MAE + MTP EuroSAT ViT-L+RVSA 98.78 Config Log Baidu & OneDrive
IMP + MTP EuroSAT InternImage-XL 99.24 Config Log Baidu & OneDrive
MAE + MTP RESISC-45 ViT-B+RVSA 95.57 Config Log Baidu & OneDrive
MAE + MTP RESISC-45 ViT-L+RVSA 95.88 Config Log Baidu & OneDrive
IMP + MTP RESISC-45 InternImage-XL 96.27 Config Log Baidu & OneDrive

Horizontal Object Detection

Pretrain Dataset Backbone Method AP50 Config Log Weights
MAE + MTP Xview ViT-B+RVSA RetinaNet 16.40 Config Log Baidu & OneDrive
MAE + MTP Xview ViT-L+RVSA RetinaNet 19.40 Config Log Baidu & OneDrive
IMP + MTP Xview InternImage-XL RetinaNet 18.20 Config Log Baidu & OneDrive
MAE + MTP DIOR ViT-B+RVSA Faster-RCNN 79.00 Config Log Baidu & OneDrive
MAE + MTP DIOR ViT-L+RVSA Faster-RCNN 81.70 Config Log Baidu & OneDrive
IMP + MTP DIOR InternImage-XL Faster-RCNN 78.30 Config Log Baidu & OneDrive

Rotated Object Detection (SLURM part)

Pretrain Dataset Backbone Method mAP Config Log Weights
MAE + MTP DIOR-R ViT-B+RVSA Oriented-RCNN 71.29 Config Log Baidu & OneDrive
MAE + MTP DIOR-R ViT-L+RVSA Oriented-RCNN 74.54 Config Log Baidu & OneDrive
IMP + MTP DIOR-R InternImage-XL Oriented-RCNN 72.17 Config Log Baidu & OneDrive
MAE + MTP FAIR1M-2.0 ViT-B+RVSA Oriented-RCNN 51.92 Config Log Baidu & OneDrive
MAE + MTP FAIR1M-2.0 ViT-L+RVSA Oriented-RCNN 53.00 Config Log Baidu & OneDrive
IMP + MTP FAIR1M-2.0 InternImage-XL Oriented-RCNN 50.93 Config Log Baidu & OneDrive

Semantic Segmentation

Pretrain Dataset Backbone Method mIOU Config Log Weights
MAE + MTP SpaceNetv1 ViT-B+RVSA UperNet 79.63 Config Log Baidu & OneDrive
MAE + MTP SpaceNetv1 ViT-L+RVSA UperNet 79.54 Config Log Baidu & OneDrive
IMP + MTP SpaceNetv1 InternImage-XL UperNet 79.16 Config Log Baidu & OneDrive
MAE + MTP LoveDA ViT-B+RVSA UperNet 52.39 Config Log Baidu & OneDrive
MAE + MTP LoveDA ViT-L+RVSA UperNet 54.17 Config Log Baidu & OneDrive
IMP + MTP LoveDA InternImage-XL UperNet 54.17 Config Log Baidu & OneDrive

Change Detection

Pretrain Dataset Backbone Method F1 Config Log Weights
MAE + MTP OSCD ViT-B+RVSA UNet 53.36 Config Log Baidu & OneDrive
MAE + MTP OSCD ViT-L+RVSA UNet 55.92 Config Log Baidu & OneDrive
IMP + MTP OSCD InternImage-XL UNet 55.61 Config Log Baidu & OneDrive
MAE + MTP WHU ViT-B+RVSA UNet 94.32 Config Log Baidu & OneDrive
MAE + MTP WHU ViT-L+RVSA UNet 94.75 Config Log Baidu & OneDrive
IMP + MTP WHU InternImage-XL UNet 95.59 Config Log Baidu & OneDrive
MAE + MTP LEVIR ViT-B+RVSA UNet 92.22 Config Log Baidu & OneDrive
MAE + MTP LEVIR ViT-L+RVSA UNet 92.67 Config Log Baidu & OneDrive
IMP + MTP LEVIR InternImage-XL UNet 92.54 Config Log Baidu & OneDrive
MAE + MTP SVCD/CDD ViT-B+RVSA UNet 97.87 Config Log Baidu & OneDrive
MAE + MTP SVCD/CDD ViT-L+RVSA UNet 97.98 Config Log Baidu & OneDrive
IMP + MTP SVCD/CDD InternImage-XL UNet 98.33 Config Log Baidu & OneDrive

🔧 Usage

Environment

This environment adopts new version OpenMMLab series to support multi-task pretraining and finetuning on various RS tasks.

Package Version Package Version Package Version Package Version
Python 3.8.17 timm 0.9.5 MMEngine 0.8.4 MMDetection 3.1.0
Pytorch 1.10.0 OpenCV 4.8.0 MMPretrain 1.2.0 MMRotate 1.0.0rc1
Torchvision 0.10.0 MMCV 2.0.0 MMSegmentation 1.0.0 Open-CD 1.1.0

❗❗❗ We also configure an environment for MMRotate 0.3.4

Package Version Package Version Package Version
Python 3.8.0 timm 0.9.2 MMEngine 0.10.3
Pytorch 1.10.0 OpenCV 4.7.0 MMDetection 2.28.2
Torchvision 0.10.0 MMCV-full 1.6.1 MMRotate 0.3.4

This environment is used for multi-scale prediction of FAIR1M-2.0 and DOTA-V1.0.

Preparing Pretraining Dataset

  1. Download SOTA-RBB and the SIOR and FAST sets from SAMRS dataset.

  2. Transform the *.pkl in SAMRS dataset to COCO *.json.

    python scripts/convert_pkl_json.py
    

Performing Multi-Task Pretraining

We conduct the MTP with SLURM. This is an example of pretraining ViT-L + RVSA:

srun -J mtp -p gpu --gres=dcu:4 --ntasks=32 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python main_pretrain.py \
    --backbone 'vit_l_rvsa' --tasks 'ss' 'is' 'rd' \
    --datasets 'sota' 'sior' 'fast' \
    --batch_size 3 --batch_size_val 3 --workers 8 \
    --save_path [folder path of saved model] \
    --distributed 'True' --end_iter 80000 \
    --image_size 448 --init_backbone 'mae' --port '16003' --batch_mode 'avg' --background 'True' --use_ckpt 'True' --interval 5000

The training can be recovered by setting --ft and --resume

--ft 'True' --resume [path of saved multi-task pretrained model]

Preparing Finetuning Dataset

For Xview: using scripts/prepare_xview_dataset.py, it contains the following functions:

  • Transform geojson to labels in yolo format
  • Divide training and testing sets
  • Clip images and yolo format labels
  • Transform yolo format labels to COCO format *.json

For DIOR: transform *.xml to COCO *.json format for feeding into MMDetection

python scripts/dior_h_2_coco.py

For FAIR1M: transform *.txt in DOTA format to required *.xml for submitting

python scripts/dota_submit_txt_to_fair1m_xml.py --txt_dir [path of *.txt]

For SpaceNetv1: extracting segmentation mask from geojson

python scripts/process_spacenet.py

Finetuning on Various RS tasks

Except for the rotated detection, we perform the finetuning on the SLURM. Here are examples:

Scene Classification (using MMPretrain)

Training and Validation on EuroSAT using MAE + MTP pretrained ViT-L + RVSA:

srun -J mmpretrn -p gpu --gres=dcu:4 --ntasks=8 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/train.py configs/mtp/vit-rvsa-l-224-mae-mtp_eurosat.py \
--work-dir=/diwang/work_dir/multitask_pretrain/finetune/classification/eurosat/vit-rvsa-l-224-mae-mtp_eurosat \
--launcher="slurm" --cfg-options 'find_unused_parameters'=True

Horizontal Object Detection (using MMDetection)

Training on DIOR using Faster-RCNN with a backbone network of MAE + MTP pretrained ViT-L + RVSA:

srun -J mmdet -p gpu --gres=dcu:4 --ntasks=4 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/train.py configs/mtp/dior/faster_rcnn_rvsa_l_800_mae_mtp_dior.py \
--work-dir=/diwang/work_dir/multitask_pretrain/finetune/Horizontal_Detection/dior/faster_rcnn_rvsa_l_800_mae_mtp_dior \
--launcher="slurm" 

Then testing and generating dection results:

srun -J mmdet -p gpu --gres=dcu:4 --ntasks=4 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/test.py configs/mtp/dior/faster_rcnn_rvsa_l_800_mae_mtp_dior.py \
/diwang/work_dir/multitask_pretrain/finetune/Horizontal_Detection/dior/faster_rcnn_rvsa_l_800_mae_mtp_dior/epoch_12.pth \
--work-dir=/diwang/work_dir/multitask_pretrain/finetune/Horizontal_Detection/dior/faster_rcnn_rvsa_l_800_mae_mtp_dior/predict \
--show-dir=/diwang/work_dir/multitask_pretrain/finetune/Horizontal_Detection/dior/faster_rcnn_rvsa_l_800_mae_mtp_dior/predict/show \
--launcher="slurm" --cfg-options val_cfg=None val_dataloader=None val_evaluator=None

Rotated Object Detection (using MMRotate, running on both SLURM and GPU server)

1. Running on SLURM:

(Using MMRotate 1.0.0rc1) Training on DIOR-R using Oriented-RCNN with a backbone network of MAE + MTP pretrained ViT-L + RVSA:

srun -J mmrot -p gpu --gres=dcu:4 --ntasks=4 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/train.py configs/mtp/diorr/oriented_rcnn_rvsa_l_800_mae_mtp_diorr.py \
--work-dir=/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/diorr/oriented_rcnn_rvsa_l_800_mae_mtp_diorr \
--launcher="slurm"

(Using MMRotate 1.0.0rc1) Testing on DIOR-R for evaluation and visualizing detection maps.

srun -J mmrot -p gpu --gres=dcu:4 --ntasks=8 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/test.py configs/mtp/oriented_rcnn_rvsa_l_800_mae_mtp_diorr.py \
/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/diorr/oriented_rcnn_rvsa_l_800_mae_mtp_diorr/epoch_12.pth \
--work-dir=/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/diorr/oriented_rcnn_rvsa_l_800_mae_mtp_diorr/predict \
--show-dir=/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/diorr/oriented_rcnn_rvsa_l_800_mae_mtp_diorr/predict/show \
--launcher="slurm" --cfg-options val_cfg=None val_dataloader=None val_evaluator=None

(Using MMRotate 0.3.4) If the dataset is evaluated online, we use --format-only, here is an example of testing on FAIR1M-2.0 for submitting results and visualizing detection maps.

srun -J mmrot -p gpu --gres=dcu:4 --ntasks=16 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/test.py configs/mtp/fair1m/oriented_rcnn_rvsa_l_800_mae_mtp_fair1m20.py \
/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/fair1mv2/oriented_rcnn_rvsa_l_800_mae_mtp_fair1m20/epoch_12.pth --format-only \
--show-dir=/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/fair1mv2/oriented_rcnn_rvsa_l_800_mae_mtp_fair1m20/predict/show \
--eval-options submission_dir=/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/fair1mv2/oriented_rcnn_rvsa_l_800_mae_mtp_fair1m20/predict/submit \
--launcher="slurm"

2. Running on GPU server:

(Using MMRotate 1.0.0rc1) Training on DOTA-2.0 using Oriented-RCNN with a backbone network of MAE + MTP pretrained ViT-L + RVSA:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_port=40002 --master_addr=1.2.3.4 \
tools/train.py configs/mtp/dotav2/oriented_rcnn_rvsa_l_1024_mae_mtp_dota20.py \
--work-dir=/data/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/dotav2/oriented_rcnn_rvsa_l_1024_mae_mtp_dota20

(Using MMRotate 1.0.0rc1) Single-scale testing on DOTA-2.0 for submitting online evaluation results and visualizing detection maps.

CUDA_VISIBLE_DEVICES=0 python tools/test.py configs/mtp/dotav2/oriented_rcnn_rvsa_l_1024_mae_mtp_dota20.py \
/data/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/dotav2/oriented_rcnn_rvsa_l_1024_mae_mtp_dota20/epoch_40.pth \
--work-dir=/data/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/dotav2/oriented_rcnn_rvsa_l_1024_mae_mtp_dota20/test \
--show-dir=/data/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/dotav2/oriented_rcnn_rvsa_l_1024_mae_mtp_dota20/test/vis

(Using MMRotate 0.3.4) Multi-scale testing on DOTA-V1.0 for submitting online evaluation results and visualizing detection maps.

CUDA_VISIBLE_DEVICES=0 python tools/test.py configs/mtp/dotav1/oriented_rcnn_rvsa_l_1024_mae_mtp_dota10.py \
/data/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/dotav1/oriented_rcnn_rvsa_l_1024_mae_mtp_dota10/epoch_12.pth --format-only \
--show-dir=/data/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/dotav1/oriented_rcnn_rvsa_l_1024_mae_mtp_dota10/predict/show \
--eval-options submission_dir=/data/diwang22/work_dir/multitask_pretrain/finetune/Rotated_Detection/dotav1/oriented_rcnn_rvsa_l_1024_mae_mtp_dota10/predict/submit

Semantic Segmentation (using MMSegmentation)

Training on SpaceNetv1 using UperNet with a backbone network of MAE + MTP pretrained ViT-L + RVSA:

srun -J mmseg -p gpu --gres=dcu:4 --ntasks=8 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/train.py configs/mtp/spacenetv1/rvsa-l-upernet-384-mae-mtp-spacenetv1.py \
--work-dir=/diwang22/work_dir/multitask_pretrain/finetune/Semantic_Segmentation/spacenetv1/rvsa-l-upernet-384-mae-mtp-spacenetv1 \
--launcher="slurm" --cfg-options 'find_unused_parameters'=True

Testing on SpaceNetv1 for accuracy evaluation and generating prediction maps:

srun -J mmseg -p gpu --gres=dcu:4 --ntasks=8 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/test.py configs/mtp/spacenetv1/rvsa-l-upernet-384-mae-mtp-spacenetv1.py \
/diwang22/work_dir/multitask_pretrain/finetune/Semantic_Segmentation/spacenetv1/rvsa-l-upernet-384-mae-mtp-spacenetv1/iter_80000.pth \
--work-dir=/diwang22/work_dir/multitask_pretrain/finetune/Semantic_Segmentation/spacenetv1/rvsa-l-upernet-384-mae-mtp-spacenetv1/predict \
--show-dir=/diwang22/work_dir/multitask_pretrain/finetune/Semantic_Segmentation/spacenetv1/rvsa-l-upernet-384-mae-mtp-spacenetv1/predict/show \
--launcher="slurm" --cfg-options val_cfg=None val_dataloader=None val_evaluator=None

Online Evaluation: Testing on LoveDA for submittting online evaluation results and generating prediction maps:

srun -J mmseg -p gpu --gres=dcu:4 --ntasks=4 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/test.py configs/mtp/loveda/rvsa-l-upernet-512-mae-mtp-loveda.py \
/diwang22/work_dir/multitask_pretrain/finetune/Semantic_Segmentation/loveda/rvsa-l-upernet-512-mae-mtp-loveda/iter_80000.pth \
--work-dir=/diwang22/work_dir/multitask_pretrain/finetune/Semantic_Segmentation/loveda/rvsa-l-upernet-512-mae-mtp-loveda/predict \
--out=/diwang22/work_dir/multitask_pretrain/finetune/Semantic_Segmentation/loveda/rvsa-l-upernet-512-mae-mtp-loveda/predict/submit \
--show-dir=/diwang22/work_dir/multitask_pretrain/finetune/Semantic_Segmentation/loveda/rvsa-l-upernet-512-mae-mtp-loveda/predict/show \
--launcher="slurm" --cfg-options val_cfg=None val_dataloader=None val_evaluator=None

Note: after inferencing, the predictions of LoveDA needs to manually reduce 1 to meet the requirement of evaluation site

python scripts/change_loveda_label.py

Change Detection (using Open-CD)

Training on WHU using UperNet with a backbone network of MAE + MTP pretrained ViT-L + RVSA:

srun -J opencd -p gpu --gres=dcu:4 --ntasks=8 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/train.py configs/mtp/whu/rvsa-l-unet-256-mae-mtp_whu.py \
--work-dir=/diwang22/work_dir/multitask_pretrain/finetune/Change_Detection/whu/rvsa-l-unet-256-mae-mtp_whu \
--launcher="slurm" --cfg-options 'find_unused_parameters'=True

Testing for accuracy evaluation and generating prediction maps:

srun -J opencd -p gpu --gres=dcu:4 --ntasks=8 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
python -u tools/test.py configs/mtp/whu/rvsa-l-unet-256-mae-mtp_whu.py \
/diwang22/work_dir/multitask_pretrain/finetune/Change_Detection/whu/rvsa-l-unet-256-mae-mtp_whu/epoch_200.pth \
--work-dir=/diwang22/work_dir/multitask_pretrain/finetune/Change_Detection/whu/rvsa-l-unet-256-mae-mtp_whu/predict \
--show-dir=/diwang22/work_dir/multitask_pretrain/finetune/Change_Detection/whu/rvsa-l-unet-256-mae-mtp_whu/predict/show \
--launcher="slurm" --cfg-options val_cfg=None val_dataloader=None val_evaluator=None

Decoder Parameter Reusing

Take an example of reusing segmentation decoder in finetuning:

  1. Change the keys of MTP saved weights:

    python scripts/change_ckpt.py
    
  2. Then training with the revised weights

    srun -J mmseg -p gpu --gres=dcu:4 --ntasks=8 --ntasks-per-node=4 --cpus-per-task=8 --kill-on-bad-exit=1 \
    python -u tools/train.py configs/mtp/spacenetv1/rvsa-b-upernet-384-mae-mtp-spacenetv1.py \
    --work-dir=/diwang22/work_dir/multitask_pretrain/finetune/Semantic_Segmentation/spacenetv1/rvsa-b-upernet-384-mae-mtp-spacenetv1_reuse_decoder \
    --launcher="slurm" \
    --cfg-options 'find_unused_parameters'=True load_from=[path of the revised weights]
    

The remaining steps are the same as regular testing.

🎵 Citation

If you find MTP helpful, please consider giving this repo a ⭐ and citing:

@ARTICLE{MTP,
  author={Wang, Di and Zhang, Jing and Xu, Minqiang and Liu, Lin and Wang, Dongsheng and Gao, Erzhong and Han, Chengxi and Guo, Haonan and Du, Bo and Tao, Dacheng and Zhang, Liangpei},
  journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing}, 
  title={MTP: Advancing Remote Sensing Foundation Model Via Multi-Task Pretraining}, 
  year={2024},
  volume={},
  number={},
  pages={1-24},
  doi={10.1109/JSTARS.2024.3408154}}

🎺 Statement

This project is for research purpose only. For any other questions please contact di.wang at gmail.com or whu.edu.cn.

💖 Thanks

💡 Relevant Projects

[1] An Empirical Study of Remote Sensing Pretraining, IEEE TGRS, 2022 | Paper | Github
     Di Wang, Jing Zhang, Bo Du, Gui-Song Xia and Dacheng Tao

[2] Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model, IEEE TGRS, 2022 | Paper | Github
     Di Wang, Qiming Zhang, Yufei Xu, Jing Zhang, Bo Du, Dacheng Tao and Liangpei Zhang

[3] SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model, NeurIPS Datasets and Benchmarks Track, 2023 | Paper | Github
     Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, Dacheng Tao and Liangpei Zhang

mtp's People

Contributors

dotwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mtp's Issues

Environment setting Question..

Hello, I'm a beginner with OpenMMLab.

I have cloned the MMEngine, MMPretrain, MMSegmentation, MMDetection, and MMRotate repositories and saved them into the MTP folder.

To install the correct version for my environment, should I use a pip command like pip install MMEngine==0.8.4? Or is there a way to set the version directly from within the repository? I'm curious to know.

关于语义分割任务的微调

您好,您的MTP基础模型在语义分割数据集上进行了微调。您的微调是打开MTP的全部参数进行调整吗?有没有冻结MTP的编码器部分?仅需8W次迭代吗?

How to inference on custom dataset?

Hello, thanks for your amazing job.
I want to use Semantic Segmentation model finetuned on LoveDA, but it seems that there is no script for inference on custom dataset provided here.
How should I do it?

Multichannel images

Hello,

It seems that the examples focus on 3 channel images.

Is semantic segmentation expected to work on multichannel images, and is there an example on that?

I know that mmsegmentation has LoadSingleRSImageFromFile, but I wonder if there would be any specific limitation associated with MTP? How to handle pre-trained weights?

Thanks!

Dataset Question

Why did you make the LEVIR-CD dataset crop 256 X256 when performing Change detection?

How to pretrain on a single machine (no using SLURM)

Thank you for this amazing project.

I tried to perform pretraining on a single machine, with a Nvidia A100 GPU, or just with a CPU, but it could not work through.

It seems the script file main_pretrain.py needs to be modified somehow.

Could you offer help in detail on this matter?

Thanks in advance.

Question about the environment setting

Dear author, I have a question about the environment setting.

It seems to have some conflicts in your given Python lib setting. For example, Pytorch 0.10.0 can not match Torchvision 0.10.0, MMCV 2.0.0 can not work with Pytorch 0.10.0.

Can you re-check your environment?
Thanks a lot~

Data Question

How can I learn from a tif file in Change detection? The directory configuration is the same, but it is a tif file, not a png file.

测试

请问应该如何用你训练好的模型做测试呢,没有找到tool/test.py,在windows系统下可以跑吗

Installing the repo and using MMSeg

Dear authors, thank you for your work and for open-sourcing it. I have a question about using this code.

I am not familiar with MMSeg, so this is probably a stupid question. I have installed it using pip but this does not install the tools folder that is needed to run the commands in the README. Should I have instead used the MMSeg repository instead of the pip package ? If yes, how do I use your repo with respect to the MMSeg repo ? It seems that MMSeg does not work like the libraries I am used to, so I am a bit confused.

Thank you in advance and good luck with your future works.

Environment Q

I'm trying to do MMdetection, what library do I need?
My environment is as below. How should I match Cuda 12.5, mmcv version?

运用

你好,我想请问一下我可以直接用你们已发布的模型,来估计自己数据集图像中目标数量吗,怎样做更加方便呢?谢谢!

Data set Question.

Hello, may I know the name of the dataset used in the backbone network?
Change detection

backbone Q

What source code should I use to learn only the backbone network separately?

关于旋转目标检测

您好作者,如果我想用您给的预训练模型训练我自己的数据集的话,有两个问题想要请教您
1.因为我的数据集是细粒度类别的目标检测,所以想用您FAIR1M训练的config进行训练,是不是用

pretrained ='/work/share/achk2o1zg1/diwang22/work_dir/multitask_pretrain/pretrain/avg/with_background/vit_b_rvsa_224_mae_samrs_mtp_three/last_vit_b_rvsa_ss_is_rd_pretrn_model_encoder.pth'
把这个预训练模型下载下来然后训练就可以了?
2.看您论文里没提多尺度训练的事,需要进行多尺度的切割吗比如[0.5,1.0,1.5],还是就单尺度切割就可以了呢?

希望您能回复,谢谢!

Test Question

Hello. I used the pretrained model levir-rvsa-l-mae-mtp-epoch_150.pth on the LEVIR-CD dataset to test the LEVIR-CD dataset test data. However, the results were very low, and I would like to understand what went wrong.

Due to tensor input errors, I resized the LEVIR-CD dataset to 256 before testing.
The performance results are as follows:

06/19 13:54:35 - mmengine - INFO - per class results:
06/19 13:54:35 - mmengine - INFO -
+-----------+--------+-----------+--------+-------+-------+
| Class | Fscore | Precision | Recall | IoU | Acc |
+-----------+--------+-----------+--------+-------+-------+
| unchanged | 97.61 | 95.59 | 99.73 | 95.34 | 99.73 |
| changed | 23.47 | 73.48 | 13.97 | 13.3 | 13.97 |
+-----------+--------+-----------+--------+-------+-------+
06/19 13:54:35 - mmengine - INFO - Epoch(test) [128/128] aAcc: 95.3700 mFscore: 60.5400 mPrecision: 84.5300 mRecall: 56.8500 mIoU: 54.3200 mAcc: 56.8500 data_time: 0.0341 time: 0.1686

The performance for the "changed" class is very low. Why is this happening?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.