Hi, thanks for your great work and codebase. The batch size is 8 in

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

questions about exp. of semantic seg.,about vitae-transformer/vitae-transformer-remote-sensing

Comments (15)

DotWang commented on May 30, 2024 1

@Li-Qingyun

The accuracies of the "impervious_surface" in your table are None, it is obviously wrong.

In addition, the category and corresponding color are officially defined, and I suggest that you do not change it.

We transform the label of '5_Labels_all.zip' by directly mapping since the "Undefined" category does not exist, here are our codes, note we use the skimage.io to load image

palette = {0 : (255, 255, 255),  # Impervious surfaces (white)
           1 : (0, 0, 255),     # Buildings (blue)
           2 : (0, 255, 255),   # Low vegetation (cyan)
           3 : (0, 255, 0),     # Trees (green)
           4 : (255, 255, 0),   # Cars (yellow)
           5 : (255, 0, 0),     # Clutter (red)
           6 : (0, 0, 0)}       # Undefined (black)

invert_palette = {v: k for k, v in palette.items()}

def convert_from_color(arr_3d, palette=invert_palette):
    """ RGB-color encoding to grayscale labels """
    arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)

    for c, i in palette.items():
        m = np.all(arr_3d == np.array(c).reshape(1, 1, 3), axis=2)
        arr_2d[m] = i

    return arr_2d

def load_img(imgPath):
    """
    Load image
    :param imgPath: path of the image to load
    :return: numpy array of the image
    """
    if imgPath.endswith('.tif'):
        img = io.imread(imgPath)
        #img = tif.read_image()
        # img = tifffile.imread(imgPath)
    else:
        raise ValueError('Install pillow and uncomment line in load_img')
    return img

Thus we set reduce_zero_label=False, num_classes=5 and ignore_index=5 to ignore the "Clutter" category

The corresponding transformation in mmseg is

    if to_label:
        color_map = np.array([[0, 0, 0], [255, 255, 255], [255, 0, 0],
                              [255, 255, 0], [0, 255, 0], [0, 255, 255],
                              [0, 0, 255]])

Note: the RGBs are inversed since mmcv use the opencv to read image

If you use this function, since '5_Labels_all.zip' doesn't have the "black boundary", the label will be transformed to 1-6 (here, cluster=6)

(Correspondingly, '5_Labels_noBoundary.zip' will be transformed to 0-6.)

At this time, the reduce_zero_label should be in True (1-6 -> 0-5), then set num_classes=5 and ignore_index=5.

from vitae-transformer-remote-sensing.

DotWang commented on May 30, 2024

batchsize=samples_per_gpu * gpu_number

samples_per_gpu is set in configs/_base_/datasets/xxxx.py (xxxx is the dataset you used)

gpu_number is controled by CUDA_VISIBLE_DEVICES

while --nproc_per_node = gpu_number

For example, the samples_per_gpu in potsdam.py has been set to 8

so if you want to set batchsize=16 and operate on GPU 0,1, you can use

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=40001 tools/train.py \
    configs/upernet/upernet_our_r50_512x512_80k_potsdam_epoch300.py \
    --launcher 'pytorch'

In configs/swin/upernet_swin_tiny_patch4_window7_512x512_80k_potsdam.py

We reset samples_per_gpu=4

Thus, if you need batchsize=8, please use

CUDA_VISIBLE_DEVICES=x,y python -m torch.distributed.launch --nproc_per_node=2 --master_port=xxxxx tools/train.py \
    configs/upernet/upernet_swin_tiny_patch4_window7_512x512_80k_potsdam.py \
    --launcher 'pytorch'

from vitae-transformer-remote-sensing.

Li-Qingyun commented on May 30, 2024

@DotWang Thanks for your reply~

I have trained with configs/swin/upernet_swin_tiny_patch4_window7_512x512_80k_potsdam.py, both 1x8 and 2x4 strategies were tested, which only achieved the following eval results:

The eval results of each eval_interval are as follows:

iter	aAcc	mFscore	mIoU
8000	80.81	/	59.73
16000	82.03	/	61.41
24000	82.52	/	61.96
32000	82.72	/	62.41
40000	83.23	/	62.97
48000	83.0	75.03	62.63
56000	82.69	74.63	62.27
64000	82.88	75.19	62.7
72000	83.35	75.57	63.29
80000	83.3	75.58	63.23

Hence I open this issue to ask.
I'll appreciate your assistance!

from vitae-transformer-remote-sensing.

DotWang commented on May 30, 2024

@Li-Qingyun How did you prepare the potsdam dataset?

This dataset contains two versions including RGB and IR-R-G

and the label also has two versions: with or without boundary

In our implementation, we use '3_Ortho_IRRG.zip' and '5_Labels_all.zip'.

In addition, you can check the label

In our experiment, the label in '5_Labels_all.zip' range from 0-5, so we directly ignore the class 5

Another kind of label extra includes an undefined category.

Note we don't use the transformation function provided by mmsegmentation.

If you use it, you may need to adjust corresponding settings

such as

whether to reduce_zero_label in configs/_base_/datasets/potsdam.py;

settings of num_classes and ignore_index in configs/swin/your config file;

and the dataset file in mmseg/datasets/your dataset file

We have not described these in the readme.md since they are highly customized.

from vitae-transformer-remote-sensing.

Li-Qingyun commented on May 30, 2024

@DotWang Thanks for your support!
I followed the official guides for preparing datasets of mmseg, in which the '2_Ortho_RGB.zip' and '5_Labels_all_noBoundary.zip' are required.
Such a huge difference between segmentation results of RGB w/o Boundary and IRRG w/ Boundary.

from vitae-transformer-remote-sensing.

Li-Qingyun commented on May 30, 2024

Oh, the actually used zip is '4_Ortho_RGBIR.zip' and '5_Labels_for_participants_no_Boundary.zip'.

from vitae-transformer-remote-sensing.

Li-Qingyun commented on May 30, 2024

@DotWang Why each image in RGBIR contains 3 channels?

from vitae-transformer-remote-sensing.

DotWang commented on May 30, 2024

@Li-Qingyun RGBIR image contains 4 channels: R, G, B, NIR

you can use skimage to read it

But since we use ordinary deep models for processing 3-channel images, the RGBIR is usually not used.

from vitae-transformer-remote-sensing.

Li-Qingyun commented on May 30, 2024

@DotWang Thanks, I used the cv2.imread, which read imgs in 3-channel mode.
I know too little about this dataset, thank you for your support. I will adjust the data and rerun the experiment.

from vitae-transformer-remote-sensing.

Li-Qingyun commented on May 30, 2024

@DotWang Hi, for the metric provided by mmseg.

Is the 'mFscore' correspond to mf1?
and the 'aAcc' correspond to OA?

from vitae-transformer-remote-sensing.

DotWang commented on May 30, 2024

@Li-Qingyun yes

from vitae-transformer-remote-sensing.

Li-Qingyun commented on May 30, 2024

Another kind of label extra includes an undefined category.
Note we don't use the transformation function provided by mmsegmentation.
If you use it, you may need to adjust corresponding settings
such as
whether to reduce_zero_label in configs/base/datasets/potsdam.py;
settings of num_classes and ignore_index in configs/swin/your config file;
and the dataset file in mmseg/datasets/your dataset file
We have not described these in the readme.md since they are highly customized.

I followed the custom potsdam.py in the repo, setting reduce_zero_label=False, ignore_index=5, the dataset was prepared with Semantic Segmentation/tools/convert_datasets/potsdam.py, '3_Ortho_IRRG.zip' and '5_Labels_all.zip' were adoped.

My training achieved OA (91.22) of upernet+swin-T-IMP, however about 88.69 mFscore only.

I think my setting of reduce_zero_label and ignore_index might be wrong.

I wrote a script to read the annotation (prepared by tools/convert_datasets/potsdam.py).
and found that, for the '5_Labels_all.zip', the script turns the palette to 1~5.

for the '5_Labels_noBoundary.zip', the script turns the palette to 0~5, in which the 0 seems to be boundary.

the IRRG image is:

And the CLASSES in both potsdam.py and potsdam_ori.py are:

CLASSES = ('impervious_surface', 'building', 'low_vegetation', 'tree',
           'car', 'clutter')

PALETTE = [[255, 255, 255], [0, 0, 255], [0, 255, 255], [0, 255, 0],
           [255, 255, 0], [255, 0, 0]]

I think the class which should be ignored is 'clutter', isn't it?

And the comment of the PotsdamDataset:

@DATASETS.register_module()
class PotsdamDataset(CustomDataset):
    """ISPRS Potsdam dataset.

    In segmentation map annotation for Potsdam dataset, 0 is the ignore index.
    ``reduce_zero_label`` should be set to True. The ``img_suffix`` and
    ``seg_map_suffix`` are both fixed to '.png'.
    """

said 0 is the ignore index, and `reduce_zero_label`` should be set to True

I'm still confused about the dataset preparing and the true usage to achieve the reported results as a baseline of my work.
I'll appreciate your help and be willing to pull a request of the dataset preparing.

Thanks for your quick replies.

The script is as follow:

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

from mmsegmentation.configs_rs._base_.potsdam import data as RGB_data
from mmsegmentation.configs_rs._base_.potsdam_IRRG import data as IRRG_data

from mmseg.datasets import build_dataset
RGB_trainset = build_dataset(RGB_data['train'])
IRRG_trainset = build_dataset(IRRG_data['train'])

palette_map = {
    '[255, 255, 255]': 'black',
    '[0, 0, 255]': 'blue',
    '[0, 255, 0]': 'green',
    '[255, 0, 0]': 'red',
    '[255, 255, 0]': 'yellow',
    '[255, 0, 255]': '',
    '[0, 255, 255]': 'cyan',
}

# CLASSES = ('impervious_surface', 'building', 'low_vegetation', 'tree',
#            'car', 'clutter')
CLASSES = ('clutter', 'impervious_surface', 'building', 'low_vegetation',
           'tree', 'car')

palette = RGB_trainset.PALETTE
print({k: palette_map[str(v)] for k, v in enumerate(palette)})


def save_ann_with_custom_palette(ann_path, output_path, ann_name):
    ann = Image.open(ann_path)
    ann_array = np.array(ann)
    print(f'{ann_name}: {np.unique(ann_array)}')
    save_bin_mask(ann_array, ann_name, output_path)
    h, w = ann_array.shape
    classes = np.unique(ann_array)
    out_ann = np.zeros((h, w, 3))
    for cls in classes:
        indices = np.nonzero(ann_array == cls)
        out_ann[indices] = palette[cls]
    plt.figure()
    plt.title(ann_name)
    plt.imshow(out_ann)
    plt.savefig(output_path + f'{ann_name}.png')


def save_bin_mask(ann_array: np.ndarray, remark: str, output_path):
    plt.figure()
    plt.suptitle(f'label {remark}')
    classes = np.unique(ann_array)
    _len = len(classes)
    subplot_w = int(np.ceil(np.sqrt(_len)))
    subplot_h = int(np.ceil(_len / subplot_w))
    gs = gridspec.GridSpec(subplot_h, subplot_w * 2)
    gs.update(wspace=0.8)
    for i, cls in enumerate(np.unique(ann_array)):
        bin_mask = (ann_array == cls).astype(np.float32)
        if _len - i >= subplot_w or _len % 2 == 0:
            plt.subplot(
                gs[i // subplot_w, i % subplot_w * 2: i % subplot_w * 2 + 2])
        else:
            plt.subplot(
                gs[i // subplot_w, i % subplot_w * 2 + 1: i % subplot_w * 2 + 3])
        plt.title(f'{cls}-{CLASSES[cls]}')
        # plt.title(f'class {cls} ({remark})')
        plt.imshow(bin_mask)
    plt.savefig(output_path + f'bin_mask {remark}')


ann0_path = f'/home/lqy/Desktop/DINO_semantic_seg/mmsegmentation/data' \
            f'/potsdam/ann_noboundary/train/2_10_0_0_512_512.png'
ann1_path = f'/home/lqy/Desktop/DINO_semantic_seg/mmsegmentation/data' \
            f'/potsdam/ann_all/train/2_10_0_0_512_512.png'
output_path = '/home/lqy/Desktop/DINO_semantic_seg/develop/dataset/'
save_ann_with_custom_palette(ann0_path, output_path, 'noboundary')
save_ann_with_custom_palette(ann1_path, output_path, 'all')

from vitae-transformer-remote-sensing.

Li-Qingyun commented on May 30, 2024

@DotWang Thanks for your replies.

I did not find your script of preparing dataset in the repo at first, hence, I followed the official instruments of mmseg, which seems mismatched with the PotsdamDataset classes in the potsdam.py this repo. The potsdam_ori.py is the one should be used.

I searched the reduce_zero_label parameter globally and tried to understand how it make effects. The core logic is as follows：

if self.reduce_zero_label:
    # avoid using underflow conversion
    gt_semantic_seg[gt_semantic_seg == 0] = 255
    gt_semantic_seg = gt_semantic_seg - 1
    gt_semantic_seg[gt_semantic_seg == 254] = 255

which makes the background class to be labeled 255.

And there seems to be two place of reduce_zero_label working:

LoadAnnotation of the pipline
CustomDataset for eval metric calculating

And whey all do the same thing, which is easy to wonder if the action will be repeated.
I thought the train annotations is convert by the one in LoadAnnotation and the val annotations is convert by the one of CustomDataset. It seems that we hardly ever call the eval function to verify the segmentation performance of the model on the training set, otherwise, the reduce operation is likely to be performed twice.

Closer to home, if the user follow mmseg's official dataset preparation, the label seems to have gone through the following mapping process (the color format is RGB):

{0 : (255, 255, 255),  # Impervious surfaces (white)
 1 : (0, 0, 255),     # Buildings (blue)
 2 : (0, 255, 255),   # Low vegetation (cyan)
 3 : (0, 255, 0),     # Trees (green)
 4 : (255, 255, 0),   # Cars (yellow)
 5 : (255, 0, 0),     # Clutter (red)
 6 : (0, 0, 0)}       # Undefined (black)
               | | | |
               | | | |  
             \ | | | | /     transformation in `convert_datasets/potsdam.py`
               \ | | /
                 \ /
{0 : (0, 0, 0), # Undefined (black)
 1 : (255, 255, 255),  # Impervious surfaces (white)
 2 : (0, 0, 255),     # Buildings (blue)
 3 : (0, 255, 255),   # Low vegetation (cyan)
 4 : (0, 255, 0),     # Trees (green)
 5 : (255, 255, 0),   # Cars (yellow)
 6 : (255, 0, 0)}     # Clutter (red)
               | | | |
               | | | |  
             \ | | | | /     reduce_zero_label in `LoadAnnotation` 
               \ | | /
                 \ /
{0 : (255, 255, 255),  # Impervious surfaces (white)
 1 : (0, 0, 255),     # Buildings (blue)
 2 : (0, 255, 255),   # Low vegetation (cyan)
 3 : (0, 255, 0),     # Trees (green)
 4 : (255, 255, 0),   # Cars (yellow)
 5 : (255, 0, 0),     # Clutter (red)
 255 : (0, 0, 0)}       # Undefined (black)

Hence, in the official potsdam_ori.py, it is ignore_index=255.
They use 'label_noBoundary.zip', whose converted labels (The second one) are [0 1 2 3 4 5 6].
When setting reduce_zero_label=True, the labels are [255 0 1 2 3 4 5], hence the ignore_index was set 255, which set the Undefined background as the ignored category. However, actually both of 5 and 255 should be ignored, isn't it?

In the ViTAE-RS, for 'label_all.zip', whose converted labels are [1 2 3 4 5 6].
When setting 'reduce_zero_label=True', the labels are [0 1 2 3 4 5], hence the ignore_index was set 5.
If 'reduce_zero_label=False', the labels are [1 2 3 4 5 6], the ignored index is 6, however, a 0 is an extra label. Hence, the transformation should be deleted in this repo, to keep the origin [0 1 2 3 4 5], and 5 is the ignored Clutter class and the Undefined background class 6 is not annotated, so ignore_index=5.

from vitae-transformer-remote-sensing.

DotWang commented on May 30, 2024

@Li-Qingyun
Haha, the script of preparing potsdam dataset is used in our previous projects, so we adopt it instead of the mmseg transformation in this work. We do not upload the script since we think the mmseg is highly customized for users.

Most of your understanding is right. The transformation convert_datasets/potsdam.py exists in the original mmseg, we didn't even use this folder and directly upload them.

The mIOUs that are shown in the mmseg site include all categories except the "Undefined". However, in RS literatures, the "Cluster" is also considered as background and does not take part in the metric calculation. In fact, whether or not to mask this category in training is both OK. For convenience, we also ignore it when training models.

from vitae-transformer-remote-sensing.

Li-Qingyun commented on May 30, 2024

@DotWang Thank you very much for your help and detailed and patient explanation, I finally achieved the results in the paper and can focus on doing my own research. Wish you all the best with your research. Thank your !

from vitae-transformer-remote-sensing.

questions about exp. of semantic seg. about vitae-transformer-remote-sensing HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent