Giter VIP home page Giter VIP logo

daformer's Introduction

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

by Lukas Hoyer, Dengxin Dai, and Luc Van Gool

[CVPR22 Paper] [Extension Paper]

🔔 News:

  • [2023-09-26] We are happy to announce that our Extension Paper on domain generalization and clear-to-adverse-weather UDA was accapted at PAMI.
  • [2023-08-25] We are happy to announce that our follow-up work EDAPS on panoptic segmentation UDA was accepted at ICCV23.
  • [2023-04-23] We further extend DAFormer to domain generalization and clear-to-adverse-weather UDA in the Extension Paper.
  • [2023-02-28] We are happy to announce that our follow-up work MIC on context-enhanced UDA was accepted at CVPR23.
  • [2022-07-06] We are happy to announce that our follow-up work HRDA on high-resolution UDA was accepted at ECCV22.
  • [2022-03-09] We are happy to announce that DAFormer was accepted at CVPR22.

Overview

As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in Unsupervised Domain Adaptation (UDA).

Even though a large number of methods propose new UDA strategies, they are mostly based on outdated network architectures. In this work, we particularly study the influence of the network architecture on UDA performance and propose DAFormer, a network architecture tailored for UDA. It consists of a Transformer encoder and a multi-level context-aware feature fusion decoder.

DAFormer is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting the source domain: While the Rare Class Sampling on the source domain improves the quality of pseudo-labels by mitigating the confirmation bias of self-training towards common classes, the Thing-Class ImageNet Feature Distance and a Learning Rate Warmup promote feature transfer from ImageNet pretraining.

DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA→Cityscapes and by 5.4 mIoU for Synthia→Cityscapes and enables learning even difficult classes such as train, bus, and truck well.

UDA over time

The strengths of DAFormer, compared to the previous state-of-the-art UDA method ProDA, can also be observed in qualitative examples from the Cityscapes validation set.

Demo Color Palette

DAFormer can be further extended to domain generalization lifting the requirement of access to target images. Also in domain generalization, DAFormer significantly improves the state-of-the-art performance by +6.5 mIoU.

For more information on DAFormer, please check our [CVPR Paper] and the [Extension Paper].

If you find this project useful in your research, please consider citing:

@InProceedings{hoyer2022daformer,
  title={{DAFormer}: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation},
  author={Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={9924--9935},
  year={2022}
}

@Article{hoyer2024domain,
  title={Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation},
  author={Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)}, 
  year={2024},
  volume={46},
  number={1},
  pages={220-235},
  doi={10.1109/TPAMI.2023.3320613}}
}

Comparison with State-of-the-Art UDA

DAFormer significantly outperforms previous works on several UDA benchmarks. This includes synthetic-to-real adaptation on GTA→Cityscapes and Synthia→Cityscapes as well as clear-to-adverse-weather adaptation on Cityscapes→ACDC and Cityscapes→DarkZurich.

GTA→CS(val) Synthia→CS(val) CS→ACDC(test) CS→DarkZurich(test)
ADVENT [1] 45.5 41.2 32.7 29.7
BDL [2] 48.5 -- 37.7 30.8
FDA [3] 50.5 -- 45.7 --
DACS [4] 52.1 48.3 -- --
ProDA [5] 57.5 55.5 -- --
MGCDA [6] -- -- 48.7 42.5
DANNet [7] -- -- 50.0 45.2
DAFormer (Ours) 68.3 60.9 55.4* 53.8*

* New results of our extension paper

References:

  1. Vu et al. "Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation" in CVPR 2019.
  2. Li et al. "Bidirectional learning for domain adaptation of semantic segmentation" in CVPR 2019.
  3. Yang et al. "Fda: Fourier domain adaptation for semantic segmentation" in CVPR 2020.
  4. Tranheden et al. "Dacs: Domain adaptation via crossdomain mixed sampling" in WACV 2021.
  5. Zhang et al. "Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation" in CVPR 2021.
  6. Sakaridis et al. "Map-guided curriculum domain adaptation and uncertaintyaware evaluation for semantic nighttime image segmentation" in TPAMI, 2020.
  7. Wu et al. "DANNet: A one-stage domain adaptation network for unsupervised nighttime semantic segmentation" in CVPR, 2021.

Comparison with State-of-the-Art Domain Generalization (DG)

DAFormer significantly outperforms previous works on domain generalization from GTA to real street scenes.

DG Method Cityscapes BDD100K Mapillary Avg.
IBN-Net [1,5] 37.37 34.21 36.81 36.13
DRPC [2] 42.53 38.72 38.05 39.77
ISW [3,5] 37.20 33.36 35.57 35.38
SAN-SAW [4] 45.33 41.18 40.77 42.43
SHADE [5] 46.66 43.66 45.50 45.27
DAFormer (Ours) 52.65* 47.89* 54.66* 51.73*

* New results of our extension paper

References:

  1. Pan et al. "Two at once: Enhancing learning and generalization capacities via IBN-Net" in ECCV, 2018.
  2. Yue et al. "Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data" ICCV, 2019.
  3. Choi et al. "RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening" in CVPR, 2021.
  4. Peng et al. "Semantic-aware domain generalized segmentation" in CVPR, 2022.
  5. Zhao et al. "Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation" in ECCV, 2022.

Setup Environment

For this project, we used python 3.8.5. We recommend setting up a new virtual environment:

python -m venv ~/venv/daformer
source ~/venv/daformer/bin/activate

In that environment, the requirements can be installed with:

pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.3.7  # requires the other packages to be installed first

Please, download the MiT ImageNet weights (b3-b5) provided by SegFormer from their OneDrive and put them in the folder pretrained/. Further, download the checkpoint of DAFormer on GTA→Cityscapes and extract it to the folder work_dirs/.

All experiments were executed on an NVIDIA RTX 2080 Ti.

Inference Demo

Already as this point, the provided DAFormer model can be applied to a demo image:

python -m demo.image_demo demo/demo.png work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/211108_1622_gta2cs_daformer_s0_7f24c.json work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/latest.pth

When judging the predictions, please keep in mind that DAFormer had no access to real-world labels during the training.

Setup Datasets

Cityscapes: Please, download leftImg8bit_trainvaltest.zip and gt_trainvaltest.zip from here and extract them to data/cityscapes.

GTA: Please, download all image and label packages from here and extract them to data/gta.

Synthia (Optional): Please, download SYNTHIA-RAND-CITYSCAPES from here and extract it to data/synthia.

ACDC (Optional): Please, download rgb_anon_trainvaltest.zip and gt_trainval.zip from here and extract them to data/acdc. Further, please restructure the folders from condition/split/sequence/ to split/ using the following commands:

rsync -a data/acdc/rgb_anon/*/train/*/* data/acdc/rgb_anon/train/
rsync -a data/acdc/rgb_anon/*/val/*/* data/acdc/rgb_anon/val/
rsync -a data/acdc/gt/*/train/*/*_labelTrainIds.png data/acdc/gt/train/
rsync -a data/acdc/gt/*/val/*/*_labelTrainIds.png data/acdc/gt/val/

Dark Zurich (Optional): Please, download the Dark_Zurich_train_anon.zip and Dark_Zurich_val_anon.zip from here and extract it to data/dark_zurich.

The final folder structure should look like this:

DAFormer
├── ...
├── data
│   ├── acdc (optional)
│   │   ├── gt
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── rgb_anon
│   │   │   ├── train
│   │   │   ├── val
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── dark_zurich (optional)
│   │   ├── gt
│   │   │   ├── val
│   │   ├── rgb_anon
│   │   │   ├── train
│   │   │   ├── val
│   ├── gta
│   │   ├── images
│   │   ├── labels
│   ├── synthia (optional)
│   │   ├── RGB
│   │   ├── GT
│   │   │   ├── LABELS
├── ...

Data Preprocessing: Finally, please run the following scripts to convert the label IDs to the train IDs and to generate the class index for RCS:

python tools/convert_datasets/gta.py data/gta --nproc 8
python tools/convert_datasets/cityscapes.py data/cityscapes --nproc 8
python tools/convert_datasets/synthia.py data/synthia/ --nproc 8

Training

For convenience, we provide an annotated config file of the final DAFormer. A training job can be launched using:

python run_experiments.py --config configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py

For the experiments in our paper (e.g. network architecture comparison, component ablations, ...), we use a system to automatically generate and train the configs:

python run_experiments.py --exp <ID>

More information about the available experiments and their assigned IDs, can be found in experiments.py. The generated configs will be stored in configs/generated/.

Testing & Predictions

The provided DAFormer checkpoint trained on GTA→Cityscapes (already downloaded by tools/download_checkpoints.sh) can be tested on the Cityscapes validation set using:

sh test.sh work_dirs/211108_1622_gta2cs_daformer_s0_7f24c

The predictions are saved for inspection to work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/preds and the mIoU of the model is printed to the console. The provided checkpoint should achieve 68.85 mIoU. Refer to the end of work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/20211108_164105.log for more information such as the class-wise IoU.

Similarly, also other models can be tested after the training has finished:

sh test.sh path/to/checkpoint_directory

When evaluating a model trained on Synthia→Cityscapes, please note that the evaluation script calculates the mIoU for all 19 Cityscapes classes. However, Synthia contains only labels for 16 of these classes. Therefore, it is a common practice in UDA to report the mIoU for Synthia→Cityscapes only on these 16 classes. As the Iou for the 3 missing classes is 0, you can do the conversion mIoU16 = mIoU19 * 19 / 16.

The results for Cityscapes→ACDC and Cityscapes→DarkZurich are reported on the test split of the target dataset. To generate the predictions for the test set, please run:

python -m tools.test path/to/config_file path/to/checkpoint_file --test-set --format-only --eval-option imgfile_prefix=labelTrainIds to_label_id=False

The predictions can be submitted to the public evaluation server of the respective dataset to obtain the test score.

Domain Generalization

For the domain generalization extension of DAFormer, please refer to the DG branch of the HRDA repository: https://github.com/lhoyer/HRDA/tree/dg

Checkpoints

Below, we provide checkpoints of DAFormer for different benchmarks. As the results in the paper are provided as the mean over three random seeds, we provide the checkpoint with the median validation performance here.

The checkpoints come with the training logs. Please note that:

  • The logs provide the mIoU for 19 classes. For Synthia→Cityscapes, it is necessary to convert the mIoU to the 16 valid classes. Please, read the section above for converting the mIoU.
  • The logs provide the mIoU on the validation set. For Cityscapes→ACDC and Cityscapes→DarkZurich the results reported in the paper are calculated on the test split. For DarkZurich, the performance significantly differs between validation and test split. Please, read the section above on how to obtain the test mIoU.

Framework Structure

This project is based on mmsegmentation version 0.16.0. For more information about the framework structure and the config system, please refer to the mmsegmentation documentation and the mmcv documentation.

The most relevant files for DAFormer are:

Acknowledgements

This project is based on the following open-source projects. We thank their authors for making the source code publically available.

License

This project is released under the Apache License 2.0, while some specific features in this repository are with other licenses. Please refer to LICENSES.md for the careful check, if you are using our code for commercial matters.

daformer's People

Contributors

lhoyer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

daformer's Issues

problem of ‘def get_rare_class_sample’

Thanks for your wonderful work, and Rare Class Sampling is a very good idea. But in get_rare_class_sample, idx i1 in the for loop is not updated, what is the significance of 10 loops? Is the code of Sample a new Random Crop missing in the loop?

I would appreciate it if you could answer it.

def get_rare_class_sample(self):
c = np.random.choice(self.rcs_classes, p=self.rcs_classprob)
f1 = np.random.choice(self.samples_with_class[c])
i1 = self.file_to_idx[f1]
s1 = self.source[i1]
if self.rcs_min_crop_ratio > 0:
for j in range(10):
n_class = torch.sum(s1['gt_semantic_seg'].data == c)
# mmcv.print_log(f'{j}: {n_class}', 'mmseg')
if n_class > self.rcs_min_pixels * self.rcs_min_crop_ratio:
break
# Sample a new random crop from source image i1.
# Please note, that self.source.__getitem__(idx) applies the
# preprocessing pipeline to the loaded image, which includes
# RandomCrop, and results in a new crop of the image.
s1 = self.source[i1]
i2 = np.random.choice(range(len(self.target)))
s2 = self.target[i2]
return {
**s1, 'target_img_metas': s2['img_metas'],
'target_img': s2['img']
}

Figure 1 of "the Progress of UDA over time on GTAV→Cityscapes"

Dear Lukas,

Thanks for your great work of DAFormer and I have really enjoyed it!

Could you please show me the drawing code of Figure 1 of "the Progress of UDA over time on GTAV→Cityscapes"?

I was wondering how to draw the curve with the arrows pointing to the circle in Figure 1?

Thank you in advance!

Best wishes to you!

The unreproducible results

Dear author, thanks for your amazing contributions in DA Segmentation task. During the process of running the code, I find that I can't get same result even if setting the same random seed. There is only slight difference (the acc_seg and decode loss) during the early training process, However, after thousands of iterations, difference can not be igonorable.
Could you help me how to fix the problem? thank you very much!

About RandomCrop on target image

Hi @lhoyer. Thank you for your wondeful work and detailed code.

Here I have some questions about one of the data transforms. For target image, RandomCrop is applied, as follows:

if self.cat_max_ratio < 1.:
# Repeat 10 times
for _ in range(10):
seg_temp = self.crop(results['gt_semantic_seg'], crop_bbox)
labels, cnt = np.unique(seg_temp, return_counts=True)
cnt = cnt[labels != self.ignore_index]
if len(cnt) > 1 and np.max(cnt) / np.sum(
cnt) < self.cat_max_ratio:
break
crop_bbox = self.get_crop_bbox(img)

I find that when cat_max_ratio < 1 the ground-truth label of target image is also used. However, target ground-truth labels are unavailable in the UDA setting.

Could you please help me out. Thanks again!

Question about loss

I found in the clean_loss and mix_loss are defined in dacs.py. As the paper said, these two losses will be added and make propagation, but I haven't found the corresponding file or functions to make this. I‘ll be grateful to your help.

Questions about the data stream

Thanks for your solid and excellent work! but I have some questions. Under the mmseg framework, where did you call mmseg / models / uda / dacs.py, and RSC as wll as the whole backbone training process are all in dacs.py? I'm looking forward to your reply.

About 'sample_class_stats.json'

Hello! Thank you for your wonderful work.

I'm having problems with missing files when reproducing.

Where can I get the 'sample_class_stats.json' please?

Compatible with earlier MMSEG versions

My version of cuda (11.3) does not enable me to install lower versions of MMCV (1.3.7) and this code is not compatible with higher versions of mmsegmentation. So how do I handle this?

KeyError: Caught KeyError in DataLoader worker process 3.

Thanks for your wonderful work. but I have a question for you. I would appreciate it if you could answer it.

When I run “python run_experiments.py --7", in Ubuntu18.04, I got this problem : KeyError: Caught KeyError in DataLoader worker process 3.'. And the error is as follows:

2022-06-11 10:04:42,150 - mmseg - INFO - Iter [16550/40000] lr: 3.518e-05, eta: 10:05:26, time: 1.454, data_time: 0.029, memory: 9792, decode.loss_seg: 0.1154, decode.acc_seg: 90.7496, src.loss_imnet_feat_dist: 0.1078, mix.decode.loss_seg: 0.1369, mix.decode.acc_seg: 89.0204
2022-06-11 10:05:54,290 - mmseg - INFO - Iter [16600/40000] lr: 3.510e-05, eta: 10:04:01, time: 1.443, data_time: 0.029, memory: 9792, decode.loss_seg: 0.1193, decode.acc_seg: 90.7338, src.loss_imnet_feat_dist: 0.1099, mix.decode.loss_seg: 0.1370, mix.decode.acc_seg: 89.7331
Traceback (most recent call last):
File "run_experiments.py", line 101, in
train.main([config_files[i]])
File "/home/duguangxing/DAFormer/tools/train.py", line 166, in main
train_segmentor(
File "/home/duguangxing/DAFormer/mmseg/apis/train.py", line 131, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/mmcv-1.3.7-py3.8.egg/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/mmcv-1.3.7-py3.8.egg/mmcv/runner/iter_based_runner.py", line 58, in train
data_batch = next(data_loader)
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/mmcv-1.3.7-py3.8.egg/mmcv/runner/iter_based_runner.py", line 32, in next
data = next(self.iter_loader)
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
KeyError: Caught KeyError in DataLoader worker process 3.
Original Traceback (most recent call last):
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/duguangxing/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/duguangxing/DAFormer/mmseg/datasets/uda_dataset.py", line 111, in getitem
return self.get_rare_class_sample()
File "/home/duguangxing/DAFormer/mmseg/datasets/uda_dataset.py", line 88, in get_rare_class_sample
i1 = self.file_to_idx[f1]
KeyError: '0005460_labelTrainIds.png'

python run_experiments.py --exp 101 does not launch the experiments

Hi!

I need to run the source-only experiment with your model. However, running python run_experiments.py --exp 101 does not launch the training procedure. Here is what it prints:
Run job 220315_1238_zerowaste2zerowaste_source-only_segformer_mitb5_poly10warm_s0_9e0a0
/projectnb2/ivc-ml/dbash/code/DAFormer/mmseg/models/backbones/mix_transformer.py:214: UserWarning: DeprecationWarning: pretrained is a deprecated, please use "init_cfg" instead
warnings.warn('DeprecationWarning: pretrained is a deprecated, '
/projectnb/ivc-ml/dbash/.conda/envs/daformer/lib/python3.8/site-packages/mmcv/cnn/utils/weight_init.py:118: UserWarning: init_cfg without layer key, if you do not define override key either, this init_cfg will do nothing
warnings.warn(

After that, it just exits. I was wondering if I did something wrong here and if you could help me with this issue. The only changes I've made is I commented out L421-428 in experiments.py.

Thank you for your time!

GPU Out of memory errors with RTX 3080

Hi, thanks for your excellent work.

I see that your experiments were run with an RTX 2080 Ti (11GB?). I am having the following error with RTX 3080 (10GB), I wonder if this is expected or not. And if you have any tips on reducing the GPU memory usage?

Full terminal output:

Run job sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0
2021-12-23 22:42:52,516 - mmseg - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 3080
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.2.r11.2/compiler.29618528_0
GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
PyTorch: 1.9.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.10.0+cu111
OpenCV: 4.4.0
MMCV: 1.3.7
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.2
MMSegmentation: 0.16.0+21c5499
------------------------------------------------------------

2021-12-23 22:42:52,516 - mmseg - INFO - Distributed training: False
2021-12-23 22:42:52,984 - mmseg - INFO - Config:
log_config = dict(
    interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
cudnn_benchmark = True
norm_cfg = dict(type='BN', requires_grad=True)
find_unused_parameters = True
model = dict(
    type='EncoderDecoder',
    pretrained='pretrained/mit_b5.pth',
    backbone=dict(type='mit_b5', style='pytorch'),
    decode_head=dict(
        type='DAFormerHead',
        in_channels=[64, 128, 320, 512],
        in_index=[0, 1, 2, 3],
        channels=256,
        dropout_ratio=0.1,
        num_classes=19,
        norm_cfg=dict(type='BN', requires_grad=True),
        align_corners=False,
        decoder_params=dict(
            embed_dims=256,
            embed_cfg=dict(type='mlp', act_cfg=None, norm_cfg=None),
            embed_neck_cfg=dict(type='mlp', act_cfg=None, norm_cfg=None),
            fusion_cfg=dict(
                type='aspp',
                sep=True,
                dilations=(1, 6, 12, 18),
                pool=False,
                act_cfg=dict(type='ReLU'),
                norm_cfg=dict(type='BN', requires_grad=True))),
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    train_cfg=dict(
        work_dir=
        'work_dirs/local-basic/211223_2242_sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_dfad9'
    ),
    test_cfg=dict(mode='whole'))
dataset_type = 'ForestRealDataset'
data_root = 'data/forest/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
sim_train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(640, 480)),
    dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
forest_train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(type='Resize', img_scale=(640, 480)),
    dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(640, 480),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=4,
    train=dict(
        type='UDADataset',
        source=dict(
            type='ForestSimDataset',
            data_root='data/sim/',
            img_dir='images',
            ann_dir='labels',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations'),
                dict(type='Resize', img_scale=(640, 480)),
                dict(
                    type='RandomCrop',
                    crop_size=(512, 512),
                    cat_max_ratio=0.75),
                dict(type='RandomFlip', prob=0.5),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
                dict(type='DefaultFormatBundle'),
                dict(type='Collect', keys=['img', 'gt_semantic_seg'])
            ]),
        target=dict(
            type='ForestRealDataset',
            data_root='data/forest/',
            img_dir='images',
            ann_dir='labels',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations'),
                dict(type='Resize', img_scale=(640, 480)),
                dict(
                    type='RandomCrop',
                    crop_size=(512, 512),
                    cat_max_ratio=0.75),
                dict(type='RandomFlip', prob=0.5),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
                dict(type='DefaultFormatBundle'),
                dict(type='Collect', keys=['img', 'gt_semantic_seg'])
            ]),
        rare_class_sampling=dict(
            min_pixels=3000, class_temp=0.01, min_crop_ratio=0.5)),
    val=dict(
        type='ForestRealDataset',
        data_root='data/forest/',
        img_dir='images',
        ann_dir='labels',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(640, 480),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='ForestRealDataset',
        data_root='data/forest/',
        img_dir='images',
        ann_dir='labels',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(640, 480),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
uda = dict(
    type='DACS',
    alpha=0.999,
    pseudo_threshold=0.968,
    pseudo_weight_ignore_top=15,
    pseudo_weight_ignore_bottom=120,
    imnet_feature_dist_lambda=0.005,
    imnet_feature_dist_classes=[6, 7, 11, 12, 13, 14, 15, 16, 17, 18],
    imnet_feature_dist_scale_min_ratio=0.75,
    mix='class',
    blur=True,
    color_jitter_strength=0.2,
    color_jitter_probability=0.2,
    debug_img_interval=1000,
    print_grad_magnitude=False)
use_ddp_wrapper = True
optimizer = dict(
    type='AdamW',
    lr=6e-05,
    betas=(0.9, 0.999),
    weight_decay=0.01,
    paramwise_cfg=dict(
        custom_keys=dict(
            head=dict(lr_mult=10.0),
            pos_block=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = None
lr_config = dict(
    policy='poly',
    warmup='linear',
    warmup_iters=1500,
    warmup_ratio=1e-06,
    power=1.0,
    min_lr=0.0,
    by_epoch=False)
seed = 0
n_gpus = 1
runner = dict(type='IterBasedRunner', max_iters=40000)
checkpoint_config = dict(by_epoch=False, interval=40000, max_keep_ckpts=1)
evaluation = dict(interval=4000, metric='mIoU')
name = '211223_2242_sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_dfad9'
exp = 'basic'
name_dataset = 'sim2forest'
name_architecture = 'daformer_sepaspp_mitb5'
name_encoder = 'mitb5'
name_decoder = 'daformer_sepaspp'
name_uda = 'dacs_a999_fd_things_rcs0.01_cpl'
name_opt = 'adamw_6e-05_pmTrue_poly10warm_1x2_40k'
work_dir = 'work_dirs/local-basic/211223_2242_sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_dfad9'
git_rev = '21c5499f0ee1ea0ecd991003ba4598782d42ec04'
gpu_ids = range(0, 1)

2021-12-23 22:42:52,984 - mmseg - INFO - Set random seed to 0, deterministic: False
/home/hans/Documents/part3project/Models/DAFormer/mmseg/models/backbones/mix_transformer.py:214: UserWarning: DeprecationWarning: pretrained is a deprecated, please use "init_cfg" instead
  warnings.warn('DeprecationWarning: pretrained is a deprecated, '
2021-12-23 22:42:54,161 - mmseg - INFO - Load mit checkpoint.
2021-12-23 22:42:54,161 - mmseg - INFO - Use load_from_local loader
/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/cnn/utils/weight_init.py:118: UserWarning: init_cfg without layer key, if you do not define override key either, this init_cfg will do nothing
  warnings.warn(
2021-12-23 22:42:54,332 - mmseg - INFO - Load mit checkpoint.
2021-12-23 22:42:54,332 - mmseg - INFO - Use load_from_local loader
2021-12-23 22:42:54,469 - mmseg - INFO - Load mit checkpoint.
2021-12-23 22:42:54,470 - mmseg - INFO - Use load_from_local loader
2021-12-23 22:42:54,606 - mmseg - INFO - DACS(
  (model): EncoderDecoder(
    (backbone): mit_b5(
      (patch_embed1): OverlapPatchEmbed(
        (proj): Conv2d(3, 64, kernel_size=(7, 7), stride=(4, 4), padding=(3, 3))
        (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
      )
      (patch_embed2): OverlapPatchEmbed(
        (proj): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
      )
      (patch_embed3): OverlapPatchEmbed(
        (proj): Conv2d(128, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
      )
      (patch_embed4): OverlapPatchEmbed(
        (proj): Conv2d(320, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      )
      (block1): ModuleList(
        (0): Block(
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=64, out_features=64, bias=True)
            (kv): Linear(in_features=64, out_features=128, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=64, out_features=64, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): Identity()
          (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=64, out_features=256, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
            )
            (act): GELU()
            (fc2): Linear(in_features=256, out_features=64, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=64, out_features=64, bias=True)
            (kv): Linear(in_features=64, out_features=128, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=64, out_features=64, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=64, out_features=256, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
            )
            (act): GELU()
            (fc2): Linear(in_features=256, out_features=64, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=64, out_features=64, bias=True)
            (kv): Linear(in_features=64, out_features=128, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=64, out_features=64, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=64, out_features=256, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
            )
            (act): GELU()
            (fc2): Linear(in_features=256, out_features=64, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
      (block2): ModuleList(
        (0): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (3): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (4): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (5): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
      (block3): ModuleList(
        (0): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (3): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (4): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (5): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (6): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (7): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (8): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (9): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (10): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (11): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (12): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (13): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (14): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (15): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (16): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (17): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (18): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (19): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (20): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (21): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (22): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (23): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (24): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (25): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (26): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (27): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (28): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (29): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (30): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (31): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (32): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (33): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (34): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (35): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (36): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (37): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (38): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (39): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm3): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
      (block4): ModuleList(
        (0): Block(
          (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=512, out_features=512, bias=True)
            (kv): Linear(in_features=512, out_features=1024, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=512, out_features=512, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
            )
            (act): GELU()
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=512, out_features=512, bias=True)
            (kv): Linear(in_features=512, out_features=1024, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=512, out_features=512, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
            )
            (act): GELU()
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=512, out_features=512, bias=True)
            (kv): Linear(in_features=512, out_features=1024, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=512, out_features=512, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
            )
            (act): GELU()
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm4): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
    )
    (decode_head): DAFormerHead(
      input_transform=multiple_select, ignore_index=255, align_corners=False
      (loss_decode): CrossEntropyLoss()
      (conv_seg): Conv2d(256, 19, kernel_size=(1, 1), stride=(1, 1))
      (dropout): Dropout2d(p=0.1, inplace=False)
      (embed_layers): ModuleDict(
        (0): MLP(
          (proj): Linear(in_features=64, out_features=256, bias=True)
        )
        (1): MLP(
          (proj): Linear(in_features=128, out_features=256, bias=True)
        )
        (2): MLP(
          (proj): Linear(in_features=320, out_features=256, bias=True)
        )
        (3): MLP(
          (proj): Linear(in_features=512, out_features=256, bias=True)
        )
      )
      (fuse_layer): ASPPWrapper(
        (aspp_modules): DepthwiseSeparableASPPModule(
          (0): ConvModule(
            (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activate): ReLU(inplace=True)
          )
          (1): DepthwiseSeparableConvModule(
            (depthwise_conv): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), groups=1024, bias=False)
              (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (pointwise_conv): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
          (2): DepthwiseSeparableConvModule(
            (depthwise_conv): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=1024, bias=False)
              (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (pointwise_conv): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
          (3): DepthwiseSeparableConvModule(
            (depthwise_conv): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), groups=1024, bias=False)
              (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (pointwise_conv): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (bottleneck): ConvModule(
          (conv): Conv2d(1024, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activate): ReLU(inplace=True)
        )
      )
    )
    init_cfg={'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}
  )
  (ema_model): EncoderDecoder(
    (backbone): mit_b5(
      (patch_embed1): OverlapPatchEmbed(
        (proj): Conv2d(3, 64, kernel_size=(7, 7), stride=(4, 4), padding=(3, 3))
        (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
      )
      (patch_embed2): OverlapPatchEmbed(
        (proj): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
      )
      (patch_embed3): OverlapPatchEmbed(
        (proj): Conv2d(128, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
      )
      (patch_embed4): OverlapPatchEmbed(
        (proj): Conv2d(320, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      )
      (block1): ModuleList(
        (0): Block(
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=64, out_features=64, bias=True)
            (kv): Linear(in_features=64, out_features=128, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=64, out_features=64, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): Identity()
          (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=64, out_features=256, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
            )
            (act): GELU()
            (fc2): Linear(in_features=256, out_features=64, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=64, out_features=64, bias=True)
            (kv): Linear(in_features=64, out_features=128, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=64, out_features=64, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=64, out_features=256, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
            )
            (act): GELU()
            (fc2): Linear(in_features=256, out_features=64, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=64, out_features=64, bias=True)
            (kv): Linear(in_features=64, out_features=128, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=64, out_features=64, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=64, out_features=256, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
            )
            (act): GELU()
            (fc2): Linear(in_features=256, out_features=64, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
      (block2): ModuleList(
        (0): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (3): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (4): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (5): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
      (block3): ModuleList(
        (0): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (3): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (4): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (5): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (6): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (7): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (8): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (9): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (10): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (11): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (12): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (13): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (14): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (15): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (16): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (17): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (18): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (19): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (20): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (21): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (22): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (23): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (24): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (25): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (26): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (27): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (28): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (29): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (30): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (31): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (32): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (33): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (34): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (35): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (36): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (37): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (38): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (39): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm3): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
      (block4): ModuleList(
        (0): Block(
          (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=512, out_features=512, bias=True)
            (kv): Linear(in_features=512, out_features=1024, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=512, out_features=512, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
            )
            (act): GELU()
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=512, out_features=512, bias=True)
            (kv): Linear(in_features=512, out_features=1024, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=512, out_features=512, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
            )
            (act): GELU()
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=512, out_features=512, bias=True)
            (kv): Linear(in_features=512, out_features=1024, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=512, out_features=512, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
            )
            (act): GELU()
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm4): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
    )
    (decode_head): DAFormerHead(
      input_transform=multiple_select, ignore_index=255, align_corners=False
      (loss_decode): CrossEntropyLoss()
      (conv_seg): Conv2d(256, 19, kernel_size=(1, 1), stride=(1, 1))
      (dropout): Dropout2d(p=0.1, inplace=False)
      (embed_layers): ModuleDict(
        (0): MLP(
          (proj): Linear(in_features=64, out_features=256, bias=True)
        )
        (1): MLP(
          (proj): Linear(in_features=128, out_features=256, bias=True)
        )
        (2): MLP(
          (proj): Linear(in_features=320, out_features=256, bias=True)
        )
        (3): MLP(
          (proj): Linear(in_features=512, out_features=256, bias=True)
        )
      )
      (fuse_layer): ASPPWrapper(
        (aspp_modules): DepthwiseSeparableASPPModule(
          (0): ConvModule(
            (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activate): ReLU(inplace=True)
          )
          (1): DepthwiseSeparableConvModule(
            (depthwise_conv): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), groups=1024, bias=False)
              (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (pointwise_conv): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
          (2): DepthwiseSeparableConvModule(
            (depthwise_conv): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=1024, bias=False)
              (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (pointwise_conv): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
          (3): DepthwiseSeparableConvModule(
            (depthwise_conv): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), groups=1024, bias=False)
              (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (pointwise_conv): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (bottleneck): ConvModule(
          (conv): Conv2d(1024, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activate): ReLU(inplace=True)
        )
      )
    )
    init_cfg={'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}
  )
  (imnet_model): EncoderDecoder(
    (backbone): mit_b5(
      (patch_embed1): OverlapPatchEmbed(
        (proj): Conv2d(3, 64, kernel_size=(7, 7), stride=(4, 4), padding=(3, 3))
        (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
      )
      (patch_embed2): OverlapPatchEmbed(
        (proj): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
      )
      (patch_embed3): OverlapPatchEmbed(
        (proj): Conv2d(128, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
      )
      (patch_embed4): OverlapPatchEmbed(
        (proj): Conv2d(320, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      )
      (block1): ModuleList(
        (0): Block(
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=64, out_features=64, bias=True)
            (kv): Linear(in_features=64, out_features=128, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=64, out_features=64, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): Identity()
          (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=64, out_features=256, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
            )
            (act): GELU()
            (fc2): Linear(in_features=256, out_features=64, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=64, out_features=64, bias=True)
            (kv): Linear(in_features=64, out_features=128, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=64, out_features=64, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=64, out_features=256, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
            )
            (act): GELU()
            (fc2): Linear(in_features=256, out_features=64, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=64, out_features=64, bias=True)
            (kv): Linear(in_features=64, out_features=128, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=64, out_features=64, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=64, out_features=256, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
            )
            (act): GELU()
            (fc2): Linear(in_features=256, out_features=64, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
      (block2): ModuleList(
        (0): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (3): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (4): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (5): Block(
          (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=128, out_features=128, bias=True)
            (kv): Linear(in_features=128, out_features=256, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=128, out_features=128, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=128, out_features=512, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
            )
            (act): GELU()
            (fc2): Linear(in_features=512, out_features=128, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
      (block3): ModuleList(
        (0): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (3): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (4): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (5): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (6): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (7): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (8): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (9): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (10): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (11): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (12): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (13): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (14): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (15): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (16): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (17): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (18): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (19): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (20): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (21): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (22): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (23): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (24): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (25): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (26): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (27): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (28): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (29): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (30): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (31): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (32): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (33): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (34): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (35): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (36): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (37): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (38): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (39): Block(
          (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=320, out_features=320, bias=True)
            (kv): Linear(in_features=320, out_features=640, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=320, out_features=320, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
            (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=320, out_features=1280, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
            )
            (act): GELU()
            (fc2): Linear(in_features=1280, out_features=320, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm3): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
      (block4): ModuleList(
        (0): Block(
          (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=512, out_features=512, bias=True)
            (kv): Linear(in_features=512, out_features=1024, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=512, out_features=512, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
            )
            (act): GELU()
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=512, out_features=512, bias=True)
            (kv): Linear(in_features=512, out_features=1024, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=512, out_features=512, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
            )
            (act): GELU()
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (q): Linear(in_features=512, out_features=512, bias=True)
            (kv): Linear(in_features=512, out_features=1024, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=512, out_features=512, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=512, out_features=2048, bias=True)
            (dwconv): DWConv(
              (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
            )
            (act): GELU()
            (fc2): Linear(in_features=2048, out_features=512, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm4): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
    )
    (decode_head): DAFormerHead(
      input_transform=multiple_select, ignore_index=255, align_corners=False
      (loss_decode): CrossEntropyLoss()
      (conv_seg): Conv2d(256, 19, kernel_size=(1, 1), stride=(1, 1))
      (dropout): Dropout2d(p=0.1, inplace=False)
      (embed_layers): ModuleDict(
        (0): MLP(
          (proj): Linear(in_features=64, out_features=256, bias=True)
        )
        (1): MLP(
          (proj): Linear(in_features=128, out_features=256, bias=True)
        )
        (2): MLP(
          (proj): Linear(in_features=320, out_features=256, bias=True)
        )
        (3): MLP(
          (proj): Linear(in_features=512, out_features=256, bias=True)
        )
      )
      (fuse_layer): ASPPWrapper(
        (aspp_modules): DepthwiseSeparableASPPModule(
          (0): ConvModule(
            (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activate): ReLU(inplace=True)
          )
          (1): DepthwiseSeparableConvModule(
            (depthwise_conv): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), groups=1024, bias=False)
              (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (pointwise_conv): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
          (2): DepthwiseSeparableConvModule(
            (depthwise_conv): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=1024, bias=False)
              (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (pointwise_conv): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
          (3): DepthwiseSeparableConvModule(
            (depthwise_conv): ConvModule(
              (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), groups=1024, bias=False)
              (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
            (pointwise_conv): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        (bottleneck): ConvModule(
          (conv): Conv2d(1024, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activate): ReLU(inplace=True)
        )
      )
    )
    init_cfg={'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}
  )
)
2021-12-23 22:42:54,654 - mmseg - INFO - Loaded 1001 images from data/sim/images
2021-12-23 22:42:54,664 - mmseg - INFO - Loaded 1001 images from data/forest/images
2021-12-23 22:42:54,666 - mmseg - INFO - RCS Classes: [6, 5, 1, 0, 2, 4, 3]
2021-12-23 22:42:54,666 - mmseg - INFO - RCS ClassProb: [3.9898804e-01 3.4685844e-01 2.4818756e-01 5.9657078e-03 1.8850398e-07
 4.6506953e-16 2.8859502e-18]
2021-12-23 22:42:58,769 - mmseg - INFO - Loaded 1001 images from data/forest/images
2021-12-23 22:42:58,769 - mmseg - INFO - Start running, host: hans@hans-3080-desktop, work_dir: /home/hans/Documents/part3project/Models/DAFormer/work_dirs/local-basic/211223_2242_sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_dfad9
2021-12-23 22:42:58,769 - mmseg - INFO - workflow: [('train', 1)], max: 40000 iters
Traceback (most recent call last):
  File "run_experiments.py", line 101, in <module>
    train.main([config_files[i]])
  File "/home/hans/Documents/part3project/Models/DAFormer/tools/train.py", line 166, in main
    train_segmentor(
  File "/home/hans/Documents/part3project/Models/DAFormer/mmseg/apis/train.py", line 131, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/hans/Documents/part3project/Models/DAFormer/mmseg/models/uda/dacs.py", line 138, in train_step
    log_vars = self(**data_batch)
  File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/home/hans/Documents/part3project/Models/DAFormer/mmseg/models/segmentors/base.py", line 109, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/hans/Documents/part3project/Models/DAFormer/mmseg/models/uda/dacs.py", line 232, in forward_train
    clean_loss.backward(retain_graph=self.enable_fdist)
  File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 624.00 MiB (GPU 0; 9.78 GiB total capacity; 6.61 GiB already allocated; 624.69 MiB free; 6.79 GiB reserved in total by PyTorch)

The get_class_mask function

def get_class_masks(labels):
    class_masks = []
    for label in labels:
        classes = torch.unique(labels)
        nclasses = classes.shape[0]
        class_choice = np.random.choice(
            nclasses, int((nclasses + nclasses % 2) / 2), replace=False)
        classes = classes[torch.Tensor(class_choice).long()]
        class_masks.append(generate_class_mask(label, classes).unsqueeze(0))
    return class_masks

Is classes = torch.unique(labels) should be classes = torch.unique(label)?

Multi GPU training

How to train on mutliple GPUs? using n_gpus=2 in the config file and setting up the CUDA_VISIBLE_DEVICES=0,1 didn't work for me.
The training is still on the 0th GPU.

Thanks

Invitation of contributing to MMSegmentation.

Hi, thanks for your excellent work.

We are members of OpenMMLab whose codebase MMSegmentation is related with your excellent work. Would you like to join us to make a pr about your repo? Currnetly our tasks are all fully supervised segmentation. I think if DAFormer is supported, more researchers could use and cite this method.

Looking forward to your reply!

Best,

Whether DACS can be reproduced through this repository?

Dear Author,I want to combine the implementations of DAformer and DACS for my own study. So whether DACS can be reproduced through this repository, if yes, I'will be very apprecitaed if you could help me to constuct the config file

where's the process of ema updating?

Hello, thanks to your excellent work! I want to know where's the process of ema updating? I found it is at the begging of the function "forward_train" in dacs.py. But I wonder does it means every time I call the "forward_train" , a new model will be created and ema updating will work?

src.loss_imnet_feat_dist is nan

Hi, thanks for your excellent work!
However, when I tried to train DAFormer on GTA5 to Cityscapes benchmark, I found src.loss_imnet_feat_dist is nan while training. Since the final result is expected (68.3 mIoU), I'm a little bit confused.

Difference to DACS

Hi,

Thanks for the amazing work! Just wondering what is the difference between the experiment in the 9th row (in which you removed everything and adopted Deeplab V2) in Tab.5 to the original DACS? Is that the ST strategy, i.e., mean teacher? In that case shouldn't it generally work better than DACS reported in Tab. 6 which does not apply this more advanced ST technique?

Thanks!

'src.loss_imnet_feat_dist' is nan

Thanks for your great work.
During training, src.loss_imnet_feat_dist is nan at the beginning. Is it right?

2022-02-19 08:15:52,120 - mmseg - INFO - Iter [50/40000] lr: 1.958e-06, eta: 1 day, 2:59:33, time: 2.432, data_time: 0.081, memory: 9808,
decode.loss_seg: 2.6839, decode.acc_seg: 10.5387, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 1.4047, mix.decode.acc_seg: 19.1339
2022-02-19 08:17:38,012 - mmseg - INFO - Iter [100/40000] lr: 3.950e-06, eta: 1 day, 1:12:58, time: 2.118, data_time: 0.033, memory: 9808,
decode.loss_seg: 2.3862, decode.acc_seg: 47.4850, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 1.3233, mix.decode.acc_seg: 41.3826
2022-02-19 08:19:28,666 - mmseg - INFO - Iter [150/40000] lr: 5.938e-06, eta: 1 day, 0:57:19, time: 2.213, data_time: 0.034, memory: 9808,
decode.loss_seg: 2.0347, decode.acc_seg: 62.5967, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 1.0585, mix.decode.acc_seg: 59.3449
2022-02-19 08:21:15,993 - mmseg - INFO - Iter [200/40000] lr: 7.920e-06, eta: 1 day, 0:37:33, time: 2.147, data_time: 0.033, memory: 9808,
decode.loss_seg: 1.6078, decode.acc_seg: 68.1829, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 0.7838, mix.decode.acc_seg: 68.8032
2022-02-19 08:23:03,135 - mmseg - INFO - Iter [250/40000] lr: 9.898e-06, eta: 1 day, 0:24:29, time: 2.143, data_time: 0.032, memory: 9808,
decode.loss_seg: 1.3028, decode.acc_seg: 68.6837, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 0.6529, mix.decode.acc_seg: 70.9704
2022-02-19 08:24:50,133 - mmseg - INFO - Iter [300/40000] lr: 1.187e-05, eta: 1 day, 0:14:51, time: 2.140, data_time: 0.034, memory: 9808,
decode.loss_seg: 1.0986, decode.acc_seg: 70.4091, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 0.5765, mix.decode.acc_seg: 72.7845
2022-02-19 08:26:36,420 - mmseg - INFO - Iter [350/40000] lr: 1.384e-05, eta: 1 day, 0:06:07, time: 2.126, data_time: 0.031, memory: 9808,
decode.loss_seg: 0.9639, decode.acc_seg: 71.2223, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 0.5049, mix.decode.acc_seg: 75.3486

Looking forward to your reply!

Performance gap on Synthia to Cityscapes

Hi,

I tried reproducing results for adaptation from Synthia to Cityscapes by running python run_experiments.py --exp 7. The obtained results were ~51 mIOU (87 aAcc, 62 mAcc) which is short of the reported 60.9 mIOU. I used a single GeForce RTX 2080 Ti GPU to run the experiment.

I am trying to match the reported results and seek your help on this.

Thank you!

feat_loss.backward() and Thing-Class ImageNet Feature Distance (FD)

Hello!

  • in feat_loss, feat_log = self.calc_feat_dist(img, gt_semantic_seg,src_feat), use feat_imnet = [f.detach() for f in feat_imnet] separate feat_imnet from the calculation diagram。
  • feat_dist = self.masked_feat_dist(feat[lay], feat_imnet[lay], fdist_mask) get the Feature Distance. Then, feat_loss.backward(), when back propagation calculates the gradient, The backbone of self.imnet_model will not calculate the gradient, so self.imnet_model will not be trained, and its parameters will not be updated?
  • Finally,Feature Distance is only used to train the backbone of Student Net. Only the parameters of the backbone of Student Net are updated?

thanks!

            feat_loss, feat_log = self.calc_feat_dist(img, gt_semantic_seg,
                                                      src_feat)
            feat_loss.backward()
    def calc_feat_dist(self, img, gt, feat=None):
        assert self.enable_fdist
        with torch.no_grad():
            self.get_imnet_model().eval()
            feat_imnet = self.get_imnet_model().extract_feat(img)

            feat_imnet = [f.detach() for f in feat_imnet]  # ?

        lay = -1
        if self.fdist_classes is not None:
            fdclasses = torch.tensor(self.fdist_classes, device=gt.device)
            scale_factor = gt.shape[-1] // feat[lay].shape[-1]
            gt_rescaled = downscale_label_ratio(gt, scale_factor,
                                                self.fdist_scale_min_ratio,
                                                self.num_classes,
                                                255).long().detach()
            fdist_mask = torch.any(gt_rescaled[..., None] == fdclasses, -1)

            # ?
            feat_dist = self.masked_feat_dist(feat[lay], feat_imnet[lay],
                                              fdist_mask)

          feat_dist = self.fdist_lambda * feat_dist

          feat_loss, feat_log = self._parse_losses(
              {'loss_imnet_feat_dist': feat_dist})
          feat_log.pop('loss', None)
          return feat_loss, feat_log
    def masked_feat_dist(self, f1, f2, mask=None):
        feat_diff = f1 - f2
        
        pw_feat_dist = torch.norm(feat_diff, dim=1, p=2)  # f1, f2?
       
        if mask is not None:
           
            pw_feat_dist = pw_feat_dist[mask.squeeze(1)]
            
        return torch.mean(pw_feat_dist)

About accuracy

Hi, thank you for your wonderful work, but I have a question for you. I would appreciate it if you could answer it.

When I run “python run_experiments.py --config configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py
”, I find the best mIoU is 66.21% in console output. Is this normal? If not, how can I get the mIoU in the paper (68.3%)?

Another problem, when evaluating performance, do you use a teacher network or a student network?

loss = Nan

Thank you for your excellent work.
when I run the code, I find such a problem, how can i fix it?

2022-05-07 17:02:57,696 - mmseg - INFO - Iter [50/40000] lr: 1.958e-06, eta: 1 day, 0:29:42, time: 2.207, data_time: 0.029, memory: 9801, decode.loss_seg: nan, decode.acc_seg: 32.2611, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: nan, mix.decode.acc_seg: 54.3771
2022-05-07 17:04:45,225 - mmseg - INFO - Iter [100/40000] lr: 3.950e-06, eta: 1 day, 0:08:59, time: 2.151, data_time: 0.017, memory: 9801, decode.loss_seg: nan, decode.acc_seg: 41.2013, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: nan, mix.decode.acc_seg: 69.9605
2022-05-07 17:06:32,717 - mmseg - INFO - Iter [150/40000] lr: 5.938e-06, eta: 1 day, 0:00:44, time: 2.150, data_time: 0.017, memory: 9801, decode.loss_seg: nan, decode.acc_seg: 41.6272, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: nan, mix.decode.acc_seg: 69.3706
2022-05-07 17:08:19,966 - mmseg - INFO - Iter [200/40000] lr: 7.920e-06, eta: 23:54:54, time: 2.145, data_time: 0.017, memory: 9801, decode.loss_seg: nan, decode.acc_seg: 42.4247, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: nan, mix.decode.acc_seg: 71.1068

The codes in the channel alignment section seem not consistent with the paper.

In 3.2 of the paper:

Before the feature fusion, we embed each $F_i$ to the same number of channels $C_e$ by a 1×1 convolution, bilinearly upsample the features to the size of $F_1$, and concatenate them.

However, the model built from configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py uses MLP instead of conv1:

...
(decode_head): DAFormerHead(
      input_transform=multiple_select, ignore_index=255, align_corners=False
      (loss_decode): CrossEntropyLoss()
      (conv_seg): Conv2d(256, 19, kernel_size=(1, 1), stride=(1, 1))
      (dropout): Dropout2d(p=0.1, inplace=False)
      (embed_layers): ModuleDict(
        (0): MLP(
          (proj): Linear(in_features=64, out_features=256, bias=True)
        )
        (1): MLP(
          (proj): Linear(in_features=128, out_features=256, bias=True)
        )
        (2): MLP(
          (proj): Linear(in_features=320, out_features=256, bias=True)
        )
        (3): MLP(
          (proj): Linear(in_features=512, out_features=256, bias=True)
        )
      )
      (fuse_layer) ...

I‘ll be grateful to your help.

Rare Class Sampling (RCS)

RCS is a very good idea. Apart from your paper, are there any other articles recommended about this idea?
Thank you!

How to tune hyper-parameters?

It's really an awesome transformer based UDA work, thanks for your sharing codes.

I want to ask an open question about UDA.

Considering that the label of target domain is not available, it's impossible to directly evaluate model performance based on the target domain, then how to tune hyper-parameters? And I mean that the validation set of target domain can not be used for tuning hyper-parameters.

Looking forward your reply. :-)

KeyError: 'gta\\labels\\13432_labelTrainIds.png'

Hi, thank you for your wonderful work, but I have a question for you. I would appreciate it if you could answer it.

When I run “python run_experiments.py --config configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py",
in Windows, I got this problem : KeyError: 'gta\labels\13438_labelTrainIds.png'.
And the error is as follows:

Traceback (most recent call last):
File "run_experiments.py", line 103, in
train.main([config_files[i]])
File "E:\Project\DAFormer-master\tools\train.py", line 166, in main
train_segmentor(
File "E:\Project\DAFormer-master\mmseg\apis\train.py", line 131, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "c:\programdata\mmcv-1.3.7\mmcv\runner\iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], **kwargs)
File "c:\programdata\mmcv-1.3.7\mmcv\runner\iter_based_runner.py", line 58, in train
data_batch = next(data_loader)
File "c:\programdata\mmcv-1.3.7\mmcv\runner\iter_based_runner.py", line 32, in next
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\ProgramData\Anaconda3\envs\torch3.8\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "E:\Project\DAFormer-master\mmseg\datasets\uda_dataset.py", line 113, in getitem
return self.get_rare_class_sample()
File "E:\Project\DAFormer-master\mmseg\datasets\uda_dataset.py", line 90, in get_rare_class_sample
i1 = self.file_to_idx[f1]
KeyError: 'gta\labels\13432_labelTrainIds.png'

Crop PL

Thanks for your detailed experiments. But I am not sure about the explanation of the "Crop PL" in Table.5. Is there any reference in the paper?

KeyError: 'data_time'

Thanks for your detailed codes.
I am (just) able to fit DAFormer in 10GiB of GPU memory by disabling FD and reducing the crop size from 512x512 to 480x480.
However,
when it trained [4000/40000], a key error has occurred.
Looking forward to your reply!

Best.

2022-01-09 23:01:50,074 - mmseg - INFO - Iter [3950/40000]	lr: 5.408e-05, eta: 11:05:38, time: 1.110, data_time: 0.014, memory: 8163, decode.loss_seg: 0.2515, decode.acc_seg: 86.1694, mix.decode.loss_seg: 0.2489, mix.decode.acc_seg: 86.0454
2022-01-09 23:02:46,390 - mmseg - INFO - Exp name: 220109_2148_gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_33f34
2022-01-09 23:02:46,390 - mmseg - INFO - Iter [4000/40000]	lr: 5.400e-05, eta: 11:04:51, time: 1.127, data_time: 0.015, memory: 8163, decode.loss_seg: 0.2198, decode.acc_seg: 85.9952, mix.decode.loss_seg: 0.2331, mix.decode.acc_seg: 86.3791
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 500/500, 7.2 task/s, elapsed: 70s, ETA:     0s2022-01-09 23:04:41,465 - mmseg - INFO - per class results:
2022-01-09 23:04:41,468 - mmseg - INFO - 
+---------------+-------+-------+
|     Class     |  IoU  |  Acc  |
+---------------+-------+-------+
|      road     | 88.63 | 92.44 |
|    sidewalk   | 46.11 | 71.19 |
|    building   | 83.71 | 94.21 |
|      wall     | 27.07 | 33.25 |
|     fence     |  7.09 |  7.36 |
|      pole     | 29.68 | 32.26 |
| traffic light | 37.11 | 50.64 |
|  traffic sign |  26.5 | 27.16 |
|   vegetation  | 87.65 | 94.58 |
|    terrain    | 44.26 | 52.82 |
|      sky      | 85.91 | 97.84 |
|     person    | 62.97 | 84.29 |
|     rider     | 34.35 | 54.13 |
|      car      | 85.22 | 93.03 |
|     truck     | 48.87 | 65.72 |
|      bus      | 47.98 | 77.11 |
|     train     | 16.64 | 17.55 |
|   motorcycle  | 40.64 | 61.24 |
|    bicycle    | 47.22 | 51.02 |
+---------------+-------+-------+
2022-01-09 23:04:41,468 - mmseg - INFO - Summary:
2022-01-09 23:04:41,468 - mmseg - INFO - 
+-------+-------+-------+
|  aAcc |  mIoU |  mAcc |
+-------+-------+-------+
| 88.67 | 49.87 | 60.94 |
+-------+-------+-------+

2022-01-09 23:04:41,562 - mmseg - INFO - Exp name: 220109_2148_gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_33f34
Traceback (most recent call last):
  File "run_experiments.py", line 104, in <module>
    train.main([config_files[i]])
  File "/home/data/liuhao/experiments/DAFormer-master/tools/train.py", line 173, in main
    meta=meta)
  File "/home/data/liuhao/experiments/DAFormer-master/mmseg/apis/train.py", line 131, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
    self.call_hook('after_train_iter')
  File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/hooks/logger/base.py", line 152, in after_train_iter
    self.log(runner)
  File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/hooks/logger/text.py", line 234, in log
    self._log_info(log_dict, runner)
  File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/hooks/logger/text.py", line 153, in _log_info
    log_str += f'time: {log_dict["time"]:.3f}, ' \
KeyError: 'data_time'

How to train on multi-target dataset

Hi, your work is briliant! I want to use it on my task which has multi targets dataset. I try to add code under mmseg/dataset and init/py, then I think I need to modify code under config/base/datasets. For example, uda_gta_to_cityscapes_512*512.py, I find data defined here as a dict concluding souce(dict)... and target(dict)... I wonder how can I add multi targets dataset?

Performance gap between training DACS with SwinTransformer and SegFormer

Hi, sorry to disturb you again. I want to ask the questions about training with SwinTransformer, and the title may not be appropriate.

I have successfully reproduced the UDA result of SegFormer in Table 1, which finally is 58.82 and close to your reported result.

Meanwhile, I did the same experiment with Swin-B, however, the result was worse than SegFromer, where the best performance is 48.1 at 24000 iters (the training was stopped unexpectedly), but the best performance is 53.81 for SegFormer at 24000 iters. The dataset, training, and other parameters are the same with SegFormer, and modification is only the model.

The training log is here: gta2cs_dacs_swin_base_poly10warm_s0.log

Did you do experiments with SwinTransformer before? Why there is a big performance gap? Can you share your points about this?

Looking forward to your reply. :-)

optimizer and backward

Hello!

  • In train_step of dacs.py, use optimizer.step() to update model parameters,

  • because backward at clean_loss.backward(retain_graph=self.enable_fdist)feat_loss.backward() and mix_loss.backward() of forward_train cannot use runner.outputs['loss'].backward() provided by after_train_iter of mmcv/runner/hooks/optimizer.py from mmcv for directional propagation ?

  • https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py

    def after_train_iter(self, runner):
        runner.optimizer.zero_grad()
        if self.detect_anomalous_params:
            self.detect_anomalous_parameters(runner.outputs['loss'], runner)
        runner.outputs['loss'].backward()

        if self.grad_clip is not None:
            grad_norm = self.clip_grads(runner.model.parameters())
            if grad_norm is not None:
                # Add grad norm to the logger
                runner.log_buffer.update({'grad_norm': float(grad_norm)},
                                         runner.outputs['num_samples'])
        runner.optimizer.step()

Thanks!

MiT-B3 is much better than MiT-B4

Dear authors, thank you for your outstanding work. I have encountered results in reproducing your work that make me puzzled: in the GTAV->Cityscapes experiment, when the backbone network is MiT-B5, I get results similar to the paper (68.3); when the backbone network is MiT-B4, I get an mIoU of 66.69; when the backbone network is MiT-B3, I get an mIoU of 67.91. I am confused why MiT-B3 is so much better than MiT-B4. Have you conducted similar experiments? What are the results like?

Whether DACS can be reproduced through this repository?

Dear Author,I want to combine the implementations of DAformer and DACS for my own study. So whether DACS can be reproduced through this repository, if yes, I'will be very apprecitaed if you could help me to constuct the config file

performance without mix augmentation

Thanks for your detailed codes. Is there any ablation study about performance without DACS augmentation? I wonder if online pseudo-label generation is very dependent on augmentation. I also conduct the experiments without DACS augmentation. The performance is only 37.

Losses are backpropagated separately

Dear authors, thank you for your outstanding work. In the process of reading the code, I found that the loss is backpropagated separately. In many other works, the loss is backpropagated after accumulation. What's the difference between the two? Looking forward to your reply.

How to use own dataset?

I have my own dataset with:

  1. Images from domain A
  2. Segmentations from domain A
  3. Images from domain B

How can I apply your code to train the model on this to predict segmentations for the images in domain B?

How to count the class statistics in Figure S1 of the supplementary material?

Dear Lukas,

Thanks for your great work, DAFormer, and I am very interested in Figure S1 of the supplementary material, which depicts the class statistics of the corresponding dataset for 10k samples.

However, I have problems with modifying the mmseg/datasets/uda_dataset.py. Could you please tell me how to change the code to count the class statistics in Figure S1?

Thank you in advance!
I have also read your excellent paper HRDA, I really hope it can be accepted to the ECCV 2022. Good luck to you!

Best Regards,

T-sne visualization

Nice works!

Can you provide the code of T-sne visualization.I really need it.

Thank you for your to rely!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.