jingyunliang / swinir Goto Github PK

SwinIR: Image Restoration Using Swin Transformer (official repository)

Home Page: https://arxiv.org/abs/2108.10257

License: Apache License 2.0

Python 97.87% Shell 2.13%

image-super-resolution image-denoising compression-artifact-reduction image-deblocking transformer real-world-image-super-resolution lightweight-image-super-resolution image-restoration low-level-vision vision-transformer image-sr restoration super-resolution denoising deblocking decompression

swinir's Introduction

SwinIR: Image Restoration Using Swin Transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, Radu Timofte

Computer Vision Lab, ETH Zurich

Gradio Web Demo

This repository is the official PyTorch implementation of SwinIR: Image Restoration Using Shifted Window Transformer (arxiv, supp, pretrained models, visual results). SwinIR achieves state-of-the-art performance in

bicubic/lighweight/real-world image SR
grayscale/color image denoising
grayscale/color JPEG compression artifact reduction

🚀 🚀 🚀 News:

Aug. 16, 2022: Add PlayTorch Demo on running the real-world image SR model on mobile devices .
Aug. 01, 2022: Add pretrained models and results on JPEG compression artifact reduction for color images.
Jun. 10, 2022: See our work on video restoration 🔥🔥🔥 VRT: A Video Restoration Transformer and RVRT: Recurrent Video Restoration Transformer for video SR, video deblurring, video denoising, video frame interpolation and space-time video SR.
Sep. 07, 2021: We provide an interactive online Colab demo for real-world image SR 🔥 for comparison with the first practical degradation model BSRGAN (ICCV2021) and a recent model RealESRGAN. Try to super-resolve your own images on Colab!

Real-World Image (x4)	BSRGAN, ICCV2021	Real-ESRGAN	SwinIR (ours)	SwinIR-Large (ours)

Aug. 26, 2021: See our recent work on real-world image SR: a pratical degrdation model BSRGAN, ICCV2021
Aug. 26, 2021: See our recent work on generative modelling of image SR and image rescaling: normalizing-flow-based HCFlow, ICCV2021
Aug. 26, 2021: See our recent work on blind SR: spatially variant kernel estimation (MANet, ICCV2021) and unsupervised kernel estimation (FKP, CVPR2021)

Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14~0.45dB, while the total number of parameters can be reduced by up to 67%.

Training
Testing
Results
Citation
License and Acknowledgement

Training

Used training and testing sets can be downloaded as follows:

Task	Training Set	Testing Set	Visual Results
classical/lightweight image SR	DIV2K (800 training images) or DIV2K +Flickr2K (2650 images)	Set5 + Set14 + BSD100 + Urban100 + Manga109 download all	here
real-world image SR	SwinIR-M (middle size): DIV2K (800 training images) +Flickr2K (2650 images) + OST (alternative link, 10324 images for sky,water,grass,mountain,building,plant,animal) SwinIR-L (large size): DIV2K + Flickr2K + OST + WED(4744 images) + FFHQ (first 2000 images, face) + Manga109 (manga) + SCUT-CTW1500 (first 100 training images, texts) *We use the pionnerring practical degradation model from BSRGAN, ICCV2021	RealSRSet+5images	here
color/grayscale image denoising	DIV2K (800 training images) + Flickr2K (2650 images) + BSD500 (400 training&testing images) + WED(4744 images) *BSD68/BSD100 images are not used in training.	grayscale: Set12 + BSD68 + Urban100 color: CBSD68 + Kodak24 + McMaster + Urban100 download all	here
grayscale/color JPEG compression artifact reduction	DIV2K (800 training images) + Flickr2K (2650 images) + BSD500 (400 training&testing images) + WED(4744 images)	grayscale: Classic5 +LIVE1 download all	here

The training code is at KAIR.

Testing (without preparing datasets)

For your convience, we provide some example datasets (~20Mb) in /testsets. If you just want codes, downloading models/network_swinir.py, utils/util_calculate_psnr_ssim.py and main_test_swinir.py is enough. Following commands will download pretrained models automatically and put them in model_zoo/swinir. All visual results of SwinIR can be downloaded here.

We also provide an online Colab demo for real-world image SR for comparison with the first practical degradation model BSRGAN (ICCV2021) GitHub Stars and a recent model RealESRGAN. Try to test your own images on Colab!

We provide a PlayTorch demo for real-world image SR to showcase how to run the SwinIR model in mobile application built with React Native.

# 001 Classical Image Super-Resolution (middle size)
# Note that --training_patch_size is just used to differentiate two different settings in Table 2 of the paper. Images are NOT tested patch by patch.
# (setting1: when model is trained on DIV2K and with training_patch_size=48)
python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x2.pth --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 3 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x3.pth --folder_lq testsets/Set5/LR_bicubic/X3 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 8 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x8.pth --folder_lq testsets/Set5/LR_bicubic/X8 --folder_gt testsets/Set5/HR

# (setting2: when model is trained on DIV2K+Flickr2K and with training_patch_size=64)
python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x2.pth --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 3 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x3.pth --folder_lq testsets/Set5/LR_bicubic/X3 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 8 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x8.pth --folder_lq testsets/Set5/LR_bicubic/X8 --folder_gt testsets/Set5/HR


# 002 Lightweight Image Super-Resolution (small size)
python main_test_swinir.py --task lightweight_sr --scale 2 --model_path model_zoo/swinir/002_lightweightSR_DIV2K_s64w8_SwinIR-S_x2.pth --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task lightweight_sr --scale 3 --model_path model_zoo/swinir/002_lightweightSR_DIV2K_s64w8_SwinIR-S_x3.pth --folder_lq testsets/Set5/LR_bicubic/X3 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task lightweight_sr --scale 4 --model_path model_zoo/swinir/002_lightweightSR_DIV2K_s64w8_SwinIR-S_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR


# 003 Real-World Image Super-Resolution (use --tile 400 if you run out-of-memory)
# (middle size)
python main_test_swinir.py --task real_sr --scale 4 --model_path model_zoo/swinir/003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth --folder_lq testsets/RealSRSet+5images --tile

# (larger size + trained on more datasets)
python main_test_swinir.py --task real_sr --scale 4 --large_model --model_path model_zoo/swinir/003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth --folder_lq testsets/RealSRSet+5images


# 004 Grayscale Image Deoising (middle size)
python main_test_swinir.py --task gray_dn --noise 15 --model_path model_zoo/swinir/004_grayDN_DFWB_s128w8_SwinIR-M_noise15.pth --folder_gt testsets/Set12
python main_test_swinir.py --task gray_dn --noise 25 --model_path model_zoo/swinir/004_grayDN_DFWB_s128w8_SwinIR-M_noise25.pth --folder_gt testsets/Set12
python main_test_swinir.py --task gray_dn --noise 50 --model_path model_zoo/swinir/004_grayDN_DFWB_s128w8_SwinIR-M_noise50.pth --folder_gt testsets/Set12


# 005 Color Image Deoising (middle size)
python main_test_swinir.py --task color_dn --noise 15 --model_path model_zoo/swinir/005_colorDN_DFWB_s128w8_SwinIR-M_noise15.pth --folder_gt testsets/McMaster
python main_test_swinir.py --task color_dn --noise 25 --model_path model_zoo/swinir/005_colorDN_DFWB_s128w8_SwinIR-M_noise25.pth --folder_gt testsets/McMaster
python main_test_swinir.py --task color_dn --noise 50 --model_path model_zoo/swinir/005_colorDN_DFWB_s128w8_SwinIR-M_noise50.pth --folder_gt testsets/McMaster


# 006 JPEG Compression Artifact Reduction (middle size, using window_size=7 because JPEG encoding uses 8x8 blocks)
# grayscale
python main_test_swinir.py --task jpeg_car --jpeg 10 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg10.pth --folder_gt testsets/classic5
python main_test_swinir.py --task jpeg_car --jpeg 20 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg20.pth --folder_gt testsets/classic5
python main_test_swinir.py --task jpeg_car --jpeg 30 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg30.pth --folder_gt testsets/classic5
python main_test_swinir.py --task jpeg_car --jpeg 40 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg40.pth --folder_gt testsets/classic5

# color
python main_test_swinir.py --task color_jpeg_car --jpeg 10 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg10.pth --folder_gt testsets/LIVE1
python main_test_swinir.py --task color_jpeg_car --jpeg 20 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg20.pth --folder_gt testsets/LIVE1
python main_test_swinir.py --task color_jpeg_car --jpeg 30 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg30.pth --folder_gt testsets/LIVE1
python main_test_swinir.py --task color_jpeg_car --jpeg 40 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg40.pth --folder_gt testsets/LIVE1

Results

We achieved state-of-the-art performance on classical/lightweight/real-world image SR, grayscale/color image denoising and JPEG compression artifact reduction. Detailed results can be found in the paper. All visual results of SwinIR can be downloaded here.

Classical Image Super-Resolution (click me)

More detailed comparison between SwinIR and a representative CNN-based model RCAN (classical image SR, X4)

Method	Training Set	Training time (8GeForceRTX2080Ti batch=32, iter=500k)	Y-PSNR/Y-SSIM on Manga109	Run time (1GeForceRTX2080Ti, on 256x256 LR image)*	#Params	#FLOPs	Testing memory
RCAN	DIV2K	1.6 days	31.22/0.9173	0.180s	15.6M	850.6G	593.1M
SwinIR	DIV2K	1.8 days	31.67/0.9226	0.539s	11.9M	788.6G	986.8M

* We re-test the runtime when the GPU is idle. We refer to the evluation code here.

Results on DIV2K-validation (100 images)

Training Set	scale factor	PSNR (RGB)	PSNR (Y)	SSIM (RGB)	SSIM (Y)
DIV2K (800 images)	2	35.25	36.77	0.9423	0.9500
DIV2K+Flickr2K (2650 images)	2	35.34	36.86	0.9430	0.9507
DIV2K (800 images)	3	31.50	32.97	0.8832	0.8965
DIV2K+Flickr2K (2650 images)	3	31.63	33.10	0.8854	0.8985
DIV2K (800 images)	4	29.48	30.94	0.8311	0.8492
DIV2K+Flickr2K (2650 images)	4	29.63	31.08	0.8347	0.8523

Lightweight Image Super-Resolution

Real-World Image Super-Resolution

Grayscale Image Deoising

Color Image Deoising

JPEG Compression Artifact Reduction

on grayscale images

on color images

Training Set	quality factor	PSNR (RGB)	PSNR-B (RGB)	SSIM (RGB)
LIVE1	10	28.06	27.76	0.8089
LIVE1	20	30.45	29.97	0.8741
LIVE1	30	31.82	31.24	0.9018
LIVE1	40	32.75	32.12	0.9174

Citation

@article{liang2021swinir,
  title={SwinIR: Image Restoration Using Swin Transformer},
  author={Liang, Jingyun and Cao, Jiezhang and Sun, Guolei and Zhang, Kai and Van Gool, Luc and Timofte, Radu},
  journal={arXiv preprint arXiv:2108.10257},
  year={2021}
}

License and Acknowledgement

This project is released under the Apache 2.0 license. The codes are based on Swin Transformer and KAIR. Please also follow their licenses. Thanks for their awesome works.

swinir's People

Contributors

Stargazers

Watchers

Forkers

farathoverfr a-biao96 ahuirecome lihaossu happy20200 hevincent hust-lidelong liyouxing ast-363 rushi-the-neural-arch crazytiy anoo555 shaunstanislauslau hadryan timerobin jiaxi-jiang 781303842 esunvoteb xkyi guyiyifeurach wwlcape helloworldcn jacklikesironman zt706 as85207 yafengge guoleisun wangdanxu frausong rookielike heavyflavor nooice98 metavai bbpatil shuguoj ak391 zeo95 weigq elessar117 jjjjjjamesharden lord-aresyzen hanlinwu hantingchen wonlee2019 rexhaif cvsch nekodaisiki aiaini66 cv-ip ustcjwyang xsz777 jayagami maoshifu-yang mingfanzhao pq569375378 ymzlygw lockejiang swewjk qqq-tech tfqkr nevolver faisalshahbaz martincastellano powerscans metaprojeto some-alien-bullshit ethan-jiang-1 leomauro 7thstorm priestd09 finder2018 kamal811 nitin-mane hy-chao chancat87 mu-l liulangxing mugerwaeric bluejayblues techthiyanes gavinljj mtsygankov b4go3s vhurryharry wdmwhh kenan82 slbuilder qiaoptdun ltbig hzk7287 ra4-z fangichao mobulan liufeng5200 rococostudio ikasumi star-twinking ly451x joeyee nicolopinci

swinir's Issues

one question

hi,this is great work. I want to use this network for single image deraining, and what parts of this code can I modify? Or do you have any good suggestions? thanks!

Looking forward to your training code

Can gan be finetuned on own dataset?

When I try to set pretrained models (003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth) paths in KAIR's train file;

, "path": {
"root": "superresolution" // "denoising" | "superresolution" | "dejpeg"
, "pretrained_netG": null // path of pretrained model
, "pretrained_netD": null // path of pretrained model
, "pretrained_netE": null // path of pretrained model
}

it starts to train from scratch anyway. And when I copy from model_zoo right into /superresolution/swinir_sr_realworld_x4_gan/models/ thats not working either.

Residual connection resulting in bad result

Thanks for sharing your work. I tried to add the residual connection in RSTB block and STL layer，but get a bad result. The residual connection was added as figure (1) and (2).

My question is:

I add the residual connection between the RSTB and STL (Only added in RSTB or only in STL also tried,but the result was bad either.) The figure showed the result of have adaptive parameters residual connection only in RSTB( The red line,and the blue one is your paper original SwinIR network). Your paper have only one global residual connection in RSTB（just from input adding to RSTB output）but get awesome result. So I want to know have you ever tried like above method by adding more residual connection and also get the bad result.
If you have tried but get the Impressive results, could you tell me the way what have you done?
Thanks ~
:-D

A question about the framework

Hi, @JingyunLiang

I appreciate your fabulous work but I have a question about the framework. Did you ever try the Unet-like framework or encoder-decoder one for the Deep Feature Extraction Block (the whole transformer block)? As your framework is all of the same RSTB blocks, I am wondering if the encoder-decoder idea is helpful for the performance gain?

Thank you very much.

Problem when saving the model

Hi thanks for the training code.
I have a problem when iteration meet 5000 to save the model.
File "/KAIR/models/network_swinir.py", line 254, in forward
x_windows = window_partition(shifted_x, self.window_size) # nW*B, window_size, window_size, C
File "/KAIR/models/network_swinir.py", line 42, in window_partition
x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
RuntimeError: shape '[1, 111, 8, 143, 8, 180]' is invalid for input of size 184459500
May I know how to fix it?
Thanks.

Training question

Thanks for your amazing work! I have a question regarding your training:
How many gpus did you use to train parallel? How many hours do it need to early stop?

testing dataset downsampled image

Thankx for releasing the wonderful code and data-sets.
I am encountering one problem while testing: While Set5 and Set14 have x2, x3 and x4 down-sampled images, other data-sets viz. Urban100, magna109 and BSDS100 do not. Will it be possible for you to share the down-sampled images for these datasets. I can down-sample them but probably the result may differ on them, than what is mentioned in the paper.

Does the Classical Image Super-Resolution have color enhancement function?

I trained the Classical Image Super-Resolution with other training sets.
When I tested the model, I discovered that the color has changed to be sharper.
May I know that is this Classical Image Super-Resolution has color enhancement?

The left-hand side is swinir result.

About drop path rate

I notice that you use the same parameters as swin transformer and set droppath 0.1, does the super resolution real nead the drop path？

High CPU Usage

Thanks for sharing your work. I meet a problem that the CPU usage is too high. When i set the H_size > 64 (eg.96 or 128) , the CPU usage is about 500%. I want to know why and what your type of GPU is used in the experiment in your paper. And I wonder if this problem is caused by the weak computing power of GPU (My GPU is NVIDIA RTX 2080 Ti).
Thanks~

swin layer

In swin transformer, self attention module conclude two subnet, one is simple windows self attention, another is shifted windows.in that code, in normal first windows self attention, there is no attn_mask, second shifted windows have mask. but in your code, seemly every self-attention layers have the attn_mask. that means every swin layer dont have windows self-attn, instead by all shifted windows in every layer? thank you

关于PSNR和SSIM没有收敛到原论文中的性能

作者你好，

谢谢你所做的非常不错的工作，我阅读了SwinIR论文，并且star了此仓库。在我使用你们提供的预训练模型在Set5数据集上测试和在DIV2K和Flickr2K数据集训练Class Imager(x2)时，发现PSNR值没有达到原论文中的值，在这里请教下是否因为我的超参数设置的问题还是有些训练的trick。

使用官方的预训练模型(001_classicalSR_DF2K_s64w8_SwinIR-M_x2.pth),在Set5上测试：

Average PSNR: 36.21 dB;

使用https://github.com/cszn/KAIR 中提供的训练代码在DIV2K和Flickr2K数据集训练Class Imager(x2) 时：

Average PSNR 收敛到36.15dB，没达到论文中的性能。

谢谢！

Problem about testing my trained model

Thanks for the training code.
I train a classic model.
I get 500000_optimizerG.pth 500000_G.pth 500000_E.pth.
May I know which pth I should run during testing?
When I run
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 64 --model_path superresolution/swinir_sr_classical_patch64_x4_l1/models/500000_G.pth --folder_lq testsets/real3wx4/test_LR_crop
It seems cannot load the model
model.load_state_dict(torch.load(args.model_path)['params'], strict=True)
KeyError: 'params'
May I know how to solve this?
Thanks.

About training code

I have download the training code from https://github.com/cszn/KAIR but it generates the G and E models when i try to do a SR task. Do I download the wrong code?

Looking forward to the open training part of the code

Cuda out of memory for photos larger than 640x480px on RTX 3060 12GB

Thanks for great code released for free to public. Tried real world 4x large model. Results are great. Even better than known commercial products.
Real esrgan for example has --tile option for cuda out of memory errors. With your code I cant upscale larger than vga resolution photos with RTX 3060 12GB. Please tell me is there way for tiling or do I need to change img_size, for what values?
Thank you very much in advance!

train code

when do you release the training code? i want it soon. thank you

about the color enhancement task

Hi！
Thank you for sharing your codes! Can this be applied for color enhancement task？

Thanks

About ape

Thanks for your code first!
After reading your code, I want to kown why don't use ape(absolute position embedding) in this code, because i saw the option is False by default.
I also want to confirm that if I use the 128-size pic to train the model with ape, could I can change the image size when I evaluation the model. I thought the length of position embedding is related to the num of patches, and the num of patches is related to the image size.
Hope you can solve my problem!

About use_checkpoint

The code for use checkpoint misses one parameter:

The original code in network_swinir.py line 399:

x = checkpoint.checkpoint(blk, x)

Should be:

x = checkpoint.checkpoint(blk, x, x_size)

About #Parameters in the model

Thanks for providing the code for SwinIR!

I calculated the #Params and #FLOPS for the lightweight SwinIR model using KAIR. However, I'm not able to replicate the numbers mentioned in table 3 of the paper.
For example, I get the #Params as 910.2K instead of 878K in the table. The same happens with #FLOPS too. Could you please guide me on how to reproduce the results? Thanks!

FLOPs?

Hi,

Thanks for this great work! Could you provide the FLOPs/MACs of your SwinIR model?

About training code

Hi, there. Thanks for your amazing work, but I have some questions about the training code.

Do we need to modify main_train_psnr.py (KAIR) to set training iterations to 500K? It's 1M epochs in the original file.
I ran training python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 main_train_psnr.py --opt options/swinir/train_swinir_sr_classical.json --dist True on 8 RTX 3090 GPUs and the dataset is DIV2K train split (default X2). The estimated training time for 500K iters is ~3.5days (1min/100 iters), much longer than your 1.8 days on 8 2080 Ti GPUs. Do you have any idea about that?

Issues about the patch embedding.

Hi, thanks very much for sharing this wonderful work. According to the definition of PatchEmbed(nn.Module). It seems that the parameters such as patch_size and img_size are not used. It seems that performance improvements of SwinIRs are provided by these MAS and MLP layers. Of course, multiple skip connections in the RSTB and STL are also helpful. I am curious about why the SwinIRs do not form patches with multiple pixels. For example, the PatchEmbed method used in
《Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions》.

patch_size

I found that the patch_size of the network setting uses the initial 1, then the pixel will become a token. What is the reason for not using the image block (e.g, 4*4) as the token?

Illustration in the README

For info, you have linked to the wrong image for your result with SwinIR-Large.

|Real-World Image (x4)|[BSRGAN, ICCV2021](https://github.com/cszn/BSRGAN)|[Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN)|SwinIR (ours)|SwinIR-Large (ours)|

|       :---       |     :---:        |        :-----:         |        :-----:         |        :-----:         | 

|<img width="200" src="figs/ETH_LR.png">|<img width="200" src="figs/ETH_BSRGAN.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR-L.png">|

|<img width="200" src="figs/OST_009_crop_LR.png">|<img width="200" src="figs/OST_009_crop_BSRGAN.png">|<img width="200" src="figs/OST_009_crop_realESRGAN.png">|<img width="200" src="figs/OST_009_crop_SwinIR.png">|<img width="200" src="figs/OST_009_crop_SwinIR-L.png">|

That is because you have an extra column on the row of the building. Check the end of the row:

|<img width="200" src="figs/ETH_SwinIR.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR-L.png">|

Training dataset - patch creation

It would be really helpful if you could point out how to create the patches when the image size is less than 128 x 128 (the patch size mentioned in the training settings). Would we consider such images by zero padding or exclude those images as the ones present in BSD500 dataset of size 120 x 80?

JSONDecodeError when training swinir

Hi @JingyunLiang I use the training code main_train_psnr.py in KAIR and I only change the dataroot and other necessary stuff. The training command is python main_train_psnr.py --opt options/swinir/train_swinir_sr_classical.json. And my environment is CUDA10.1+Pytorch1.7.1+Python3.7. When training the swinir model, I got this error:

So I search it and change the json_path='options/train_msrresnet_psnr.json' to json_path="options/train_msrresnet_psnr.json" in main_train_psnr.py. As you can see, the line 34 is the json_path.

But I still got the error over and over again. Could you please provide some suggestions? Thanks a lot.

[SUGGESTION] Optimized version for videos

Hi there, SwinIR is really cool !

Sice it has been "ported" to VapourSynth (thanks to @HolyWu) some interesting discussions - with tests too - about its effectiveness on videos started:

Seems that the main issue is the processing speed, btw someone argue that the algorithm is not (yet ?) optimized for videos...

About speed: @xinntao may help to implement an NCNN-Vulkan (as already done for Real-ESGRAN)...

About video optimizations: a collaboration with @ding3820 of MIMO-VRN project may help...

Hope that inspires.

Training settings for SwinIR light

Thanks for sharing your work.

I notice that the training config file for lightweight in KAIR may be not consistent with the statement in the paper. Could you double check that?
In particular, both the batch size and patch size are set to 64, and embed_dim is 180. Is this the correct setting?

network interpolation

When i'm trying denoising images, I need more noise level such as 20 and 35, i think a network interpolation function may produce approximate model

About Resi-connection

Hi there, thanks for the amazing work!
In the section 'Impact of residual connection and convolution layer in RSTB' of the paper said that would add a 1x1 conv or 3x3 conv at the residual connection.
the result shows that 3x3 is better than 1x1, also the inverted-bottleneck 3x3.

back to the code itself.
when I first read the 'resi_connection' argument in SwinIR class, I thought that '1conv' means 1x1 conv and '3conv' means 3x3 conv
after a while reading more code, I realized that '1conv' actually means 'one 3x3 conv' and '3conv' means 'three 3x3 conv'.

# build the last conv layer in deep feature extraction
if resi_connection == '1conv':
    self.conv_after_body = nn.Conv2d(embed_dim, embed_dim, 3, 1, 1)
elif resi_connection == '3conv':
    # to save parameters and memory
    self.conv_after_body = nn.Sequential(nn.Conv2d(embed_dim, embed_dim // 4, 3, 1, 1),
                                         nn.LeakyReLU(negative_slope=0.2, inplace=True),
                                         nn.Conv2d(embed_dim // 4, embed_dim // 4, 1, 1, 0),
                                         nn.LeakyReLU(negative_slope=0.2, inplace=True),
                                         nn.Conv2d(embed_dim // 4, embed_dim, 3, 1, 1))

I think the way of naming could be a little bit confusing.
It could be better if it calls '1conv3' '3conv3' or something else.

just want to tell you the confusing part.
thanks.

denosing training code

I haven’t found the training code for the denoising task in KAIR, hasn’t it been released yet?

about test

在swinIR模型中，有img_size这个参数，例如为128，在SwinLayer时，是input_resolution=（128， 128）, 比如我在测试的时候，我的输入图像不是(128, 128) 那么计算attention的时候有一个判断， if self.input_resolution == x_size， else attn_windows = self.attn(x_windows, mask=self.calculate_mask(x_size).to(x.device))。我想请问一下如果图片大小不等于self.input_resolution=(128,128)时，加入的参数 mask 这个是什么mask

about charbonnierloss

charbonnierloos have a extra parameter eps, in paper, eps is 1e-3, its true use is (1e-3) ^ 2, but in your code May be you dont take ^2 operations. I Dont know its inference is important?

Supplements

hello, In this great paper, some details write in the supplement, I want to know where I can find the supp? thankyou

Trained model from KAIR (40000_optimizerG.pth) gives error on testing

Instead of using pre-trained models, I trained the KAIR code and used the generated model for testing.

The KAIR training code produced 3 models:
40000_optimizerG.pth
40000_G.pth
40000_E.pth

Using these models, for testing the code in this repository, I am getting error:

(pytorch-gpu) C:\Users\Downloads\SwinIR-main>python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 48 --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
loading model from model_zoo/swinir/40000_optimizerG.pth
Traceback (most recent call last):
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 253, in <module>
    main()
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 42, in main
    model = define_model(args)
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 174, in define_model
    model.load_state_dict(pretrained_model[param_key_g] if param_key_g in pretrained_model.keys() else pretrained_model, strict=True)
  File "C:\Users\anaconda3\envs\pytorch-gpu\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SwinIR:
        Missing key(s) in state_dict: "conv_first.weight", "conv_first.bias", "patch_embed.norm.weight", "patch_embed.norm.bias", "layers.0.residual_group.blocks.0.norm1.weight", "layers.0.residual_group.blocks.0.norm1.bias", "layers.0.residual_group.blocks.0.attn.relative_position_bias_table",

40000_G.pth
40000_E.pth
are testing fine

The input size during test

Hi, Jingyun, nice work! I just wonder why SwinIR function needs to set the 'img_size'. It is somehow kind of inconvenient, especially for test, since we usually want to test on different sizes of images, right? Is there any particular reason for this? Since Swin Transformer does not need this because they use padding operations. Besides, are there any requirements of the input size, i.e., must be the multiple of a number, or something else? Thanks.

About the FLOPs of SwinIR

Hi,

Thanks for sharing the code of this interesting work. Would you mind helping provide the FLOPs cost of SwinIR? E.g., FLOPs under 256x256x3 images. Thanks!

About test part in training

Thanks for your code first!
I run the super-resolution lightweight part of the code, and there is an error in the testing part of training:

Traceback (most recent call last): File "main_train_psnr.py", line 291, in <module> main() File "main_train_psnr.py", line 190, in main current_psnr = util.calculate_psnr(E_img, H_img, border=border) File "/home/ET/huiyuxiang/KAIR/utils/utils_image.py", line 632, in calculate_psnr raise ValueError('Input images must have the same dimensions.') ValueError: Input images must have the same dimensions.

So that I print the shape and find it is padding the LR image to be multiple of 8 without HR image.

About training

Thank you for your work. I tried to train SwinIR but in my process of training swinir, I found that although the small swinir training is smooth, the loss often suddenly doubles when dim change to 180. Because of the memory problem, my batch_zie=16 lr=1e-4, may I have any special skills to let Is the training stable?

Training time of SwinIR; Impact of learning rate (fix the lr to 1e-5 for x4 fine-tuning is slightly better)

In my opinion, the transformer costs much memory. And the paper pointed out that although swinir has fewer parameters, its speed is much slower than RCAN. So I'm curious about the cost of training, thank you.

runtimeerror when using other dataset?

Hi.
I want to train the model with my own dataset.
However, it keeps reporting
RuntimeError: stack expects each tensor to be equal size, but got [3, 256, 256] at entry 0 and [3, 256, 252] at entry 1
Do I have any wrong setting?
Thanks.

The part of json:
"datasets": {
"train": {
"name": "train_dataset" // just name
, "dataset_type": "sr" // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
, "dataroot_H": "HR" // path of H training dataset. DIV2K (800 training images)
, "dataroot_L": "LR" // path of L training dataset

  , "H_size": 256                   // 96/144|192/384 | 128/192/256/512. LR patch size is set to 48 or 64 when compared with RCAN or RRDB.

  , "dataloader_shuffle": true
  , "dataloader_num_workers": 16
  , "dataloader_batch_size": 8      // batch size 1 | 16 | 32 | 48 | 64 | 128. Total batch size =4x8=32 in SwinIR
}
, "test": {
  "name": "test_dataset"            // just name
  , "dataset_type": "sr"         // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
  , "dataroot_H": "testsets/Set5/HR"  // path of H testing dataset
  , "dataroot_L": "testsets/Set5/LR_bicubic/X4"              // path of L testing dataset

}

}

, "netG": {
"net_type": "swinir"
, "upscale": 4 // 2 | 3 | 4 | 8
, "in_chans": 3
, "img_size": 64 // For fair comparison, LR patch size is set to 48 or 64 when compared with RCAN or RRDB.
, "window_size": 8
, "img_range": 1.0
, "depths": [6, 6, 6, 6, 6, 6]
, "embed_dim": 180
, "num_heads": [6, 6, 6, 6, 6, 6]
, "mlp_ratio": 2
, "upsampler": "pixelshuffle" // "pixelshuffle" | "pixelshuffledirect" | "nearest+conv" | null
, "resi_connection": "1conv" // "1conv" | "3conv"

, "init_type": "default"

}

Comparison with IPT

Hi,

Thanks for sharing this interesting work. Table. 6, CBSD68, sigma=50 shows that IPT achieves 28.39 PSNR. However, the original paper of IPT shows that it can achieves 29.88 (in their Table. 2). Is there any difference with these two settings?

GPU numbers

dear author:
I want to know the numbers of gpu you used when train swinIR network.
thank you

It sounds unfair to use different Training set like EDSR or ther methods.

Need apple to apple training set to show the much better method or not better method .

Why SwinIR can be directly (not patch by patch) tested on images with arbitrary sizes?

In my knowledge, the input in transformer must be fixed resolution, in test time, often take patch overlap method to test image in transformer.in your code, I want to know how the method you take, and the idea. I saw that like any resolution can be feed in swinIR? how to do it?
Looking forward your reply, thanku.!

IndexError: index 2080 is out of bounds for dimension 2 with size 2080

Hey, thanks for this awsome code.
It always worked great for me, but now I'm getting this error regardless of which image I'm trying the colab on.
In 3. Interference:

/content/Real-ESRGAN/BSRGAN
LogHandlers setup!
21-10-08 17:53:58.872 : Model Name : BSRGAN
21-10-08 17:53:58.873 : GPU ID : 0
[3, 3, 64, 23, 32, 4]
21-10-08 17:54:01.995 : Input Path : testsets/RealSRSet
21-10-08 17:54:01.995 : Output Path : testsets/RealSRSet_results_x4
21-10-08 17:54:01.996 : ---1 --> BSRGAN --> x4--> adsads.png
/content/Real-ESRGAN
Testing 0 adsads
loading model from experiments/pretrained_models/003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth
Traceback (most recent call last):
File "SwinIR/main_test_swinir.py", line 287, in
main()
File "SwinIR/main_test_swinir.py", line 73, in main
output = test(img_lq, model, args, window_size)
File "SwinIR/main_test_swinir.py", line 259, in test
output = model(img_lq)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, **kwargs)
File "/content/Real-ESRGAN/SwinIR/models/network_swinir.py", line 839, in forward
return x[:, :, Hself.upscale, Wself.upscale]
IndexError: index 2080 is out of bounds for dimension 2 with size 2080
loading model from experiments/pretrained_models/003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth
Traceback (most recent call last):
File "SwinIR/main_test_swinir.py", line 287, in
main()
File "SwinIR/main_test_swinir.py", line 73, in main
output = test(img_lq, model, args, window_size)
File "SwinIR/main_test_swinir.py", line 259, in test
output = model(img_lq)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, **kwargs)
File "/content/Real-ESRGAN/SwinIR/models/network_swinir.py", line 839, in forward
return x[:, :, Hself.upscale, Wself.upscale]
IndexError: index 2080 is out of bounds for dimension 2 with size 2080

Training efficiency

Thanks for the great work again.

When training SWIR with the KAIR toolbox, I found that the CPU utilization was particularly high, while the GPU was always idle. And the training is particularly inefficient. I wonder if the author would be so kind as to tell me the GPU and CPU configurations used, and the training time?