Giter VIP home page Giter VIP logo

jingyunliang / swinir Goto Github PK

View Code? Open in Web Editor NEW
4.1K 56.0 505.0 29.83 MB

SwinIR: Image Restoration Using Swin Transformer (official repository)

Home Page: https://arxiv.org/abs/2108.10257

License: Apache License 2.0

Python 97.87% Shell 2.13%
image-super-resolution image-denoising compression-artifact-reduction image-deblocking transformer real-world-image-super-resolution lightweight-image-super-resolution image-restoration low-level-vision vision-transformer image-sr restoration super-resolution denoising deblocking decompression

swinir's Introduction

SwinIR: Image Restoration Using Swin Transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, Radu Timofte

Computer Vision Lab, ETH Zurich


arXiv GitHub Stars download visitors google colab logo PlayTorch Demo Gradio Web Demo

This repository is the official PyTorch implementation of SwinIR: Image Restoration Using Shifted Window Transformer (arxiv, supp, pretrained models, visual results). SwinIR achieves state-of-the-art performance in

  • bicubic/lighweight/real-world image SR
  • grayscale/color image denoising
  • grayscale/color JPEG compression artifact reduction

🚀 🚀 🚀 News:

  • Aug. 16, 2022: Add PlayTorch Demo on running the real-world image SR model on mobile devices PlayTorch Demo.
  • Aug. 01, 2022: Add pretrained models and results on JPEG compression artifact reduction for color images.
  • Jun. 10, 2022: See our work on video restoration 🔥🔥🔥 VRT: A Video Restoration Transformer GitHub Stars download and RVRT: Recurrent Video Restoration Transformer GitHub Stars download for video SR, video deblurring, video denoising, video frame interpolation and space-time video SR.
  • Sep. 07, 2021: We provide an interactive online Colab demo for real-world image SR google colab logo🔥 for comparison with the first practical degradation model BSRGAN (ICCV2021) GitHub Stars and a recent model RealESRGAN. Try to super-resolve your own images on Colab!
Real-World Image (x4) BSRGAN, ICCV2021 Real-ESRGAN SwinIR (ours) SwinIR-Large (ours)

Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14~0.45dB, while the total number of parameters can be reduced by up to 67%.

Contents

  1. Training
  2. Testing
  3. Results
  4. Citation
  5. License and Acknowledgement

Training

Used training and testing sets can be downloaded as follows:

Task Training Set Testing Set Visual Results
classical/lightweight image SR DIV2K (800 training images) or DIV2K +Flickr2K (2650 images) Set5 + Set14 + BSD100 + Urban100 + Manga109 download all here
real-world image SR SwinIR-M (middle size): DIV2K (800 training images) +Flickr2K (2650 images) + OST (alternative link, 10324 images for sky,water,grass,mountain,building,plant,animal)
SwinIR-L (large size): DIV2K + Flickr2K + OST + WED(4744 images) + FFHQ (first 2000 images, face) + Manga109 (manga) + SCUT-CTW1500 (first 100 training images, texts)

*We use the pionnerring practical degradation model from BSRGAN, ICCV2021 GitHub Stars
RealSRSet+5images here
color/grayscale image denoising DIV2K (800 training images) + Flickr2K (2650 images) + BSD500 (400 training&testing images) + WED(4744 images)

*BSD68/BSD100 images are not used in training.
grayscale: Set12 + BSD68 + Urban100
color: CBSD68 + Kodak24 + McMaster + Urban100 download all
here
grayscale/color JPEG compression artifact reduction DIV2K (800 training images) + Flickr2K (2650 images) + BSD500 (400 training&testing images) + WED(4744 images) grayscale: Classic5 +LIVE1 download all here

The training code is at KAIR.

Testing (without preparing datasets)

For your convience, we provide some example datasets (~20Mb) in /testsets. If you just want codes, downloading models/network_swinir.py, utils/util_calculate_psnr_ssim.py and main_test_swinir.py is enough. Following commands will download pretrained models automatically and put them in model_zoo/swinir. All visual results of SwinIR can be downloaded here.

We also provide an online Colab demo for real-world image SR google colab logo for comparison with the first practical degradation model BSRGAN (ICCV2021) GitHub Stars and a recent model RealESRGAN. Try to test your own images on Colab!

We provide a PlayTorch demo PlayTorch Demo for real-world image SR to showcase how to run the SwinIR model in mobile application built with React Native.

# 001 Classical Image Super-Resolution (middle size)
# Note that --training_patch_size is just used to differentiate two different settings in Table 2 of the paper. Images are NOT tested patch by patch.
# (setting1: when model is trained on DIV2K and with training_patch_size=48)
python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x2.pth --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 3 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x3.pth --folder_lq testsets/Set5/LR_bicubic/X3 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 8 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x8.pth --folder_lq testsets/Set5/LR_bicubic/X8 --folder_gt testsets/Set5/HR

# (setting2: when model is trained on DIV2K+Flickr2K and with training_patch_size=64)
python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x2.pth --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 3 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x3.pth --folder_lq testsets/Set5/LR_bicubic/X3 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 8 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x8.pth --folder_lq testsets/Set5/LR_bicubic/X8 --folder_gt testsets/Set5/HR


# 002 Lightweight Image Super-Resolution (small size)
python main_test_swinir.py --task lightweight_sr --scale 2 --model_path model_zoo/swinir/002_lightweightSR_DIV2K_s64w8_SwinIR-S_x2.pth --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task lightweight_sr --scale 3 --model_path model_zoo/swinir/002_lightweightSR_DIV2K_s64w8_SwinIR-S_x3.pth --folder_lq testsets/Set5/LR_bicubic/X3 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task lightweight_sr --scale 4 --model_path model_zoo/swinir/002_lightweightSR_DIV2K_s64w8_SwinIR-S_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR


# 003 Real-World Image Super-Resolution (use --tile 400 if you run out-of-memory)
# (middle size)
python main_test_swinir.py --task real_sr --scale 4 --model_path model_zoo/swinir/003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth --folder_lq testsets/RealSRSet+5images --tile

# (larger size + trained on more datasets)
python main_test_swinir.py --task real_sr --scale 4 --large_model --model_path model_zoo/swinir/003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth --folder_lq testsets/RealSRSet+5images


# 004 Grayscale Image Deoising (middle size)
python main_test_swinir.py --task gray_dn --noise 15 --model_path model_zoo/swinir/004_grayDN_DFWB_s128w8_SwinIR-M_noise15.pth --folder_gt testsets/Set12
python main_test_swinir.py --task gray_dn --noise 25 --model_path model_zoo/swinir/004_grayDN_DFWB_s128w8_SwinIR-M_noise25.pth --folder_gt testsets/Set12
python main_test_swinir.py --task gray_dn --noise 50 --model_path model_zoo/swinir/004_grayDN_DFWB_s128w8_SwinIR-M_noise50.pth --folder_gt testsets/Set12


# 005 Color Image Deoising (middle size)
python main_test_swinir.py --task color_dn --noise 15 --model_path model_zoo/swinir/005_colorDN_DFWB_s128w8_SwinIR-M_noise15.pth --folder_gt testsets/McMaster
python main_test_swinir.py --task color_dn --noise 25 --model_path model_zoo/swinir/005_colorDN_DFWB_s128w8_SwinIR-M_noise25.pth --folder_gt testsets/McMaster
python main_test_swinir.py --task color_dn --noise 50 --model_path model_zoo/swinir/005_colorDN_DFWB_s128w8_SwinIR-M_noise50.pth --folder_gt testsets/McMaster


# 006 JPEG Compression Artifact Reduction (middle size, using window_size=7 because JPEG encoding uses 8x8 blocks)
# grayscale
python main_test_swinir.py --task jpeg_car --jpeg 10 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg10.pth --folder_gt testsets/classic5
python main_test_swinir.py --task jpeg_car --jpeg 20 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg20.pth --folder_gt testsets/classic5
python main_test_swinir.py --task jpeg_car --jpeg 30 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg30.pth --folder_gt testsets/classic5
python main_test_swinir.py --task jpeg_car --jpeg 40 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg40.pth --folder_gt testsets/classic5

# color
python main_test_swinir.py --task color_jpeg_car --jpeg 10 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg10.pth --folder_gt testsets/LIVE1
python main_test_swinir.py --task color_jpeg_car --jpeg 20 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg20.pth --folder_gt testsets/LIVE1
python main_test_swinir.py --task color_jpeg_car --jpeg 30 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg30.pth --folder_gt testsets/LIVE1
python main_test_swinir.py --task color_jpeg_car --jpeg 40 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg40.pth --folder_gt testsets/LIVE1

Results

We achieved state-of-the-art performance on classical/lightweight/real-world image SR, grayscale/color image denoising and JPEG compression artifact reduction. Detailed results can be found in the paper. All visual results of SwinIR can be downloaded here.

Classical Image Super-Resolution (click me)

  • More detailed comparison between SwinIR and a representative CNN-based model RCAN (classical image SR, X4)
Method Training Set Training time
(8GeForceRTX2080Ti
batch=32, iter=500k)
Y-PSNR/Y-SSIM
on Manga109
Run time
(1GeForceRTX2080Ti,
on 256x256 LR image)*
#Params #FLOPs Testing memory
RCAN DIV2K 1.6 days 31.22/0.9173 0.180s 15.6M 850.6G 593.1M
SwinIR DIV2K 1.8 days 31.67/0.9226 0.539s 11.9M 788.6G 986.8M

* We re-test the runtime when the GPU is idle. We refer to the evluation code here.

  • Results on DIV2K-validation (100 images)
Training Set scale factor PSNR (RGB) PSNR (Y) SSIM (RGB) SSIM (Y)
DIV2K (800 images) 2 35.25 36.77 0.9423 0.9500
DIV2K+Flickr2K (2650 images) 2 35.34 36.86 0.9430 0.9507
DIV2K (800 images) 3 31.50 32.97 0.8832 0.8965
DIV2K+Flickr2K (2650 images) 3 31.63 33.10 0.8854 0.8985
DIV2K (800 images) 4 29.48 30.94 0.8311 0.8492
DIV2K+Flickr2K (2650 images) 4 29.63 31.08 0.8347 0.8523
Lightweight Image Super-Resolution

Real-World Image Super-Resolution

Grayscale Image Deoising

Color Image Deoising

JPEG Compression Artifact Reduction

on grayscale images

on color images

Training Set quality factor PSNR (RGB) PSNR-B (RGB) SSIM (RGB)
LIVE1 10 28.06 27.76 0.8089
LIVE1 20 30.45 29.97 0.8741
LIVE1 30 31.82 31.24 0.9018
LIVE1 40 32.75 32.12 0.9174

Citation

@article{liang2021swinir,
  title={SwinIR: Image Restoration Using Swin Transformer},
  author={Liang, Jingyun and Cao, Jiezhang and Sun, Guolei and Zhang, Kai and Van Gool, Luc and Timofte, Radu},
  journal={arXiv preprint arXiv:2108.10257},
  year={2021}
}

License and Acknowledgement

This project is released under the Apache 2.0 license. The codes are based on Swin Transformer and KAIR. Please also follow their licenses. Thanks for their awesome works.

swinir's People

Contributors

ak391 avatar chenxwh avatar jingyunliang avatar liuyinglao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swinir's Issues

one question

hi,this is great work. I want to use this network for single image deraining, and what parts of this code can I modify? Or do you have any good suggestions? thanks!

Can gan be finetuned on own dataset?

When I try to set pretrained models (003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth) paths in KAIR's train file;

, "path": {
"root": "superresolution" // "denoising" | "superresolution" | "dejpeg"
, "pretrained_netG": null // path of pretrained model
, "pretrained_netD": null // path of pretrained model
, "pretrained_netE": null // path of pretrained model
}

it starts to train from scratch anyway. And when I copy from model_zoo right into /superresolution/swinir_sr_realworld_x4_gan/models/ thats not working either.

Residual connection resulting in bad result

Thanks for sharing your work. I tried to add the residual connection in RSTB block and STL layer,but get a bad result. The residual connection was added as figure (1) and (2).
5E9FE9741C9CC72A1564391DDD6161E1
My question is:

  • I add the residual connection between the RSTB and STL (Only added in RSTB or only in STL also tried,but the result was bad either.) The figure showed the result of have adaptive parameters residual connection only in RSTB( The red line,and the blue one is your paper original SwinIR network). Your paper have only one global residual connection in RSTB(just from input adding to RSTB output)but get awesome result. So I want to know have you ever tried like above method by adding more residual connection and also get the bad result.
    image

  • If you have tried but get the Impressive results, could you tell me the way what have you done?
    Thanks ~
    :-D

A question about the framework

Hi, @JingyunLiang

I appreciate your fabulous work but I have a question about the framework. Did you ever try the Unet-like framework or encoder-decoder one for the Deep Feature Extraction Block (the whole transformer block)? As your framework is all of the same RSTB blocks, I am wondering if the encoder-decoder idea is helpful for the performance gain?

Thank you very much.

Problem when saving the model

Hi thanks for the training code.
I have a problem when iteration meet 5000 to save the model.
File "/KAIR/models/network_swinir.py", line 254, in forward
x_windows = window_partition(shifted_x, self.window_size) # nW*B, window_size, window_size, C
File "/KAIR/models/network_swinir.py", line 42, in window_partition
x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
RuntimeError: shape '[1, 111, 8, 143, 8, 180]' is invalid for input of size 184459500
May I know how to fix it?
Thanks.

Training question

Thanks for your amazing work! I have a question regarding your training:
How many gpus did you use to train parallel? How many hours do it need to early stop?

testing dataset downsampled image

Thankx for releasing the wonderful code and data-sets.
I am encountering one problem while testing: While Set5 and Set14 have x2, x3 and x4 down-sampled images, other data-sets viz. Urban100, magna109 and BSDS100 do not. Will it be possible for you to share the down-sampled images for these datasets. I can down-sample them but probably the result may differ on them, than what is mentioned in the paper.

About drop path rate

I notice that you use the same parameters as swin transformer and set droppath 0.1, does the super resolution real nead the drop path?

High CPU Usage

Thanks for sharing your work. I meet a problem that the CPU usage is too high. When i set the H_size > 64 (eg.96 or 128) , the CPU usage is about 500%. I want to know why and what your type of GPU is used in the experiment in your paper. And I wonder if this problem is caused by the weak computing power of GPU (My GPU is NVIDIA RTX 2080 Ti).
Thanks~

swin layer

In swin transformer, self attention module conclude two subnet, one is simple windows self attention, another is shifted windows.in that code, in normal first windows self attention, there is no attn_mask, second shifted windows have mask. but in your code, seemly every self-attention layers have the attn_mask. that means every swin layer dont have windows self-attn, instead by all shifted windows in every layer? thank you

关于PSNR和SSIM没有收敛到原论文中的性能

作者你好,

谢谢你所做的非常不错的工作,我阅读了SwinIR论文,并且star了此仓库。在我使用你们提供的预训练模型在Set5数据集上测试 和在DIV2K和Flickr2K数据集训练Class Imager(x2)时,发现PSNR值没有达到原论文中的值,在这里请教下是否因为我的超参数设置的问题还是有些训练的trick。

使用官方的预训练模型(001_classicalSR_DF2K_s64w8_SwinIR-M_x2.pth),在Set5上测试:
image

Average PSNR: 36.21 dB;

使用https://github.com/cszn/KAIR 中提供的训练代码 在DIV2K和Flickr2K数据集训练Class Imager(x2) 时:
image

Average PSNR 收敛到36.15dB,没达到论文中的性能。

谢谢!

Problem about testing my trained model

Thanks for the training code.
I train a classic model.
I get 500000_optimizerG.pth 500000_G.pth 500000_E.pth.
May I know which pth I should run during testing?
When I run
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 64 --model_path superresolution/swinir_sr_classical_patch64_x4_l1/models/500000_G.pth --folder_lq testsets/real3wx4/test_LR_crop
It seems cannot load the model
model.load_state_dict(torch.load(args.model_path)['params'], strict=True)
KeyError: 'params'
May I know how to solve this?
Thanks.

Cuda out of memory for photos larger than 640x480px on RTX 3060 12GB

Thanks for great code released for free to public. Tried real world 4x large model. Results are great. Even better than known commercial products.
Real esrgan for example has --tile option for cuda out of memory errors. With your code I cant upscale larger than vga resolution photos with RTX 3060 12GB. Please tell me is there way for tiling or do I need to change img_size, for what values?
Thank you very much in advance!

train code

when do you release the training code? i want it soon. thank you

About ape

Thanks for your code first!
After reading your code, I want to kown why don't use ape(absolute position embedding) in this code, because i saw the option is False by default.
I also want to confirm that if I use the 128-size pic to train the model with ape, could I can change the image size when I evaluation the model. I thought the length of position embedding is related to the num of patches, and the num of patches is related to the image size.
Hope you can solve my problem!

About use_checkpoint

The code for use checkpoint misses one parameter:

The original code in network_swinir.py line 399:

x = checkpoint.checkpoint(blk, x)

Should be:

x = checkpoint.checkpoint(blk, x, x_size)

About #Parameters in the model

Thanks for providing the code for SwinIR!

I calculated the #Params and #FLOPS for the lightweight SwinIR model using KAIR. However, I'm not able to replicate the numbers mentioned in table 3 of the paper.
For example, I get the #Params as 910.2K instead of 878K in the table. The same happens with #FLOPS too. Could you please guide me on how to reproduce the results? Thanks!

FLOPs?

Hi,

Thanks for this great work! Could you provide the FLOPs/MACs of your SwinIR model?

About training code

Hi, there. Thanks for your amazing work, but I have some questions about the training code.

  1. Do we need to modify main_train_psnr.py (KAIR) to set training iterations to 500K? It's 1M epochs in the original file.

  2. I ran training python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 main_train_psnr.py --opt options/swinir/train_swinir_sr_classical.json --dist True on 8 RTX 3090 GPUs and the dataset is DIV2K train split (default X2). The estimated training time for 500K iters is ~3.5days (1min/100 iters), much longer than your 1.8 days on 8 2080 Ti GPUs. Do you have any idea about that?

Issues about the patch embedding.

Hi, thanks very much for sharing this wonderful work. According to the definition of PatchEmbed(nn.Module). It seems that the parameters such as patch_size and img_size are not used. It seems that performance improvements of SwinIRs are provided by these MAS and MLP layers. Of course, multiple skip connections in the RSTB and STL are also helpful. I am curious about why the SwinIRs do not form patches with multiple pixels. For example, the PatchEmbed method used in
《Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions》.

patch_size

I found that the patch_size of the network setting uses the initial 1, then the pixel will become a token. What is the reason for not using the image block (e.g, 4*4) as the token?

Illustration in the README

For info, you have linked to the wrong image for your result with SwinIR-Large.

Bug

Pic

|Real-World Image (x4)|[BSRGAN, ICCV2021](https://github.com/cszn/BSRGAN)|[Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN)|SwinIR (ours)|SwinIR-Large (ours)|

|       :---       |     :---:        |        :-----:         |        :-----:         |        :-----:         | 

|<img width="200" src="figs/ETH_LR.png">|<img width="200" src="figs/ETH_BSRGAN.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR-L.png">|

|<img width="200" src="figs/OST_009_crop_LR.png">|<img width="200" src="figs/OST_009_crop_BSRGAN.png">|<img width="200" src="figs/OST_009_crop_realESRGAN.png">|<img width="200" src="figs/OST_009_crop_SwinIR.png">|<img width="200" src="figs/OST_009_crop_SwinIR-L.png">|

That is because you have an extra column on the row of the building. Check the end of the row:

|<img width="200" src="figs/ETH_SwinIR.png">|<img width="200" src="figs/ETH_realESRGAN.jpg">|<img width="200" src="figs/ETH_SwinIR-L.png">|

Training dataset - patch creation

It would be really helpful if you could point out how to create the patches when the image size is less than 128 x 128 (the patch size mentioned in the training settings). Would we consider such images by zero padding or exclude those images as the ones present in BSD500 dataset of size 120 x 80?

JSONDecodeError when training swinir

Hi @JingyunLiang I use the training code main_train_psnr.py in KAIR and I only change the dataroot and other necessary stuff. The training command is python main_train_psnr.py --opt options/swinir/train_swinir_sr_classical.json. And my environment is CUDA10.1+Pytorch1.7.1+Python3.7. When training the swinir model, I got this error:
image

So I search it and change the json_path='options/train_msrresnet_psnr.json' to json_path="options/train_msrresnet_psnr.json" in main_train_psnr.py. As you can see, the line 34 is the json_path.
image

But I still got the error over and over again. Could you please provide some suggestions? Thanks a lot.

[SUGGESTION] Optimized version for videos

Hi there, SwinIR is really cool !

Sice it has been "ported" to VapourSynth (thanks to @HolyWu) some interesting discussions - with tests too - about its effectiveness on videos started:

Seems that the main issue is the processing speed, btw someone argue that the algorithm is not (yet ?) optimized for videos...

About speed: @xinntao may help to implement an NCNN-Vulkan (as already done for Real-ESGRAN)...

About video optimizations: a collaboration with @ding3820 of MIMO-VRN project may help...

Hope that inspires.

Training settings for SwinIR light

Thanks for sharing your work.

I notice that the training config file for lightweight in KAIR may be not consistent with the statement in the paper. Could you double check that?
In particular, both the batch size and patch size are set to 64, and embed_dim is 180. Is this the correct setting?

network interpolation

When i'm trying denoising images, I need more noise level such as 20 and 35, i think a network interpolation function may produce approximate model

About Resi-connection

Hi there, thanks for the amazing work!
In the section 'Impact of residual connection and convolution layer in RSTB' of the paper said that would add a 1x1 conv or 3x3 conv at the residual connection.
the result shows that 3x3 is better than 1x1, also the inverted-bottleneck 3x3.

back to the code itself.
when I first read the 'resi_connection' argument in SwinIR class, I thought that '1conv' means 1x1 conv and '3conv' means 3x3 conv
after a while reading more code, I realized that '1conv' actually means 'one 3x3 conv' and '3conv' means 'three 3x3 conv'.

# build the last conv layer in deep feature extraction
if resi_connection == '1conv':
    self.conv_after_body = nn.Conv2d(embed_dim, embed_dim, 3, 1, 1)
elif resi_connection == '3conv':
    # to save parameters and memory
    self.conv_after_body = nn.Sequential(nn.Conv2d(embed_dim, embed_dim // 4, 3, 1, 1),
                                         nn.LeakyReLU(negative_slope=0.2, inplace=True),
                                         nn.Conv2d(embed_dim // 4, embed_dim // 4, 1, 1, 0),
                                         nn.LeakyReLU(negative_slope=0.2, inplace=True),
                                         nn.Conv2d(embed_dim // 4, embed_dim, 3, 1, 1))

I think the way of naming could be a little bit confusing.
It could be better if it calls '1conv3' '3conv3' or something else.

just want to tell you the confusing part.
thanks.

denosing training code

I haven’t found the training code for the denoising task in KAIR, hasn’t it been released yet?

about test

在swinIR模型中,有img_size这个参数,例如为128, 在SwinLayer时,是input_resolution=(128, 128), 比如我在测试的时候,我的输入图像不是(128, 128) 那么计算attention的时候 有一个判断, if self.input_resolution == x_size, else attn_windows = self.attn(x_windows, mask=self.calculate_mask(x_size).to(x.device))。我想请问一下如果图片大小不等于self.input_resolution=(128,128)时, 加入的参数 mask 这个是什么mask

about charbonnierloss

charbonnierloos have a extra parameter eps, in paper, eps is 1e-3, its true use is (1e-3) ^ 2, but in your code May be you dont take ^2 operations. I Dont know its inference is important?

Supplements

hello, In this great paper, some details write in the supplement, I want to know where I can find the supp? thankyou

Trained model from KAIR (40000_optimizerG.pth) gives error on testing

Instead of using pre-trained models, I trained the KAIR code and used the generated model for testing.

The KAIR training code produced 3 models:
40000_optimizerG.pth
40000_G.pth
40000_E.pth

Using these models, for testing the code in this repository, I am getting error:

(pytorch-gpu) C:\Users\Downloads\SwinIR-main>python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 48 --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
loading model from model_zoo/swinir/40000_optimizerG.pth
Traceback (most recent call last):
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 253, in <module>
    main()
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 42, in main
    model = define_model(args)
  File "C:\Users\Downloads\SwinIR-main\main_test_swinir.py", line 174, in define_model
    model.load_state_dict(pretrained_model[param_key_g] if param_key_g in pretrained_model.keys() else pretrained_model, strict=True)
  File "C:\Users\anaconda3\envs\pytorch-gpu\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SwinIR:
        Missing key(s) in state_dict: "conv_first.weight", "conv_first.bias", "patch_embed.norm.weight", "patch_embed.norm.bias", "layers.0.residual_group.blocks.0.norm1.weight", "layers.0.residual_group.blocks.0.norm1.bias", "layers.0.residual_group.blocks.0.attn.relative_position_bias_table",

40000_G.pth
40000_E.pth
are testing fine

The input size during test

Hi, Jingyun, nice work! I just wonder why SwinIR function needs to set the 'img_size'. It is somehow kind of inconvenient, especially for test, since we usually want to test on different sizes of images, right? Is there any particular reason for this? Since Swin Transformer does not need this because they use padding operations. Besides, are there any requirements of the input size, i.e., must be the multiple of a number, or something else? Thanks.

About the FLOPs of SwinIR

Hi,

Thanks for sharing the code of this interesting work. Would you mind helping provide the FLOPs cost of SwinIR? E.g., FLOPs under 256x256x3 images. Thanks!

About test part in training

Thanks for your code first!
I run the super-resolution lightweight part of the code, and there is an error in the testing part of training:

Traceback (most recent call last): File "main_train_psnr.py", line 291, in <module> main() File "main_train_psnr.py", line 190, in main current_psnr = util.calculate_psnr(E_img, H_img, border=border) File "/home/ET/huiyuxiang/KAIR/utils/utils_image.py", line 632, in calculate_psnr raise ValueError('Input images must have the same dimensions.') ValueError: Input images must have the same dimensions.

So that I print the shape and find it is padding the LR image to be multiple of 8 without HR image.

About training

Thank you for your work. I tried to train SwinIR but in my process of training swinir, I found that although the small swinir training is smooth, the loss often suddenly doubles when dim change to 180. Because of the memory problem, my batch_zie=16 lr=1e-4, may I have any special skills to let Is the training stable?

runtimeerror when using other dataset?

Hi.
I want to train the model with my own dataset.
However, it keeps reporting
RuntimeError: stack expects each tensor to be equal size, but got [3, 256, 256] at entry 0 and [3, 256, 252] at entry 1
Do I have any wrong setting?
Thanks.

The part of json:
"datasets": {
"train": {
"name": "train_dataset" // just name
, "dataset_type": "sr" // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
, "dataroot_H": "HR" // path of H training dataset. DIV2K (800 training images)
, "dataroot_L": "LR" // path of L training dataset

  , "H_size": 256                   // 96/144|192/384 | 128/192/256/512. LR patch size is set to 48 or 64 when compared with RCAN or RRDB.

  , "dataloader_shuffle": true
  , "dataloader_num_workers": 16
  , "dataloader_batch_size": 8      // batch size 1 | 16 | 32 | 48 | 64 | 128. Total batch size =4x8=32 in SwinIR
}
, "test": {
  "name": "test_dataset"            // just name
  , "dataset_type": "sr"         // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
  , "dataroot_H": "testsets/Set5/HR"  // path of H testing dataset
  , "dataroot_L": "testsets/Set5/LR_bicubic/X4"              // path of L testing dataset

}

}

, "netG": {
"net_type": "swinir"
, "upscale": 4 // 2 | 3 | 4 | 8
, "in_chans": 3
, "img_size": 64 // For fair comparison, LR patch size is set to 48 or 64 when compared with RCAN or RRDB.
, "window_size": 8
, "img_range": 1.0
, "depths": [6, 6, 6, 6, 6, 6]
, "embed_dim": 180
, "num_heads": [6, 6, 6, 6, 6, 6]
, "mlp_ratio": 2
, "upsampler": "pixelshuffle" // "pixelshuffle" | "pixelshuffledirect" | "nearest+conv" | null
, "resi_connection": "1conv" // "1conv" | "3conv"

, "init_type": "default"

}

Comparison with IPT

Hi,

Thanks for sharing this interesting work. Table. 6, CBSD68, sigma=50 shows that IPT achieves 28.39 PSNR. However, the original paper of IPT shows that it can achieves 29.88 (in their Table. 2). Is there any difference with these two settings?

GPU numbers

dear author:
I want to know the numbers of gpu you used when train swinIR network.
thank you

IndexError: index 2080 is out of bounds for dimension 2 with size 2080

Hey, thanks for this awsome code.
It always worked great for me, but now I'm getting this error regardless of which image I'm trying the colab on.
In 3. Interference:

/content/Real-ESRGAN/BSRGAN
LogHandlers setup!
21-10-08 17:53:58.872 : Model Name : BSRGAN
21-10-08 17:53:58.873 : GPU ID : 0
[3, 3, 64, 23, 32, 4]
21-10-08 17:54:01.995 : Input Path : testsets/RealSRSet
21-10-08 17:54:01.995 : Output Path : testsets/RealSRSet_results_x4
21-10-08 17:54:01.996 : ---1 --> BSRGAN --> x4--> adsads.png
/content/Real-ESRGAN
Testing 0 adsads
loading model from experiments/pretrained_models/003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth
Traceback (most recent call last):
File "SwinIR/main_test_swinir.py", line 287, in
main()
File "SwinIR/main_test_swinir.py", line 73, in main
output = test(img_lq, model, args, window_size)
File "SwinIR/main_test_swinir.py", line 259, in test
output = model(img_lq)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, **kwargs)
File "/content/Real-ESRGAN/SwinIR/models/network_swinir.py", line 839, in forward
return x[:, :, H
self.upscale, Wself.upscale]
IndexError: index 2080 is out of bounds for dimension 2 with size 2080
loading model from experiments/pretrained_models/003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth
Traceback (most recent call last):
File "SwinIR/main_test_swinir.py", line 287, in
main()
File "SwinIR/main_test_swinir.py", line 73, in main
output = test(img_lq, model, args, window_size)
File "SwinIR/main_test_swinir.py", line 259, in test
output = model(img_lq)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, **kwargs)
File "/content/Real-ESRGAN/SwinIR/models/network_swinir.py", line 839, in forward
return x[:, :, H
self.upscale, W
self.upscale]
IndexError: index 2080 is out of bounds for dimension 2 with size 2080

Training efficiency

Hi

Thanks for the great work again.

When training SWIR with the KAIR toolbox, I found that the CPU utilization was particularly high, while the GPU was always idle. And the training is particularly inefficient. I wonder if the author would be so kind as to tell me the GPU and CPU configurations used, and the training time?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.