Giter VIP home page Giter VIP logo

swz30 / restormer Goto Github PK

View Code? Open in Web Editor NEW
1.6K 18.0 208.0 1.56 MB

[CVPR 2022--Oral] Restormer: Efficient Transformer for High-Resolution Image Restoration. SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

License: Other

Python 97.18% MATLAB 2.78% Shell 0.04%
image-restoration image-deraining image-deblurring defocus-deblurring motion-deblurring transformer pytorch low-level-vision cvpr2022 high-resolution

restormer's Introduction

Restormer: Efficient Transformer for High-Resolution Image Restoration (CVPR 2022 -- Oral)

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang

paper supplement video slides Summary

News

  • April 4, 2022: Integrated into Huggingface Spaces 🤗 using Gradio. Try out the web demo: Hugging Face Spaces
  • March 30, 2022: Added Colab Demo. Open In Colab
  • March 29, 2022: Restormer is selected for an ORAL presentation at CVPR 2022 💫
  • March 10, 2022: Training codes are released 🔥
  • March 3, 2022: Paper accepted at CVPR 2022 🎉
  • Nov 21, 2021: Testing codes and pre-trained models are released!

Abstract: Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and inadaptability to input content), its computational complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to most image restoration tasks involving high-resolution images. In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks, including image deraining, single-image motion deblurring, defocus deblurring (single-image and dual-pixel data), and image denoising (Gaussian grayscale/color denoising, and real image denoising).


Network Architecture

Installation

See INSTALL.md for the installation of dependencies required to run Restormer.

Demo

To test the pre-trained Restormer models of Deraining, Motion Deblurring, Defocus Deblurring, and Denoising on your own images, you can either use Google Colab Open In Colab, or command line as following

python demo.py --task Task_Name --input_dir path_to_images --result_dir save_images_here

Example usage to perform Defocus Deblurring on a directory of images:

python demo.py --task Single_Image_Defocus_Deblurring --input_dir './demo/degraded/' --result_dir './demo/restored/'

Example usage to perform Defocus Deblurring on an image directly:

python demo.py --task Single_Image_Defocus_Deblurring --input_dir './demo/degraded/portrait.jpg' --result_dir './demo/restored/'

Training and Evaluation

Training and Testing instructions for Deraining, Motion Deblurring, Defocus Deblurring, and Denoising are provided in their respective directories. Here is a summary table containing hyperlinks for easy navigation:

Task Training Instructions Testing Instructions Restormer's Visual Results
Deraining Link Link Download
Motion Deblurring Link Link Download
Defocus Deblurring Link Link Download
Gaussian Denoising Link Link Download
Real Denoising Link Link Download

Results

Experiments are performed for different image processing tasks including, image deraining, single-image motion deblurring, defocus deblurring (both on single image and dual pixel data), and image denoising (both on Gaussian and real data).

Image Deraining (click to expand)
Single-Image Motion Deblurring (click to expand)

Defocus Deblurring (click to expand)

S: single-image defocus deblurring. D: dual-pixel defocus deblurring.

Gaussian Image Denoising (click to expand)

Top super-row: learning a single model to handle various noise levels. Bottom super-row: training a separate model for each noise level.

Grayscale

Color

Real Image Denoising (click to expand)

Citation

If you use Restormer, please consider citing:

@inproceedings{Zamir2021Restormer,
    title={Restormer: Efficient Transformer for High-Resolution Image Restoration}, 
    author={Syed Waqas Zamir and Aditya Arora and Salman Khan and Munawar Hayat 
            and Fahad Shahbaz Khan and Ming-Hsuan Yang},
    booktitle={CVPR},
    year={2022}
}

Contact

Should you have any question, please contact [email protected]

Acknowledgment: This code is based on the BasicSR toolbox and HINet.

Our Related Works

  • Learning Enriched Features for Fast Image Restoration and Enhancement, TPAMI 2022. Paper | Code
  • Multi-Stage Progressive Image Restoration, CVPR 2021. Paper | Code
  • Learning Enriched Features for Real Image Restoration and Enhancement, ECCV 2020. Paper | Code
  • CycleISP: Real Image Restoration via Improved Data Synthesis, CVPR 2020. Paper | Code

restormer's People

Contributors

adityac8 avatar swz30 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

restormer's Issues

Does not correspond to paper MPRNet

0V%7W6C)GL~C%I5K10NZ%AM

Figure 3 in this paper and Figure 5 in the author's previous paper MPRNet use the same test picture, the SAME PSNR, but the visualization results are very different. Whether I can doubt the authenticity and rigor of the results of the paper

The number of channels do not match before/after pixel unshuffle/shuffle?

I am wondering that the number of channels should become 4 times (1/4 times) after applying pixel unshuffle (pixel shuffle) with downscale (upscale x2). But in Fig.1, the number of channels is double (half) before applying pixel unshuffle (pixel shuffle).

[My question]
Is there another layer (like conv_1x1) next to the upsampling/downsample operator to adjust the number of channels (but is not mentioned in Fig.1)? Am I right? Thank you for your time!

Here, I paste my understanding of pixel shuffle. The number of channels changes in r*r scale.
image

Train from pre-trained model (like ImageNet1K) or random initialization model?

Hello,

When training the Resformer, do you train it from a random initialization model or pre-trained it on ImageNet1K/21K as ResNeXt series do? According to my understanding, it is trained without additional data. For example, in derain task, only use the Rain13K to train the random initialization model for 300K iterations. Am I right? Thanks!

DeamNet and AINDNet results in Real image denoising

Hello, I doubt the results showed in real image denoising are wrong.
The DeamNet and AINDNet use SIDD benchmark to test. However, in your paper, you utilize the results of the SIDD benchmark to compare the results tested in SIDD validation datasets. MIRNet and MPRNet also can find the same problem.
so why not directly compare the results of the Benchmark dataset? Like the DND dataset, the results can be obtained on websites.

Question about the ablation study

Thanks for your outstanding work. I have some confusion about the implementation of group(b) and group(e) in the ablation study in Sec. 4.5. For group(b), with MTA+standard FN, should I break the feature map into patches before passing it to the fully-connected layer and reshape it to its original size after FC. If so, what's the patch size used, and is dropout applied here? And for group(e), what is the structure of DFN since the gating mechanism is removed? Is it just a sequential LayerNorm and 1*1 convolution?

add model to Huggingface

Hi, would you be interested in adding Restormer to Hugging Face Hub? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. We can setup an organization or a user account under which restormer can be added similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/akhaliq/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

About the training

How to solve the error of create_dataloader, create_dataset in init.py in the train.py file?
Also what is the difference between training on basicsr documents and training on specific tasks (e.g. Deraining)?

train code

hello,deal authors,I want to train a restormer model on myself,but I can't find the training code of restormer ,can you please write a readme document for the traing of restormer model.

GPU

In Gopro deblurring, I want to know How many gpus were used and how long were they trained in
[patchsize 128, batchsize 64].

About inference on Huggingface

I'm really impressed in inferencing on HuggingFace. I had run Motion Deblur task on my own CPU(i7) and got ~50s inference runtime. May I convert from pth model to onnx and get better runtime.

Pretrained model for Defocus Deblurring links to wrong file

Restormer / Defocus_Deblurring / Pretrained_Models / README.MD
contains a link to download the pretrained model for defocus deblurring.

However, this link appears to lead to the pre-trained model for Deraining.
It is the same URL as is found in Restormer / Deraining / Pretrained_Models / README.MD

I assume this is an error, rather then one being intended to use the same model for both of them.
If so, there is currently no pre-trained model available for Defocus Deblurring.

An updated link would be appreciated.

GPU

Hello, because my gpu is relatively small, how can I specify the gpu for training?

Question about the baseline model

Dear Authors,
Congratulations to you for the paper has been acepted as Oral in CVPR 2022.
Here, I have a question about the code.
In Section 4.5 'Ablation Studies', Table.7 compares the performance of baseline, multi-head attention, FFN and overall model. The baseline's component is Unet with Resblocks (EDSR model). Is the number of blocks [4, 6, 6, 8] in overall network same as the number of blocks in baseline model? And where can I find the baseline model code? Is that in 'https://github.com/xinntao/BasicSR/blob/master/basicsr/archs/edsr_arch.py'?

Problems about training Deraining

Hi,Congratulations to you have a good job!
Although I haved changed the number of GPUs in train.sh and Deraining_Restormer.yml to 4 since I only have 4 GPUs,I can't train the code of Deraining due to my GPU memory limitations. I found the program can run if I change the batch_size_per_gpu smaller. But the batch size can't meet the experimental settings.
So what can I do if I want to achieve the settings in your experiment ( i.e. For progressive learning, we start training with patch size 128×128 and batch size 64. The patch size and batch size pairs are updated to [(160^2,40), (192^2,32), (256^2,16), (320^2,8),(384^2,8)] at iterations [92K, 156K, 204K, 240K, 276K].) ?

denoising training dataset

well done ! But can you tell me about your denoising-working ,what dataset your used? real training dataset and Gaussian Denoising dataset. Thank you very much!

Cuda out Of memory error

For real denoising task, on photos bigger than 4Mpx code returns this error.
RuntimeError: CUDA out of memory. Tried to allocate 7.62 GiB (GPU 0; 12.00 GiB total capacity; 5.33 GiB already allocated; 4.64 GiB free; 5.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is there any trick to get it working properly with RTX 3060?

Thanks

some problems

Since no training code is given, I write my own training program to train Restormer. However, at the beginning of the training, I could only set batchsize to 48 due to the limitation of GPUs memory. However, I found that the loss would hardly decrease when the first 10,000 to 20,000 iteration was carried out, which verified that the PSNR remained unchanged at about 26.2. Is the training relatively slow, or what is the problem? And if prob, I would like to know the upward trend of Val PSNR and the downward trend of loss during your training

image resolution

i found that the performance declined largely when the pretrained model was used to a custom testset with larger resolution (4k*3k). Is there any suggestion to solve the gap of image resolution between trainset and testset?

colab?

I am pleased with your work; the level of completeness is really professional!
Do you guys have any plan to release the code for Google Colab?
Unfortunately, I can't run the code on my local machine due to some poor factors.

Question about result.

Thanks for your outstanding work. I have a question about your deraining result. I tested deraining.pth weights you provided, and this is its results:
image
It is much lower than results in your paper. Can you explain it and show more details of your deraining results?
Thank you!

a question about ".state" file

Hello, I train a denoise model and then generate the 4000.state file .I want ask what it use for? and where can I see the change of loss during the training period.

torch.distributed.elastic.multiprocessing.errors.ChildFailedError

Hi,

I've been trying to train the deraining model on your datasets for the last one week, but every time I run the train.sh script, the data loaders get created and then I get the following error:

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 23058) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

basicsr/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-04-17_01:06:49
host : instance-1.c.cs4705-hw4.internal
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 23058)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

I've tried training this on Colab, GCP, and my local machine, and the only time it runs is when I train with num_gpus=1, at which point the ETA for a single epoch is 2 days.

Any help would be greatly appreciated.

Thanks!

How many GPUS are used when training

Hi, could tell me the detailed information about the GPUS(GPU type and number) used in the experiment and the loss for motion deblur? I do not find these details in the paper, thanks so much.

Pretrained models

Do you have planned to train restormer on realblur and released the pretrained models?

Training code

Your work is so wonderful,could you please provide training scripts?

Loss function and supp

In this paper , I dont find the defination of loss function, could you tell me the loss function of restormer.
and where can i find the supplement of this paper.
thank you

Comparing with SwinIR

Thanks for your great works which inspire me so much.
In your paper(page8), compared to swinIR, your model has fewer Flops and runs 13x faster. And it comes true in my own testing. So, what is the key point that makes it faster?
And Shifted window is not utilized in your model. Whether it is the key of reducing the computation than SwinIR?

Question about the ablation study

Thanks for your outstanding work. I have some confusion about the implementation of group(b) and group(e) in the ablation study in Sec. 4.5. For group(b), with MTA+standard FN, should I break the feature map into patches before passing it to the fully-connected layer and reshape it to its original size after FC. If so, what's the patch size used, and is dropout applied here? And for group(e), what is the structure of DFN since the gating mechanism is removed? Is it just a sequential LayerNorm and 1*1 convolution?

Runtime environment

麻烦问一下这个代码运行的环境是什么,显卡使用情况以及pytorch版本信息

About layer normalization code

There are some discrepancies between the layer normalization that I understand and the implementation method in the code. I hope someone can answer my doubts:

I understand layer normalization:
input X, dimension B C H W, calculate the mean variance of length B for each B, multiply the weight and bias of length B

Layer normalization in the Restomer code:
input X, dimension B C H W, calculate the mean variance of B H*W dimension, multiply the weight and bias of length C

Problems about training dual_pixel defocus deblur

I ran the training code of DualPixel in your way, and there is an error. The error is as follows:

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2200930) of binary: (my python path)
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (FAILED). Waiting 300 seconds for other agents to finish

This error should be related to distributed training. I use two gpus for distributed training and modify [Restormer/train.sh]and [DefocusDeblur_Single_8bit_Restormer.yml].

About metrics of matlab version

Hello, thanks for your work. I tested my resultant images using your matlab script in motion delburring folder. But obtained the somewhat lower psnr result compared with the pytorch version using skimage.metrics.
Did you encounter this issue? Thanks very much.

how to measure FLOPs?

In Table.7 and Table.8, the measurements of params and FLOPs on different network settings are listed for the input image of size 256*256. Here I'd like to know how to get the FLOPs. I have tried thop, a open-source package "https://github.com/sovrasov/flops-counter.pytorch", but it doesn't work well. The accurate FLOPs for Transformer Block, MergeBlock and other layers cannot be derived using thop directly.

About the training?

Hello.

In ''Restormer-main\basicsr\data\paired_image_dataset.py''

1650803949(1)

These functions are not defined. Why?

Network is not converging using custom dataset.

Thank you for the great work!!

Here is my question: I would like to train the Restormer on my own dataset with "Deraining" setup.
However, after about 50000 iterations, the loss is not descending at all, and val psnr almost didn't change since the start of the training.

I've confirmed that the data flow is correct, where input and output image are matched.
Is this because the Restormer needs a longer time to warm up? Or am I doing anything wrong?

By the way, I am using 2 GPU for training, so I changed the yaml file and train.sh accordingly to fit 2 GPU.

I'm looking forward to your reply. Thank you in advance!

MDTA

Hi @swz30,

Thanks for the great work!

The multi-Dconv head transposed attention (MDTA) you proposed is not a spatial-attention mechanism, right? So the efficiency is only attained by switching the original attention applied on spatial dim to channel dim, which is always fixed regardless of image sizes. If this holds true, unlike attention in ViT, MDTA cannot perform long-range (or global) interaction across the spatial dim, correct?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.