swz30 / restormer Goto Github PK

[CVPR 2022--Oral] Restormer: Efficient Transformer for High-Resolution Image Restoration. SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

License: Other

Python 97.18% MATLAB 2.78% Shell 0.04%

image-restoration image-deraining image-deblurring defocus-deblurring motion-deblurring transformer pytorch low-level-vision cvpr2022 high-resolution

restormer's Introduction

Restormer: Efficient Transformer for High-Resolution Image Restoration (CVPR 2022 -- Oral)

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang

News

April 4, 2022: Integrated into Huggingface Spaces 🤗 using Gradio. Try out the web demo:
March 30, 2022: Added Colab Demo.
March 29, 2022: Restormer is selected for an ORAL presentation at CVPR 2022 💫
March 10, 2022: Training codes are released 🔥
March 3, 2022: Paper accepted at CVPR 2022 🎉
Nov 21, 2021: Testing codes and pre-trained models are released!

Abstract: Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and inadaptability to input content), its computational complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to most image restoration tasks involving high-resolution images. In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks, including image deraining, single-image motion deblurring, defocus deblurring (single-image and dual-pixel data), and image denoising (Gaussian grayscale/color denoising, and real image denoising).

Network Architecture

Installation

See INSTALL.md for the installation of dependencies required to run Restormer.

Demo

To test the pre-trained Restormer models of Deraining, Motion Deblurring, Defocus Deblurring, and Denoising on your own images, you can either use Google Colab , or command line as following

python demo.py --task Task_Name --input_dir path_to_images --result_dir save_images_here

Example usage to perform Defocus Deblurring on a directory of images:

python demo.py --task Single_Image_Defocus_Deblurring --input_dir './demo/degraded/' --result_dir './demo/restored/'

Example usage to perform Defocus Deblurring on an image directly:

python demo.py --task Single_Image_Defocus_Deblurring --input_dir './demo/degraded/portrait.jpg' --result_dir './demo/restored/'

Training and Evaluation

Training and Testing instructions for Deraining, Motion Deblurring, Defocus Deblurring, and Denoising are provided in their respective directories. Here is a summary table containing hyperlinks for easy navigation:

Task	Training Instructions	Testing Instructions	Restormer's Visual Results
Deraining	Link	Link	Download
Motion Deblurring	Link	Link	Download
Defocus Deblurring	Link	Link	Download
Gaussian Denoising	Link	Link	Download
Real Denoising	Link	Link	Download

Results

Experiments are performed for different image processing tasks including, image deraining, single-image motion deblurring, defocus deblurring (both on single image and dual pixel data), and image denoising (both on Gaussian and real data).

Image Deraining (click to expand)

Single-Image Motion Deblurring (click to expand)

Defocus Deblurring (click to expand)

S: single-image defocus deblurring. D: dual-pixel defocus deblurring.

Gaussian Image Denoising (click to expand)

Top super-row: learning a single model to handle various noise levels. Bottom super-row: training a separate model for each noise level.


Grayscale	Color

Real Image Denoising (click to expand)

Citation

If you use Restormer, please consider citing:

@inproceedings{Zamir2021Restormer,
    title={Restormer: Efficient Transformer for High-Resolution Image Restoration}, 
    author={Syed Waqas Zamir and Aditya Arora and Salman Khan and Munawar Hayat 
            and Fahad Shahbaz Khan and Ming-Hsuan Yang},
    booktitle={CVPR},
    year={2022}
}

Contact

Should you have any question, please contact [email protected]

Acknowledgment: This code is based on the BasicSR toolbox and HINet.

Our Related Works

Learning Enriched Features for Fast Image Restoration and Enhancement, TPAMI 2022. Paper | Code
Multi-Stage Progressive Image Restoration, CVPR 2021. Paper | Code
Learning Enriched Features for Real Image Restoration and Enhancement, ECCV 2020. Paper | Code
CycleISP: Real Image Restoration via Improved Data Synthesis, CVPR 2020. Paper | Code

restormer's People

Contributors

Stargazers

Watchers

Forkers

iwldzt3011 liuzhuforfun scott-mao mhayatt wangxiu0607 shuweis cc1164 supersupercong mfkiwl cv-ip wang-kaige azuredsky zzg-tju gujiananl giantmonster yoyokitartora ikasumi littleyann invokerer kuijiang94 peterzhousz hello-trouble vztu zr1048028670 lpzhang fdsig aasthaengg rishistyping nishantb06 delldu infernolia 447983454 mountains-high midnight93 timfu248 caiyuanhao1998 taowangzj sailfish009 husterrc junedylan liushiji666 afaq-ahmad lwdebug geekfanr luoolu smile-qt wwhappylife salman-h-khan iseemay ayankumarbhunia a23956491z onetaken huxiang15b youhojoon beiluo97 cryptowealth-technology stephenhandiar jinglongdu pengqitu mcc1095319343 gz-hongyin jaycheney lecooo michaelscovell zhangaocanada ninghaiywx fedral ulysiss qiaoptdun phonhay103 abdullah-cvpr yidfeng kkodoo jackzhousz weijia99 ip-restoration dl-vit jasonli0707 artificial-intelligence-office louisayu0426 newuserforstudy guojingming langmanbusi totorom calayzhou ee21s050fahim sunpro108 mmaaz60 poonono chxy95 github-campus-iitm sabujar taqsblaze shruti22kumari jmzhang79 tonia86 gauenk clayfx aaaazdy vegeballoon

restormer's Issues

Does not correspond to paper MPRNet

Figure 3 in this paper and Figure 5 in the author's previous paper MPRNet use the same test picture, the SAME PSNR, but the visualization results are very different. Whether I can doubt the authenticity and rigor of the results of the paper

Would you inform about the wide-shallow network?

Hello,

In the ablation study, you compared deeper vs wider Restormer.
I'm wondering about the wider Restormer you mentioned, so could you inform me of the details of it?

Kodak24 testing setting may lead to unfair comparison

For DnCNN and FFDNet, the images in Kodak24 are center cropped into 500x500 for testing (Those images are from https://github.com/cszn/FFDNet/tree/master/testsets/Kodak24). However, the Kodak24 testing set provided in download_data.py does not follow that setting, which may result in an unfair comparison.

The number of channels do not match before/after pixel unshuffle/shuffle?

I am wondering that the number of channels should become 4 times (1/4 times) after applying pixel unshuffle (pixel shuffle) with downscale (upscale x2). But in Fig.1, the number of channels is double (half) before applying pixel unshuffle (pixel shuffle).

[My question]
Is there another layer (like conv_1x1) next to the upsampling/downsample operator to adjust the number of channels (but is not mentioned in Fig.1)? Am I right? Thank you for your time!

Here, I paste my understanding of pixel shuffle. The number of channels changes in r*r scale.

Train from pre-trained model (like ImageNet1K) or random initialization model?

Hello,

When training the Resformer, do you train it from a random initialization model or pre-trained it on ImageNet1K/21K as ResNeXt series do? According to my understanding, it is trained without additional data. For example, in derain task, only use the Rain13K to train the random initialization model for 300K iterations. Am I right? Thanks!

DeamNet and AINDNet results in Real image denoising

Hello, I doubt the results showed in real image denoising are wrong.
The DeamNet and AINDNet use SIDD benchmark to test. However, in your paper, you utilize the results of the SIDD benchmark to compare the results tested in SIDD validation datasets. MIRNet and MPRNet also can find the same problem.
so why not directly compare the results of the Benchmark dataset? Like the DND dataset, the results can be obtained on websites.

Question about the ablation study

Thanks for your outstanding work. I have some confusion about the implementation of group(b) and group(e) in the ablation study in Sec. 4.5. For group(b), with MTA+standard FN, should I break the feature map into patches before passing it to the fully-connected layer and reshape it to its original size after FC. If so, what's the patch size used, and is dropout applied here? And for group(e), what is the structure of DFN since the gating mechanism is removed? Is it just a sequential LayerNorm and 1*1 convolution?

problem about implict long-range dependencies

Hello, I would like to ask how the global features are obtained implicitly in restormer when calculating transposed attention on channel dimension?

add model to Huggingface

Hi, would you be interested in adding Restormer to Hugging Face Hub? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. We can setup an organization or a user account under which restormer can be added similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/akhaliq/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

About the training

How to solve the error of create_dataloader, create_dataset in init.py in the train.py file？
Also what is the difference between training on basicsr documents and training on specific tasks (e.g. Deraining)?

train code

hello,deal authors,I want to train a restormer model on myself,but I can't find the training code of restormer ,can you please write a readme document for the traing of restormer model.

GPU

In Gopro deblurring, I want to know How many gpus were used and how long were they trained in
[patchsize 128, batchsize 64].

About inference on Huggingface

I'm really impressed in inferencing on HuggingFace. I had run Motion Deblur task on my own CPU(i7) and got ~50s inference runtime. May I convert from pth model to onnx and get better runtime.

Pretrained model for Defocus Deblurring links to wrong file

Restormer / Defocus_Deblurring / Pretrained_Models / README.MD
contains a link to download the pretrained model for defocus deblurring.

However, this link appears to lead to the pre-trained model for Deraining.
It is the same URL as is found in Restormer / Deraining / Pretrained_Models / README.MD

I assume this is an error, rather then one being intended to use the same model for both of them.
If so, there is currently no pre-trained model available for Defocus Deblurring.

An updated link would be appreciated.

Looking forward to open source code

GPU

Hello, because my gpu is relatively small, how can I specify the gpu for training?

Question about the baseline model

Dear Authors,
Congratulations to you for the paper has been acepted as Oral in CVPR 2022.
Here, I have a question about the code.
In Section 4.5 'Ablation Studies', Table.7 compares the performance of baseline, multi-head attention, FFN and overall model. The baseline's component is Unet with Resblocks (EDSR model). Is the number of blocks [4, 6, 6, 8] in overall network same as the number of blocks in baseline model? And where can I find the baseline model code? Is that in 'https://github.com/xinntao/BasicSR/blob/master/basicsr/archs/edsr_arch.py'?

Problems about training Deraining

Hi,Congratulations to you have a good job!
Although I haved changed the number of GPUs in train.sh and Deraining_Restormer.yml to 4 since I only have 4 GPUs，I can't train the code of Deraining due to my GPU memory limitations. I found the program can run if I change the batch_size_per_gpu smaller. But the batch size can't meet the experimental settings.
So what can I do if I want to achieve the settings in your experiment ( i.e. For progressive learning, we start training with patch size 128×128 and batch size 64. The patch size and batch size pairs are updated to [(160^2,40), (192^2,32), (256^2,16), (320^2,8),(384^2,8)] at iterations [92K, 156K, 204K, 240K, 276K].) ?

waiting for the code!

denoising training dataset

well done ! But can you tell me about your denoising-working ,what dataset your used? real training dataset and Gaussian Denoising dataset. Thank you very much!

Access denied during downloading the DND denoising dataset.

Hi, thank you for your great work!
I met a permission problem during downloading the DND denoising dataset. But I still am able to access the file from the browser. I wonder if changing permission can fix it.

Cuda out Of memory error

For real denoising task, on photos bigger than 4Mpx code returns this error.
RuntimeError: CUDA out of memory. Tried to allocate 7.62 GiB (GPU 0; 12.00 GiB total capacity; 5.33 GiB already allocated; 4.64 GiB free; 5.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is there any trick to get it working properly with RTX 3060?

Thanks

some problems

Since no training code is given, I write my own training program to train Restormer. However, at the beginning of the training, I could only set batchsize to 48 due to the limitation of GPUs memory. However, I found that the loss would hardly decrease when the first 10,000 to 20,000 iteration was carried out, which verified that the PSNR remained unchanged at about 26.2. Is the training relatively slow, or what is the problem? And if prob, I would like to know the upward trend of Val PSNR and the downward trend of loss during your training

closed

input resolution for measuring FLOPs?

Hi @swz30,

I didn't find anywhere the input image resolution for the reported FLOPs in Table 1. Could you share the image sizes used in your measurement?

image resolution

i found that the performance declined largely when the pretrained model was used to a custom testset with larger resolution （4k*3k）. Is there any suggestion to solve the gap of image resolution between trainset and testset?

colab?

I am pleased with your work; the level of completeness is really professional!
Do you guys have any plan to release the code for Google Colab?
Unfortunately, I can't run the code on my local machine due to some poor factors.

Question about result.

Thanks for your outstanding work. I have a question about your deraining result. I tested deraining.pth weights you provided, and this is its results:

It is much lower than results in your paper. Can you explain it and show more details of your deraining results?
Thank you!

a question about ".state" file

Hello, I train a denoise model and then generate the 4000.state file .I want ask what it use for? and where can I see the change of loss during the training period.

torch.distributed.elastic.multiprocessing.errors.ChildFailedError

Hi,

I've been trying to train the deraining model on your datasets for the last one week, but every time I run the train.sh script, the data loaders get created and then I get the following error:

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 23058) of binary: /opt/conda/bin/python
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

basicsr/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-04-17_01:06:49
host : instance-1.c.cs4705-hw4.internal
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 23058)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

I've tried training this on Colab, GCP, and my local machine, and the only time it runs is when I train with num_gpus=1, at which point the ETA for a single epoch is 2 days.

Any help would be greatly appreciated.

Thanks!

How many GPUS are used when training

Hi, could tell me the detailed information about the GPUS(GPU type and number) used in the experiment and the loss for motion deblur? I do not find these details in the paper, thanks so much.

Pretrained models

Do you have planned to train restormer on realblur and released the pretrained models?

The url of Gassian denoising datasets are invalid, please provide the new url, thanks.

How about the performance on image super-resolution task?

Great job! Have you tried working on image super-resolution task and how does it compare to existing work (e.g. SwinIR) ?

Training code

Your work is so wonderful,could you please provide training scripts?

Happy to see the HINet codebase helps your research!

Don't you think that it is appropriate to add a link to the HINet (e.g. https://github.com/megvii-model/HINet), thanks!

Loss function and supp

In this paper , I dont find the defination of loss function, could you tell me the loss function of restormer.
and where can i find the supplement of this paper.
thank you

problem on the step ”Install gdrive using“

Dear author，I met a problem when input the code "go get github.com/prasmussen/gdrive"

package golang.org/x/oauth2/google: unrecognized import path "golang.org/x/oauth2/google" (https fetch: Get https://golang.org/x/oauth2/google?go-get=1: dial tcp 172.217.163.49:443: i/o timeout)

I want to know how to solve this.THANKS!

Comparing with SwinIR

Thanks for your great works which inspire me so much.
In your paper(page8), compared to swinIR, your model has fewer Flops and runs 13x faster. And it comes true in my own testing. So, what is the key point that makes it faster？
And Shifted window is not utilized in your model. Whether it is the key of reducing the computation than SwinIR？

Question about the ablation study

Runtime environment

麻烦问一下这个代码运行的环境是什么，显卡使用情况以及pytorch版本信息

About layer normalization code

There are some discrepancies between the layer normalization that I understand and the implementation method in the code. I hope someone can answer my doubts:

I understand layer normalization:
input X, dimension B C H W, calculate the mean variance of length B for each B, multiply the weight and bias of length B

Layer normalization in the Restomer code:
input X, dimension B C H W, calculate the mean variance of B H*W dimension, multiply the weight and bias of length C

Problems about training dual_pixel defocus deblur

I ran the training code of DualPixel in your way, and there is an error. The error is as follows:

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2200930) of binary: (my python path)
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (FAILED). Waiting 300 seconds for other agents to finish

This error should be related to distributed training. I use two gpus for distributed training and modify [Restormer/train.sh]and [DefocusDeblur_Single_8bit_Restormer.yml].

About metrics of matlab version

Hello, thanks for your work. I tested my resultant images using your matlab script in motion delburring folder. But obtained the somewhat lower psnr result compared with the pytorch version using skimage.metrics.
Did you encounter this issue? Thanks very much.

how to measure FLOPs?

In Table.7 and Table.8, the measurements of params and FLOPs on different network settings are listed for the input image of size 256*256. Here I'd like to know how to get the FLOPs. I have tried thop, a open-source package "https://github.com/sovrasov/flops-counter.pytorch", but it doesn't work well. The accurate FLOPs for Transformer Block, MergeBlock and other layers cannot be derived using thop directly.

Update about License

Can you please update about use Restormer for the commercial purpose sir

About the training?

Hello.

In ''Restormer-main\basicsr\data\paired_image_dataset.py''

These functions are not defined. Why?

Network is not converging using custom dataset.

Thank you for the great work!!

Here is my question: I would like to train the Restormer on my own dataset with "Deraining" setup.
However, after about 50000 iterations, the loss is not descending at all, and val psnr almost didn't change since the start of the training.

I've confirmed that the data flow is correct, where input and output image are matched.
Is this because the Restormer needs a longer time to warm up? Or am I doing anything wrong?

By the way, I am using 2 GPU for training, so I changed the yaml file and train.sh accordingly to fit 2 GPU.

I'm looking forward to your reply. Thank you in advance!

MDTA

Hi @swz30,

Thanks for the great work!

The multi-Dconv head transposed attention (MDTA) you proposed is not a spatial-attention mechanism, right? So the efficiency is only attained by switching the original attention applied on spatial dim to channel dim, which is always fixed regardless of image sizes. If this holds true, unlike attention in ViT, MDTA cannot perform long-range (or global) interaction across the spatial dim, correct?

About the thesis reproduction process

Hi, you're doing a great job, but can you give us the details of the reproduction process?
Thank you

swz30 / restormer Goto Github PK

restormer's Introduction

Restormer: Efficient Transformer for High-Resolution Image Restoration (CVPR 2022 -- Oral)

News

Network Architecture

Installation

Demo

Training and Evaluation

Results

Citation

Contact

Our Related Works

restormer's People

Contributors

Stargazers

Watchers

Forkers

restormer's Issues

basicsr/train.py FAILED

Failures: <NO_OTHER_FAILURES>

Recommend Projects

Recommend Topics

Recommend Org

Failures:
<NO_OTHER_FAILURES>