idkiro / starenhancer Goto Github PK

[ICCV 2021 Oral] StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement

License: MIT License

Python 100.00%

pytorch deep-learning image-enhancement image-processing

starenhancer's Introduction

StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement

Abstract: Image enhancement is a subjective process whose targets vary with user preferences. In this paper, we propose a deep learning-based image enhancement method covering multiple tonal styles using only a single model dubbed StarEnhancer. It can transform an image from one tonal style to another, even if that style is unseen. With a simple one-time setting, users can customize the model to make the enhanced images more in line with their aesthetics. To make the method more practical, we propose a well-designed enhancer that can process a 4K-resolution image over 200 FPS but surpasses the contemporaneous single style image enhancement methods in terms of PSNR, SSIM, and LPIPS. Finally, our proposed enhancement method has good interactability, which allows the user to fine-tune the enhanced image using intuitive options.

Getting started

Install

We test the code on PyTorch 1.8.1 + CUDA 11.1 + cuDNN 8.0.5, and close versions also work fine.

Install PyTorch and torchvision fom http://pytorch.org.
Install other requirements:

pip install -r requirements.txt

We mainly train the model on RTX 2080Ti * 4, but a smaller mini batch size can also work.

Prepare

You can generate your own dataset, or download the one we generate.

The final file path should be the same as the following:

┬─ save_model
│   ├─ stylish.pth.tar
│   └─ ... (model & embedding)
└─ data
    ├─ train
    │   ├─ 01-Experts-A
    │   │   ├─ a0001.jpg
    │   │   └─ ... (id.jpg)
    │   └─ ... (style folder)
    ├─ valid
    │   └─ ... (style folder)
    └─ test
        └─ ... (style folder)

Download

Data and pretrained models are available on GoogleDrive or BaiduPan (jvyf).

Generate

Download raw data from MIT-Adobe FiveK Dataset.
Download the modified Lightroom database fivek.lrcat, and replace the original database with it.
Generate dataset in JPEG format with quality 100, which can refer to this issue.
Run generate_dataset.py in data folder to generate dataset.

Train

Firstly, train the style encoder:

python train_stylish.py

Secondly, fetch the style embedding for each sample in the train set:

python fetch_embedding.py

Lastly, train the curve encoder and mapping network:

python train_enhancer.py

Test

Just run:

python test.py

Testing LPIPS requires about 10 GB GPU memory, and if an OOM occurs, replace the following lines

lpips_val = loss_fn_alex(output * 2 - 1, target_img * 2 - 1).item()

with

lpips_val = 0

Notes

Due to licenses, we are unable to release part of the source code. This repository provides a pure python implementation for research use. There are some differences between the repository and the paper as follows:

The repository uses a ResNet-18 w/o BN as the curve encoder's backbone, and the paper uses a more lightweight model.
The paper uses CUDA to implement the color transform function, and the repository uses torch.gather to implement it.
The repository removes some tricks used in training lightweight models.

Overall, this repository can achieve higher performance, but will be slightly slower.

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{song2021starenhancer,
  title={StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement},
  author={Song, Yuda and Qian, Hui and Du, Xin},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={4126--4135},
  year={2021}
}

starenhancer's People

Contributors

Stargazers

Watchers

Forkers

cv-ip zener90818 peterzhousz zheng980629 baifree hello-trouble killsking adambear suke0 micolxs jackzhousz sdlpkxd prakharg1 caogaofeng linhong00316

starenhancer's Issues

Have you tried image retouching with StarEnhancer?

不做风格迁移而是单纯的iamge retouching

The results are not the same as the paper

I am the author.

Some peers have emailed me asking about the performance of the open source model that does not agree with the results in the paper.
As stated in the README, the model is not the model of the paper, but the performance is similar.
The exact result should be:
PSNR: 25.41, SSIM: 0.942, LPIPS: 0.085

If you find that your result is not this, then it may be that the JPEG codec is different, which is related to the version of opencv and how it is installed.

You can uninstall your opencv (either with pip or conda) and reinstall it using pip (it must be pip, because conda installs a different JPEG codec):

pip install opencv-python==4.5.5.62

Is it feasible to use GAN for unpaired training?

非常感谢您开源代码！
您是否进行过，使用GAN进行非成对训练的实验，这样是否可行，您对此有什么看法吗？

How to test with my own style

Hi!
If I want to test my images with your pretrained model, where should I put my images?

How can I use style encoder to train classification on my dataset?

请问对于style_encoder部分，我该如何在我的数据集上训练分类，同时，对于proxy这部分的含义我不太清楚，希望您能指点一二！

One question about the code under the setting of sigle-style enhancement

I am testing your codes on single-style enhancement. However, in line 103-104 of the file 'loader.py', I find you randomly use images of different styles as the source image and target image during the training and validation stage, which may cause the optimization process unstable. It is better to fix the source style and target style?

Multi-style, unpaired setting

您好，在多风格非配对图场景，能否交换source和target的位置，并将得到的output_A和output_B进一步经过enhancer,得到recover_A和recover_B。最后计算l1_loss(source, recover_A)和l1_loss(target, recover_B)及Triplet_loss(output_A，target, source) 和 Triplet_loss(output_B，source，target)

def train(train_loader, mapping, enhancer, criterion, optimizer):
    losses = AverageMeter()
    criterionTriplet = torch.nn.TripletMarginLoss(margin=1.0, p=2)
    FEModel = Feature_Extract_Model().cuda()

    mapping.train()
    enhancer.train()

    for (source_img, source_center, target_img, target_center) in train_loader:
        source_img = source_img.cuda(non_blocking=True)
        source_center = source_center.cuda(non_blocking=True)
        target_img = target_img.cuda(non_blocking=True)
        target_center = target_center.cuda(non_blocking=True)

        style_A = mapping(source_center)
        style_B = mapping(target_center)

        output_A = enhancer(source_img, style_A, style_B)
        output_B = enhancer(target_img, style_B, style_A)
        recoverA = enhancer(output_A, style_B, style_A)
        recoverB = enhancer(output_B, style_A, style_B)

        source_img_feature = FEModel(source_img)
        target_img_feature = FEModel(target_img)
        output_A_feature = FEModel(output_A)
        output_B_feature = FEModel(output_B)

        loss_l1 = criterion(recoverA, source_img) + criterion(recoverB, target_img)
        loss_triplet = criterionTriplet(output_B_feature, source_img_feature, target_img_feature) + \
                       criterionTriplet(output_A_feature, target_img_feature, source_img_feature)
        loss = loss_l1 + loss_triplet

        losses.update(loss.item(), args.t_batch_size)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    return losses.avg