nkolkin13 / neuralneighborstyletransfer Goto Github PK

Optimization based style transfer

License: MIT License

Python 100.00%

neuralneighborstyletransfer's Introduction

NNST

Repo for the algorithm NNST-Opt, described in the preprint "Neural Neighbor Style Transfer", please feel free to email any questions to [email protected] Paper Link: https://ttic.uchicago.edu/~nickkolkin/Paper/NNST_Preprint.pdf

Web Demo

Try Replicate web demo here

Dependencies

Tested With:

Python 3.7.7
Pytorch 1.5.0
Imageio 2.8.0
Numpy 1.18.1

Example Output

Example output produced using included files with the command:

python styleTransfer.py --content_path inputs/content/C1.png --style_path inputs/style/S4.jpg --output_path ./output.jpg

To produce an output without color correction use the command

python styleTransfer.py --content_path inputs/content/C1.png --style_path inputs/style/S4.jpg --output_path ./output.jpg --dont_colorize

Examples of using NNST to generate keyframes for video stylization

https://home.ttic.edu/~nickkolkin/nnst_video_supp.mp4

Hardware Requirements

Primarily tested in gpu mode with nvidia gpus using cuda, cpu mode implemented but not tested extensively (and is very slow).
Generating 512x512 outputs requires ~6GB of memory, generating 1024x1024 outputs requires ~12GB of memory.

Usage

Default Settings (512x512 Output):

python styleTransfer.py --content_path PATH_TO_CONTENT_IMAGE --style_path PATH_TO_STYLE_IMAGE --output_path PATH_TO_OUTPUT

(Optional Flag) Producing 1024x1024 Output.

N.B: to get the most out of this setting use styles that are at least 1024 pixels on the long side, the included styles are too small (512 pixels on the long side):

python styleTransfer.py --content_path PATH_TO_CONTENT_IMAGE --style_path PATH_TO_STYLE_IMAGE --output_path PATH_TO_OUTPUT --high_res

(Optional Flag) Set Alpha, must be between 0.0 and 1.0. Alpha=1.0 corresponds to maximum content preservation, Alpha=0.0 is maximum stylization (Default is 0.75):

python styleTransfer.py --content_path PATH_TO_CONTENT_IMAGE --style_path PATH_TO_STYLE_IMAGE --output_path PATH_TO_OUTPUT --alpha ALPHA_VALUE

(Optional Flag) Augment style image with rotations. Slows down algorithm and increases memory requirement. Generally improves content preservation but hurts stylization slightly:

python styleTransfer.py --content_path PATH_TO_CONTENT_IMAGE --style_path PATH_TO_STYLE_IMAGE --output_path PATH_TO_OUTPUT --do_flip

(Optional Flag) Cpu Mode, this takes tens of minutes even for a 512x512 output:

python styleTransfer.py --content_path PATH_TO_CONTENT_IMAGE --style_path PATH_TO_STYLE_IMAGE --output_path PATH_TO_OUTPUT --cpu

(Optional Flag) Use experimental content loss. The most common failure mode of our method is that colors will shift within an object creating highly visible artifacts, if that happens this flag can usually fix it, but currently it has some drawbacks which is why it isn't enabled by default (see below for details). One advantage of using this flag though is that Alpha can typically be set all the way to 0.0 and the content will remain recognizable:

python styleTransfer.py --content_path PATH_TO_CONTENT_IMAGE --style_path PATH_TO_STYLE_IMAGE --output_path PATH_TO_OUTPUT --content_loss

Optional Flags can be combined.

Experimental Content Loss

Because by default our method doesn't use a content loss it sometimes destroys important content details, especially if alpha is below 0.5. I'm currently working on a minimally invasive content loss based on the self-similarity matrix of the downsampled content image. So far it seems to reliably ensure the content is preserved, but has two main flaws.

The first is that it causes periodic artifacts (isolated bright or dark pixels at regular intervals), which is caused by using bilinear downsampling. I'm working on a modified downsampler that randomizes the blur kernel which should fix this, and I'll update this repo when it's ready.

The second is that for styles with limited pallettes (typically drawing based styles), the content loss will cause unfaithful colors that interpolate between the limited palette, I'm still thinking of a way to address this.

Included Example Inputs

The most important thing about choosing input style and content images is ensuring that they are at least as high resolution as the output image you want (this is most important for the style image, but very helpful for the content image as well).

Generally I've found that style images which work well for one image, tend to work well for many images, I've included some examples of such images in ./inputs/style/ . If a style image consists of large visual elements (for example large shapes in a cubist painting), our method is less likely to capture it. Sometimes setting Alpha to be near 1.0 will work, but this isn't guaranteed.

The content images that work the best are ones with a single large object. The smaller or lower contrast an aspect of the content image is, the more likely it will be lost in the final output. I've included some examples of content images that work well in ./inputs/content/

neuralneighborstyletransfer's People

Contributors

Stargazers

Watchers

neuralneighborstyletransfer's Issues

colorization

Is it possible to retain the colors of the content image? The color preservation does only work sometimes.

Deployed on gradio

Hello,
First thank you for your work and sharing it !

I tinkered a bit with you project yesterday and deployed it to Gradio, I can do a PR and if you want me to, I thought that you might like it.

I have to admit that It's a bit hard to find textures that work well Here is an example I managed to somehow make work.

Do you have the texture that you used for your video example ?

Multi GPU mode?

Is it possible to use two GPUs?

doodle mode

Thanks for the new work & paper! will a guided style transfer be added?

Question on Inference performance

Hi,

I see that it is taking around 10s on A100 to generate a 512 stylyzed image (Without knowing the technical details of the architecture, I would imagine the style transfer to be faster than generating images from scratch using the models like GANs or text-2-img models which can generate new images in a couple of seconds or even less, I apologize if this is less informed comment from me).

I am just curious if there is anything that can be warmed up or tweaked to make the inference time in the order of a couple of seconds? I see that VGG model gets loaded every time.. but that's inconsequential compared to the overall time. Is there anything that can be warmed up or tweaked to improve the performance?

Thanks.

Google Colab link

Someone asked me if i could turn this into a colab for them, this is just simple demo to take in single images and not batches or videos

https://colab.research.google.com/drive/1Osok7GPHe0uHMpEn2L6xqqdRUEZhGj7X?usp=sharing

Sent from my SM-G988U using FastHub

Works on RTX 3080 (mobile)

Just wanted to report this works on a RTX 3080 mobile.

Tested yesterday on Ubuntu 20.04.4 LTS w NVIDIA Driver Version: 510.54 & CUDA Version: 11.6, plus:

Python 3.8.10
PyTorch (torch) 1.11.0+cu113
torchvision 0.12.0+cu113
Imageio 2.16.1
Numpy 1.22.3

To install dependencies, I used:

pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113

Everything worked fine.

This is cool af BTW

RuntimeError: CUDA error: out of memory

Why do I get this error using 3090 with 24GB of memory
RuntimeError: CUDA error: out of memory
Thank you for your answer!!!

Improving inference time

We are looking to run this model and reduce the processing time. Is there any way to tune hyperparameters and/or parallelise inference so that we can leverage more compute?

Would using a model different than VGG be a good idea?

I have a 4gb GTX 970, but I want to generate 1k images using NNST so I am thinking how I could do that. My first idea would be to implement checkpointing, but I realized that VGG is narrow at the bottom, but wide at the top so rather than a 4x reduction in memory use, it would be more like 2x which is not enough to get me to 1k. I haven't tried your method yet, but I saw a Youtube vid which said it needed 12gb for 1024 x 1024 images. It also said that 6gb are required for 512 x 512 which is probably an error, more likely it needs only 3gb. Is that right?

Anyway, VGG is an old and big architecture, I am looking at more recent ones like EfficientNet and ConvNeXt which should have a better computational design and am considering adapting NNST to them, as well as potentially implementing checkpointing.

Before I undertake that kind of work, I just wanted to ask if you've tried out NNST with different models? If you have and then found that VGG is better, I'd like to hear about that. If not, then it might be worth trying NNST with a ConvNeXt.

CPU mode not working without Nvidea Card?

I'm trying to install NNST on an ARM Ubuntu 22.04 virtual machine.

Seems like Torch doesn't work on that setup?

python styleTransfer.py --content_path inputs/content/C1.png --style_path inputs/style/S4.jpg --output_path ./output.jpg --cpu
/home/jan/nnst/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 
  warn(f"Failed to load image Python extension: {e}")
Traceback (most recent call last):
  File "/home/jan/Dokumente/NeuralNeighborStyleTransfer/styleTransfer.py", line 63, in <module>
    torch.cuda.synchronize()
  File "/home/jan/nnst/lib/python3.9/site-packages/torch/cuda/__init__.py", line 493, in synchronize
    _lazy_init()
  File "/home/jan/nnst/lib/python3.9/site-packages/torch/cuda/__init__.py", line 210, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Some very specific image resolutions lead to errors

For instance, with a 850x756 content image:

  File "X:\Avirtual\NeuralNeighborStyleTransfer-main\styleTransfer.py", line 70, in <module>
    output = produce_stylization(content_im_orig, style_im_orig, phi,
  File "X:\Avirtual\NeuralNeighborStyleTransfer-main\utils\stylize.py", line 122, in produce_stylization
    s_pyr = optimize_output_im(s_pyr, c_pyr, content_im, style_im_tmp,
  File "X:\Avirtual\NeuralNeighborStyleTransfer-main\utils\stylize.py", line 311, in optimize_output_im
    s_col_samp = cs_tmp.contiguous().view(1, chans, -1)
RuntimeError: shape '[1, 512, -1]' is invalid for input of size 3673856

With 850x272:

File "X:\Avirtual\NeuralNeighborStyleTransfer-main\utils\imagePyramid.py", line 20, in dec_lap_pyr
    x_small = F.interpolate(cur, (h // 2, w // 2), mode='bilinear')
  File "X:\avirtual\.main\lib\site-packages\torch\nn\functional.py", line 3919, in interpolate
    return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)
RuntimeError: Input and output sizes should be greater than 0, but got input (H: 1, W: 3) output (H: 0, W: 1)