Giter VIP home page Giter VIP logo

fouriscale's Introduction

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

      visitors

Linjiang Huang1,3*, Rongyao Fang1,*, Aiping Zhang4, Guanglu Song5, Si Liu6, Yu Liu5, Hongsheng Li1,2,3 ✉️

1CUHK MMLab, 2Shanghai AI Laboratory
3Centre for Perceptual and Interactive Intelligence
4Sun Yat-Sen University, 5Sensetime Research, 6Beihang University
* Equal contribution, ✉️Corresponding author

🔥🔥🔥 We have released the code, cheers!

⭐ If FouriScale is helpful for you, please help star this repo. Thanks! 🤗

📖 Table Of Contents

🆕 Update

  • 2024.07.27: The code for ControlNet is added 🔥
  • 2024.07.01: FouriScale is accepted by ECCV2024 🔥🔥🔥
  • 2024.03.25: The code is released 🔥
  • 2024.03.20: 🎉 FouriScale has been selected as 🤗 Hugging Face Daily Papers 🔥
  • 2024.03.19: This repo is released 🔥

⌛ TODO

  • Release Code 💻
  • Update Code for ControlNet 💻
  • Update links to project page 🔗
  • Provide Hugging Face demo 📺

🎆 Abstract

In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions. To address this issue, we introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis. We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation, intending to achieve structural consistency and scale consistency across resolutions, respectively. Further enhanced by a padding-then-crop strategy, our method can flexibly handle text-to-image generation of various aspect ratios. By using the FouriScale as guidance, our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation. With its simplicity and compatibility, our method can provide valuable insights for future explorations into the synthesis of ultra-high-resolution images.

👀 Visual Results

Visual comparisons

⭐ Visual comparisons between ① ours, ② ScaleCrafter and ③ Attn-Entro, under settings of 4× (default height×2, default width×2), 8× (default height×2, default width×4), and 16× (default height×4, default width×4), employing three distinct pre-trained diffusion models: SD 1.5, SD 2.1, and SDXL 1.0.

Visual results with LoRAs

⭐ Visualization of the high-resolution images generated by SD 2.1 integrated with customized LoRAs (images in red rectangle) and images generated by a personalized diffusion model, AnimeArtXL.

Visual results with more resolutions

⚙️ Setup

conda create -n fouriscale python=3.8
conda activate fouriscale
pip install -r requirements.txt

⭐ We highly recommend following the provided environmental requirements, especially regarding diffusers, as there are significant modifications between versions.

💫 Inference

Text-to-image higher-resolution generation with diffusers script

stable-diffusion xl v1.0 base

# 2048x2048 (4x) generation
accelerate launch --num_processes 1 \
text2image_xl.py \
  --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
  --validation_prompt 'Polenta Fritters with Asparagus & Eggs' \
  --seed 23 \
  --config ./configs/sdxl_2048x2048.yaml \
  --logging_dir ${your-logging-dir}

To generate in other resolutions, change the value of the parameter --config to:

  • 2048x2048: ./configs/sdxl_2048x2048.yaml
  • 2560x2560: ./configs/sdxl_2560x2560.yaml
  • 4096x2048: ./configs/sdxl_4096x2048.yaml
  • 4096x4096: ./configs/sdxl_4096x4096.yaml

Generated images will be saved to the directory set by ${your-logging-dir}. You can use your customized prompts by setting --validation_prompt to a prompt string or a path to your custom .txt file. Make sure different prompts are in different lines if you are using a .txt prompt file.

--pretrained_model_name_or_path specifies the pretrained model to be used. You can provide a huggingface repo name (it will download the model from huggingface first), or a local directory where you save the model checkpoint.

You can create your custom generation resolution setting by creating a .yaml configuration file and specifying the layer to use our method. Please see ./assets/layer_settings/sdxl.txt as an example.

If the stable-diffusion xl model generate a blurred image with your customized prompt, please try --amp_guidance for a stronger guidance.

stable-diffusion v1.5 and stable-diffusion v2.1

# sd v1.5 1024x1024 (4x) generation
accelerate launch --num_processes 1 \
text2image.py \
--pretrained_model_name_or_path runwayml/stable-diffusion-v1-5 \
--validation_prompt "Polenta Fritters with Asparagus & Eggs" \
--seed 23 \
--config ./configs/sd1.5_1024x1024.yaml \
--logging_dir ${your-logging-dir}

# sd v2.1 1024x1024 (4x) generation
accelerate launch --num_processes 1 \
text2image.py \
--pretrained_model_name_or_path stabilityai/stable-diffusion-2-1-base \
--validation_prompt "Polenta Fritters with Asparagus & Eggs" \
--seed 23 \
--config ./configs/sd2.1_1024x1024.yaml \
--logging_dir ${your-logging-dir}

To generate in other resolutions please use the following config files:

  • 1024x1024: ./configs/sd1.5_1024x1024.yaml ./configs/sd2.1_1024x1024.yaml
  • 1280x1280: ./configs/sd1.5_1280x1280.yaml ./configs/sd2.1_1280x1280.yaml
  • 2048x1024: ./configs/sd1.5_2048x1024.yaml ./configs/sd2.1_2048x1024.yaml
  • 2048x2048: ./configs/sd1.5_2048x2048.yaml ./configs/sd2.1_2048x2048.yaml

Higher-resolution generation with ControlNet

We now provide ControlNet with SDXL, you can modify the code similarly for SD 1.5/2.1.

# 2048x2048 (4x) generation
accelerate launch --num_processes 1 \
text2image_xl_controlnet.py \
   --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
   --controlnet_model_name_or_path lllyasviel/sd-controlnet-canny \
   --image_path ${your-control-image-dir} \
   --validation_prompt "Girl with Pearl Earring, highly detailed, sharp focus, ultra sharpness, high contrast" \
   --seed 1 \
   --config ./configs/sdxl_2048x2048.yaml \
   --logging_dir ${your-logging-dir}

Please see the instructions above to use your customized text prompt.

😃 Citation

Please cite us if our work is useful for your research.

@article{2024fouriscale,
  author    = {Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li},
  title     = {FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis},
  journal   = {arxiv},
  year      = {2024},
}

📓 License

This project is released under the Apache 2.0 license.

💡 Acknowledgement

We appreciate ScaleCrafter for their awesome work and open-source code.

✉️ Contact

If you have any questions, please feel free to contact [email protected].

fouriscale's People

Contributors

eltociear avatar leonhlj avatar rongyaofang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fouriscale's Issues

Problem about introducing ControlNet to you model

Hi,

I introduce a ControlNet Module to your model, but the width and height of the latent that be the input to ControlNet are the same of target image width and height.

It is conflict to your setting that latent height(or width)= target height(or width) // 8

Error message under Windows

Clone the repo.
Setup a new virtual environment.
Install requirements.
Install GPU torch.
Run this command (note that Windows needed the sinlge quotes changed to double quotes).

accelerate launch --num_processes 1 text2image_xl.py --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --validation_prompt "Polenta Fritters with Asparagus & Eggs" --seed 23 --config ./configs/sdxl_2048x2048.yaml --logging_dir ./logs

Gives this error message dialog.

---------------------------
python.exe - Entry Point Not Found
---------------------------
The procedure entry point
?requires_grad@TensorOptions@c10@@QEBA?AU12@V?$optional@_N@2@@Z could not be located in the dynamic link library
D:\Tests\FouriScale\voc_fouriscale\Lib\site-packages\xformers\_C.pyd.
---------------------------
OK   
---------------------------

After clicking OK the rest of the script does appear to run OK and the image is created/saved.

0_Polenta Fritters with Asparagus   Eggs_seed23

Error when set text2image_xl.py#L205 guidance_scale=0.0

error message as follow:
Traceback (most recent call last): File "text2image_xl.py", line 597, in <module> main() File "text2image_xl.py", line 576, in main images = pipeline.forward( File "fouriscale/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "text2image_xl.py", line 358, in forward added_cond_kwargs = {"text_embeds": add_text_embeds, "time_ids": add_time_ids} File "fouriscale/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "fouriscale/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 930, in forward sample, res_samples = downsample_block( File "fouriscale/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "fouriscale/lib/python3.8/site-packages/diffusers/models/unet_2d_blocks.py", line 1053, in forward hidden_states = attn( File "fouriscale/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "fouriscale/lib/python3.8/site-packages/diffusers/models/transformer_2d.py", line 309, in forward hidden_states = block( File "fouriscale/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "fouriscale/lib/python3.8/site-packages/diffusers/models/attention.py", line 194, in forward attn_output = self.attn1( File "fouriscale/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "fouriscale/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 322, in forward return self.processor( File "text2image_xl.py", line 156, in __call__ key = key.view(3, batch_size_prompt, attn.heads, -1, head_dim) RuntimeError: shape '[3, 0, 10, -1, 64]' is invalid for input of size 4915200

Evaluation data

The laion-5b data set is too large to download. Could you please share the Laion data you used for evaluation? Thank you very much. In addition, do you have any idea how to download part of Laion?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.