Giter VIP home page Giter VIP logo

stylecrafter's Introduction

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

                 

GongyeLiu, Menghan Xia*, Yong Zhang, Haoxin Chen, Jinbo Xing,
Xintao Wang, Yujiu Yang*, Ying Shan


(* corresponding authors)

From Tsinghua University and Tencent AI Lab.

🔆 Introduction

TL;DR: We propose StyleCrafter, a generic method that enhances pre-trained T2V models with style control, supporting Style-Guided Text-to-Image Generation and Style-Guided Text-to-Video Generation.

1. ⭐⭐ Style-Guided Text-to-Video Generation.

Style-guided text-to-video results. Resolution: 320 x 512; Frames: 16. (Compressed)

2. Style-Guided Text-to-Image Generation.

Style-guided text-to-image results. Resolution: 512 x 512. (Compressed)

📝 Changelog

  • [2023.12.08]: 🔥🔥 Release the Huggingface online demo.
  • [2023.12.05]: 🔥🔥 Release the code and checkpoint.
  • [2023.11.30]: 🔥🔥 Release the project page.

⏳ TODO

  • Remove Video Watermark(due to trained on WebVid10M).

🧰 Models

Model Resolution Checkpoint
StyleCrafter 320x512 Hugging Face

It takes approximately 5 seconds to generate a 512×512 image and 85 seconds to generate a 320×512 video with 16 frames using a single NVIDIA A100 (40G) GPU. A GPU with at least 16G GPU memory is required to perform the inference process.

⚙️ Setup

conda create -n stylecrafter python=3.8.5
conda activate stylecrafter
pip install -r requirements.txt

💫 Inference

  1. Download all checkpoints according to the instructions
  2. Run the commands in terminal.
# style-guided text-to-image generation
sh scripts/run_infer_image.sh

# style-guided text-to-video generation
sh scripts/run_infer_video.sh
  1. (Optional) Infernce on your own data according to the instructions

👨‍👩‍👧‍👦 Crafter Family

VideoCrafter1: Framework for high-quality text-to-video generation.

ScaleCrafter: Tuning-free method for high-resolution image/video generation.

TaleCrafter: An interactive story visualization tool that supports multiple characters.

LongerCrafter: Tuning-free method for longer high-quality video generation.

DynamiCrafter Animate open-domain still images to high-quality videos.

📢 Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.


🙏 Acknowledgements

We would like to thank AK(@_akhaliq) for the help of setting up online demo.

📭 Contact

If your have any comments or questions, feel free to contact [email protected]

stylecrafter's People

Contributors

gongyeliu avatar

Stargazers

Faych Chen avatar Recep Ahmet SARITEKIN avatar  avatar jiang xingbo avatar  avatar Awam M Wang avatar  avatar ali  avatar longman avatar yuuuu avatar BrichRiver avatar  avatar ChenHuangrong avatar Kanade avatar Aryan avatar leanAI avatar Jiachen Zhou avatar Avery Lamp avatar Fei avatar  avatar Udon avatar YONG-XIANG LIN avatar  avatar Kolia Plemennyk avatar Jintao Lin avatar  avatar Jay Hong avatar  avatar Dae-Young Song avatar  avatar  avatar  avatar Hao Zhang avatar Zhenhua Yang avatar Junyao Gao avatar  avatar  avatar YANHONG ZENG avatar Yuechen avatar  avatar mytoon avatar Peter Hu avatar Hiroka Koizumi avatar Guangyuan Li avatar 林金鹏 avatar sword avatar Yijian Fan avatar Bien avatar luodahei avatar Hay Kim avatar Yutong Wang avatar  avatar Ye TIAN avatar David B. avatar Andrew Bamboo avatar cuigh avatar Andy Lin avatar  avatar  avatar  avatar Stelios Petrakis avatar As avatar  avatar  avatar Lau Van Kiet avatar  avatar James Le avatar Mike Brave avatar  avatar Jean-Philippe Deblonde avatar  avatar  avatar Thomas Roche avatar  avatar  avatar  avatar  avatar RFChu avatar Pavan Ganti avatar  avatar  avatar Blake Senftner avatar Luke Perkin avatar  avatar tomato avatar  avatar Nanqiao Deng avatar  avatar Justin John avatar  avatar Geon-hui Jang avatar Astroboy avatar  avatar Richard Kuo avatar fengzhihui avatar Eliot avatar  avatar Cassie avatar  avatar Shaoshu Yang avatar

Watchers

刘国友 avatar Paragoner avatar ke1ne avatar  avatar Evan avatar  avatar Volkan avatar  avatar  avatar  avatar Inferencer avatar  avatar

stylecrafter's Issues

如何只训练问文本+风格化参考图得到风格化图片?

  1. 请问大佬,我如果只想得到文本+参考的风格化图像得到风格化的图像模型,有好的项目可以参考进行训练吗,因为我发现我们项目代码基础模型加载的是文本生成视频的模型videocrafter_t2v_320_512,那我只想生成风格化图片,没有必要使用视频生成模型?
    image
  2. 我发现这里风格化和IP-Adapter(https://github.com/tencent-ailab/IP-Adapter)不同,项目中风格化更多是提取参考图的风格+ 文本提升内容,IP-Adapter更多倾向于参考图作为底图进行绘制

Style Embedding Extraction

Can you train detail "Style Embedding Extraction"?

  1. Especially, a trainable "Q-Former", how to train this?

  2. Can you publish train code?

Thank you.

Caption Preprocessing with regular expressions

Dear author,
20240508195612

I would like to know in detail about the BLIP2 caption preprocessing method used in the paper.

  1. Which BLIP2 checkpoint was used? Was there no difference in performance between the checkpoints?
  2. What regular expressions were used?
  3. Whether the captions were processed by BLIP-2 itself or Python source code?

Your paper inspires me. Thank you, in advance.

How to remove watermark?

Hi, thanks to share free.

When i am looking to your work, wonder to To-Do list.
How to remove watermark?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.