Giter VIP home page Giter VIP logo

lumina-t2x's Introduction

Alpha-VLLM Lab Website

This is the website of our academic research group at Shanghai AI Lab.

This website is powered by Jekyll and some Bootstrap, Bootwatch.

Copyright Alpha-VLLM team. Code released under the MIT License.

lumina-t2x's People

Contributors

artanic30 avatar chrisliu6 avatar eltociear avatar feifeieiar avatar frankluox avatar gaopengpjlab avatar hafred avatar kamisatokanade avatar linziyi96 avatar npjd avatar pommespeter avatar poppuppy avatar rongjiehuang avatar yuxumin avatar zhuole1025 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lumina-t2x's Issues

ComfyUI support?

Is there a way to use this in comfyui?
Really impressed with the prompt following that a user shared into a discord channel.

Also can lora's be created for it? Can it be trained?

conrolnet support?

conrolnet support? Can lumina trained with input conditional images and generate image with image conditions like controlnet?

🤩 [User Study] Report your bad examples!

Hi, all

Thank you for being so interested in Lumina-T2X. If you encounter images of poor quality, please feel free to report them in this issue to assist in our model's enhancement.

You can directly add a comment on this issue and use the template below:

prompt: <copy-paste prompt here>
image: <copy-paste image here>

You can also optionally report the hyperparameters you have used if it does influence the quality.

出图效果不够惊艳

为什么你们模型出图效果不够惊艳呢
甚至不如pixart-sigma效果
是因为模型训练数据量不够吗
还是你们模型对prompt做了你们自己指令规范

Question about lognorm

Hi, great work! The paper mentions lognorm, but I couldn't find the implementation. Could you let me know if it's used in the code? If so, please tell me where I can find it. Thank you very much!

Showcase lumina-t2x on Huggingface Spaces with get free GPU grant(A100s)

Congratulations on your amazing project and the successful live gradio demo! It would also be great to have the demo available on Huggingface Spaces. This could help with more community engagement and drive more visibility to the project. We at Huggingface also provide free GPU grants through the ZeroGPU program, which includes free A100s. We would be happy to extend the grant to your application.

Here are some useful links to help you get started on Spaces:

Please let us know if you need any further assistance or support in integrating your project with Spaces or any other relevant Huggingface offerings.

About time shifting factor.

For Lumina-T2I, it seems that time_shifting_factor is only implemented in the ODE integrator, but not in the SDE integrator. Does this factor have a big impact? SDE and ODE, which one is more recommended? Thanks!

Can not generate the same image with the demo

I have download the models on my computer and run the models locally. But I found that my generated images can not align the images with your demo, even with the same settings, eg. seed and sample_steps. Do you use some default negative prompts in your demo?

Training details about the t2v model.

Hi, I am currently using one A100 40 doing test on lumina-t2v model, may I ask the gpu type used for training the T2V model. And I also wonder the number of frames?

My implementation follows these steps:

  1. I followed the paper, added another flatten and unflatten operations along the frame dimension.
  2. In order to save time, I did the preprocessing separatedly before starting training, including llama and vae. But the vae is identical to the one used in t2i, so I worry it might not be able to capture enough temporal consistency.

In my testing, the video tensor stops at b=4,f=8,c=4,h=32,w=32 (after embedding) out of the memory issue. So it might be sort of impossible to even do the small-scale tests to verify your temporal-spatial merging method.

I am really interested in reading your training details, and the comparison between temporal-spatial dividing and merging strtegies. Your insights would be greatly helpful.

t2v timing

Hi

I've implemented Lumina-T2V model and training it on Panda dataset. The paper mentions initial training takes 8 GPUs. I assume they are 8xA100 80GBs (which I'm using). May I know how long does it take (in terms of GPU hours)?

About synthetic dataset of T2I

Hi, thank you for your great work
I'm curious whether, in the synthetic T2I dataset, both the captions and images are synthetic, or if only one of them is.

I can't make it work in Colab

I can't make it work in Colab
Hi, I'm trying to run Lumina-T2X in Google Colab, but I'm encountering an error when trying to import what I believe is a necessary function.

Here's the error message I'm getting:

ERROR: Could not find a version that satisfies the requirement lumina_next_sft (from versions: none)
ERROR: No matching distribution found for lumina_next_sft

ModuleNotFoundError Traceback (most recent call last)
in <cell line: 11>()
9 import matplotlib.pyplot as plt
10 import gradio as gr
---> 11 from lumina_next_sft import generate_text as lumina_generate_text

ModuleNotFoundError: No module named 'lumina_next_sft'


NOTE: If my import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

I'm assuming "lumina_next_sft" is part of the Lumina-T2X codebase, but I haven't been able to figure out the correct way to import the generate_text function.

Could you please provide some guidance on how to resolve this issue or point me to any resources that might help?

Thanks in advance!

error in Next-DiT

When I run Next-DiT according to the readme, I get the following problem when loading the DiT model. How can I solve it?
KeyError: 'NextDiT_2B_GQA_patch2'
image

Batch Generation

Hello, currently Lumina-T2I only supports Web Demo and CLI. I would like to ask how to achieve batch generation, that is, to continuously generate multiple prompts. Looking forward to your reply, thank you.

great work

this is a great work for t2x. can you update the checkpoint url of the t2i, since the old url is invalid

Support Generation for Low Resolution Generation

Hi, thanks for the release! Is there any plan to support good quality low res image gen? For eg., 256x256 or 128x128? Since, directly changing the res in the config file doesn't work

resolution: "1024x1024" # option: ["1024x1024", "512x2048", "2048x512", "(Extrapolation) 1664x1664", "(Extrapolation) 1024x2048", "(Extrapolation) 2048x1024"]

For eg., for prompt: A black Honda motorcycle parked in front of a garage. (from COCO) and sampling steps 60, the results with 256x256 are pretty bad

000000179765_1

Although for 1024x1024, they look better
000000179765

Inquiry on training data and setup for T2I training

Firstly, I would like to express my gratitude and respect for the remarkable work you’ve done by open-sourcing the T2I model, which is a significant contribution to the community.

I have two questions:

  1. I have gone through the associated paper but was unable to find specific details on the datasets used for training the T2I model. Could you please confirm if this information is available elsewhere or if I may have overlooked it in the paper? Any details you could share would be greatly appreciated.

  2. While I have read about the training section you’ve shared for the T2I model, there seems to be a lack of information regarding the training data setup. I am particularly interested in the data structure and how to properly organize it for training. Additionally, it would be extremely helpful if you could provide an example of a toy dataset, similar to the one shown in Pixart-Sigma, and instructions to verify if the training CLI is functioning as intended.

I understand that providing this detailed information might be demanding, but I believe that such transparency would greatly benefit the wider adoption of the Lumina project within the open-source community.

Thank you for considering my request. I look forward to your response and any guidance you can provide.

which llama 7b version use?

did you use llama 7b in the training with the InternViT−6B?
and is there any plan to release a technical report?

Got Unpickling error when sampling lumina_next_t2i

Hi!

According to https://github.com/Alpha-VLLM/Lumina-T2X/tree/main/lumina_next_t2i#inference,

  • I downloaded Lumina-Next-T2I from huggingface
  • and try python -u sample.py --ckpt {my_ckpt_path} but got following error.
FutureWarning: `Transformer2DModelOutput` is deprecated and will be removed in version 1.0.0. Importing `Transformer2DModelOutput` from `diffusers.models.transformer_2d` is deprecated and this will be removed in a future version. Please use `from diffusers.models.modeling_outputs import Transformer2DModelOutput`, instead.
  deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
/home/Lumina-T2X/lumina_next_t2i/models/components.py:9: UserWarning: Cannot import apex RMSNorm, switch to vanilla implementation
  warnings.warn("Cannot import apex RMSNorm, switch to vanilla implementation")
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Traceback (most recent call last):
  File "/home/Lumina-T2X/lumina_next_t2i/sample.py", line 326, in <module>
    main(args, 0, master_port)
  File "/home/Lumina-T2X/lumina_next_t2i/sample.py", line 95, in main
    train_args = torch.load(os.path.join(args.ckpt, "model_args.pth"))
  File "/home/.conda/envs/lumina_t2x/lib/python3.10/site-packages/torch/serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/.conda/envs/lumina_t2x/lib/python3.10/site-packages/torch/serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
  • Checked model_args.pth file exists in ckpt_path

[Feedback] Share your good examples!

If you've generated some great images, we'd love to see your prompts, hyper-parameters and hear about your experience!

Let's discuss how to generate a perfect image!

Questions about next line / next frame token

Hi. Thanks for sharing great work!

  • In case of video generation, how next line / next frame token attached to latent frames ??
    image

    • It seems like next line token attached to every end of height, and next frame token attached to the very end of one latent frame. Is this right?
    • Since videos have different resolution and duration, how this thing managed in one batch of different videos ? Is there learnable PAD tokens attached to make same length inside a batch?
  • Do you have any plan to release training code (especially T2V) and dataset ??

Thank you!

how to try sd3 sample_sd3.py ?

I saw you made changes yesterday involving ODEs, and I'd love to try them out in a simple Diffusers Colab notebook. Is that possible? Does this change involve taking the transformer and VAE from SD3 and using them with Lumina? Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.