👋 加入我们的 微信社区
Mini Sora 开源社区定位为由社区同学自发组织的开源社区(免费不收取任何费用、不割韭菜),Mini Sora 计划探索 Sora 的实现路径和后续的发展方向:
- 将定期举办 Sora 的圆桌和社区一起探讨可能性
- 视频生成的现有技术路径探讨
- GPU-Green: 最好对GPU内存大小和GPU数量要求较低,比如8卡A100,4KA6000,单卡Rtx4090之类的算力可以训练和推理
- 训练时长友好: 不需要训练太久即可有较好的效果
- 生成视频长度和分辨率不要求过高, 如3-10s,480p都是可接受的
候选复现论文主要有以下三篇, 来作为后续Sora复现的Baseline, 社区已经(02/29)将OpenDiT和SiT代码Fork到codes文件夹下, 期待贡献者提交PR, 将Baseline代码迁移到Sora复现工作上来.
- DiT with OpenDiT
- SiT
- W.A.L.T(还未release)
提交PR或者Issue后, 可以申请加入MiniSora贡献者社群并申请加入 Sora 有关论文复现小组!
主讲: 邢桢 复旦大学视觉与学习实验室博士生
直播看点: 图像生成扩散模型基础/文生视频扩散模型的发展/浅谈 Sora 背后技术和复现挑战
在线直播时间: 02/28 20:00-21:00
回放微信视频号搜索: 聊聊 Sora 之 Video Diffusion 综述
PPT: 飞书下载链接
- Sora: Creating video from text 技术报告: Video generation models as world simulators
- DiT: Scalable Diffusion Models with Transformers
- Latte: Latte: Latent Diffusion Transformer for Video Generation latte论文精读翻译.pdf Latte论文解读
- 更新中...
论文 | 链接 |
1) Guided-Diffusion: Diffusion Models Beat GANs on Image Synthesis | NeurIPS 21 Paper, Github |
2) Latent Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models | CVPR 22 Paper, Github |
3) EDM: Elucidating the Design Space of Diffusion-Based Generative Models | NeurIPS 22 Paper, Github |
4) DDPM: Denoising Diffusion Probabilistic Models | NeurIPS 20 Paper, Github |
5) DDIM: Denoising Diffusion Implicit Models | ICLR 21 Paper, Github |
6) Score-Based Diffusion: Score-Based Generative Modeling through Stochastic Differential Equations | ICLR 21 Paper, Github, Blog |
7) Stable Cascade: Würstchen: An efficient architecture for large-scale text-to-image diffusion models | ICLR 24 Paper, Github, Blog |
8) Diffusion Models in Vision: A Survey | TPAMI 23 Paper, Github |
论文 | 链接 |
1) UViT: All are Worth Words: A ViT Backbone for Diffusion Models | CVPR 23 Paper, Github, ModelScope |
2) DiT: Scalable Diffusion Models with Transformers | ICCV 23 Paper, Github, ModelScope |
3) SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | Paper, Github, ModelScope |
4) FiT: Flexible Vision Transformer for Diffusion Model | Paper, Github |
5) k-diffusion: Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | Paper, Github |
6) OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference | Github |
7) Large-DiT: Large Diffusion Transformer | Github |
论文 | 链接 |
1) Animatediff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning | ICLR 24 Paper, Github, ModelScope |
2) I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models | Paper, Github, ModelScope |
3) Imagen Video: High Definition Video Generation with Diffusion Models | Paper |
4) MoCoGAN: Decomposing Motion and Content for Video Generation | CVPR 18 Paper |
5) Adversarial Video Generation on Complex Datasets | Paper |
6) W.A.L.T: Photorealistic Video Generation with Diffusion Models | Paper Project |
7) VideoGPT: Video Generation using VQ-VAE and Transformers | Paper, Github |
8) Video Diffusion Models | Paper, Github, Project |
9) MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | NeurIPS 22 Paper, Github, Project, Blog |
10) VideoPoet: A Large Language Model for Zero-Shot Video Generation | Paper |
11) MAGVIT: Masked Generative Video Transformer | CVPR 23 Paper, Github, Project, Colab |
12) EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | Paper, Github, Project |
13) SimDA: Simple Diffusion Adapter for Efficient Video Generation | Paper, Github, Project |
14) StableVideo: Text-driven Consistency-aware Diffusion Video Editing | ICCV 23 Paper, Github, Project |
15) SVD: Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets | Paper, Github |
16) ADD: Adversarial Diffusion Distillation | Paper, Github |
论文 | 链接 |
1) World Model on Million-Length Video And Language With RingAttention | Paper, Github |
2) Ring Attention with Blockwise Transformers for Near-Infinite Context | Paper, Github |
3) Extending LLMs' Context Window with 100 Samples | Paper, Github |
4) Efficient Streaming Language Models with Attention Sinks | ICLR 24 Paper, Github |
5) The What, Why, and How of Context Length Extension Techniques in Large Language Models – A Detailed Survey | Paper |
6) MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | CVPR 24 Paper, Github, Project |
论文 | 链接 |
1) ViViT: A Video Vision Transformer | ICCV 21 Paper, Github |
2) VideoLDM: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models | CVPR 23 Paper |
3) LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation | Paper, Github |
4) LFDM: Conditional Image-to-Video Generation with Latent Flow Diffusion Models | CVPR 23 Paper, Github |
5) MotionDirector: Motion Customization of Text-to-Video Diffusion Models | Paper, Github |
论文 | 链接 |
1) Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion | Link |
论文 | 链接 |
1) Layered Neural Atlases for Consistent Video Editing | TOG 21 Paper, Github, Project, |
2) StableVideo: Text-driven Consistency-aware Diffusion Video Editing | ICCV 23 Paper, Github, Project |
3) CoDeF: Content Deformation Fields for Temporally Consistent Video Processing | Paper, Github, Project |
资料 | 链接 |
1) Datawhale - AI视频生成学习 | Feishu doc |
2) A Survey on Generative Diffusion Model | TKDE 24 Paper, Github |
3) Awesome-Video-Diffusion-Models: A Survey on Video Diffusion Models | Paper, Github |
4) Awesome-Text-To-Video:A Survey on Text-to-Video Generation/Synthesis | Github |
5) video-generation-survey: A reading list of video generation | Github |
6) Awesome-Video-Diffusion | Github |
7) Video Generation Task in Papers With Code | Link |
8) Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models | Paper, Github |
9) Open-Sora-Plan (PKU-YuanGroup) | Github |
我们非常希望你们能够为 Mini Sora 开源社区做出贡献,并且帮助我们把它做得比现在更好!
具体查看贡献指南