Diffusion_Journey 🔫

This document mainly works as an paper List in categories 🐱 Also, our notes for read papers are linked beside, which could hFelp us recall the main idea in paper more quickly.

🎫 Note that
The paper Information is listed at such format
"Paper Name" Conference/Journal/Arxiv, year month, MethodsAbbreviation
Authors(optional)
[paper link]() [code link]() [paper website link]()
[the Note link, which we makde summary based on our understanding]()
short discription(optional)
If only the paper website is listed, it denotes the paper link and code link could be found in the website page.

The priority order of papers in each category is based on paper importance(based on our task) and then paper-release time.

emoji meaning :warning: : no official code. :construction: code is obscure :statue_of_liberty: / :star:: canonical paper. :bulb:: novel thoughts. :+1:: recommend to see this first. :baby_chick: : has only skimmed through
GPU comparison website

CCF Rec. Conference Deadlines

get stamp for github stars

https://papers.cool/arxiv/cs.CV

Here is Table Of Content! 📖

[TOC]

Old photo restoration

"Bringing Old Photos Back to Life" CVPR oral, 2020 Apr ⭐ paper(CVPR version) paper(TPAMI version) code website note

Pascal VOC 上合成噪声（DA & 噪声模板 collect 62 scratch texture images and 55 paper texture image）；可以参考消除合成数据和真实数据之间 domain gap 的方法。

Face Enhancement 模块用 FFHQ 数据

"Time-Travel Rephotography" SIGGRAPH, 2020 Dec ⭐ paper website code pdf talk 👍

无监督方式！！将灰度图人脸修复为彩色图 >> Sibling 概念，使用预训练 stylegan 的优良颜色特征，用 StyleGAN 生成先弄一个类似的人脸（颜色ok，人不像），然后另外训一个 decoder 生成结构和原图相似的人脸，Color Transfer,contextual loss 训练。无监督方式训练：模拟一个老相机的退化，将 RGB 转为灰度图，与原图做 reconstruction loss （这里前提是数据集本身的噪声并不多，没有扭曲很多情况下）

"Pik-Fix: Restoring and Colorizing Old Photo" WACV, 2022 May paper code pdf

有数据集，发邮件回复下载 GoogleDrive >> Runsheng Xu

RealOld 200个老照片，有 Expert 修复过的 GT！
"Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer" CVPR, 2023 Apr, MROPM paper code website note

从风格迁移的角度做，修复完划痕看起来还很旧，修改风格

Dataset: 从韩国 3 个博物馆收集到的文物照片，拍摄样式老旧，但没有明显的划痕
"Self-Prior Guided Pixel Adversarial Networks for Blind Image Inpainting" TAPMI, 2023 June paper pdf
"Focusing on Persons: Colorizing Old Images Learning from Modern Historical Movies" 2021 Aug, HistoryNet paper code
"DeOldify: A Review and Implementation of an Automatic Colorization Method" IPOL, 2022 Apr, DeOldify paper code pdf
"Towards Robust Blind Face Restoration with Codebook Lookup Transformer" NeurIPS, 2022 Jun, CodeFormer 🗽 paper code website

Blind Face Restoration SOTA, 老照片修复

Degrade Region 🦀

预测划痕、雨水区域，伪影问题的处理

"DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models" ICML, 2023 Jul paper code blog_explanation ⚠️

解决 GAN-SR 的伪影问题，分析 L1 Loss 细节过于突兀，Gan Loss 容易产生伪影但细节很自然，关注如何融合两个 loss 能写成一个工作

"Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks" ICCV, 2023 Aug paper code pdf note Authors: Sixiang Chen, Tian Ye, Jinbin Bai, Erkang Chen, Jun Shi, Lei Zhu
"Restoring Degraded Old Films with Recursive Recurrent Transformer Networks" code
"CLIP-DINOiser: Teaching CLIP a few DINO tricks"Sakuga-42M Dataset: Scaling Up Cartoon Research paper code note

CLIP lack of spatial awareness makes it unsuitable for dense computer vision tasks && self-supervised representation methods have demonstrated good localization properties

take the best of both worlds and propose a zero-shot open-vocabulary semantic segmentation method, which does not require any annotations

Old video restoration 🔥

🎯 Current Working Direction!

"DeOldify" open-sourced toolbox to restore image and video code

strong baseline in multi papers 👍

Analog Video Restoration 🔥

paper with code VHS 老录像带修复

"BasicVSR++: Improving video super-resolution with enhanced propagation and alignment" CVPR, 2021 Apr 🗿 paper code note
"Memory-Augmented Non-Local Attention for Video Super-Resolution" CVPR, 2021 Aug, MANA paper code
"Multi-Scale Memory-Based Video Deblurring" CVPR, 2022 Apr paper code
"Restoration of Analog Videos Using Swin-UNet" ACM-ICM, 2022 Oct paper ACM-paper code
"Reference-based Restoration of Digitized Analog Videotapes" WACV, 2023 Oct, TAPE paper code note Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo

Video Diffusion

paper with code searching 'diffusion video' 👍

Awesome Video Diffusion

"A Survey on Video Diffusion Models" paper code

survey_video_LDM.md

text2video task, 提出 temporal adapter 和 attention adapter 把 image SD 调整为 video SD

"CoDeF: Content Deformation Fields for Temporally Consistent Video Processing" Arxiv, 2023 Aug ⭐ paper code website note

视频一致性编辑，效果非常好！ as a new type of video representation, which consists of a canonical content field
"FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling" Arxiv, 2023 Oct paper code website note
"VideoCrafter1: Open Diffusion Models for High-Quality Video Generation" Arxiv, 2023 Oct paper code note
"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets" Arxiv, 2023 Nov 25, SVD paper code pdf note Authors: Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach
"MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model" CVPR, 2023 Nov 🗽 paper code website note

human image animation task, which aims to generate a video of a certain reference identity
"VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models" paper website
"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets" Arxiv, 2023 Nov 25 paper code pdf note Authors: Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach
"VideoBooth: Diffusion-based Video Generation with Image Prompts" CVPR, 2023 Dec paper code website note
"Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution" CVPR, 2023 Dec, Upscale-A-Video paper code website note

将整个视频按 8 帧切为各个 clip，模仿 SD x4 upscaler 将输入 LR 加噪作为 SD latent space 特征。改造了一下 UNet 加了一点 temporal layer 微调了一下，然后对 z0 对于不同clip 传播一下。更新后的特征输入 VAE decoder 得到 x4 的 HR。这里的 VAE Decoder 加入了 conv3d 微调了一下作为 decoder.

Image2Video

"Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models" Arxiv, 2023 May ⭐ paper code [website](https://controlavideo.github.io /) note
"VideoComposer: Compositional Video Synthesis with Motion Controllability" Arxiv, 2023 Jun, VideoComposer arXiv Website note
"I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models" Arxiv, 2023 Nov 7 paper code pdf note Authors: Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, Jingren Zhou
"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets" Arxiv, 2023 Nov 25 paper code pdf note Authors: Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach
"Pix2Gif: Motion-Guided Diffusion for GIF Generation" Arxiv, 2024 Mar 7 paper code pdf note Authors: Hitesh Kandala, Jianfeng Gao, Jianwei Yang
"Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model" Arxiv, 2024 Apr 15 paper code pdf note Authors: Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal
"ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation" Arxiv, 2024 Feb 6 paper code website pdf note Authors: Weiming Ren, Harry Yang, Ge Zhang, Cong Wei, Xinrun Du, Stephen Huang, Wenhu Chen

talking video

"Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement" Arxiv, 2024 Jun 12 paper code pdf note Authors: Runyi Yu, Tianyu He, Ailing Zeng, Yuchi Wang, Junliang Guo, Xu Tan, Chang Liu, Jie Chen, Jiang Bian

Diffusion related

paper List repo Awesome-Diffusion-Models contains introductory lectures for canonical papers! 👨‍🏫 awesome-diffusion-low-level-vision image-to-image-papers

VAE 博客提供了一个将概率图跟深度学习结合起来的一个非常棒的案例 code

lujianqing，zhangmingxuan，chengqifeng， zhenglei 老师：low level

Daniel Cohen-Or

Diffusion Blog

苏剑林老师 DDPM 理解博客

Diffusion basics

"Understanding Diffusion Models: A Unified Perspective" Arxiv, 2022 Aug paper [note](./2022_08_Arxiv_Understanding Diffusion Models-A Unified Perspective_Note.md)

the basic math for diffusion model

"Denoising Diffusion Implicit Models" ICLR, 2020 Oct 6, DDIM paper code pdf note Authors: Jiaming Song, Chenlin Meng, Stefano Ermon
"Progressive Distillation for Fast Sampling of Diffusion Models" ICLR, 2022 Feb 1, v-prediction paper code pdf note Authors: Tim Salimans, Jonathan Ho

milestone 🗿

"Image-to-Image Translation with Conditional Adversarial Networks" CVPR, 2016 Nov, Pix2pix 🗿 paper code website
"Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks" ICCV, 2017 Mar 30, CycleGAN ⭐ paper code website pdf note blog Authors: Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros

无监督方式实现非配对数据训练，用两组生成鉴别器形成一个 cycle；回环的 cycle 用 consistency loss （L1 Loss 足够）来约束内容一致；用 Identity loss 保证不需要转换的数据输出还是不变

Acceleration

"Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference" paper website Authors: Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao
"Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling" Arxiv, 2023 May paper note
"LCM-LoRA: A Universal Stable-Diffusion Acceleration Module" Arxiv, 2023 Nov paper code pdf note Authors: Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinário Passos, Longbo Huang, Jian Li, Hang Zhao

Stable Diffusion 加速

- "Fast Diffusion EM: a diffusion model for blind inverse problems with application to deconvolution" code
- Rerender A Video"Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling" code
"Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning" NeurIPS, 2023 Jun paper code note

提出降低显存的 finetune 方法，比 LoRA 方式显存降低很多

"PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU" code

单卡 4090 推理 17B 模型

"Distribution-Aware Prompt Tuning for Vision-Language Models" ICCV, 2023 Sep paper code
"Nested Diffusion Processes for Anytime Image Generation" Arxiv, 2023 May paper code

show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image

"Adversarial Diffusion Distillation" Arxiv, 2023 Nov 28, SD-Turbo ⭐ paper code pdf note Authors: Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach

提出 ADD 蒸馏方法，使用此方法蒸馏 SD-v21 得到 SD-turbo

"One-Step Image Translation with Text-to-Image Models" Arxiv, 2024 Mar 18 paper code pdf note Authors: Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu

验证了 one-step SD-turbo 有足够的能力做 image 合成Adversarial Diffusion Distillation任务 && low-level 任务中，在 VAE Decoder 中加入 Encoder 特征能够减轻信息丢失问题

"SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions" Arxiv, 2024 Mar 25 paper code pdf note Authors: Yuda Song, Zehao Sun, Xuanwu Yin

findings

"DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data" NeurIPS, 2022 Nov ⭐ paper

发现训练数据量减少后，FID 指标变差很多，发现 discriminator 对真实or生成图的梯度差距加大，然后相应的设计了一个discriminator的regularization（做实验多观察），验证了一种 unstable training 的原因

To improve the training of GANs with limited data, it is natural to reduce the DIG. We propose to use Eq. (2) as a regularizer so as to control the DIG during training. In turn, this aids to balance the discriminator’s learning speed.

训练完发现效果差，去检查可能的原因！

"FreeU: Free Lunch in Diffusion U-Net" CVPR, 2023 Sep paper

improves diffusion model sample quality at no costs: no training, no additional parameter introduced, and no increase in memory or sampling time.

可视化发现 U-Net Encoder 的残差主要是高频信息，含有较多噪声。因此先用 FFT 和 IFFT 变换降低高频信息，将 UNet decoder 特征乘个系数（加大权重）再 concat

"FreeInit: Bridging Initialization Gap in Video Diffusion Models" CVPR, 2023 Dec paper code

Video Diffusion 噪声图 $z_t$ 的低频部分维持了视频的时序一致性。

怎么加 temporal layer

Findings

text2video inference 时候基于随机选取的高斯噪声开始，这里面的高频信息很乱，造成生成的不一致。因此先用训练的 text2video 模型得到更新的 z0 特征图（也认为是一种噪声），提取里面比较好的低频特征，高频信息替换为新的高斯噪声，优化初始噪声，重新进行去噪。

对 Video Diffusion 的 noise $z_T$ 用 FFT 分解为低频、高频信息，逐步去掉高频信息后，发现生成的视频主体类似，生成内容的时序一致性由视频本身的低频信息决定

Framework

"The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing" ICLR, 2023 Nov 2 paper code web pdf note Authors: Shen Nie, Hanzhong Allan Guo, Cheng Lu, Yuhao Zhou, Chenyu Zheng, Chongxuan Li

发现 diffusion 去噪过程，对于 latent 图像编辑后特征分布改变的情况，导致了编辑结果的变差，而先前ODE方法认为仍是同一分布，没考虑此问题；在数学推导上发现 SDE 去噪过程噪声有益处，能够逐渐拉进编辑后特征的分布 & 原始图像空间特征的分布；而 ODE 去噪过程的分布是不变的，若 xT 分布改变则无法拉近特征分布距离；

Generative Prior

get prior info from large-scale model Kelvin C.K. Chan Yuval Alaluf

"Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation" ECCV oral&PAMI, 2020 Mar, DGP(Deep Generative Prior) 🗽 🐤 paper video 👍
DGP exploits the image prior of an off-the-shelf GAN for various image restoration and manipulation. DGP effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. we allow the generator to be fine-tuned on-the-fly in a progressive manner.

GAN-inversion 由于 model capacity 等限制，只能修复大致的 latent code 但生成效果并不好；类似 Bring Old Photo 论文，GT 和 GAN 生成的数据分布类似，但还有距离
1. 因为要对 generator finetune，使用 MSR + perceptual loss 会 wipe out image prior 损害先验信息，生成的效果不行。
2. **用 discriminator loss 来表示和 GT 分布的距离。**直接 finetune 整个 encoder 会导致 information lingering artifact（上色区域和物体不一致）, 分析因为深层的 encoder 没处理好 low-level 细节，就去 align high-level 的颜色
提出 Progressive Reconstruction 一种 finetune 策略 >> 由浅到深依次解冻 encoder 去 finetune
- Experiment
  
  BigGAN on ImageNet 用于 finetune。Colorization, inpainting, SR
  
  Remove most adversarial perturbation (adversarial defense)
  
  映射到 latent space 来进行修改，除了修复也可以加 random noise 实现 jittering、Image Morphing(融合两幅图像，类似插值)
"PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" CVPR, 2020 Mar 🗽 paper code blog_explanation
对 LR 图像做超分，给定一堆 HR 图像（Manifold），如果有 HR 图像下采样后能近似输入的 LR 图像，则认为这个 HR 为输入 LR图像超分的结果。作者用预训练的 latent space 生成模型 $G$ （本文中使用 StyleGAN）的 latent space 去近似这个 Manifold，转化问题为：去 latent space 找与 LR 接近的 latent code。 PULSE seeks for for a latent vector $z\in \cal{L}(latent~~space)$ that minimizes $downscaling~~loss = \abs{\abs{DS(G(z)) - I_{LR}}}p^p < \epsilon(1e{-3})$ ，$I{SR}=G(z)$ 生成模型结果, $DS$ 代表下采样
- 缺点：推理很慢，需要不停迭代 latent space 去找合适的 latent code
"Blind Face Restoration via Deep Multi-scale Component Dictionaries"
"PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior" Arxiv, 2021 Jun paper website
"Diffusion models as plug-and-play priors" NeurIPS, 2022 Jun paper code
"GLEAN: Generative Latent Bank for Image Super-Resolution and Beyond" TAPMI, 2022 Jul ⭐ paper code note

使用 StyleGAN 大模型先验，从里面抽一些特征辅助进行 SR。参考同样方式做 Diffusion

"Adaptive Diffusion Priors for Accelerated MRI Reconstruction" Arxiv, 2022 Jul paper
"ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow Removal" CVPR, 2022 Dec ⚠️ paper code
"Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models" paper
"CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection" ICCV, 2023 Jan paper code
"Generative Diffusion Prior for Unified Image Restoration and Enhancement" CVPR, 2023 Apr paper website

参考如何使用退化信息作为先验
"Learning a Diffusion Prior for NeRFs" Arxiv, 2023 Apr paper
"Exploiting Diffusion Prior for Real-World Image Super-Resolution" Arxiv, 2023 May paper website code note
"Hierarchical Integration Diffusion Model for Realistic Image Deblurring" NIPS-spotlight, 2023 May paper code note

使用主干网络 Encoder-Decoder 的主干网络（Restormer），在每个 scale 开头加上 diffusion 的先验特征，当作 KV 融入主干网络（提出的 HIM block）；两阶段训练，stage1 先训练用于 diffusion 的图像编码器 LE Encoder, 不训diffusion 把特征 z 输入主干网络，在图像空间约束；stage2 zT 的编码器不训，训练 condition 的编码器 + diffusion + HIM

"ConceptLab: Creative Generation using Diffusion Prior Constraints" Arxiv, 2023 Aug paper website
"DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior" Arxiv, 2023 Aug 🗽 paper code website note

diffusion 先验实现 real-world 修复
"Are Diffusion Models Vision-And-Language Reasoners" code

使用预训练diffusion，设计一个image-text matching module可以完成绝大多数image-text-matching task 👍
"Learning Dual Memory Dictionaries for Blind Face Restoration" paper code
"DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior" CVPR, 2023 Oct paper website note

训练合成新视角的 diffusion 出图，辅助生成 3D 模型；用 stable diffusion 用 VSD loss 细化细节?
"Text-to-Image Generation for Abstract Concepts" AAAI, 2023 Sep paper note

抽象概念的 text2image，分解为理解层次（object，form）优化 prompt

edit

"Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance" 2022 Oct paper code
"Localizing Object-level Shape Variations with Text-to-Image Diffusion Models" Arxiv, 2023 Mar paper code note pdf Authors: Or Patashnik, Daniel Garibi, Idan Azuri, Hadar Averbuch-Elor, Daniel Cohen-Or

通过调整去噪步数，实现指定物体的编辑，同时不改变其余物体
"Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition" WACV, 2023 Mar paper code note

参考如何解决图像修复中，文字模糊的问题 ⭐
"LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On" ACMM, 2023 May paper code

保持区域背景Improving Diffusion Models for Virtual Try-on

"TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" ICCV, 2023 Jul paper code website blog

基于扩散的免训练跨域图像合成
"Editing Implicit Assumptions in Text-to-Image Diffusion Models" CVPR, 2023 Aug, TIME paper code note

输入原始 prompt 和增加编辑属性的 prompt (例如加一个形容词)，修改stable diffusion 的 QKV 映射矩阵实现编辑，用 loss function 约束两个 text embedding 接近。

通过这种编辑来调整 SD 原始 text-prompt 的 QKV mapping 矩阵来实现消除训练数据 bias 的目的

例如原始 SD 训练数据 “A CEO” 都是男士，“A female CEO” 去调整 mapping 矩阵来达到降低 bias 目的

学习对 Loss 计算闭合全局最优解，就不用再去训练了

"Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting" Arxiv, 2023 Nov paper code note
"DiffiT: Diffusion Vision Transformers for Image Generation" CVPR, 2023 Dec 🐤 paper code

引入了一种新的时间依赖的自注意力模块，允许注意力层以高效的方式适应其在去噪过程中的不同阶段的行为
"Reference-based Image Composition with Sketch via Structure-aware Diffusion Model" Arxiv, 2023 Mar paper code pdf note Authors: Kangyeol Kim, Sunghyun Park, Junsoo Lee, Jaegul Choo
"Ablating Concepts in Text-to-lmage Diffusion Models" paper

大模型生成的内容存在版权问题，例如生成 snoopy。想要删除此类受版权保护的概念或图像，因此从头开始重新训练模型。

图像动作编辑

"The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing" ICLR, 2023 Nov 2 paper code web pdf note Authors: Shen Nie, Hanzhong Allan Guo, Cheng Lu, Yuhao Zhou, Chenyu Zheng, Chongxuan Li

从 diffusion 去噪公式上（SDE, ODE）分析验证，**存在 zt 特征编辑后会改变特征分布的情况，SDE 由于噪声的存在，能够在去噪过程中弥补这个特征分布的差距！**而 ODE 加噪去噪前后特征分布是一致的，导致编辑后特征分布不同后，去噪得到的分布也不同，导致了图像烂；

"COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing" Arxiv, 2024 Jun 13 paper code pdf note Authors: Jiangshan Wang, Yue Ma, Jiayi Guo, Yicheng Xiao, Gao Huang, Xiu Li

ID

"PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding" Arxiv, 2023 Dec, PhotoMaker paper code note pdf Authors: (TencentARC) Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan

light

"DiFaReli: Diffusion Face Relighting" ICCV, 2023 Apr paper website code pdf note Authors: Puntawat Ponglertnapakorn, Nontawat Tritrong, Supasorn Suwajanakorn

DiffAE + DDIM 可以将图像解耦为 high-level 特征 $z_{sem}$ ，由图像确定地得到的 low-level 特征 xT（DDIM 性质，图像能够唯一映射到 xT），有出色的重建效果
编辑任务，缺少数据时候，可以用特征分解 + 自重建方式训练；在测试时候对特征进行编辑即可（这个编辑怎么做到？）
类似 StyleGAN style-feature, Semantic Encoder 出来的特征 1x512 含有足够多的信息？

The reverse process to obtain xT is key to reproducing high-frequency details from the input image ⭐

condition 只能起到辅助，需要 xT 保留了很多 low-level 信息，xT 是重建质量的关键！
condition 方式
1. 预训练模型提取有效的图像特征 ⭐
2. 用类似 ControlNet 方式（复制一个 UNet 的 encoder ）去预测一个权重，乘到 res-block 的输出上(AdaIN 方式)
3. 直接 condat 使用 MLP + SiLU 组合去提取特征向量 ok

restoration

"SketchFFusion: Sketch-guided image editing with diffusion model" CVPR, 2023 Apr paper
"SinDDM: A Single Image Denoising Diffusion Model" ICML, 2022 Nov paper code

多尺度 DDPM 去噪

DA

"Effective Data Augmentation With Diffusion Models" NIPS, 2023 Feb paper code note

Img2Video

"Understanding Object Dynamics for Interactive Image-to-Video Synthesis" Arxiv, 2021 Jun paper code
"iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis" Arxiv, 2021 Jul paper code
Conditional Image-to-Video Generation with Latent Flow Diffusion Models CVPR, 2023 Mar paper note

latent flow diffusion models (LFDM)
I2VGen-XL (image-to-video / video-to-video)
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion (Apr., 2023) note
"VideoCrafter1: Open Diffusion Models for High-Quality Video Generation" Arxiv, 2023 Oct paper code note
"Any-to-Any Generation via Composable Diffusion" NIPS, 2023 May paper website

实现多个模态的转换

3D

"Adding 3D Geometry Control to Diffusion Models" Arxiv, 2023 Jun paper
Understanding the underlying 3D world of 2D images, existing challenge:
1. inability to control the 3D properties of the object
2. difficulty in obtaining ground-truth 3D annotations of objects
用 edge map 作为 visual condition；文本用 tag 和类别，将类别标签用 LLM 描述丰富一些，再使用。使用 ContolNet 来引导

Text specific

"Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition" WACV, 2023 Mar paper code note

参考如何解决图像修复中，文字模糊的问题 ⭐
"AnyText: Multilingual Visual Text Generation And Editing" ICLR-Spotlight, 2023 Nov paper code pdf note

Authors: (Alibaba Group) Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie

参考一下针对具体任务，如何设计任务相关的模块：生成文字图像，先直接把文字排好弄成一张图；
1. 特定任务的预训练好的 Encoder (OCR Encoder ) 加上一个 MLP 来与原先文本特征融合，或理解成把原先特征中的一些不好的特征，用现有特征替换掉！ ⭐
  
  pre-trained visual model, specifically the recognition model of PP-OCRv3 Li et al.
2. ConrtolNet 初始 condition 比较少：增加一些额外的条件（文本的位置 mask，文本简单排列）！
针对任务设计 Loss！

GAN

"WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images" Arxiv, 2017 Feb, WaterGAN paper

解决水下图像色差 color correction；结合相机模型训练一个 GAN generator 生成水下风格的合成图像（绿色背景），少部分真实水下照片只用于测试；先预测图像 depth 得到 RGB-D 图像，再训练了一个 UNet 进行颜色矫正

"MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks" CVPR, 2020 paper

多尺度提取 GAN generator 特征，增强生成图像的 details

stable-training

genetic algorithm

"EvoGAN: An Evolutionary Computation Assisted GAN" paper
"Evolutionary Generative Adversarial Networks", E-GAN paper
"Annealing Genetic GAN for Minority Oversampling" paper
"CDE-GAN: Cooperative dual evolution-based generative adversarial network" website

Mamba 🐍

mamba_note_collection

Image restoration

Awesome-diffusion-model-for-image-processing Awesome-CVPR2023-Low-Level-Vision

"Low-Dose CT with a Residual Encoder-Decoder Convolutional Neural Network (RED-CNN)" TMI, 2017 Feb 🗿 paper code

医学 CT 去噪（噪声 GT 对），模型结构很简单
"Deep Image Prior" CVPR, 2017 Nov 29, DIP paper code website pdf note blog Authors: Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky

无监督，发现 NN 去拟合单张低质图像的过程中，中间的迭代步数可以接近输出修复好的图像；NN 对噪声越强的图像，越难拟合（阻抗性）

""Double-DIP": Unsupervised Image Decomposition via Coupled Deep-Image-Priors" CVPR, 2018 Dec 2, Double-DIP paper code website pdf note Authors: Yossi Gandelsman, Assaf Shocher, Michal Irani

DIP 中提出用 NN 本身在训练过程中的先验信息，只去拟合单张低质图像就可以做图像恢复任务，这个方法可以应用到超分，inpainting 各种任务上；

DoubleDIP 把各种 low-level 任务看作图层分解任务，将图像看作多层 layer 的叠加，每一个图层取用 DIP 方式学习。可以用于很多下游任务，例如去雾任务分解为一张清晰和雾气图；将视频转场效果分解，视频分割

findings

"Deep Image Prior" CVPR, 2017 Nov 29 ⭐ paper code pdf note blog Authors: Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky

用随机初始化的 NN 只去拟合单张低质量图像，发现神经网络本身在迭代过程的先验，只要控制指定迭代步数就能得到较好的修复结果（一开始输出乱的，100it 出了个接近修复的图；1kiteration学的太好了输出含有噪声的原图）；

Colorization

"Deep Exemplar-based Colorization" SIGGRAPH, 2018 Jul 🗽 paper code
"DeOldify: A Review and Implementation of an Automatic Colorization Method" IPOL, 2022 Apr, DeOldify paper
"DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders" ICCV, 2022 Dec, DDColor paper code note

Unsupervised

"Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network" TIP, 2020 Dec, UEGAN paper note

unsupervised image enhancement GAN

参考 Encoder Decoder 如何设计
"Time-Travel Rephotography" SIGGRAPH, 2020 Dec ⭐ paper website code talk 👍 pdf
无监督方式！！将灰度图人脸修复为彩色图 >> Sibling 概念，使用预训练 stylegan 的优良颜色特征，用 StyleGAN 生成先弄一个类似的人脸（颜色ok，人不像），然后另外训一个 decoder 生成结构和原图相似的人脸，Color Transfer,contextual loss 训练。无监督方式训练：模拟一个老相机的退化，将 RGB 转为灰度图，与原图做 reconstruction loss （这里前提是数据集本身的噪声并不多，没有扭曲很多情况下）

HWFD 数据集，100多张名人人脸灰度照片，可以下载
- ❓ Color Transfer Loss
"RefineDNet: A Weakly Supervised Refinement Framework for Single Image Dehazing" TIP, 2021 Mar, RefineDNet paper code pdf note Authors: Shiyu Zhao; Lin Zhang; Ying Shen; Yicong Zhou

融入感知到图像融合中，参考设计特征融合

"Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer" CVPR, 2023 Apr, MROPM paper code website note

从风格迁移的角度做，修复完划痕看起来还很旧，修改风格

Dataset: 从韩国 3 个博物馆收集到的文物照片，拍摄样式老旧，但没有明显的划痕

Plug-and-Play

"Denoising Diffusion Models for Plug-and-Play Image Restoration" CVPRW, 2023 May, DiffPIR⭐ paper code website note

Multi-task Restoration
"Deep Optimal Transport: A Practical Algorithm for Photo-realistic Image Restoration" Arxiv, 2923 Jun,DOT-Dmax paper code

后处理方法，进一步提升指标：control the perceptual quality and/or the mean square error (MSE) of any pre-trained model, trading one over the other at test time

Blind-Restoration

Survey

"Survey on Deep Face Restoration: From Non-blind to Blind and Beyond" 2023 Sep paper local_pdf

"Blind Image Super-Resolution: A Survey and Beyond" paper

"Blind Face Restoration via Deep Multi-scale Component Dictionaries" ECCV, 2020 Aug paper
"Towards Robust Blind Face Restoration with Codebook Lookup Transformer" NeurIPS, 2022 Jun, CodeFormer 🗽 paper code website

Blind Face Restoration SOTA, 老照片修复
"CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior" Arxiv, 2023 Jan paper website
"RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors" CVPR, 2023 Apr ⭐ paper
"RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs" TPAMI, 2023 Aug code
"Context-Aware Pretraining for Efficient Blind Image Decomposition" CVPR, 2023 Jun, CPNet ⚠️ paper code note
1. 避免信息泄露，GT 的 pretext 分支用 masked noisy image 代替 gt image

inpainting

"Rethinking Image Inpainting via a Mutual Encoder-Decoder with Feature Equalizations" ECCV oral, 2020 Jul paper code

浅层网络特征提取细节特征（纹理）；深度网络感受野逐渐加大，主要提取语义信息（semantic）。类似 Transformer 的特征融合模块，融合 Encoder 得到的细节和语义特征。关注 Encoder、Decoder 设计
"SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations" 2021 Aug paper

deblur

"Restormer: Efficient transformer for high-resolution image restoration" CVPR, 2021 Nov, Restormer 🐤 paper

UNet 结构不变，每个 block 换为 Transformer block. 两个 Attention，第一个把 MLP 换为 Depth-conv 说是不同通道分别处理，空间HxW上的特征更丰富；第二个相当于做一个 CBAM 时空注意力。

"Stripformer: Strip transformer for fast image deblurring" ECCV, 2022 Apr, 🐤 paper

在 pixel-space 竖着 & 横着逐像素做 attn. 再竖着 & 横着逐条做 attn

"Hierarchical Integration Diffusion Model for Realistic Image Deblurring" NIPS-spotlight, 2023 May paper code note

dehaze

AwesomeDehazing

"Single image haze removal using dark channel prior" CVPRBestPaper&TPAMI, 2009, DCP paper blog code
"Aerial Image Dehazing with Attentive Deformable Transformers" WACV, 2023 ⭐ paper code

Self-atten QKV 特征都单独过 SE 空间注意力 + Deformable 偏移（自己计算偏移）；对比了不同类型 deformable，针对不同任务稍微修改一下 deformable ，psnr 能高 1 db

"RefineDNet: A Weakly Supervised Refinement Framework for Single Image Dehazing" TIP, 2021 Mar paper code note

融入感知到图像融合中，参考设计特征融合

"RIDCP: Revitalizing Real Image Dehazing via High-Quality Codebook Priors" CVPR, 2023 Apr ⭐ paper

reference

"RealFill: Reference-Driven Generation for Authentic Image Completion" Arxiv, 2023 Sep 28 paper code pdf note Authors: Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael Rubinstein

类似 DreamBooth, 用几张图去微调 Diffusion 学习 target image 的场景；参考图 & target 图做 mask 去微调 Diffusion； Diffusion 出图原始区域模糊，对 mask blur & 用 alpha 把生成的和原图融合； diffusion 每次推理不同随机种子随机性太大，用预训练的 dense correspondence 去筛选生成较好的图

"AnyDoor: Zero-shot Object-level Image Customization" CVPR, 2023 Jul 18 paper code pdf note Authors: Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao

使用预训练的 DINOv2 提供细节特征，DINOv2 有全局和 patch 的特征，发现 concat 起来过可学习的 MLP，可以与 UNet 特征空间对齐 ⭐

贴图的时候使用高频特征，而不是放图像，避免生成图像不搭的情况

各个 trick，细节一致性还是不足，例如文字扭曲了

DNIO or CLIP 特征很重要，作为图像物体生成的基本盘，不加物体直接不一样；细节不一致的问题要再用高频特征约束一下

发现训练早期多用视频中多姿态物体训练，能够增强生成物体的细节一致性，缓解色偏的问题

对比 DINO, CLIP 提取物体特征

DINO 特征对于物体细节的特征比 CLIP 特征更优秀，但 DINO 特征要处理下才能好：用分割图提取物体再去提取特征才能得到接近原始物体的结果

CLIP 特征有点离谱，可能是背景干扰很大

"Zero-shot Image Editing with Reference Imitation" Arxiv, 2024 Jun 11, MimicBrush paper code pdf note Authors: Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao

ControlNet 有学习 dense correspondence 的能力，基于 correspondence 去做 inpaint

Image Control/Edit

Image SR

"Image Super-Resolution Using Very Deep Residual Channel Attention Networks" ECCV, 2018 Jul, RCAN 🗽 paper
"SRDiff: Single image super-resolution with diffusion probabilistic models" Neurocomputing, 2021 Apr paper code
"OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer" CVPR, 2023 Feb paper code

Deformable attn 用于图像 SR
"DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models" ICML, 2023 Jul paper code blog_explanation

解决 GAN-SR 的伪影问题，分析 L1 Loss 细节过于突兀，Gan Loss 容易产生伪影但细节很自然，关注如何融合两个 loss 能写成一个工作

"Dual Aggregation Transformer for Image Super-Resolution" ICCV, 2023 Aug paper code

block-based

"MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" ICML, 2023 Feb 16 paper code pdf note Authors: Omer Bar-Tal, Lior Yariv, Yaron Lipman, Tali Dekel
"Mixture of diffusers for scene composition and high resolution image generation"
"Orthogonal Adaptation for Modular Customization of Diffusion Models" CVPR, 2023 Dec paper
"Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis" CVPR, 2023 Unknown paper
"Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer" Arxiv, 2024 May 7 paper code pdf note Authors: Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang

解决任意分辨率，多个 patch 一致性问题

RealSR

"Exploiting Diffusion Prior for Real-World Image Super-Resolution" Arxiv, 2023 May, StableSR paper code website pdf
"Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization" CVPR, 2023 Aug, PASD paper code note
"SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution" Arxiv, 2023 Nov ⭐ paper code note

微调 stable diffusion
"Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild" 24.01
"Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution" 24.01
"Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models" 24.04 paper
"AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation" 24.05.23
"CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution" paper

Video Editing ✂️

"Layered Neural Atlases for Consistent Video Editing" SIGGRAPH, 2021 Sep 🗽 paper website

Nerf representation for video "Blind Video Deflickering by Neural Filtering with a Flawed Atlas" video deblurin
"Stitch it in Time: GAN-Based Facial Editing of Real Videos" SIGGRAPH, 2019 Jan, STIT paper code website note
"Pix2Video: Video Editing using Image Diffusion" Arxiv, 2023 Mar ⚠️ paper code website
"ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing" Arxiv, 2023 May paper code website
"Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer" Arxiv, 2023 May ⚠️ paper code
"TokenFlow: Consistent Diffusion Features for Consistent Video Editing" Arxiv, 2023 Jul ⭐ ⚠️ paper code website

generated video is temporal consistent, 效果很不错使用 video atlas
"StableVideo: Text-driven Consistency-aware Diffusion Video Editing" ICCV. 2023 Aug paper code
"CoDeF: Content Deformation Fields for Temporally Consistent Video Processing" Arxiv, 2023 Aug ⭐ paper code website note

视频一致性编辑，效果非常好！ as a new type of video representation, which consists of a canonical content field
"Generative Image Dynamics" Arxiv, 2023 Sep paper website

LDM 交互方式模拟图像中物体的物理运动

Video Inpainting 😷

"Learning Joint Spatial-Temporal Transformations for Video Inpainting" ECCV, 2020 Jul, STTN 🗽 🐤 paper code pdf note

第一个将 transformer 用于 video inpainting，构建了时空 transformer 从而实现视频修复
"Decoupled Spatial-Temporal Transformer for Video Inpainting" Arxiv, 2021 Apr, DSTT paper code
"FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting" ICCV, 2021 Sep 🗽 🐤 paper code pdf
1. FuseFormer 在 patch 角度做融合，提取有重叠的 patch。主要提出了 SoftSplit+SoftComposite 方式替换了 Transformer 的 FeedForward。Figure1 展示了重叠取 patch (SS+SC 操作) 很好地融合了相邻 patch，overlapped区域聚合了很多tokens的信息，这对于平滑的边界以及增加感受野很有用。 SS 分解 patch，SC将原始重叠区域特征直接相加 (要 Normalize)
2. Encoder-Decoder, Discriminator 参考 STTN，区别在 Encoder 和第一个 Transformer Block 之间加了几层卷积
3. 任意分辨率上测试 issue
"Towards An End-to-End Framework for Flow-Guided Video Inpainting" CVPR, 2022 Apr, E2FGVI 🗽 ⭐ paper code note

End2End: optical-flow completion + feature propagation + content hallucination(Focal transformer 实现)
"PS-NeRV: Patch-wise Stylized Neural Representations for Videos" ICIP, 2022 Aug paper
"Deficiency-Aware Masked Transformer for Video Inpainting" Arxiv, 2023 Jul 🗽⚠️ paper code
"Hierarchical Masked 3D Diffusion Model for Video Outpainting" Arxiv, 2023 Sep paper website note
"ProPainter: Improving Propagation and Transformer for Video Inpainting" ICCV, 2023 Sep 🗽 paper code pdf
- Encoder: We use an image encoder with the same structure as previous works (E2FGVI, FuseFormer)
- Feature Propagation 增加一致性筛选机制，只 warp 筛选后的区域，其余区域用原来的特征
"CIRI: Curricular Inactivation for Residue-aware One-shot Video Inpainting" ICCV, 2023 ⚠️ paper code
One-shot Inpainting（要 mask 的区域只给在第一帧中的信息）, propagate the initial target to the other frames
1. curricular inactivation to replace the hard masking mechanism 动态预测 mask
  
  对于不同帧的 mask 区域存在细节区域不一致，使用 Contextual loss 约束:star:
2. 对于只 inpainting 部分区域的问题，提出 online residue removal method

Video Interpolation

"XVFI: eXtreme Video FFrame Interpolation" ICCV Oral, 2021 Mar 🗽 paper code

optical-flow based VFI methods
"LDMVFI: Video Frame Interpolation with Latent Diffusion Models" Arxiv, 2023 Mar 👍 LDMVFI paper code note

video Interpolation, first diffusion used in video interpolation
"Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation" Arxiv, 2023 Nov paper website note
SparseCtrl
DynamiCrafter

Video generation

"Towards Smooth Video Composition" Arxiv, 2022 Dec, paper code website note

Video Restoration 💧

Video Denoising 🚱

Awesome-Deblurring paper with code

"FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow Estimation" CVPR, 2019 Jul paper code
"Recurrent Video Restoration Transformer with Guided Deformable Attention" NeurlPS, 2022 June, RVRT 🗽 paper code note
"Learning Task-Oriented Flows to Mutually Guide Feature Alignment in Synthesized and Real Video Denoising" 2022 Aug, ReViD ⚠️ paper
"Real-time Controllable Denoising for Image and Video" CVPR, 2023 Mar paper website code

video/image Denoising!
"A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift" CVPR, 2022 Mar paper code
"Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time" CVPR, 2023 Mar ⭐ paper code

参考如何进行多帧融合

Video Colorization 🎨

https://github.com/MarkMoHR/Awesome-Image-Colorization ⭐

"Deep Exemplar-based Video Colorization" CVPR, 2019 Jun paper code note

Bring old films 张博老师的工作Self-augmented Unpaired Image

T 帧结合作者选取的 reference image，输入VGG19提取关联性矩阵。上色模块结合 t-1 时刻的输出，实现 temporal consistency
"Video Colorization with Pre-trained Text-to-Image Diffusion Models" Arxiv, 2023 Jun ⭐

website
"Temporal Consistent Automatic Video Colorization via Semantic Correspondence" CVPR, 2023 May paper
Interactive Deep Colorization

https://github.com/junyanz/interactive-deep-colorization
Improved Diffusion-based Image Colorization via Piggybacked Models Apr 2023

https://piggyback-color.github.io/

Video SR 🔍

mmedit model-zoo paper with code: VSR

"Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution" CVPR, 2023 Dec, Upscale-A-Video paper code website note

event camera

"EvTexture: Event-driven Texture Enhancement for Video Super-Resolution" Arxiv, 2024 Jun 19 paper code pdf note Authors: Dachun Kai, Jiayao Lu, Yueyi Zhang, Xiaoyan Sun

光流传播的方法类似 basicVSR；增加了额外的 event signals ，搞了一个 event signals 的传播分支，得到传播后的 event signal；

把每一帧各自传播后的光流特征 $f_t^C$ 和 event signal 特征 $f_t^T$ concatenate 拼接

Event signals 含有更多细节的纹理的特征，但需要 event camera
按 badcase 找出来分别计算指标hhh
参考实验设计 ⭐

Video Understanding 🤔

3D ResNets for Action Recognition

参考这个将 2D CNN 改成 3D

Video Swin Transformer: survey_note

"Long-Term Feature Banks for Detailed Video Understanding" CVPR, 2018 Decf paper code

use 3D volumes to solve long-video understanding
"Learning to Cut by Watching Movies" ICCV, 2021 Aug paper code website pdf
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale" CVPR, 2022 Nov, EVA-CLIP paper code

feature extractor
"BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models" Arxiv, 2023 Jan paper code

feature extractor Qformer
"Siamese Masked Autoencoders" NeurIPS, 2023 May paper website
"Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models" Arxiv, 2023 Jun, Video-ChatGPT 🗽 paper code

memory

"MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition" CVPR oral, 2022 Jan ⭐ 🐤 paper code paper_local_pdf
高效处理长视频，将 KV 每次压缩（可学 layer）存到 memory（列表存 tensor），和之前 memory concat 起来输入 KV 和当前特征 Q 一起 attn
- tips: 为了不让 memory 和先前 iteration 的梯度关联起来 >> detach
- code 可以学习
  
  读取视频方式：perform sequential reading of consecutive chunks of frames (clips) to process videos in an online fashion
  
  learnable pooling 对 memory 降维
  
  relative positional embedding
- 长视频15min 数据集：AVA spatiotemporal action
- PySlowFast: provides state-of-the-art video classification models with efficient training
"Pin the Memory: Learning to Generalize Semantic Segmentation" CVPR, 2022 Apr paper code
"Multi-Scale Memory-Based Video Deblurring" CVPR, 2022 Oct ⭐ code

多尺度
"Make-A-Story: Visual Memory Conditioned Consistent Story Generation" CVPR, 2022 Nov 🐤 🚧 paper code note

给 story 文本合成图，在 StableDiffusion U-net 的 cross-attn 后面加上一层 memory attn 用先前生成结果替换 QKV ( latent code 作为 V, ...) ，能够提升 LDM 生成一致性。

原来 LDM 效果已经很好，替换指代对数据集加难度，体现 memory 机制的有效性
"MovieChat: From Dense Token to Sparse Memory for Long Video Understanding" Arxiv, 2023 Jul, MovieChat🗽 paper code pdf
designed for ultra-long videos (>10K frames) understanding through interactive dialogue with the user
- frame-wise visual feature extractor, memory mechanism, projection layer, LLM
- feature-extract in sliding window: EVA-CLIP + Qformer
"Memory-and-Anticipation Transformer for Online Action Understanding" ICCV, 2023 Aug paper website
"Memory-Aided Contrastive Consensus Learning for Co-salient Object Detection" AAAI, 2023 Feb paper
"Memory-guided Image De-raining Using Time-Lapse Data"
"Memory Encoding Model" code

match attention 🕸️

可变形卷积 >> transformer 做视频帧的特征对齐各帧之间有差异，直接用 CNN。可变形 transformer 对非对其特征融合

"Cross Attention-guided Dense Network for Images Fusion" Arxiv, 2021 Sep paper code
"TransforMatcher: Match-to-Match Attention for Semantic Correspondence" CVPR, 2022 May paper code
"Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence" NeurIPS, 2022 Oct paper code website note

INR 隐式网络用于特征点匹配，SOTA & 推理一张图要 8-9s
"DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data" Arxiv, 2023 Jun paper code website
"Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes" CVPR, 2023 Apr paper code

多特征融合，去除部分特征干扰
"DiffMatch: Diffusion Model for Dense Matching" Arxiv, 2023 May ⚠️ paper website

Neural Matching Fields 同个组
"GMFlow: Learning Optical Flow via Global Matching" CVPR oral, 2022 Nov

paper code
比 RAFT 更高效的光流预测网络，关注光流预测 & 特征对齐
- 前反向光流只要一次前向

Reference SR

Spatial-Temporal

"STDAN: Deformable Attention Network for Space-Time Video Super-Resolution" NNLS, 2023 Feb 🗽 paper code note

Deformable Attention 视频 SR，每一帧和多个参考帧加权平均来融合（在像素点角度，用 QK 乘积得到的相似度，去做加权平均是否合理？:question:） 12帧显存占用只有 8 G，但搭配后续 Residual Swim Transformer Block 显存直接到 20 G
"Store and Fetch Immediately: Everything Is All You Need for Space-Time Video Super-resolution" AAAI, 2023 Jun paper note

Foundation Model

"InternVideo: General Video Foundation Models via Generative and Discriminative Learning" Arxiv, 2022 Dec paper code note
视频基础大模型，39个数据集的 SOTA， 6B 参数。
1. 验证了 masked video learning (VideoMAE) and video-language contrastive modeling 对于下游任务的有效性；用两个分支的 Encoder
2. 高效训练，相比之前 Coca 只要 23% power costs
3. 无法处理长视频
"VideoChat: Chat-Centric Video Understanding" Arxiv, 2023 May ⭐ paper code

结合 InternVideo, 长视频的视频理解，可以类似 chatgpt 交互

feature alignment

researcher: Prune Truong working on dense correspondence

Video Grounding

"Knowing Where to Focus: Event-aware Transformer for Video Grounding" ICCV, 2023 Aug paper code

Prompt 📚

"Align and Prompt: Video-and-Language Pre-training with Entity Prompts" CVPR, 2021 Dec paper code note
"Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP" CVPR, 2022 Oct paper code note
"Iterative Prompt Learning for Unsupervised Backlit Image Enhancement" ICCV, 2023 Mar paper code website

背光图像增强，CLIP 锁住，初始化 prompt，和图片得到 loss 梯度回传去更新 prompt。得到新的 prompt 去更新优化器
"PromptIR: Prompting for All-in-One Blind Image Restoration" Arxiv, 2023 Jul paper code

Blind Restoration

HDR,LLIE 🔅

HDR(High-Dynamic Range), LLIE(Low-Light Image Enhancement) paper with code rank

Trick

模型最后 [-1, 1] >> Decoder 不用 tanh 直接 conv 出来也是可以的

Decoder 最后一层卷积后面 bias 是否有用，需要做实验去验证

bias 可能会学到训练集的先验

"Opening the Black Box of Deep Neural Networks via Information" paper

Information Bottleneck (IB) tradeoff 加深对 DNN 的理解
"Half Wavelet Attention on M-Net+ for Low-Light Image Enhancement" paper

U-Net decoder 部分每一层的特征，和最终结果 concat 处理，提点很有用
"Learning Enriched Features for Fast Image Restoration and Enhancement" TPAMI, 2022 May, MIRNetv2 🗽 paper code note

各种 low level 的 trick

集成了各种 Trick：关注 Encoder，（多尺度）特征融合，上下文信息融合，训练策略；消融实验很有参考价值:moneybag:
"Deep Learning Tricks links repo" code
"A ConvNet for the 2020s" paper

当作卷积
"Learning to Upsample by Learning to Sample" ICCV, 2023 Aug paper code

对特征进行上采样方式，先前都是 bilinear+Conv; PixelShuffle
"Editing Implicit Assumptions in Text-to-Image Diffusion Models" CVPR, 2023 Aug, TIME paper code note

对 loss function 按目标变量求导，看看是否有闭合解（直接能求出来使得导数=0），也就不用训练了！

Model Architecture Design

paper-list: Awesome-Segment-Anything

https://www.sainingxie.com/pdf/CVPR_t4vworkshop_clean.pdf 基础模型的总结

"Densely Connected Convolutional Networks" CVPRBestPaper, 2016 Aug 25 paper code pdf note blog Authors: Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger

"Shunted Self-Attention via Multi-Scale Token Aggregation" CVPR, 2021 Nov ⭐ paper code 8.2

backbone, 对 KV 下采样多次
"DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks for Image Super-Resolution" Arxiv, 2023 Jan paper 8.2
"Deep Discriminative Spatial and Temporal Network for Efficient Video Deblurring" CVPR, 2023 paper code 8.2
"Learning A Sparse Transformer Network for Effective Image Deraining" CVPR, 2023 Mar paper code 8.2
"Simple but Effective: CLIP Embeddings for Embodied AI" CVPR, 2021 Nov paper code
"Rethinking Breast Lesion Segmentation in Ultrasound: A New Video Dataset and A Baseline Network" paper
ultrasound video segmentation
1. propose a dynamic selection scheme to effectively sample the most relevant frames from all the past frames

Attention:moyai:

Awesome-Transformer-Attention Flighting-CV attention_usage GNN survey

"Attention Is All You Need" NIPS, 2017 Jun 12 ⭐⭐ paper code pdf note Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

$$ WindowAttention(Q,K,V)=Softmax(Dropout\big(\frac{Q}{\sqrt{d_k}} K^T +Pos + mask)\big) *V $$

Q：$\sqrt{d_k}$ 是干嘛的？

dk 为 Q,K,V 特征的通道数 or 维度 ⭐ ，用于做 scaling 的，不加会导致 Dot products 值很大，做完 softmax 梯度太小

We suspect that for large values of dk, the dot products grow large in magnitude, pushing the softmax function into regions where it has extremely small gradients

multi-head 就是在一开始 MLP ，把 C 拆成 nH, C/nH，单独做 attention；最后做完再 reshape 回去；认为对应到了子空间中的不同位置，能够得到更丰富特征
"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" ICCV_best_paper, 2021 Mar paper code note

Video Transformer

Self-atten QKV 特征都单独过 SE 空间注意力 + Deformable 偏移（自己计算偏移）；对比了不同类型 deformable，针对不同任务稍微修改一下 deformable ，psnr 能高 1 db

"Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" ICCV, 2023 Jul paper code
"Revisiting Deformable Convolution for Depth Completion" IROS, 2023 Aug paper code
- Motivation:most of the propagation refinement methods require several iterations and suffer from a fixed receptive field, which may contain irrelevant and useless information
  
  address these two challenges simultaneously by revisiting the idea of deformable convolution. 增大感受野降低迭代数
studied its best usage on depth completion with very sparse depth maps: first generate a coarse depth map Dˆ from the backbone. Then, we pass it through our deformable refinement module.

Efficient-Backbone

giithub repo

"MnasNet: Platform-Aware Neural Architecture Search for Mobile" CVPR, 2018 Jul paper

用 NAS 搜出来一个网络，可以参考借鉴一下搜出来的 CNN block；3x3 Conv Block 搭配 5x5 有 SE 的 ConvBlock

"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" ICML, 2019 May, EfficientNet paper code pdf note Authors: Mingxing Tan, Quoc V. Le

验证了组合地去增加模型深度&通道数&分辨率，能够比只加一个更好；每一个维度只加 1.3 倍左右就足够，能比单一一个维度加 4 倍效果更好
每个维度 scale 多少用 NAS 方式去搜索，能有接近 4 % Acc 的提升

"An Image Patch is a Wave: Phase-Aware Vision MLP" CVPR, 2022 Nov paper code note

ViT 等方法使用 MLP 将图像分为多个 patch，每个 patch 都用同一个 MLP 映射没考虑到 patch 内的特殊性。aim to improve the representation way of tokens for dynamically aggregating them according to their semantic contents，对比 ViT-L 参数量小一半，Flops 约 1/4，对比 Swin-T 同样 Flops 下准确率更高。 wave function（幅值代表强度，相位代表在 wave 中的相对位置）分解图像为幅值（特征）和相位（平衡 token 和 MLP 权值之间的关系）；

提出 (PATM) for aggregating tokens，分解幅值、相位进行融合（区别于加权平均，phase $\theta_k$ adjusts dynamically according to the semantic content），对比没有 phase （加权平均方式）提升 2%Acc. 使用 Channel-FC 获取 phase 信息 $$ \begin{aligned}\boldsymbol{o}j&=\sum_kW{jk}^t\boldsymbol{z}_k\odot\cos\boldsymbol{\theta}k+W{jk}^i\boldsymbol{z}_k\odot\sin\boldsymbol{\theta}_k,\j&=1,2,\cdots,n,\end{aligned} $$

"EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention" CVPR, 2023 May paper note

SAM

"Segment Anything" Arxiv, 2023 May, SAM paper code note
"Fast Segment Anything" Arxiv, 2023 Jun 21 paper code

FPS25 !
"Faster Segment Anything: Towards Lightweight SAM for Mobile Applications" Arxiv, 2023 Jun 25, MobileSAM paper code blog
"Segment and Track Anything" Arxiv, 2023 May, SAM-Track code

视频实例分割，和 E2FGVI 结合一下实现 object removal
"Segment Anything Meets Point Tracking" Arxiv, 2023 Jul, VideoSAM paper code blog

self/semi-Supervised Learning

Self-Supervised Learning awesome-self-supervised-learning

"EMP-SSL: Towards Self-Supervised Learning in One Training Epoch" Arxiv, 2023 Apr, EMP-SSL ⭐ paper code blog_explanation note

一个 loss 提升自监督学习效率，30个epoch实现SOTA；提出的 TCR loss 约束特征表示，将相近特征拉的更近，避免噪声干扰 $$ Loss = \max{\frac{1}{n}\sum_{i=1}^{n}{(R(Z_i) +\lambda\cdot D(Z_i, \bar{Z}))}}\ \bar{Z} = \frac{1}{n}\sum_{i=1}^{n}{Z_i}\ \text{where $n$ is augmented results number, $\bar{Z}$ is the mean of representations of different augmented patches ,}\ \text{In the TCR loss, λ is set to 200.0 and $\epsilon^2$is set to 0.2 (Exp setting)} $$

Siamese Masked Autoencoders

Masked Siamese Networks for Label-Efficient Learning https://github.com/facebookresearch/msn

MixMask: Revisiting Masking Strategy for Siamese ConvNets https://github.com/LightnessOfBeing/MixMask 这几个半监督/自监督的工作很有意思，大家好好看下
SimMIM: a Simple Framework for Masked Image Modeling

可以应用到 video MAE
Hard Patches Mining for Masked Image Modeling https://mp.weixin.qq.com/s/YJFDjcTqtX_hzy-FXt-F6w
Masked-Siamese-Networks-for-Label-Efficient-Learning
"Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks" ICCV, 2017 Mar, CycleGAN 🗿 paper code website
"Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance" paper code
"RefineDNet: A Weakly Supervised Refinement Framework for Single Image Dehazing" TIP, 2021 Mar paper code note

自监督去雾，多个预测结果通SeeSR过感知融合

"Multi-view Self-supervised Disentanglement for General Image Denoising" ICCV, 2023 Sep paper code website note

NLP & 多模态

Multimodal Prompting with Missing Modalities for Visual Recognition https://github.com/YiLunLee/Missing_aware_prompts 训练或者测试是多模态非完美情况
Is GPT-4 a Good Data Analyst https://github.com/damo-nlp-sg/gpt4-as-dataanalyst

对比学习

CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer https://github.com/JarrentWu1031/CCPL

NeRF

VALSE conference report: Nerf summary
2023_CVPR_Inverting-the-Imaging-Process-by-Learning-an-Implicit-Camera-Model_Note.md
Neural Volume Super Resolution https://github.com/princeton-computational-imaging/Neural-Volume-Super-Resolution NeRF+SR
LERF: Language Embedded Radiance Fields https://github.com/kerrj/lerf NeRF + 3D CLIP
iNeRF: Inverting Neural Radiance Fields for Pose Estimation
ViP-NeRF: Visibility Prior for Sparse Input Neural Radiance Fields
AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields
2022_CVPR_Aug-NeRF--Training-Stronger-Neural-Radiance-Fields-with-Triple-Level-Physically-Grounded-Augmentations >> 输入增加扰动
"Anything-3D: Towards Single-view Anything Reconstruction in the Wild" code

将SAM，BLIP，stable diffusion，NeRF结合到一起

Implicit Neural Network

Implicit Neural Representation blog explanation

Invertible Network

"Invertible Image Rescaling" ECCV, 2020 May paper code
"Enhanced Invertible Encoding for Learned Image Compression" ACMM, 2021 Aug paper code

Zhangyang 现在用的可逆网络
"reversible ViT"

刷性能

Neural Operators

Factorized Fourier Neural Operators https://github.com/alasdairtran/fourierflow
Super-Resolution Neural Operator https://github.com/2y7c3/Super-Resolution-Neural-Operator
Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers https://github.com/NVlabs/AFNO-transformer
Fourier Neural Operator for Parametric Partial Differential Equations https://github.com/neuraloperator/neuraloperator

IQA

❔ what is IQA CVPR IQA 博客 IQA(image quality assessment) Task target: quantification of human perception of image quality

Application 想对某一项视觉任务评估图像能否满足需要，比如针对人脸识别的质量评价，看一幅图像是否应该拒绝还是输入到人脸识别系统中；texture classification；texture retrieval （texture similarity）；texture recovery

对于图像下游任务：denoising, deblurring, super-resolution, compression，能够提升图像质

Full Reference, No-reference

"Image Quality Assessment: Unifying Structure and Texture Similarity" TPAMI, 2020 Dec, DISTS paper note

针对有明显纹理的原图，让模型对 JPEG 压缩后、resample 的图像打分（实际上肉眼看上去 JPEG 更加模糊），之前方法对于 JPEG 图像质量评分错误地高于 resample 图。

"Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild" CVPR, 2023 Apr paper our noted pdf
一种 NR-IQA 算法，使用对比学习的方式，使用 2 个 Res50 去学习 content & image-quality-aware features. 最后加一个 regressor 输出 image quality scores. 对于 quality feature 一路，模仿 MoCoV2 ，修改了构造正负样本的方式进行训练。
- Full-reference IQA 方法对于 images in the wild 场景，没有 reference 应用受限
  
  FR-IQA 需要参考图像（undistorted） & distorted 图像，一起才能输出评分。
- high-level content representation using MoCoV2
  
  2 crops from same image -> similar scores, but not the case for some human viewers.
"Half of an image is enough for quality assessment"
"MaxVQA"
- FastIQA 提取视频 VQA 特征，没考虑失真信息
  
  FAST-VQA-and-FasterVQA
"REQA: Coarse-to-fine Assessment of Image Quality to Alleviate the Range Effect" CVPR&IVP, 2022 Sep paper code
Blind image quality assessment (BIQA) of User Generated Content (UGC) suffers from the range effect 发现： overall quality range, mean opinion score (MOS) and predicted MOS (pMOS) are well correlated while focusing on a particular range, the correlation is lower
1. utilize global context features and local detailed features for the multi-scale distortion perception
2. Feedback Mechanism
统计发现 mos 分布具有一定的特性，然后针对性设计了 curriculum learning 提升性能

Impressive Blog

Clandestine 📫

put the works not classified or read below

Problem Formulation: It's quite slow to read a paper just to get enlightenment for ideas. This would attribute to not being able to read much paper in one field to get whole picture and forget previous paper's idea after 1-2 weeks. Not able to generate ideas is caused by little accumulation. Some modules in paper are proposed to make up 2 novelties and may have not much enlightenment on our work. In this case, it's not worth it to spend much time read it and find not that helpful when finished.

In order to solve that problem, we should scan the paper within 30mins at maximum at first read and it's ok not to understand every details at first time! In this section, we could record the meaningful papers and corresponding problems to remind us figure out some problems that we met later.

Also, we should read paper with purpose, like when we need to solve scratch detection problems then we search paper with this objective. First read collected paper coarsely and understand the methods(whole pipeline) at minimum level. If find helpful, then check the code and read in details. And quickly apply the idea to our framework, which is the objective and most significant stuff! 💰 If find not much enlightenment, then quickly turn to search other papers.

However, In these cases, some paper includes some basics knowledge, formulations, like DDPM, or the paper we need further modify. It's worth it to spend 1-2 days to understand every little details or line of code.

贴图

"BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion" Arxiv, 2024 Mar 11 paper code pdf note Authors: Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, Qiang Xu
ST"AnyText: Multilingual Visual Text Generation And Editing" ICLR-Spotlight, 2023 Nov paper code pdf note
"RMT: Retentive Networks Meet Vision Transformers"

blog
3.22
- "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis"

指定区域 && 控制生成内容

enables precise position control while ensuring the correctness of various attributes

"LocalMamba: Visual State Space Model with Windowed Selective Scan"
"VmambaIR: Visual State Space Model for Image Restoration"
"FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation" ⭐

code

视频风格迁移，解决生成内容与输入的一致性
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
"One-Step Image Translation with Text-to-Image Models" Arxiv, 2024 Mar 18 paper code pdf note Authors: Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu
"SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions" Arxiv, 2024 Mar 25 paper code pdf note Authors: Yuda Song, Zehao Sun, Xuanwu Yin
"UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing"

4.1

"Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance" Arxiv, 2024 Mar 26 paper code pdf note Authors: Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim

使用 diffusion 进行 image deblur，inpainting （想办法弄到视频上提升 diffusion condition）
学习一下查看 diffusion 生成效果的特征可视化怎么做 👍

"EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba" code

学习一下被人怎么进一步对 Mamba Block 进行优化
"LITA: Language Instructed Temporal-Localization Assistant" 学习一下视频文本关联，如何定位到某一帧（找干净的帧）
"Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs" Arxiv, 2024 Jan 22, RPG paper code pdf note Authors: Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui

VideoCaption && Diffusion 不同 patch 一致性
"PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance" paper
"Iterative Token Evaluation and Refinement for Real-World Super-Resolution" paper
"Beyond Text: Frozen Large Language Models in Visual Signal Comprehension" Arxiv, 2024 Mar 12, V2T-Tokenizer paper code pdf note Authors: Lei Zhu, Fangyun Wei, Yanye Lu

用 LLM token 来表示图像，发现具有 low-level restoration 的能力 && 不需要 finetune；支持多种下游任务 caption, VQA, denoising; 学习 codebook;

Low-Level 任务给一张完全的人脸，只是移一个位置 or 旋转，输出的人脸修复很烂

LLM gains the ability not only for visual comprehension but also for image denoising and restoration in an auto-regressive fashion

"The Hidden Attention of Mamba Models" paper

可视化 Mamba 如何做 attention

"Multi-granularity Correspondence Learning from Long-term Noisy Videos" Arxiv, 2024 Jan 30 paper code pdf note Authors: Yijie Lin, Jie Zhang, Zhenyu Huang, Jia Liu, Zujie Wen, Xi Peng
"Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models" paper
"Adversarial Diffusion Distillation" Arxiv, 2023 Nov 28, SD-Turbo ⭐ paper code pdf note Authors: Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach
"MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies" code

图像中获取写了什么文字

4.20

"Magic Clothing: Controllable Garment-Driven Image Synthesis"

paper

提出对齐 loss

OmniParser
"Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs"
"State Space Model for New-Generation Network Alternative to Transformers: A Survey" paper

4.26

"QLoRA: Efficient Finetuning of Quantized LLMs"

finetune LLM

"SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation" paper
"Improving Diffusion Models for Virtual Try-on" paper

BrushNet

"ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback" Arxiv, 2024 Apr 11 paper code pdf note Authors: Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen

5.1

Diffusion 细节补足

"MultiBooth: Towards Generating All Your Concepts in an Image from Text" paper
"ID-Animator: Zero-Shot Identity-Preserving Human Video Generation" paper
"ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving" Arxiv, 2024 Apr 25 paper code pdf note Authors: Jiehui Huang, Xiao Dong, Wenhui Song, Hanhui Li, Jun Zhou, Yuhao Cheng, Shutao Liao, Long Chen, Yiqiang Yan, Shengcai Liao, Xiaodan Liang
"Efficient Multimodal Learning from Data-centric Perspective" paper

"StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation" Arxiv, 2024 May 2 paper code pdf note Authors: Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou

Zero-shot 保持 batch 一致性

"DoRA: Weight-Decomposed Low-Rank Adaptation"
"KAN: Kolmogorov-Arnold Networks" code
OpenSoRA
"Factorized Diffusion: Perceptual Illusions by Noise Decomposition"

5.11

"ImageInWords: Unlocking Hyper-Detailed Image Descriptions" Arxiv, 2024 May 5 paper code pdf note Authors: Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut

Prompt 细节描述，提升细节生成能力

"Improving Diffusion Models for Virtual Try-on" Arxiv, 2024 Mar 8 paper code pdf note Authors: Yisol Choi, Sangkyung Kwak, Kyungmin Lee, Hyungwon Choi, Jinwoo Shin

reference 在需要对齐的情况下，保持细节一致

"ID-Animator: Zero-Shot Identity-Preserving Human Video Generation" Arxiv, 2024 Apr 23 paper code pdf note Authors: Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Man Zhou, Jie Zhang

ID 保持能力

基础

SDXL 怎么做
ComfyUI 如何加入节点？
SD-webui
"Vision Mamba: A Comprehensive Survey and Taxonomy" Arxiv, 2024 May 7 paper code pdf note Authors: Xiao Liu, Chenxu Zhang, Lei Zhang
"SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap" Arxiv, 2024 Apr 17 paper code pdf note Authors: Vladimir Somers, Victor Joos, Anthony Cioppa, Silvio Giancola, Seyed Abolfazl Ghasemzadeh, Floriane Magera, Baptiste Standaert, Amir Mohammad Mansourian, Xin Zhou, Shohreh Kasaei, Bernard Ghanem, Alexandre Alahi, Marc Van Droogenbroeck, Christophe De Vleeschouwer
"Emergent Correspondence from Image Diffusion" NIPS, 2023, DIFT code

Diffusion 特征点匹配, Dense correspondence

DragonDiffusion
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
Q：什么是 DDIM-inversion?

"Denoising Diffusion Implicit Models", Arxiv, 2020 Oct, DDIM paper code

"Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration" Arxiv, 2024 Mar 30 paper code pdf note Authors: Shihao Zhou, Jinshan Pan, Jinglei Shi, Duosheng Chen, Lishen Qu, Jufeng Yang
"Emergent Correspondence from Image Diffusion" NIPS, 2023 Jun 6 paper code pdf note Authors: Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, Bharath Hariharan
"Quality-Aware Image-Text Alignment for Real-World Image Quality Assessment" Arxiv, 2024 Mar 17 paper code pdf note Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini
"Lumiere: A Space-Time Diffusion Model for Video Generation" paper
"LayoutGPT: Compositional Visual Planning and Generation with Large Language Models" paper
"Muse Pose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation."

24.06.06

"MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model" Arxiv, 2024 May 30 paper code pdf note Authors: Muyao Niu, Xiaodong Cun, Xintao Wang, Yong Zhang, Ying Shan, Yinqiang Zheng
"ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation" paper
"VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers" Arxiv, 2024 May 28 paper code pdf note Authors: Jun Zheng, Fuwei Zhao, Youjiang Xu, Xin Dong, Xiaodan Liang
"LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model" Arxiv, 2023 May 19 paper code pdf note Authors: Chenjie Cao, Yunuo Cai, Qiaole Dong, Yikai Wang, Yanwei Fu
"EchoReel: Enhancing Action Generation of Existing Video Diffusion Models" paper

基于参考视频生成动漫

一致性

"Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence" paper
"Looking Backward: Streaming Video-to-Video Translation with Feature Banks" paper
"Training-Free Consistent Text-to-Image Generation" Arxiv, 2024 Feb 5 paper code pdf note Authors: Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, Yuval Atzmon
"Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence" NIPS, 2023 May 23 paper code pdf note Authors: Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holynski, Trevor Darrell
"EchoReel: Enhancing Action Generation of Existing Video Diffusion Models" Arxiv, 2024 Mar 18 paper code pdf note Authors: Jianzhi liu, Junchen Zhu, Lianli Gao, Jingkuan Song
"Video Interpolation with Diffusion Models"

24.06.11

"DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis" paper
"StableVideo: Text-driven Consistency-aware Diffusion Video Editing" ICCV, 2023 Aug 18 paper code pdf note Authors: Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
"ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation" Arxiv, 2024 Feb 6 paper code website pdf note Authors: Weiming Ren, Harry Yang, Ge Zhang, Cong Wei, Xinrun Du, Stephen Huang, Wenhu Chen
"AdaptBIR: Adaptive Blind Image Restoration with latent diffusion prior for higher fidelity" paper Authors: Chao Dong
"Temporally consistent video colorization with deep feature propagation and self-regularization learning" 03 January 2024 paper
"Misalignment-Robust Frequency Distribution Loss for Image Transformation"
"The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing" ICLR, 2023 Nov 2 paper code pdf note Authors: Shen Nie, Hanzhong Allan Guo, Cheng Lu, Yuhao Zhou, Chenyu Zheng, Chongxuan Li
"COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing" Arxiv, 2024 Jun 13 paper code pdf note Authors: Jiangshan Wang, Yue Ma, Jiayi Guo, Yicheng Xiao, Gao Huang, Xiu Li

融入 correspondence

"OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation" paper
"Zero-shot Image Editing with Reference Imitation" MimicBrush ⭐ paper code

cross-attn KV concat 融合实现匹配

24.06.18

"One-Step Effective Diffusion Network for Real-World Image Super-Resolution" paper
Stable Diffusion 3 Medium

"Scaling Rectified Flow Transformers for High-Resolution Image Synthesis" Arxiv, 2024 Mar 5, SD3 paper code weights pdf note Authors: Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach
"I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models"
"Generative Image Dynamics" CVPR_best_paper paper
"ToonCrafter: Generative Cartoon Interpolation" Arxiv, 2024 May 28 paper code pdf note Authors: Jinbo Xing, Hanyuan Liu, Menghan Xia, Yong Zhang, Xintao Wang, Ying Shan, Tien-Tsin Wong

24.06.25

"Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer" Arxiv, 2024 May 7 paper code pdf note Authors: Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang

超大分辨率超分，patch 之间关联

"MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling" paper

插帧

"Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing" paper

视频动作编辑

"OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model" paper

RealSR

"Autoregressive Image Generation without Vector Quantization" paper
"Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding" paper
"Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%"

增大 VQGAN codebook 看看效果
"VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding"

获取文本！

24.06.28

https://arxiv.org/pdf/2312.10240
https://arxiv.org/abs/2405.17421
2309.07906
2406.16863
"Zero-shot Image Editing with Reference Imitation" Arxiv, 2024 Jun 11, MimicBrush paper code pdf note Authors: Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao

ControlNet 有学习 dense correspondence 的能U力，基于 correspondence 去做 inpaint

"ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning"

设计方法https://arxiv.org/pdf/2406.14130提升当前 video synthesis 生成的时序长度，显存更低

, we propose a novel post-tuning methodology fMicmicBrushor video synthesis models, called ExVideo. This approach is designed to enhance the capability of current video synthesis models, allowing them to produce content over extended temporal durations while incurring lower training expenditures. I

"AnyDoor: Zero-shot Object-level Image Customization" CVPR, 2023 Jul 18 paper code pdf note Authors: Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao

使用预训练的 DINOv2 提供细节特征，DINOv2 有全局和 patch 的特征，发现 concat 起来过可学习的 MLP，可以与 UNet 特征空间对齐 ⭐

贴图的时候使用高频特征，而不是放图像，避免生成图像不搭的情况

各个 trick，细节一致性还是不足，例如文字扭曲了

DNIO or CLIP 特征很重要，作为图像物体生成的基本盘，不加物体直接不一样；细节不一致的问题要再用高频特征约束一下

发现训练早期多用视频中多姿态物体训练，能够增强生成物体的细节一致性，缓解色偏的问题

对比 DINO, CLIP 提取物体特征

DINO 特征对于物体细节的特征比 CLIP 特征更优秀，但 DINO 特征要处理下才能好：用分割图提取物体再去提取特征才能得到接近原始物体的结果

CLIP 特征有点离谱，可能是背景干扰很大

"ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning" Arxiv, 2024 Jun 20 paper code pdf note Authors: Zhongjie Duan, Wenmeng Zhou, Cen Chen, Yaliang Li, Weining Qian
"EvTexture: Event-driven Texture Enhancement for Video Super-Resolution" Arxiv, 2024 Jun 19 paper code pdf note Authors: Dachun Kai, Jiayao Lu, Yueyi Zhang, Xiaoyan Sun

Mask 制作

"Masked Autoencoders Are Scalable Vision Learners" CVPR, 2021 Nov, MAE paper
"SimMIM: A Simple Framework for Masked Image Modeling" CVPR, 2021 Nov 18 paper code pdf note Authors: Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu

24.07.08

kuaishou 可图，keling

https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf
"Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis" Arxiv, 2024 Jul 7 paper code pdf note website Authors: Kolors Team
"LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control" Arxiv, 2024 Jul 3 paper code pdf note Authors: Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, Di Zhang

Fancy Stuff, efficiency

"Learning to (Learn at Test Time): RNNs with Expressive Hidden States" paper
"StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control" paper

24.07.15

"Explore the Limits of Omni-modal Pretraining at Scale" Arxiv , 2024 Jun 13, MiCo, paper code [pdf](./2024_06_Arxiv _Explore-the-Limits-of-Omni-modal-Pretraining-at-Scale.pdf) [note](./2024_06_Arxiv _Explore-the-Limits-of-Omni-modal-Pretraining-at-Scale_Note.md) Authors: Yiyuan Zhang, Handong Li, Jing Liu, Xiangyu Yue

多模态

"MambaVision: A Hybrid Mamba-Transformer Vision Backbone" Arxiv , 2024 Jul 10, MambaVision paper code [pdf](./2024_07_Arxiv _MambaVision--A-Hybrid-Mamba-Transformer-Vision-Backbone.pdf) [note](./2024_07_Arxiv _MambaVision--A-Hybrid-Mamba-Transformer-Vision-Backbone_Note.md) Authors: Ali Hatamizadeh, Jan Kautz

Backbone

SEED-Story: Multimodal Long Story Generation with Large Language Model

https://arxiv.org/abs/2407.08683v1
"MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds" https://arxiv.org/pdf/2405.17421

视频直接到 4D

"FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds" paper

视频生成音频

"Image Neural Field Diffusion Models" paper

高分辨率输出逼真细节

"A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights" paper
"Video Diffusion Alignment via Reward Gradients" Arxiv , 2024 Jul 11 paper code web [pdf](./2024_07_Arxiv _Video-Diffusion-Alignment-via-Reward-Gradients.pdf) [note](./2024_07_Arxiv _Video-Diffusion-Alignment-via-Reward-Gradients_Note.md) Authors: Mihir Prabhudesai, Russell Mendonca, Zheyang Qin, Katerina Fragkiadaki, Deepak Pathak

24.07.21

"Visual Geometry Grounded Deep Structure From Motion" CVPR, 2023 Dec 7 paper code pdf note Authors: Jianyuan Wang, Nikita Karaev, Christian Rupprecht, David Novotny

predict camera pose

"BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering" Arxiv, 2024 Mar 10 paper code pdf note Authors: Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Pingyu Wang, Xuecheng Nie
"LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models" Arxiv, 2024 Jul 12 paper code pdf note Authors: Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, Shuaicheng Liu

Research Note

Paper-Writing-Note blog

Research Suggestion

论文笔记

主要看懂方法；总结学到了啥（总结创新点，分析如何能用到自己的任务；提到的典型方法；...）

Key-point
- Task
- Background
- 🏷️ Label:
Contributions

Related Work

methods

Experiment

ablation study 看那个模块有效，总结一下

Limitations

Summary 🌟

learn what & how to apply to our task
文章阅读建议

每周 5 篇精读，一篇文章要看多次的，第一次看完不懂没关系，但要记录下来后面再看！一定要整理 code，进性总结 ⭐⭐

https://gaplab.cuhk.edu.cn/cvpapers/#home 这里整理分类了近几年计算机视觉方面重要会议（CVPR，ICCV，ECCV，NeurIPS，ICLR）的文章和代码，大家可以多看看

https://openaccess.thecvf.com/menu 这是CVF的官网，一些计算机视觉一些重要会议（CVPR，ICCV，WACV）的所有文章附录等材料

https://www.ecva.net/index.php 这是ECCV的官网，历年的文章附录都有

建议这些会议（CVPR，ICCV，ECCV，NeurIPS，ICLR，ICML，AAAI，IJCAI，ACMMM等）的文章以及一些重要期刊（T-PAMI，T-IP，TOG，TVCG，IJCV，T-MM，T-CSVT等）大家多阅读，相同或者相近任务的文章至少全部粗读一遍，然后选择性精读，需要学会使用Google学术和GitHub查询有用资料
复现方法时，检查正确性：先看数据输入是否正确（dataloader，learning-rate, batchsize不要是1），再去看框架

至少想 2 个创新点，做实验看为什么不 work，分析问题&看文献；

Possible direction
- diffusion 稳定 or 加速训练
- ControlNet >> 能否借鉴到 video editing ⭐
- GAN 之前存在的问题，一些**能否用到 diffusion 中
  - 模式崩塌：多样性很差，只生产几个类别，质量比较好
  - Limited Data 看**
  - wavelet diffusion models
- Rado, 张雷老师组 >> diffusion model in low level
- https://orpatashnik.github.io/ 看一下这个组的工作 >> StyleCLIP, StyleGAN-NADA Daniel Cohen-Or Blog
关注自己的研究方向，作为主线：diffusion model 用于老电影修复。当这周的论文阅读量没做完，优先看自己的主线方向论文和项目进展！
主线方向，和视频相关方向都要看，只不过要学会某些进行略读。不要局限于技术细节，识别哪些可以暂时跳过，记录下来后面看。

Low-level All-stars

Chao Dong 中科大
Xintao Wang Kuaishou
Chen Change Loy NTU
Zhangkai NI TJ

Q&A

Q：DDPM 采样原理；
Q：DDIM 的区别？
Q：ControlNet reference-only 处理残差的方式？
Q：手写一下 self-attention 公式和代码 ⭐

"Attention Is All You Need" NIPS, 2017 Jun 12 paper code pdf note Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

Current Progress 🎯

my_own_work_journey

Dataset

youtube-dl

youtube-dl is a command-line program to download videos from YouTube.com

10downloader tools to Download Youtube Video

FFHQ(Flickr-Faces-Hight-Quality)

FFHQ 介绍博客

FFHQ是一个高质量的人脸数据集，包含1024x1024分辨率的70000张PNG格式高清人脸图像，在年龄、种族和图像背景上丰富多样且差异明显，在人脸属性上也拥有非常多的变化，拥有不同的年龄、性别、种族、肤色、表情、脸型、发型、人脸姿态等，包括普通眼镜、太阳镜、帽子、发饰及围巾等多种人脸周边配件，因此该数据集也是可以用于开发一些人脸属性分类或者人脸语义分割模型的。(人脸图像恢复)
YouTube-VOS

the first large-scale dataset for video instance segmentation 大部分视频为 human & one item 互动。

提供分割图

several video super-resolution, deblurring, and denoising datasets like REDS [49], DVD [69], GoPro [50], DAVIS [35], Set8 [72] quote from "ReBotNet: Fast Real-time Video Enhancement"

Low-level Dataset

DDPD(Dual-pixel defocus deblurring.)

350 images for training, 74 images for validation and 76 images for testing. 每个场景有 2 张 blur 的图，一张 all-in-focus 图

video dataset

The Densely Annotation Video Segmentation dataset (DAVIS)

dataset

There are 50 video sequences with 3455 densely annotated frames in pixel level. 可用于 Video inpaint, denoising, interactive segmentation 任务
Talking-Heads (video deblur, heads video)

"One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing" CVPR, 2021 Apr paper code:unofficial

public dataset that uses Youtube videos and processes them using face detectors to obtain just the face.
GoPro (video deblur)

"Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring"

paper blog

作者使用GOPRO4 HERO Black相机拍摄了240fps的视频，然后对连续的7到13帧取平均获得模糊程度不一的图像。每张清晰图像的快门速度为1/240s，对15帧取平均相当于获得了一张快门速度为1/16s的模糊图像。作者将模糊图像对应的清晰图像定义为处于中间位置的那一帧图像。最终，一共生成了3214对模糊-清晰图像，分辨率为1280×720
DVD (video deblur)

"Deep Video Deblurring for Hand-held Cameras" CVPR 2017

paper code dataset website

拍摄了 71 个 240 FPS 视频，含有多个场景：合成 Long Exposure 来得到 blur video，最后处理得到 2Million training patches
"ReBotNet: Fast Real-time Video Enhancement" 从 Youtube Talking-Head数据集处理，针对视频会议构造的两个数据集，没开源 (video deblur, heads video)
- PortraitVideo 关注人脸区域：，筛选&抠出人脸区域视频；resolution of the faces to 384 × 384. The videos are processed at 30 frames per second (FPS) with a total of 150 frames per video. 加上退化操作
- FullVideo 关注说话人的身体和其他周围场景：从 Talking-Head 数据集筛选，没用 Talking-Head 的预处理方法。132 training videos and 20 testing videos, and all videos are 720 × 1280, 30 FPS and 128 frames long. We apply similar degradations as PortraitVideo
Set8 (usually used as test set)

Set8 is composed of 8 sequences: 4 sequences from the Derf 480p testset ("tractor", "touchdown", "park_joy", "sunflower") plus other 4 540p sequences. You can find these under the test_sequences folder here.
Vimeo-90K

"Video Enhancement with Task-Oriented Flow" IJCV 2019 website sample-clip-from-viemo-90K

build a large-scale, high-quality video dataset, Vimeo90K. This dataset consists of 89,800 video clips downloaded from vimeo.com, which covers large variaty of scenes and actions. It is designed for the following four video processing tasks: temporal frame interpolation, video denoising, video deblocking, and video super-resolution.
Youtube-VOS

link
WebVid10M subset code

335K video-text pair 336×596

下载工具 video2dataset
YouHQ

animate

"Sakuga-42M Dataset: Scaling Up Cartoon Research" Arxiv, 2024 May 13 ⚠️ (Withdraw) paper code pdf note Authors: Zhenglin Pan

Old photos Dataset

老照片修复中心 https://www.ancientfaces.com/photo/george-roberts/1328388 old photos textures

"Time-Travel Rephotography" SIGGRAPH, 2021 Dec ⭐

HWFD 数据集，100多张名人人脸照片，可以下载
"Bringing Old Photos Back to Life" CVPR oral, 2020 Apr ⭐

Pascal VOC, DIV2K 上合成的照片（DA 看论文，退化模板没给要去网上找），真实老照片只用来测试（ancient face）
"Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer" CVPR, 2023 Apr

644 old color photos produced in 20th century >> 3个韩国博物馆找的文物图
"Pik-Fix: Restoring and Colorizing Old Photo" WACV, 2023

Div2K [1], Pascal [17], and RealOld
- hired Photoshop experts to mimic the degradation patterns in real old photos (but not from our newly created RealOld dataset) on images from the Pascal dataset, using alpha compositing with randomized transparency levels, thereby generating synthetic old photos.
- Real-World Old Photos (RealOld)
  
  we collected digitized copies of 200 real old black & white photographs. Each of these photos were digitally manually restored and colorized by Photoshop experts.
  
  the first real-world old photo dataset that has aligned “ground truth”

Old Films Dataset

old movie, old cartoon 都可以！ How Old Movies Are Professionally Restored | Movies Insider **电影修复馆 Baidu 智感超清服务修复流程

网上按 film noise 关键字查询噪声模板

example 35mm胶带

Commercial Old films https://www.britishpathe.com/ 老电影商店 75英镑下载一个。。
Youtube Denis Shiryaev Youtuber permit other to use the video for research in his video comment. 有给出 source video Youtube url

[4k, 60 fps] A Trip Through New York City in 1911 already restore by several algorithms ⚠️ [4k, 60 fps] San Francisco, a Trip down Market Street, April 14, 1906 >> tell what methods used to restore
Youtube guy jones 1902c - Crowd Ice Skating in Montreal (stabilized w/ added sound)
Youtube Old cartoon

[A Day at the Zoo (1939) goland-2022.2.1.exe ](https://www.youtube.com/watch?v=RtblQQvT2Nk&list=PL-F4vmhdMdiXIXZEDNQ3UFLXmQqjHguBA)
Youtube GHWTVideos
优酷搜索老电影

优酷 kux 文件转为 mp4
B 站博主

https://www.bilibili.com/video/BV1dT411u7Hu/?spm_id_from=333.788&vd_source=eee3c3d2035e37c90bb007ff46c6e881 https://www.bilibili.com/video/BV1oG41187Rp/?spm_id_from=333.999.0.0&vd_source=eee3c3d2035e37c90bb007ff46c6e881

https://github.com/leiurayer/downkyi b站视频下载工具

The Dataset that used in old video restoration related paper

Deepremaster 👍

作者从 Youtube-8M dataset 数据集，筛选了一些合适用于合成的视频，共 1569 个youtube 视频，给出了视频 url 和退化模板（网上按 film noise 关键字搜索）。

按 DA 方式对视频帧进行退化
"Bringing Old Films Back to Life" CVPR, 2022 Mar

crop 256 patches from REDS dataset and apply the proposed video degradation model（DA & noise template） on the fly

REDS sharp data: train 240 video and each has 100 Frame
"Blind flickering" 提供自己构造的 flickering 数据（Link to paper info）
- 真实数据 evaluation
  
  60 * old_movie clip，存储为 %05d.jpg 大多为 350 帧图像，若 fps=25，约为 10-14s的视频。
  
  21* old_cartoon clip，图像格式存储，大多为 50-100 帧，约为 1 - 4s 视频
- 合成数据 train
  
  用软件自己修复的视频
"DSTT-MARB: Multi-scale Attention Based Spatio-Temporal Transformers for Old Film Restoration" Master Thesis

没看文章里面有 release 数据的 url

参考 Deepremaster 使用合成数据，a subset of 103845 images is selected from YouTube-VOS Dataset
- 找 Noise 模板，增加到 5770 个
- noise 模板预处理：几个模板融合起来
- Noise-level：原图和 noise 模板，使用图形学 Grain-merge 等方法融合
- frame-level >> pdf Page 51
"RTTLC: Video Colorization with Restored Transformer and Test-time Local" CVPR, 2023 Mar

LDV Dataset contains 240 high-quality videos and exhibits a high degree of diversity. Specifically, we select 200 color videos with a resolution of 960×536 as the training set. The validation set contains 15 videos

trained for 4 days on four NVIDIA GeForce RTX 3090 GPUs.

DeOldify [1], RTN [23] and BasicVSR++

Old-VHS-recording-dataset

VHS recordings videos given from Mr.Jorvan contacted under the blog: Can I upload a dataset of old VHS recordings of music videos? You’ll probably need to do some trimming and velocity adjustments here and there, and some videos don’t have audio for some reason.

What is VHS? VHS(Video Home System) is a standard for consumer-level analog video recording on tape cassettes invented in 1976 by the Victor Company of Japan and was the competitor to the ill-fated Sony Betamax system.

没有 GT 但有相似的？

How the 90s VHS look works

How to Convert Your Old VHS Tapes to DVD with honestech VHS to DVD 4.0 Deluxe

如何合成 old VHS video vaporwave 通过抽象虚拟信号影像展现过程中的各种可能性、实现了九十年代影像风格的重现。

IOS app

accept file transfer
❓ have old music videos (with blur, noise, artifacts, etc.) that nowadays are on youtube in HD

At least some of them have GT！
- 确定视频 degradation 类型 Sade - Is It A Crime (Live Video from San Diego) Mariah Carey - Emotions
可以模仿类似方法找老视频数据！去影像店找那种旧的录像带如何把图片处理成VHS录像带的效果？

Synthetic VHS

PR software VHS template video1

PR 软件下载

image VHS like vaporwave

video VHS

Creating faux analogue video imagery with python ntscqt 👍 python rewrite of https://github.com/joncampbell123/composite-video-simulator

The ultimate goal is to reproduce all of the artifacts described here https://bavc.github.io/avaa/tags.html#video ⭐ >> Video artifact 社区标准

How to compile using Makefile.am

v ba

lokixun / diffusion_video_paper_list Goto Github PK

diffusion_video_paper_list's Introduction

Diffusion_Journey 🔫

Old photo restoration

Degrade Region 🦀

Old video restoration 🔥

Analog Video Restoration 🔥

Video Diffusion

Image2Video

talking video

Diffusion related

Diffusion basics

milestone 🗿

Acceleration

findings

Generative Prior

edit

ID

light

restoration

DA

Img2Video

3D

Text specific

GAN

stable-training

Mamba 🐍

Image restoration

findings

Colorization

Unsupervised

Plug-and-Play

Blind-Restoration

inpainting

deblur

dehaze

reference

Image Control/Edit

Image SR

block-based

RealSR

Video Editing ✂️

Video Inpainting 😷

Video Interpolation

Video generation

Video Restoration 💧

Video Denoising 🚱

Video Colorization 🎨

Video SR 🔍

event camera

Video Understanding 🤔

memory

match attention 🕸️

Reference SR

Spatial-Temporal

Foundation Model

feature alignment

Video Grounding

Prompt 📚

HDR,LLIE 🔅

Trick

Model Architecture Design

Attention:moyai:

Video Transformer

Efficient-Backbone

SAM

self/semi-Supervised Learning

NeRF

Implicit Neural Network

Invertible Network

Neural Operators

IQA

Impressive Blog

Clandestine 📫

Research Note

Research Suggestion

Low-level All-stars

Q&A

Current Progress 🎯

Dataset