Hi there 👋🏻,
🔭 I am currently focused on developing cutting-edge multi-modality models capable of generating and understanding text and images natively.
My specific areas of interest include:
- Denoising
Auto-regressive GenerationDiffusion
PS: I recognize that all diffusion and next-token generation tasks are inherently denoising tasks. (2024/05)
Document Understanding & Layout AnalysisOptical Character RecognitionObject Detection
PS: Looks like all these tasks can be regarded as next-token generation tasks. (2023/12)
In addition to my current work, I have prior experience in Robotics Perception from my Master's studies. I hope my work can be helpful to you.
Feel free to reach out if you have any questions or if there's anything I can assist with!