Name: Xinyu Huang
Type: User
Company: Fudan University
Bio: Ph.D. Student at Fudan University, homepage: xinyu1205.github.io
Location: Shanghai, China
Blog: https://xinyu1205.github.io
Xinyu Huang's Projects
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"
Code for ALBEF: a new vision-language pre-training method
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Contrastive Language-Image Pretraining
äøé®å¹³å®å¤ę¦å°čę¬ļ¼čŖåØååæ«éäøę„ē«ę
Marrying Grounding DINO with Segment Anything & Tag2Text & Stable Diffusion & BLIP & Whisper - Automatically Recognize, Detect, Segment and Generate Anything with Image, Text, and Speech Inputs
The official implementation of "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Code for paper: IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training [ACM MM2022]
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
Object Detection Metrics
Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".
Open-source and strong foundation image recognition models.
Code for paper: Simple and Robust Loss Design for Multi-Label Learning with Missing Labels
Code implementation for paper "On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals".
š¤ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.