Topic: vision-and-language Goto Github
Some thing interesting about vision-and-language
Some thing interesting about vision-and-language
vision-and-language,My Reading Lists of Deep Learning and Natural Language Processing
User: 26hzhang
vision-and-language,A one stop repository for generative AI research updates, interview resources, notebooks and much more!
User: aishwaryanr
Home Page: https://www.linkedin.com/in/areganti/
vision-and-language,Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
User: chenrocks
Home Page: https://arxiv.org/abs/1909.11740
vision-and-language,Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
User: dandelin
vision-and-language,A curated list of research papers in Vision-Language Navigation (VLN)
User: daqingliu
vision-and-language,A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
Organization: eric-ai-lab
Home Page: https://arxiv.org/abs/2203.12667
vision-and-language,[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
User: fuxiaoliu
Home Page: https://fuxiaoliu.github.io/LRV/
vision-and-language,A curated list of awesome vision and language resources for earth observation.
Organization: geoaigroup
Home Page: https://geogroup.ai/
vision-and-language,Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
Organization: google-research-datasets
vision-and-language,[ECCV2024 Oralπ₯] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
User: haiyang-w
Home Page: https://arxiv.org/abs/2403.09394
vision-and-language,Awesome Resources for Advanced Computer Vision Topics
User: haofanwang
vision-and-language,HPT - Open Multimodal LLMs from HyperGAI
Organization: hypergai
Home Page: https://www.hypergai.com/
vision-and-language,PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
User: j-min
Home Page: https://arxiv.org/abs/2205.13115
vision-and-language,PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
User: j-min
Home Page: https://arxiv.org/abs/2102.02779
vision-and-language,Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
User: jackroos
vision-and-language,[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
User: jayleicn
Home Page: https://arxiv.org/abs/2102.06183
vision-and-language,Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
Organization: jdai-cv
vision-and-language,This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
User: jindonggu
Home Page: https://arxiv.org/abs/2307.12980
vision-and-language,Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
User: linjieli222
Home Page: https://arxiv.org/abs/2005.00200
vision-and-language,ζ₯ζ¬θͺLLMγΎγ¨γ - Overview of Japanese LLMs
Organization: llm-jp
Home Page: https://llm-jp.github.io/awesome-japanese-llm
vision-and-language,[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"
User: marsaki
vision-and-language,[CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Organization: mbzuai-oryx
Home Page: https://grounding-anything.com
vision-and-language,CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
User: mees
Home Page: http://calvin.cs.uni-freiburg.de
vision-and-language,Oscar and VinVL
Organization: microsoft
vision-and-language,[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Organization: nvlabs
Home Page: https://arxiv.org/abs/2402.09353
vision-and-language,The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Organization: nvlabs
Home Page: https://shikun.io/projects/prismer
vision-and-language,A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Organization: ofa-sys
vision-and-language,Real-time and accurate open-vocabulary end-to-end object detection
Organization: om-ai-lab
vision-and-language,RS5M: a large-scale vision language dataset for remote sensing
Organization: om-ai-lab
vision-and-language,Multimodal-GPT
Organization: open-mmlab
vision-and-language,[ECCV 2024 Oral] PointLLM: Empowering Large Language Models to Understand Point Clouds
Organization: openrobotlab
Home Page: https://runsenxu.com/projects/PointLLM
vision-and-language,The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
User: paranioar
vision-and-language,Pathology Language and Image Pre-Training (PLIP) is the first vision and language foundation model for Pathology AI (Nature Medicine). PLIP is a large-scale pre-trained model that can be used to extract visual and language features from pathology images and text description. The model is a fine-tuned version of the original CLIP model.
Organization: pathologyfoundation
vision-and-language,AI Research Platform for Reinforcement Learning from Real Panoramic Images.
User: peteanderson80
vision-and-language,Recent Advances in Vision and Language Pre-training (VLP)
User: phellonchen
vision-and-language,Code for ALBEF: a new vision-language pre-training method
Organization: salesforce
vision-and-language,Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Organization: salesforce
vision-and-language,LAVIS - A One-stop Library for Language-Vision Intelligence
Organization: salesforce
vision-and-language,Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Organization: salt-nlp
Home Page: https://llavar.github.io/
vision-and-language,A curated list of awesome vision and language resources (still under construction... stay tuned!)
User: sangminwoo
vision-and-language,This repository is a curated collection of the most exciting and influential CVPR 2023 papers. π₯ [Paper + Code]
User: skalskip
vision-and-language,This repository is a curated collection of the most exciting and influential CVPR 2024 papers. π₯ [Paper + Code + Demo]
User: skalskip
vision-and-language,[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
User: sunzey
Home Page: https://aleafy.github.io/alpha-clip
vision-and-language,A Gradio demo of MGIE
User: tsujuifu
vision-and-language,code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
Organization: uta-smile
vision-and-language,Creating a software for automatic monitoring in online proctoring
User: vardanagarwal
vision-and-language,X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
User: yehli
vision-and-language,PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
User: ylsung
vision-and-language,Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
User: yuewang-cuhk
vision-and-language,X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
User: zengyan-97
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.