Comments (3)
@tsb0601 ??
from mmvp.
I think the code is set up for image size 224, so the number of patches for both clip and dino is 256, which sums up to 512. As shown in the above table.
The below table shows results for image size 336. Here you can see number of patches for both clip and dino is 576, which sums up to 1152.
from mmvp.
@tsb0601 Is clip-vit-large-patch14 used in LLaVA-1.5 for image size 224 ?
from mmvp.
Related Issues (20)
- leaked api key HOT 1
- The accuracy of LLaVA-1.5-7b with CLIP encoder is 60.0 on MMVP HOT 3
- Answer sheet of LLMs on MMVP? HOT 1
- CLIP-blind pairs HOT 2
- GPT4-V Prompt Used in Evaluation
- Index Overflow in position_ids when evaluate on mmvp HOT 1
- LLaVA-1.5 stage2 Traing batchsize HOT 1
- Correction Needed for Incorrect Answers in MMVP Benchmark Questions 279 and 280
- Unable to load the provided pretrained
- pretrain_dino_mm_mlp_adapter
- Is the result of MMVP-VLM sensitive to the presentation of the statement?
- COMM (which has already been on Arxiv on 2023.10 https://arxiv.org/pdf/2310.08825.pdf) already proposed to merge the features of CLIP and DINOv2 to realize MLLM, maybe this paper should cite this reference.
- how to reproduce the siglip's performance on MMVP-VLM Benchmark?
- evaluate on the MMVP
- Training Time details HOT 1
- Implementation Details of Additive-MoF
- Why is CLIP model created during loading od DINO Encoder?
- Does params in table1 reported in the paper just contain the visual parameter number?
- RuntimeError(f"still have inflight params " during Instruction Tuning
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mmvp.