Hello, I've successfully run the demo locally and managed to obtain

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubusercontent.com

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Question about Output Quality Difference Between Local and Online Demo for MBZUAI/GLaMM-FullScope about groundinglmm HOT 3 CLOSED

mbzuai-oryx commented on August 10, 2024

Question about Output Quality Difference Between Local and Online Demo for MBZUAI/GLaMM-FullScope

from groundinglmm.

Comments (3)

hanoonaR commented on August 10, 2024

Hi @Jayce1kk,

Thank you for your interest in our work. To assist further, could you provide the image, the prompt used, and a screenshot highlighting the differences between the model? We will try to replicate and address the issue. If necessary, we can also suggest an alternative checkpoint for you to try. Thank you.

from groundinglmm.

Jayce1kk commented on August 10, 2024

Hi, Thank you very much for your reply！ I'm using the example given. The code Settings for app.py have not changed, I've just changed the model path to the local path.

from groundinglmm.

hanoonaR commented on August 10, 2024

Hi @Jayce1kk,

Thank you for providing more details about your setup. The difference you're noticing is due to differences in the checkpoints and the model versions used in our live demo and the local setup - Our demo model is built on LLaVA 1.0, whereas the released code and models, including MBZUAI/GLaMM-FullScope, is built on LLaVA 1.5.

To achieve similar results to our live demo, please use this checkpoint: GLaMM-FullScope_v0. You'll want to make a few adjustments:

Change the --vision tower from openai/clip-vit-large-patch14-336 to openai/clip-vit-large-patch14. This modification is due to the input size of the global image encoder (CLIP image encoder) being 224 instead of 336. You'll also need to adjust token lengths in these lines (here and here) from 575 to 255, reflecting the change in input dimensions (224/14=16, resulting in 16*16=256 tokens).
Update the V-L projection layer (LLaVA 1.0 uses a single linear layer):
Replace self.mm_projector = nn.Sequential(*modules) with self.mm_projector = nn.Linear(config.mm_hidden_size, config.hidden_size)

We hope these adjustments help you in reproducing the demo results locally. Please don't hesitate to reach out if you encounter any issues or have further questions.

from groundinglmm.

Recommend Projects

Question about Output Quality Difference Between Local and Online Demo for MBZUAI/GLaMM-FullScope about groundinglmm HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent