Giter VIP home page Giter VIP logo

Comments (3)

hanoonaR avatar hanoonaR commented on August 10, 2024

Hi @Jayce1kk,

Thank you for your interest in our work. To assist further, could you provide the image, the prompt used, and a screenshot highlighting the differences between the model? We will try to replicate and address the issue. If necessary, we can also suggest an alternative checkpoint for you to try. Thank you.

from groundinglmm.

Jayce1kk avatar Jayce1kk commented on August 10, 2024
ballon Hi, Thank you very much for your reply! I'm using the example given. The code Settings for app.py have not changed, I've just changed the model path to the local path.

from groundinglmm.

hanoonaR avatar hanoonaR commented on August 10, 2024

Hi @Jayce1kk,

Thank you for providing more details about your setup. The difference you're noticing is due to differences in the checkpoints and the model versions used in our live demo and the local setup - Our demo model is built on LLaVA 1.0, whereas the released code and models, including MBZUAI/GLaMM-FullScope, is built on LLaVA 1.5.

To achieve similar results to our live demo, please use this checkpoint: GLaMM-FullScope_v0. You'll want to make a few adjustments:

  1. Change the --vision tower from openai/clip-vit-large-patch14-336 to openai/clip-vit-large-patch14. This modification is due to the input size of the global image encoder (CLIP image encoder) being 224 instead of 336. You'll also need to adjust token lengths in these lines (here and here) from 575 to 255, reflecting the change in input dimensions (224/14=16, resulting in 16*16=256 tokens).

  2. Update the V-L projection layer (LLaVA 1.0 uses a single linear layer):
    Replace self.mm_projector = nn.Sequential(*modules) with self.mm_projector = nn.Linear(config.mm_hidden_size, config.hidden_size)

We hope these adjustments help you in reproducing the demo results locally. Please don't hesitate to reach out if you encounter any issues or have further questions.

from groundinglmm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.