Giter VIP home page Giter VIP logo

Comments (4)

hanoonaR avatar hanoonaR commented on August 10, 2024

Hi @HumbleBone ,

Yes, the model can process multiple regions within a single image. To achieve this, you can utilize a structured query format. For instance, you can frame your request as "Can you please describe region1 <bbox> and region2 <bbox>?". In this query, the and tokens are placeholders for the representations of the respective regions you wish to describe. The model is designed to replace these tokens sequentially with the representations of the corresponding regions based on the order of the box prompts that you provide.

For example, you would get a response like:
image

I hope this clarifies your query. If you have any further questions or need more detailed guidance, feel free to ask!

from groundinglmm.

HumbleBone avatar HumbleBone commented on August 10, 2024

Hi @hanoonaR
Thank you for your reply! In this example, does the model generate only one caption for region1 and region2? If I want the model to generate a separate caption of each region, etc, "a man in black" for region1 and "a motorcycle" for region2. can it?

from groundinglmm.

hanoonaR avatar hanoonaR commented on August 10, 2024

Hi @HumbleBone,

Thank you for the clarification. In this scenario you've described, the model generates a single, combined caption that encompasses both region1 and region2 within the same response - as the model is trained to do this to identify how different objects relate with each other. If you are looking to get separate, distinct captions for each region, the current model setup does not support this in a single query. To obtain individual captions for each region, you would need to run separate inferences, one for each region. However, the model can also be tuned to give separate responses.

I hope this helps to clarify the model's capabilities. Thank you.

from groundinglmm.

HumbleBone avatar HumbleBone commented on August 10, 2024

OK, I got it, Thank you very much!

from groundinglmm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.