When I read the paper, I found the model can handle multiple regions as input，Does th

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

For region level captioning, does the model support multi-region inputs? about groundinglmm HOT 4 CLOSED

HumbleBone commented on August 10, 2024

For region level captioning, does the model support multi-region inputs?

from groundinglmm.

Comments (4)

hanoonaR commented on August 10, 2024

Hi @HumbleBone ,

Yes, the model can process multiple regions within a single image. To achieve this, you can utilize a structured query format. For instance, you can frame your request as "Can you please describe region1 <bbox> and region2 <bbox>?". In this query, the and tokens are placeholders for the representations of the respective regions you wish to describe. The model is designed to replace these tokens sequentially with the representations of the corresponding regions based on the order of the box prompts that you provide.

For example, you would get a response like:

I hope this clarifies your query. If you have any further questions or need more detailed guidance, feel free to ask!

from groundinglmm.

HumbleBone commented on August 10, 2024

Hi @hanoonaR
Thank you for your reply! In this example, does the model generate only one caption for region1 and region2? If I want the model to generate a separate caption of each region, etc, "a man in black" for region1 and "a motorcycle" for region2. can it?

from groundinglmm.

hanoonaR commented on August 10, 2024

Hi @HumbleBone,

Thank you for the clarification. In this scenario you've described, the model generates a single, combined caption that encompasses both region1 and region2 within the same response - as the model is trained to do this to identify how different objects relate with each other. If you are looking to get separate, distinct captions for each region, the current model setup does not support this in a single query. To obtain individual captions for each region, you would need to run separate inferences, one for each region. However, the model can also be tuned to give separate responses.

I hope this helps to clarify the model's capabilities. Thank you.

from groundinglmm.

HumbleBone commented on August 10, 2024

OK, I got it, Thank you very much!

from groundinglmm.

Recommend Projects

For region level captioning, does the model support multi-region inputs? about groundinglmm HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent