As mentioned in the title, Section 4.2, "Relationships and Landmarks," presents some p

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

how the relationships are formed using objects from level-1? about groundinglmm HOT 1 CLOSED

mbzuai-oryx commented on August 10, 2024

how the relationships are formed using objects from level-1?

from groundinglmm.

Comments (1)

hanoonaR commented on August 10, 2024

Hi @peiliu0408,

Thank you for your interest in our work. Here are the clarifications to your questions:

Object Relationships in Level 2: In Level 1, our primary focus is on precise object identification within images. Moving to Level 2, we aim to understand the interactions between these identified objects. We utilize BLIP-2 and LLaVA-1.5 to generate short captions, followed by applying SpaCy for phrase extraction and MDETR for phrase grounding, as detailed in Section A.4. For example, if Level 1 identifies objects like "hot-air balloon" and "river," and Level 2 generates a caption such as "A hot air balloon flying over a river with a view of the cityscape," the extracted phrase is "hot air balloon flying over a river." MDETR grounds this phrase to two bounding boxes: one for the "hot air balloon" and another for the "river." These groundings are then matched with the objects identified in Level 1. Therefore, in a simplified form, we use phrase grounding to establish that these two objects are related, and their relationship can be described by the phrase itself.
Introduction of Landmark Category: The landmark category in Level 2 significantly enhances scene comprehension. Establishing object relationships is followed by scene categorization for added context. This is particularly vital for Level 3’s dense captioning with Vicuna-v1.5, a language-only model. By incorporating detailed information about relationships and scene characteristics in Level 2, we lay a comprehensive foundation that allows Vicuna-v1.5 to generate robust and informative dense captions. For additional details please refer to Table.7 in appendix.

from groundinglmm.

Recommend Projects