Giter VIP home page Giter VIP logo

visual-query's Introduction

Visual-Query

(the sample output qa data is under folder "QA_data" The code for generating the data is under "QA_automation") View sampe generated QA data without download the json: http://jsonblob.com/1117586296569872384

Goal: Query an visual object.

{"key": "visual_object", "value": "timestamp_last_appear"}

QA Data Automation

Current chat-gpt prompt description

Identify objects appear in the text and ask the question when that object appear in the video.

input: the annotation/summary of one single clip from the video with description, start time, and end time.

output: Q: When does object X appear? A: start time - end time.

Problem with current policy

  • The json file contains annotation and summary. The summaries largely contains physical object, but the annotations contains mostly movements that are not descriptive e.g. X moves his head around. And the non-descriptive annotation will make gpt ask questions that does not make sense. Do we ignore annotations to avoid noise in the data?

  • The current policy will identify each object and a timestamp when they show up in the text. For example, if in two separate time intervals of one clip, a yellow key appears in both time intervals, then the data will contain two different timestamp for that single object. However, we want to identify the time when the object last appears. Should we use another policy that takes the descriptions of all clips in the entire video and ask gpt the question of "when does that object last appears?"

    • Pros for current design:

      • It is easy for gpt model to identify objects from a single text description (compared to given the whole context in a video).
      • The answer has a larger possiblity to be correct because the timestamp for that specific text description is given in the prompt (It is to say that the job of the gpt model is not to find a timestamp, but instead to identify physical object from the description).
    • Cons for current design:

      • The data will contain multiple timestamp for one single object if that object appears more than once (may not be a problem for our purpose?).
    • Pros for the other design:

      • The data will only have timestamp for the last occurance of an object.
    • Cons for the other design:

      • Hard to verify the correctness of the data.
      • Hard for the gpt model to output the correct answer given the entire video narration as context: 1. Identify objects that appears. 2. Finding the last occurance of an object.

Question: Does containing all timestamps for the occurance of an object affect the legitimacy of the data? If the data gives our model the ability to capture the occurance of an object instead of only capturing the last occurance, does that affect the models ability for visual query? We can train two models using different data for comparison.

visual-query's People

Contributors

jackgeng19 avatar taixil avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.