Giter VIP home page Giter VIP logo

Comments (13)

Andy1621 avatar Andy1621 commented on May 23, 2024

Good question!

  1. Fine-grained Pose: We manually combine similar poses and randomly generate the candidates, like drop(5), pick up(6), sit down(8), stand up(9), hopping(26), jump up(27), squat down(80).
  2. Scene Transition: We generate the candidates based on the scene annotation, then use ChatGPT to generate the candidates like Based on 'From the courtroom to the prison.', please create three similar sentences with different places.
  3. Unexpected action: We use similar prompts as in the original paper to generate the option as follows
"You are now an assistant for data augmentation. You have extensive experience in video understanding and have mastered this skill. I will provide you with a 'question' and 'answer' regarding a counter-intuitive video.\n" + \
"Your task is to help me understand the content of this paragraph and generate one English question-answer pair from it. The generated question should be closely related to the provided answer.\n" + \
"The format will be multiple choice, where each question has four options - one correct answer and three distractors.\n" + \
f'Question: "{question.strip()}"\n' + \
f'Answer: "{answer.strip()}"\n' + \
"To avoid cheating, the lengths of the correct answer and other distractors MUST be similar.\n" + \
"You need to ONLY return the generated QA in JSON like {'question': '', 'options': [], 'answer': ''}"

We will check the option's length. If the length difference is large, we will regenerate it.

  1. Egocentric Navigation: We randomly generate the candidates from move forward, stop, turn left and move forward, turn right and move forward.

from ask-anything.

cmh1027 avatar cmh1027 commented on May 23, 2024

@Andy1621 I can't find question annotations, as well as options you mentioned. I navigated the datasets, but the question annotations were not in the original datasets. At least It seems that Fine-grained Pose has limited set of questions (like, Which one of these descriptions correctly matches the actions in the video?) but the others (MovieNet, VLN-CE) don't.
Also, there's no annotation for NTU RGBD at all. Did you annotate the data with watching videos by yourself?

from ask-anything.

Andy1621 avatar Andy1621 commented on May 23, 2024

For the questions, we generate it by ChatGPT~

from ask-anything.

Andy1621 avatar Andy1621 commented on May 23, 2024

You can check our appendix for more details, like
image

from ask-anything.

cmh1027 avatar cmh1027 commented on May 23, 2024

@andy621 Could you provide the prompt for MovieNet and VLN-CE?

from ask-anything.

Andy1621 avatar Andy1621 commented on May 23, 2024

I don't save the specific prompt... I remember that I required ChatGPT to generate some basic questions.

from ask-anything.

cmh1027 avatar cmh1027 commented on May 23, 2024

For example in scene_transition,

        "video": "Top006_08310.mp4",
        "question": "Which choice matches the scene changes in the video?",
        "candidates": [
            "From the kitchen to the dining room.",
            "From the staircase to the gangway.",
            "From the bedroom to the bathroom.",
            "From the classroom to the library."
        ],
        "answer": "From the staircase to the gangway."

I wonder where the answer "From the staircase to the gangway" came from.
There's no related data in the original MovieNet dataset

from ask-anything.

cmh1027 avatar cmh1027 commented on May 23, 2024

In addition, It seems that MovieNet currently does not provide video data due to copyright issue. Where did you get videos like Top006_08310.mp4?

from ask-anything.

Andy1621 avatar Andy1621 commented on May 23, 2024

Good question! It's a wrong cite when preparing the paper. We actually use the videos in MoVQA, we will fix it later.

from ask-anything.

cmh1027 avatar cmh1027 commented on May 23, 2024

I see. It seems the dataset hasn't released yet. Is there any plan to release it? Or could you provide me with the dataset?

from ask-anything.

Andy1621 avatar Andy1621 commented on May 23, 2024

Since I'm not the author, you can email the authors for more details.

from ask-anything.

cmh1027 avatar cmh1027 commented on May 23, 2024

In action sequence task of STAR dataset, the paper states that it directly adopts QA of original dataset. However, annotation is quite different from the original one. In the annotation of MVbench, more than half questions start with "What happened after ~?" Meanwhile, I can't find corresponding questions which start with such sentence in STAR dataset annotation. Can you clarify this discrepancy?

from ask-anything.

Andy1621 avatar Andy1621 commented on May 23, 2024

Please check those Sequence data in STAR. We do not use those QA contains to the which is about object.

from ask-anything.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.