Comments (13)
Good question!
- Fine-grained Pose: We manually combine similar poses and randomly generate the candidates, like
drop(5), pick up(6), sit down(8), stand up(9), hopping(26), jump up(27), squat down(80)
. - Scene Transition: We generate the candidates based on the scene annotation, then use ChatGPT to generate the candidates like
Based on 'From the courtroom to the prison.', please create three similar sentences with different places
. - Unexpected action: We use similar prompts as in the original paper to generate the option as follows
"You are now an assistant for data augmentation. You have extensive experience in video understanding and have mastered this skill. I will provide you with a 'question' and 'answer' regarding a counter-intuitive video.\n" + \
"Your task is to help me understand the content of this paragraph and generate one English question-answer pair from it. The generated question should be closely related to the provided answer.\n" + \
"The format will be multiple choice, where each question has four options - one correct answer and three distractors.\n" + \
f'Question: "{question.strip()}"\n' + \
f'Answer: "{answer.strip()}"\n' + \
"To avoid cheating, the lengths of the correct answer and other distractors MUST be similar.\n" + \
"You need to ONLY return the generated QA in JSON like {'question': '', 'options': [], 'answer': ''}"
We will check the option's length. If the length difference is large, we will regenerate it.
- Egocentric Navigation: We randomly generate the candidates from
move forward, stop, turn left and move forward, turn right and move forward
.
from ask-anything.
@Andy1621 I can't find question annotations, as well as options you mentioned. I navigated the datasets, but the question annotations were not in the original datasets. At least It seems that Fine-grained Pose has limited set of questions (like, Which one of these descriptions correctly matches the actions in the video?) but the others (MovieNet, VLN-CE) don't.
Also, there's no annotation for NTU RGBD at all. Did you annotate the data with watching videos by yourself?
from ask-anything.
For the questions, we generate it by ChatGPT~
from ask-anything.
You can check our appendix for more details, like
from ask-anything.
@andy621 Could you provide the prompt for MovieNet and VLN-CE?
from ask-anything.
I don't save the specific prompt... I remember that I required ChatGPT to generate some basic questions.
from ask-anything.
For example in scene_transition,
"video": "Top006_08310.mp4",
"question": "Which choice matches the scene changes in the video?",
"candidates": [
"From the kitchen to the dining room.",
"From the staircase to the gangway.",
"From the bedroom to the bathroom.",
"From the classroom to the library."
],
"answer": "From the staircase to the gangway."
I wonder where the answer "From the staircase to the gangway" came from.
There's no related data in the original MovieNet dataset
from ask-anything.
In addition, It seems that MovieNet currently does not provide video data due to copyright issue. Where did you get videos like Top006_08310.mp4?
from ask-anything.
Good question! It's a wrong cite when preparing the paper. We actually use the videos in MoVQA, we will fix it later.
from ask-anything.
I see. It seems the dataset hasn't released yet. Is there any plan to release it? Or could you provide me with the dataset?
from ask-anything.
Since I'm not the author, you can email the authors for more details.
from ask-anything.
In action sequence task of STAR dataset, the paper states that it directly adopts QA of original dataset. However, annotation is quite different from the original one. In the annotation of MVbench, more than half questions start with "What happened after ~?" Meanwhile, I can't find corresponding questions which start with such sentence in STAR dataset annotation. Can you clarify this discrepancy?
from ask-anything.
Please check those Sequence
data in STAR
. We do not use those QA contains to the
which is about object.
from ask-anything.
Related Issues (20)
- 一二阶段训练数据读入问题 HOT 1
- Question about training stage3 in videochat2 HOT 3
- Warnings in loading the models HOT 1
- The final fine-tuned vision encoder HOT 1
- Any instructions for fine-tuning on custom datasets? HOT 1
- stage1无法eval HOT 1
- 第一个step后第三阶段loss变为nan HOT 3
- MVBench local evaluation HOT 4
- Question about the output scores in videochat2
- Question about attention mask
- 您好,急问为什么执行inference demo时,对一定数量的视频生成caption之后程序就会卡死
- Disk Space, GPU Usage, and Training Duration for Stage3
- Hardware configuration conditions? HOT 1
- 'Dataset' object has no attribute 'components' HOT 2
- SH-IDC1-10-140-1-1, 10068 | cannot be retrieved HOT 1
- Is there a bug in mvbench.ipynb check_ans
- typo: assert -> asset for folder name
- clevr数据集的使用 HOT 15
- How to implement VideoChat2_text HOT 1
- hola queremos follar HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ask-anything.