Hi, In Pre-training VL-BERT section

Actually there are only two tasks in pre-training: Masked Language Modeling with

Thanks for the reply. For (3), I meant this line of

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Query regarding Pretrained model about vl-bert HOT 3 CLOSED

jackroos commented on May 29, 2024

Query regarding Pretrained model

from vl-bert.

Comments (3)

jackroos commented on May 29, 2024

Actually there are only two tasks in pre-training: Masked Language Modeling with Visual Clues, Masked RoI Classification with Linguistic Clues. Sure, the pre-trained models provided in this repo have been pre-trained on these tasks, you don't need to perform pre-training by yourself.
We use Conceptual Captions for both tasks, and English Wikipedia & BookCorpus only for Masked Language Modeling task.
Could you provide the detailed link to the line of code since I am not sure where do you exactly mean?

Thanks!

from vl-bert.

prajjwal1 commented on May 29, 2024

Thanks for the reply.

For (3), I meant this line of code. Is it only partially loading the weights ?

I have some more questions.

Could you also share the weights for fast rcnn module ? It seems that the pretrained model is meant for ResNetVLBert only.
Do I need to train fast rcnn on Visual Genome for my task?
I want to port fast rcnn to torchvision's faster rcnn. What does obj_reps actually represent here, are these the box predictions coming out of ROI Head ? And you seem to be using 'cnn_regularization' loss which is cross entropy loss for classes, what about MSE loss for regression of bounding boxes ? I think MSE loss would be necessary for improving region proposals.
If I were to port to torchvision's faster_rcnn implementation, please see here, Will detections from self.transform.postprocess be 'obj_reps' and cross entropy loss be 'cnn_regularization' loss ?

from vl-bert.

jackroos commented on May 29, 2024

@prajjwal1
3. The partial loading is because downstream task and pre-training task may have different prediction heads, and only common weights (including the Fast RCNN, VL-BERT, and maybe some common heads) are loaded.
4. Actually, the weights of Fast RCNN is included:

VL-BERT/pretrain/modules/resnet_vlbert_for_pretraining_multitask.py

Line 19 in 28cbde9

self.image_feature_extractor = FastRCNN(config,

5/6/7. Here I think you have some misunderstanding about Fast RCNN in our model, it's not Faster RCNN, so there is no RPN in it.
Actually, our workflow is:

(Offline) Use the pre-trained Faster RCNN to extract bounding boxes and store them (refer to

VL-BERT/data/conceptual-captions/ReadMe.txt

Line 59 in 28cbde9

 9. python ./tools/generate_tsv_v2.py --gpu 0,1,2,3,4,5,6,7 --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --def models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel --split conceptual_captions_train --data_root {Conceptual_Captions_Root} --out {Conceptual_Captions_Root}/train_frcnn/ 

);

Use only the backbone (ResNet) and RoI head of pre-trained Faster RCNN to initialize our Fast RCNN;
(Online) Treat precomputed boxes as region proposals, and using Fast RCNN to extract their features.

P.S.
The obj_reps refers to visual feature of each RoI used in VL-BERT, obj_reps_raw means RoI features coming out of RoI head, and the cnn_regularization is deprecated, we don't use it in both pre-training and fine-tuning.

from vl-bert.

Query regarding Pretrained model about vl-bert HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent