Comments (3)
- Actually there are only two tasks in pre-training: Masked Language Modeling with Visual Clues, Masked RoI Classification with Linguistic Clues. Sure, the pre-trained models provided in this repo have been pre-trained on these tasks, you don't need to perform pre-training by yourself.
- We use Conceptual Captions for both tasks, and English Wikipedia & BookCorpus only for Masked Language Modeling task.
- Could you provide the detailed link to the line of code since I am not sure where do you exactly mean?
Thanks!
from vl-bert.
Thanks for the reply.
- For (3), I meant this line of code. Is it only partially loading the weights ?
I have some more questions.
- Could you also share the weights for fast rcnn module ? It seems that the pretrained model is meant for ResNetVLBert only.
- Do I need to train fast rcnn on Visual Genome for my task?
- I want to port fast rcnn to torchvision's faster rcnn. What does obj_reps actually represent here, are these the box predictions coming out of ROI Head ? And you seem to be using 'cnn_regularization' loss which is cross entropy loss for classes, what about MSE loss for regression of bounding boxes ? I think MSE loss would be necessary for improving region proposals.
- If I were to port to torchvision's faster_rcnn implementation, please see here, Will detections from
self.transform.postprocess
be 'obj_reps' and cross entropy loss be 'cnn_regularization' loss ?
from vl-bert.
@prajjwal1
3. The partial loading is because downstream task and pre-training task may have different prediction heads, and only common weights (including the Fast RCNN, VL-BERT, and maybe some common heads) are loaded.
4. Actually, the weights of Fast RCNN is included:
5/6/7. Here I think you have some misunderstanding about Fast RCNN in our model, it's not Faster RCNN, so there is no RPN in it.
Actually, our workflow is:
- (Offline) Use the pre-trained Faster RCNN to extract bounding boxes and store them (refer to );
- Use only the backbone (ResNet) and RoI head of pre-trained Faster RCNN to initialize our Fast RCNN;
- (Online) Treat precomputed boxes as region proposals, and using Fast RCNN to extract their features.
P.S.
The obj_reps refers to visual feature of each RoI used in VL-BERT, obj_reps_raw means RoI features coming out of RoI head, and the cnn_regularization is deprecated, we don't use it in both pre-training and fine-tuning.
from vl-bert.
Related Issues (20)
- The problem of multitasking parallel processing HOT 4
- ./scripts/init.sh HOT 4
- fine-tune HOT 1
- visualize attention for custom images
- CUDA Illegal Memory Access In FastRCNN with ROIAlign
- what is text_visual_embeddings
- How to pretrain on my own data? HOT 2
- An example for image feature HOT 3
- Does the pre-trained BERT come from vl-bert ? HOT 1
- Why the preprocessing method differs from the official method of pytorch
- Is there an option to upload the the val_frcnn features?
- Caffe feature extraction for conceptual-captions - where do I make pycaffe?
- Pretrained VL-BERT model on bert_based_multilingual_uncased needed to apply zero shot learning for German language
- CUDA error: no kernel image is available for execution on the device
- Can I get the object label ?
- Improper RefCOCO evaluation
- How to freeze the Fast RCNN?
- Is the nvcc necessary?
- google drive
- script for downloading GCC images
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vl-bert.