jialianw / grit Goto Github PK
View Code? Open in Web Editor NEWGRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)
License: MIT License
GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)
License: MIT License
Hello, could you please provide a tutorial on evaluating the model and what evaluation indicators are available.
I want to get the output like the image in the repo. I want to use the text result to continue my next work. How can I achieve this
Hello,
I am currently tring to reproduce the result of task dense captioning of GRiT. I have trained the model by default setting and got the checkpoint of it. Then I ran inference on VG test set and got the json result by
python train_net.py --num-gpus-per-machine 8 --config-file configs/GRiT_B_DenseCap.yaml --output-dir-name ./output/grit_b_densecap --eval-only MODEL.WEIGHTS models/grit_b_densecap.pth
However, when installing the environment of DenseCap, I was stuck in the installation of torch on my GPU machine which has a CUDA version of 12.0. I always met this error:
Make Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
linked by target "THC" in directory /root/torch/extra/cutorch/lib/THC
Could you tell me what platform you use to install DenseCap and perform evaluation?
Thanks a lot!
Hello,
I am trying to use the results produced by the provided checkpoint of densecap to evaluate on VG, and after replacing
logprobs(confidence/score), box, captions
in addResult() as well as idx_to_token, vocab_size
in model,
in densecap/eval_utils.lua, I got a mAP result of 0.000609. I found that the number of 'ok=1' is very small, meaning few ground truth are assigned to predictions. Seems like I have done something wrong.
I combined GRiT's boxes, descriptions and score predictions of an image together, and fed them into addResult() per image in densecap, but I got a reletively low mAP and I found that the IOU between ground truth and prediction boxes were very small, could you please tell me what I am wrong with? Thank you!
Here is the process of replacement:
` while true do
------- single image ------
counter = counter + 1
-- Grab a batch of data and convert it to the right dtype batch_size = 1
local loader_kwargs = {split=split, iterate=true}
local img, gt_boxes, gt_labels, info, _ = loader:getBatch(loader_kwargs)
info = info[1]
-- fine the index of corresponding preditions, the indexs of image_id,box,score,descriptions are same in the same image
for index, v in ipairs(my_results.image_id) do
if tostring(v) == string.gsub(info.filename, '.jpg', '') then
index_ = index
print(index_)
end
end
assert(string.gsub(info.filename, '.jpg', '') == tostring(my_results.image_id[index_]) )
-- replace these with the predictions of the corrsponding image in GRiT
local boxes, logprobs, captions = my_results.box[index_], my_results.score[index_], my_results.descriptions[index_]
local boxes, logprobs = torch.Tensor(boxes), torch.Tensor(logprobs)
local gt_captions = model.nets.language_model:decodeSequence(gt_labels[1]) -- seq: tensor of shape N x T id_to_tokens bs = 1
evaluator:addResult(logprobs, boxes, captions, gt_boxes[1], gt_captions)`
i got this error:
`
File "/public/Medical_image_segmentation/lixi/detectron2/detectron2/data/common.py", line 90, in getitem
data = self._map_func(self._dataset[cur_idx])
File "/public/Medical_image_segmentation/lixi/GRiT-4/grit/data/custom_dataset_mapper.py", line 53, in call
dataset_dict_out = self.prepare_data(dataset_dict)
File "/public/Medical_image_segmentation/lixi/GRiT-4/grit/data/custom_dataset_mapper.py", line 99, in prepare_data
object_descriptions = [an['object_description'] for an in dataset_dict["annotations"]]
File "/public/Medical_image_segmentation/lixi/GRiT-4/grit/data/custom_dataset_mapper.py", line 99, in
object_descriptions = [an['object_description'] for an in dataset_dict["annotations"]]
KeyError: 'object_description'
`
what was going wrong?
The installation instructions explain to clone the detectron2 repo and install from the clone
git clone https://github.com/facebookresearch/detectron2.git
cd detectron2
git checkout cc87e7ec
pip install -e .
However, detectron2 is already cloned inside the GRiT repo as third_party/CenterNet2 and pip install -e .
should be run from there.
Perhaps the instructions were written before the library was included
Maybe install instructions should be something like this:
conda create --name grit python=3.8 -y
conda activate grit
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
cd ..
git clone https://github.com/JialianW/GRiT.git
cd GRiT
pip install -r requirements.txt
cd third_party/CenterNet2
git checkout cc87e7ec
pip install -e .
If PR #16 is merged then installation is automated and this may become obsolete
Hello, Jialian.
I am currently training the model on 4 3090 GPUs and I find that the batch size is small.
I have changed the SOLVER: IMS_PER_BATCH from 64 to 128 in config/base.yaml but the memomy consumption doesn't seem to become larger.
Could you please tell me how could I increase it.
Thanks a lot.
Hey,
Thanks for your inspiring work.
How long will the training take with 8 x A100 GPUs?
Hi.
I notice that you present the results of in Table 1 of the paper, but these results are based on the bounding boxes predicted by your foreground object extractor, which might lead to error propagations.
As a result, the real performance of the text decoder is not clear and underestimated. So I'm curious about the real performance of the text decoder. Can you provide the performance based on the GT boxes?
hello.Could you please tell me how to use 'demo.py' to generate captions on boxes I give to it? I don't need it to do object detection.
Hi, @JialianW! Thanks for your wonderful work!
I try to run GRiT on 4 nodes & 32 GPUs with following command:
python train_deepspeed.py --num-machines 4 --num-gpus-per-machine 8 --config-file configs/GRiT_B_ObjectDet.yaml --output-dir-name ./output/grit_b_objectdet
However, I notice that only one GPU is used on each node.
In implementation, there is no mp.spawn
in multi-node deepspeed launcher, is this the reason and is there any plan to fix this?
Despite the fact that I cloned the detectoron2 repository
Traceback (most recent call last):
File "D:\Python\VisionGRIT\GRiT\demo.py", line 9, in
from detectron2.config import get_cfg
ModuleNotFoundError: No module named 'detectron2'
Hi,Jia lian;
When I run train_net.py with the default settings and reached 22000 iterations, it reported an error of "cuda out of memory". May I ask what the reason is? (running on a single 48G GPU)
How much data is required to fine-tune GRiT?
Hi,
Thanks for your amazing work and I try to retrain the model on VG, however, there seems to be a corner case that would raise an error
[01/16 12:04:41 d2.utils.events]: eta: 1 day, 11:49:23 iter: 1360 total_loss: 2.975 loss_box_reg_stage0: 0.2477 loss_box_reg_stage1: 0.3255 loss_box_reg_stage2: 0.2068 loss_centernet_agn_neg: 0.0414 loss_centernet_agn_pos: 0.1851 loss_centernet_loc: 0.3947 loss_cls_stage0: 0.2062 loss_cls_stage1: 0.1867 loss_cls_stage2: 0.1439 loss_mask: 0.3913 text_decoder_loss: 0.6096 time: 0.7084 data_time: 0.0160 lr: 7.7501e-07 max_mem: 21398M
[01/16 12:04:42] grit.modeling.roi_heads.grit_roi_heads INFO: all proposals are background at stage 2
Traceback (most recent call last):
File "train_deepspeed.py", line 263, in <module>
launch_deepspeed(
File "/nvme/xxxxx/GRiT/lauch_deepspeed.py", line 67, in launch_deepspeed
mp.spawn(
File "/nvme/xxxxx/anaconda3/envs/grit/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/nvme/xxxxx/anaconda3/envs/grit/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/nvme/xxxxx/anaconda3/envs/grit/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/nvme/xxxxx/anaconda3/envs/grit/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/nvme/xxxxx/GRiT/lauch_deepspeed.py", line 133, in _distributed_worker
main_func(*args)
File "/nvme/xxxxx/GRiT/train_deepspeed.py", line 251, in main
do_train(cfg, model, resume=args.resume, train_batch_size=train_batch_size)
File "/nvme/xxxxx/GRiT/train_deepspeed.py", line 175, in do_train
loss_dict = model(data)
File "/nvme/xxxxx/anaconda3/envs/grit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/nvme/xxxxx/anaconda3/envs/grit/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
return func(*args, **kwargs)
File "/nvme/xxxxx/anaconda3/envs/grit/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1656, in forward
loss = self.module(*inputs, **kwargs)
File "/nvme/xxxxx/anaconda3/envs/grit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/nvme/xxxxx/GRiT/grit/modeling/meta_arch/grit.py", line 59, in forward
proposals, roihead_textdecoder_losses = self.roi_heads(
File "/nvme/xxxxx/anaconda3/envs/grit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/nvme/xxxxx/GRiT/grit/modeling/roi_heads/grit_roi_heads.py", line 302, in forward
losses = self._forward_box(features, proposals, targets, task=targets_task)
File "/nvme/xxxxx/GRiT/grit/modeling/roi_heads/grit_roi_heads.py", line 173, in _forward_box
proposals = self.check_if_all_background(proposals, targets, k)
File "/nvme/xxxxx/GRiT/grit/modeling/roi_heads/grit_roi_heads.py", line 142, in check_if_all_background
proposals[0].proposal_boxes.tensor[0, :] = targets[0].gt_boxes.tensor[0, :]
IndexError: index 0 is out of bounds for dimension 0 with size 0
The error seems to indicate there is no any proposal for this batch and It can be easily reproduced by single-node training at around iter1360.
Would you mind checking it as I'm not familiar enough with this repo
Under third_party/Centernet2 there is detectron2 and yet anohter project/Centernet2
The installation however requires installing detecton2 separately.
It is a little confusing as what is needed and what is not in the gitsubmodule
Thank you for the nice work!
Is it possible to use larger ViT backbone for dense captioning?
Is there a reason that there is only ViT-B backbone for dense captioning?
Thank you.
Hi,
I need to perform batch inference for my use-case. I followed this thread here that extends the DefaultPredictor
class to enable batched inputs. But I end up with this error
grit/modeling/roi_heads/grit_roi_heads.py:230, in GRiTROIHeadsAndTextDecoder._forward_box(self, features, proposals, targets, task)
227 predictor, predictions, proposals = head_outputs[-1]
228 boxes = predictor.predict_boxes(
229 (predictions[0], predictions[1]), proposals)
--> 230 assert len(boxes) == 1
231 pred_instances, _ = self.fast_rcnn_inference_GRiT(
232 boxes,
233 scores,
(...)
239 self.soft_nms_enabled,
240 )
242 assert len(pred_instances) == 1, "Only support one image"
AssertionError:
Uncommenting the assertion doesn't help either.
Sorry to bother you, but seems the official website is down, and i need to use the official annotations of dense caption. I'm hoping that you can share the annotations, it will be very useful. Thx a lot~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.