yuwenmichael / grounding-dino-batch-inference Goto Github PK
View Code? Open in Web Editor NEWSupport batch inference of Grounding DINO. "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Support batch inference of Grounding DINO. "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
I noticed this strange behavior: the logits outputs of the model vary slightly with respect to different runs, resulting in non-deterministic behavior. This effect does not happen in case of batch size = 1. Do you know what the reason could be? The difference between the logits (comparing two separate script executions) increases as the batch size increases
Hi everyone, I discovered this strange behavior.
I have 4 images (img_1, img_2, img_3, img_4)
if I run groundingDINO with the following prompts:
["distance . mountains . valley . view .", "man . snow board . trick", "lunch . pizza . they .", "kitchen . refrigerator . "]
then I obtain the following probabilities per class:
{0: {'distance': 0.19821932911872864, 'mountains': 0.7135314345359802, 'valley': 0.42435237765312195, 'view': 0.38242971897125244}, 1: {'man': 0.3701115548610687, 'snow board': 0.31612950563430786, 'trick': 0.21027937531471252}, 2: {'lunch': 0.38231441378593445, 'pizza': 0.7270074486732483, 'they': 0.19436779618263245}, 3: {'kitchen': 0.6813028454780579, 'refrigerator': 0.5736187100410461}}
but if I add the class "pineapple" to the third prompt:
["distance . mountains . valley . view .", "man . snow board . trick", "lunch . pizza . they . pineapple . ", "kitchen . refrigerator . "]
then the probabilities associated with other elements in the batch also change.
{0: {'distance': 0.22729776799678802, 'mountains': 0.7141298651695251, 'valley': 0.43764322996139526, 'view': 0.367383748292923}, 1: {'man': 0.3758210241794586, 'snow board': 0.3222990036010742, 'trick': 0.21733753383159637}, 2: {'lunch': 0.3865318298339844, 'pineapple': 0.040617868304252625, 'pizza': 0.6494675278663635, 'they': 0.21683959662914276}, 3: {'kitchen': 0.6852126717567444, 'refrigerator': 0.5792219042778015}}
It seems the samples in the batch are not processed independently...
Has anyone encountered the same problem or have any suggestions to fix it?
Thanks in advance
The predict_batch function seems to only process the first image in a batch when generating predictions. Specifically, the lines:
prediction_logits = outputs["pred_logits"].cpu().sigmoid()[0]
prediction_boxes = outputs["pred_boxes"].cpu()[0]
These lines appear to only handle the logits and boxes for the first image in the batch, ignoring the rest.
Hi does this work with cuda? I'm using:
and getting the following:
label-studio-ml-backend | /app/Grounding-DINO-Batch-Inference/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py:31: UserWarning: Failed to load custom C++ ops. Running on CPU mode Only! label-studio-ml-backend | warnings.warn("Failed to load custom C++ ops. Running on CPU mode Only!")
and
output = _C.ms_deform_attn_forward(\\nNameError: name \'_C\' is not defined
I ran:
!python3 inference_gdino.py
got message:
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
final text_encoder_type: bert-base-uncased
loading image list file from: image_paths.txt
total images:13, need detect: 0, skip images: 13
detect: : : 0it [00:00, ?it/s]
image_path.txt:
/workspace/data/dog.jpeg
/workspace/data/dog-2.jpeg
/workspace/data/dog-3.jpeg
/workspace/data/dog-4.jpeg
/workspace/data/dog-5.jpeg
/workspace/data/dog-6.jpeg
/workspace/data/dog-7.jpeg
/workspace/data/dog-8.jpeg
/workspace/data/dogs.jpg
/workspace/data/fox.jpg
/workspace/data/frog.jpg
/workspace/data/panda.jpg
/workspace/data/seal.jpg
i did not use multiple directories as shared by you
let me know if you need any further information from my side
Best,
Andy
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.