milvlg / bottom-up-attention.pytorch Goto Github PK

View Code? Open in Web Editor NEW

291.0 2.0 75.0 10.13 MB

A PyTorch reimplementation of bottom-up-attention models

License: Apache License 2.0

Python 25.11% Cuda 0.49% C++ 0.28% Jupyter Notebook 73.95% C 0.17%

bottom-up-attention pytorch detectron2

bottom-up-attention.pytorch's Introduction

bottom-up-attention.pytorch

This repository contains a PyTorch reimplementation of the bottom-up-attention project based on Caffe.

We use Detectron2 as the backend to provide completed functions including training, testing and feature extraction. Furthermore, we migrate the pre-trained Caffe-based model from the original repository which can extract the same visual features as the original model (with deviation < 0.01).

Some example object and attribute predictions for salient image regions are illustrated below. The script to obtain the following visualizations can be found here

Prerequisites
Training
Testing
Feature Extraction
Pre-trained models

Prerequisites

Requirements

Python >= 3.6
PyTorch >= 1.4
Cuda >= 9.2 and cuDNN
Apex
Detectron2
Ray
OpenCV
Pycocotools

Note that most of the requirements above are needed for Detectron2.

Installation

Clone the project including the required version (v0.2.1) of Detectron2. Note that if you use another version, some strange problems may occur.
```
# clone the repository inclduing Detectron2(@be792b9) 
$ git clone --recursive https://github.com/MILVLG/bottom-up-attention.pytorch
```
Install Detectron2
```
$ cd detectron2
$ pip install -e .
$ cd ..
```
We recommend using Detectron2 v0.2.1 (@be792b9) as backend for this project, which has been cloned in step 1. We believe a newer Detectron2 version is also compatible with this project unless their interface has been changed (we have tested v0.3 with PyTorch 1.5).

Compile the rest tools using the following script:

# install apex
$ git clone https://github.com/NVIDIA/apex.git
$ cd apex
$ python setup.py install
$ cd ..
# install the rest modules
$ python setup.py build develop
$ pip install ray

Setup

If you want to train or test the model, you need to download the images and annotation files of the Visual Genome (VG) dataset. If you only need to extract visual features using the pre-trained model, you can skip this part.

The original VG images (part1 and part2) are to be downloaded and unzipped to one folder and put it into the datasets folder.

The generated annotation files in the original repository are needed to be transformed to a COCO data format required by Detectron2. The preprocessed annotation files can be downloaded here and unzipped to the dataset folder.

Finally, the datasets folders will have the following structure:

|-- datasets
   |-- visual_genome
   |  |-- images
   |  |  |  |-- 1.jpg
   |  |  |  |-- 2.jpg
   |  |  |  |-- ...
   |  |  |  |-- ...
   |  |-- annotations
   |  |  |-- visual_genome_train.json
   |  |  |-- visual_genome_test.json
   |  |  |-- visual_genome_val.json

Training

The following script will train a bottom-up-attention model on the train split of VG.

$ python3 train_net.py --mode d2 \
         --config-file configs/d2/train-d2-r101.yaml \
         --resume

mode = 'd2' refers to training a model with the Detectron2 backend, which is inspired by the grid-feats-vqa. We think it is unnecessary to train a new model using the caffe mode. The pre-trained Caffe models are provided for testing and feature extraction.
config-file refers to all the configurations of the model.
resume refers to a flag if you want to resume training from a specific checkpoint.

Testing

Given the trained model, the following script will test the performance on the val split of VG:

$ python3 train_net.py --mode caffe \
         --config-file configs/caffe/test-caffe-r101.yaml \
         --eval-only

mode = {'caffe', 'd2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the d2 mode.
config-file refers to all the configurations of the model, which also include the path of the model weights.
eval-only refers to a flag to declare the testing phase.

Feature Extraction

Given the trained model, the following script will extract the bottom-up-attention visual features. Single GPU and multiple GPUs are both supported.

$ python3 extract_features.py --mode caffe \
         --num-cpus 32 --gpus '0,1,2,3' \
         --extract-mode roi_feats \
         --min-max-boxes '10,100' \
         --config-file configs/caffe/test-caffe-r101.yaml \
         --image-dir <image_dir> --bbox-dir <out_dir> --out-dir <out_dir>
         --fastmode

mode = {'caffe', 'd2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the detectron2 mode. 'caffe' is the default value. Note that the detecron2 mode need to run with Ray.
num-cpus refers to the number of cpu cores to use for accelerating the cpu computation. 0 stands for using all possible cpus and 1 is the default value.
gpus refers to the ids of gpus to use. '0' is the default value. If the number of gpus greater than 1, for example '0,1,2,3', the script will use the Ray library for parallelization.
config-file refers to all the configurations of the model, which also include the path of the model weights.
extract-mode refers to the modes for feature extraction, including {roi_feats, bboxes and bbox_feats}.
min-max-boxes refers to the min-and-max number of features (boxes) to be extracted. Note that mode d2 only support to set the min-and-max number as '100,100' to get 100 boxes per image or other values to get about 50~60 boxes per image.
image-dir refers to the input image directory.
bbox-dir refers to the pre-proposed bbox directory. Only be used if the extract-mode is set to 'bbox_feats'.
out-dir refers to the output feature directory.
fastmode refers to use the a faster version (about 2x faster on a workstation with 4 Titan-V GPUs and 32 CPU cores), at the expense of a potential memory leakage problem if the computing capability of GPUs and CPUs is mismatched. More details and some matched examples in here.

Using the same pre-trained model, we also provide an alternative two-stage strategy for extracting visual features. This results in (slightly) more accurate bounding boxes and visual features, at the expense of more time overhead:

# extract bboxes only:
$ python3 extract_features.py --mode caffe \
         --num-cpus 32 --gpu '0,1,2,3' \
         --extract-mode bboxes \
         --config-file configs/caffe/test-caffe-r101.yaml \
         --image-dir <image_dir> --out-dir <out_dir>  --resume 

# extract visual features with the pre-extracted bboxes:
$ python3 extract_features.py --mode caffe \
         --num-cpus 32 --gpu '0,1,2,3' \
         --extract-mode bbox_feats \
         --config-file configs/caffe/test-caffe-r101.yaml \
         --image-dir <image_dir> --bbox-dir <bbox_dir> --out-dir <out_dir>  --resume

Pre-trained models

We provided pre-trained models as follows, including the models trained in both the caffe and d2 mode.

For the models of the caffe mode, R101-k36 and R101-k10-100 refer to the fix36 model and dynamic 10-100 model provided in the original bottom-up-attention repository. We additionally provide a R-152 model which outperforms the two counterparts above.

For the models of the d2 mode, we follow the configurations and implementations in the grid-feats-vqa and trained three models using the training script in this repository, namely R50, R101 and X152.

name	mode	objects [email protected]	weighted objects [email protected]	download
R101-k36	caffe	9.3	14.0	model
R101-k10-100	caffe	10.2	15.1	model
R152	caffe	11.1	15.7	model
R50	d2	8.2	14.9	model
R101	d2	9.2	15.9	model
X152	d2	10.7	17.7	model

License

This project is released under the Apache 2.0 license.

Contact

This repository is currently maintained by Zhou Yu (@yuzcccc), Tongan Luo (@Zoroaster97), and Jing Li (@J1mL3e_).

Citation

If this repository is helpful for your research or you want to refer the provided pretrained models, you could cite the work using the following BibTeX entry:

@misc{yu2020buapt,
  author = {Yu, Zhou and Li, Jing and Luo, Tongan and Yu, Jun},
  title = {A PyTorch Implementation of Bottom-Up-Attention},
  howpublished = {\url{https://github.com/MILVLG/bottom-up-attention.pytorch}},
  year = {2020}
}

bottom-up-attention.pytorch's People

Contributors

Stargazers

Watchers

bottom-up-attention.pytorch's Issues

RuntimeError when install Detectron2 (recommended version in this repo)

I followed the instruction to install
However, I got errors when installing detectron2
Part of error traceback as follows:

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "$root/bottom-up-attention.pytorch/detectron2/setup.py", line 138, in <module>
        cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
      File "$venv/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "$venv/lib/python3.6/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "$venv/lib/python3.6/site-packages/setuptools/command/develop.py", line 136, in install_for_development
        self.run_command('build_ext')
      File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "$venv/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "$venv/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
        _build_ext.build_ext.run(self)
      File "/usr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
        self.build_extensions()
      File "$venv/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 649, in build_extensions
        build_ext.build_extensions(self)
      File "$venv/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
        _build_ext.build_ext.build_extensions(self)
      File "/usr/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
        self._build_extensions_serial()
      File "/usr/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
        self.build_extension(ext)
      File "$venv/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
        _build_ext.build_extension(self, ext)
      File "/usr/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
        depends=ext.depends)
      File "$venv/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 478, in unix_wrap_ninja_compile
        with_cuda=with_cuda)
      File "$venv/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1233, in _write_ninja_file_and_compile_objects
        error_prefix='Error compiling objects for extension')
      File "$venv/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1529, in _run_ninja_build
        raise RuntimeError(message)
    RuntimeError: Error compiling objects for extension
    ----------------------------------------
ERROR: Command errored out with exit status 1: $venv/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'$root/bottom-up-attention.pytorch/detectron2/setup.py'"'"'; __file__='"'"'$root/bottom-up-attention.pytorch/detectron2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

Then I installed the latest detectron2 and it worked!

But when I run extract_features.py, I still got errors:

Traceback (most recent call last):
  File "$root/bottom-up-attention.pytorch/extract_features.py", line 133, in <module>
    main()
  File "$root/bottom-up-attention.pytorch/extract_features.py", line 81, in main
    model = DefaultTrainer.build_model(cfg)
  File "$root/bottom-up-attention.pytorch/detectron2/detectron2/engine/defaults.py", line 419, in build_model
    model = build_model(cfg)
  File "$root/bottom-up-attention.pytorch/detectron2/detectron2/modeling/meta_arch/build.py", line 21, in build_model
    model = META_ARCH_REGISTRY.get(meta_arch)(cfg)
  File "$root/bottom-up-attention.pytorch/models/bua/rcnn.py", line 40, in __init__
    self.roi_heads = build_roi_heads(cfg, self.backbone.output_shape())
  File "$root/bottom-up-attention.pytorch/detectron2/detectron2/modeling/roi_heads/roi_heads.py", line 43, in build_roi_heads
    return ROI_HEADS_REGISTRY.get(name)(cfg, input_shape)
  File "$root/bottom-up-attention.pytorch/models/bua/roi_heads.py", line 51, in __init__
    super().__init__(cfg, input_shape)
  File "$root/bottom-up-attention.pytorch/detectron2/detectron2/config/config.py", line 149, in wrapped
    explicit_args = _get_args_from_config(from_config_func, *args, **kwargs)
  File "$root/bottom-up-attention.pytorch/detectron2/detectron2/config/config.py", line 182, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
TypeError: from_config() takes 2 positional arguments but 3 were given

It seems caused by detectron2 version mismatch, so I just replace the newer detectron2 files by older ones and everything works fun now😂

simultaneous box and feature extraction

Hello,

in the feature extraction session you note that there is a two-stage procedure: first, one extracts bounding boxes, then, their visual features are extracted.

Would it be possible to run the extraction script only once, so that features for identified bounding boxes are extracted in a single run, given that I do not have ground-truth bounding boxes pre-extracted? Or is it necessary to provide the --gt-bbox-dir argument when extracting features?

Download pre-trained model

I wanted to make a docker container that uses this model, and would need to download the model file from sharepoint. However, it looks like neither curl nor wget are able to download the pre-trained model files. Is there a way I can download those files not by hand, but by command line?

hi

why you use roipooling but not roialign ><?

hope to get your reply.

Thank you

Attention visualization

Thanks for sharing the code. Can you please tell how to visualize the attention/extracted features on top of the image? would be helpful if you can share any notebooks that you have.

extract-bua-caffe-r101.yaml

when running extract_features.py, Config 'configs/bua-caffe/extract-bua-caffe-r101.yaml' has no VERSION. Assuming it to be compatible with latest v2.
so, what can I do

'BUACaffeRes5ROIHeads' object has no attribute 'test_score_thresh'

The following command raises this error:
python3 train_net.py --mode caffe --config-file configs/bua-caffe/test-bua-caffe-r101.yaml --eval-only

It has been fixed in facebookresearch/detectron2#1132, could you please help with incorporating it here?
Thanks.

和原版bottom-up-attention相比还是有一些误差，虽然feature相差不大，但是预测的object tag还是有不少的区别

Error about feature exctration.

Traceback (most recent call last):
File "extract_features.py", line 24, in
from utils.utils import mkdir, save_features
File "/mnt/disk1/ly/workspace/bottom-up-attention.pytorch/utils/init.py", line 1, in
from .utils import save_features
File "/mnt/disk1/ly/workspace/bottom-up-attention.pytorch/utils/utils.py", line 7, in
from models.bua.layers.nms import nms
File "/mnt/disk1/ly/workspace/bottom-up-attention.pytorch/models/init.py", line 1, in
from .bua import add_bottom_up_attention_config
File "/mnt/disk1/ly/workspace/bottom-up-attention.pytorch/models/bua/init.py", line 4, in
from .roi_heads import BUACaffeRes5ROIHeads
File "/mnt/disk1/ly/workspace/bottom-up-attention.pytorch/models/bua/roi_heads.py", line 17, in
from .fast_rcnn import BUACaffeFastRCNNOutputs, BUACaffeFastRCNNOutputLayers, BUADetection2FastRCNNOutputs, BUADetectron2FastRCNNOutputLayers
File "/mnt/disk1/ly/workspace/bottom-up-attention.pytorch/models/bua/fast_rcnn.py", line 15, in
from .layers.nms import batched_nms
File "/mnt/disk1/ly/workspace/bottom-up-attention.pytorch/models/bua/layers/nms.py", line 3, in
from models.bua import _C
ImportError: cannot import name '_C'

How the attributes are predicted?

Hi, thanks for sharing again.

I'm trying to re-produce the results. During the implementation I have a questions about the prediction of attributions of detected objects. It confuse me a little bit. Could you maybe elaborate further?

If I understand correctly, the attributes will be predicted if the attribute flag is toggled on and the BUADetectron2Res5ROIHeads is used (or perhaps BUACaffeRes5ROIHeads). If it's correct, my question goes to:

Attribute inference of detected classes a.k.a boxes
How the attributes are predicted or selected for the predicted box & class which are inferred by considering the threshold for nms in the inference function fast_rcnn_inference_single_image. In this function the candidate boxes are selected by nms operation as well as class scores. There is no any further snippets on repository( either here or on the original anderson's) that is responsible for deciding on filtering candidate attributes of predicted classes. If I missed something here, please give a clue.
Again, there is no snippet code available there shows how the attributes are selected for the predicted classes in the previous step. I'm wondering how the demo.ipynb case works. Since the indexes of predicted classes and attributes are not matched, the former has e.g. 100 in test case, the later has not been processed additional for inference. I guess the demo code is out of update (or the repository as well), since the inference return is a list, not tuple as demo shows. I also tried to use the filter_ids in the inference functions to select the attributes of filtered boxes, however the dimensions of class and attribute are not the same one. Thus, it does not work. Please give me some information about how the attributes of predicted objects are selected or in another word, after nms operation, how to pick the attributes of nms-ed objects.
Parameter Optimization
I tried to fine-tune the network parameters without using the pre-trained weights on repository. The loss function decreases and fluctuates during the training. It, therefore, intends no to converge no matter how many iterations I gave. What kind of learning scheduler did you use in order to get the provided pre-trained weight dump? Did you use some pre-trained model in you training for fine-tuning or any special training strategy as well?

Thanks for any inputs in advance.
Jian

网络输出feature的位置

您好：
按照您的方法提取特征后，应用在另外的方法上效果并不好，经过检查发现，他使用原始caffe版本的网络的pool5_flat作为输出的特征层，即：layer { bottom: "pool5" top: "pool5_flat" name: "pool5_flat" type: "Flatten" flatten_param { axis: 1 } } ，在您的二阶段提取特征的方法中，得到的是res4层的特征，我不确定这两者是否一样，请问如果不一样的话需要如何修改yaml文件的OUT_FEATURES 以及其他可能许需要修改的参数呢？

希望得到您的帮助！谢谢！

Missing R_101_detectron2_with_attributes.pth file

Thanks for your sharing code. It's interesting. But I have some problems with code.

I don't see any R_101_detectron2_with_attributes.pth in your repo. If can please share it.
I downloaded bua-caffe-frcn-r101_with_attributes.pth, bua-caffe-frcn-r152_with_attributes.pth and test object prediction with attribute. But the attribute confident of inference two low (I tested many images). So the result just have only class name.

if attr_conf[i] > attr_thresh:
cls = attributes[attr[i]+1] + " " + cls

I think the bua-caffe-frcn-r101_with_attributes.pth weight is not the best weights of your train. Could you please provide the best pre-trained weights.

Thanks a lot,

BInh.

visualize.ipynb+'../configs/bua-caffe/extract-bua-caffe-r101-gt-bbox.yaml'

The visualization of other configurations ('../configs/bua-caffe/extract-bua-caffe-r101-fix36.yaml' or '../configs/bua-caffe/extract-bua-caffe-r152.yaml') is normal.

Feature Extraction Error

When I run the feature extraction script:
File "extract_features.py", line 128, in
main()
File "extract_features.py", line 77, in main
model = DefaultTrainer.build_model(cfg)
File "/home/hadoop-mt-ocr/cephfs/data/weiran06/detectron2/detectron2/engine/defaults.py", line 418, in build_model
model = build_model(cfg)
File "/home/hadoop-mt-ocr/cephfs/data/weiran06/detectron2/detectron2/modeling/meta_arch/build.py", line 21, in build_model
model = META_ARCH_REGISTRY.get(meta_arch)(cfg)
File "/home/hadoop-mt-ocr/cephfs/data/weiran06/bottom-up-attention.pytorch/models/bua/rcnn.py", line 37, in init
self.roi_heads = build_roi_heads(cfg, self.backbone.output_shape())
File "/home/hadoop-mt-ocr/cephfs/data/weiran06/detectron2/detectron2/modeling/roi_heads/roi_heads.py", line 42, in build_roi_heads
return ROI_HEADS_REGISTRY.get(name)(cfg, input_shape)
File "/home/hadoop-mt-ocr/cephfs/data/weiran06/bottom-up-attention.pytorch/models/bua/roi_heads.py", line 53, in init
super().init(cfg, input_shape)
File "/home/hadoop-mt-ocr/cephfs/data/weiran06/detectron2/detectron2/config/config.py", line 153, in wrapped
explicit_args = _get_args_from_config(from_config_func, *args, **kwargs)
File "/home/hadoop-mt-ocr/cephfs/data/weiran06/detectron2/detectron2/config/config.py", line 200, in _get_args_from_config
ret = from_config_func(*args, **kwargs)
TypeError: from_config() takes 2 positional arguments but 3 were given

Train-bua-caffe-r101.yaml can't found in /configs/bua-caffe/???

When i try to run
"$ python3 train_net.py --mode detectron2
--config-file configs/bua-caffe/train-bua-caffe-r101.yaml \
--resume
"
I found that there no train-bua-caffe-r101.yaml in the path.
Where can i get it OR how can i make the file by myself?????

How to generate the pre-processed bboxes?

from the README.md, if I want to extract features in the bbox/bbox-feat mode, I need to give the pre-processed bboxes file. My point is how to generate these boxes? many thanks!

Is it using this command?? boxes are saved in out_dir
python3 extract_features.py --mode caffe \ --num-cpus 32 --gpu '0,1,2,3' \ --extract-mode bboxes \ --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \ --image-dir <image_dir> --out-dir <out_dir> --resume

ImportError: bottom-up-attention.pytorch/detectron2/detectron2/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: THPVariableClass

hi, thanks for your work
I met this error when I try to extract the feature of my own pictures:

Traceback (most recent call last):  File "extract_features_faster.py", line 20, in <module>
    from detectron2.data import build_detection_test_loader, build_detection_train_loader
  File "/disks/sdb/home/jingyuan_wen/bottom-up-attention.pytorch/detectron2/detectron2/data/__init__.py", line 5, in <module>
    from .build import (
  File "/disks/sdb/home/jingyuan_wen/bottom-up-attention.pytorch/detectron2/detectron2/data/build.py", line 13, in <module>    from detectron2.structures import BoxMode
  File "/disks/sdb/home/jingyuan_wen/bottom-up-attention.pytorch/detectron2/detectron2/structures/__init__.py", line 6, in <module>    from .keypoints import Keypoints, heatmaps_to_keypoints
  File "/disks/sdb/home/jingyuan_wen/bottom-up-attention.pytorch/detectron2/detectron2/structures/keypoints.py", line 6, in <module>
    from detectron2.layers import interpolate  File "/disks/sdb/home/jingyuan_wen/bottom-up-attention.pytorch/detectron2/detectron2/layers/__init__.py", line 4, in <module>    from .deform_conv import DeformConv, ModulatedDeformConv
  File "/disks/sdb/home/jingyuan_wen/bottom-up-attention.pytorch/detectron2/detectron2/layers/deform_conv.py", line 11, in <module>
    from detectron2 import _C
ImportError: /disks/sdb/home/jingyuan_wen/bottom-up-attention.pytorch/detectron2/detectron2/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: THPVariableClass

questions about TypeError: from_config() takes 2 positional arguments but 3 were given in detectron2

Hello, I ran the code according to your requirements, but TypeError: from_config() takes 2 positional arguments but 3 were given in detectron2

The train.json of annotation is not available

Hi, in the link you provided to download the annotations, train.json did not respond. Could you please upload it again? Or email it to me [email protected]. Thank you！

为什么相同的图片，检测出的bbox数量和原repo不同

train-bua-caffe-r101.yaml failed to load

Thanks for sharing such compact and excellent source code based on Detectron. I tried the same thing, but not finish yet.

After configuring the necessary libraries, I got all errors (mostly related to building Detectron) cleaned. As I know, that It seems the configuration file of model to training is not there.

It'd be very appreciated that we somehow get this missing file available.

Thanks again for your sharing. Jian

Big batch size when extracting features?

Thanks for your great work!

A small suggest here is to set a big batch size when extracting features since you have got the total image list.

About feature extraction

Hi, thanks for your code, I have a question: in this part of feature extraction,
https://github.com/MILVLG/bottom-up-attention.pytorch#feature-extraction

I want to do it using two-stage strategy for extracting visual features, which results in (slightly) more accurate bboxes and visual features:, but when I run this piece of code,

python3 extract_features.py --mode caffe
--num-cpus 32 --gpu '0,1,2,3'
--extract-mode bboxes
--config-file configs/bua-caffe/extract-bua-caffe-r101.yaml
--image-dir <image_dir> --out-dir <out_dir> --resume</out_dir></image_dir>

I don't understand what this --image-dir <image_dir>
how </image_dir> means?
If I use MSCOCO, does that mean I should include Train2014 and Val2014 folders?
Looking forward to your reply.

How to plug the feature extraction model into other models for downstream tasks?

I tried:

cfg = setup(args)
model = DefaultTrainer.build_model(cfg)
DetectionCheckpointer(model).resume_or_load(
        cfg.MODEL.WEIGHTS, resume="")

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.obj_detector = model
    def forward(self, x):
        with torch.set_grad_enabled(True):  # the code seems to work if I set torch.set_grad_enabled(False) and model.eval()
            boxes, scores, features_pooled, attr_scores = self.obj_detector(x)

using the config file and input parameters the same as just feature extraction specified in the Readme.

But I got:

Traceback (most recent call last):
File "engine.py", line 19, in
pred_y = model(x)
File "/shared/nas/data/users/yyy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/shared/nas/data/users/yyy/testing/model.py", line 115, in forward
boxes, scores, features_pooled, attr_scores = self.obj_detector(x_im)
File "/shared/nas/data/users/yyy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/shared/nas/data/users/yyy/bottom-up-attention.pytorch/models/bua/rcnn.py", line 92, in forward
proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
File "/shared/nas/data/users/yyy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/shared/nas/data/users/yyy/bottom-up-attention.pytorch/models/bua/rpn.py", line 149, in forward
losses = {k: v * self.loss_weight for k, v in outputs.losses().items()}
File "/shared/nas/data/users/yyy/bottom-up-attention.pytorch/models/bua/rpn_outputs.py", line 243, in losses
gt_objectness_logits, gt_anchor_deltas = self._get_ground_truth()
File "/shared/nas/data/users/yyy/bottom-up-attention.pytorch/models/bua/rpn_outputs.py", line 188, in _get_ground_truth
for image_size_i, anchors_i, gt_boxes_i in zip(self.image_sizes, anchors, self.gt_boxes):
TypeError: 'NoneType' object is not iterable

Can someone help me please? Thank you so much!

I have a keyerror:'attributes'

when I run : python train_net.py --config-file configs/bua-caffe/train-rcnn101.yaml --resume
it has a error:
File "./bottom-up-attention.pytorch/dataloader/detection_utils.py", line 75, in
attributes = [obj["attributes"] for obj in annos]
KeyError: 'attributes'

please help me !
thank you !

ray 报错了details = "failed to connect to all addresses" 这是为什么

Working with batch_size>1

Thanks for sharing your code.

I just wanted to know does your code support multiple images in a single batch during inference. If not then is there any specific reason why it works with batch_size=1 only.

I tried extracting features using extract_features.py. The code is working fine and using around 3.2 GB GPU RAM. I checked configs/bua-caffe/extract-bua-caffe-r101.yaml file and saw that IMS_PER_BATCH: 1. I changed it to IMS_PER_BATCH : 3 but still code is using single image per batch.

Thanks again.

How to extract the exactly same feature of 36 bbox with ori caffe model?

I have used the 36 config and model and the npz['x'] has less or more than 36 features.

>>> wo = np.load('out_fea/3035057.npz')
>>> wo['x'].shape
(38, 2048)

Library versions

Dear all,

I had some trouble installing the decaltron2 in the code and I've just figured it out it was due to the library version mismatch between libraries the author used and my enviroment. I post pytorch-related library versions that works for the code here in case you had the same problem:

For CUDA 10.1:
torch==1.6.0+cu101
torchvision==0.7.0+cu101

Thanks,

fatal error: cublas_v2.h: No such file or directory when installing Detectron2

Hello,

I keep receiving the following error when trying to install Detectron2. I was not sure what to do with this issue, since official Detectron2 repo does not provide any information on this problem.

Does it have to do something with Caffe? Do I also need to install it? Maybe I could get some advice here...

    In file included from /home/xilini/detectron2/detectron2/layers/csrc/ROIAlignRotated/ROIAlignRotated_cuda.cu:3:0:
    /home/xilini/anaconda3/envs/vis-par/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7:10: fatal error: cublas_v2.h: No such file or directory
     #include <cublas_v2.h>
              ^~~~~~~~~~~~~
    compilation terminated.
    error: command '/usr/local/cuda-10.1/bin/nvcc' failed with exit status 1

Thanks.

VG Preprocessed Annotation Files

Thank you for your great work!!
I notice that you have 97224 imgs in 'train.json' and 4949 imgs in 'val.json'. According to Peter Anderson's work, I believe that you remain about 5k imgs for test.
If so, could you please provide 'test.json' to me? That will help me a lot!

Thanks again for your work and maintaining!

from_config() takes 2 positional arguments but 3 were given

i had noticed similar issues reported before; and your answer to them was make sure to use the correct version; whereas I am using the correct version of detectron2 and its demo works properly;

git clone https://github.com/facebookresearch/detectron2.git
cd detectron2
pip install -e .

would you please guide me how to fix this issue? thank you.

~/img_cap/bottom-up-attention.pytorch/detectron2/detectron2/config/config.py in _get_args_from_config(from_config_func, *args, **kwargs)
180 if name not in supported_arg_names:
181 extra_kwargs[name] = kwargs.pop(name)
--> 182 ret = from_config_func(*args, **kwargs)
183 # forward the other arguments to init
184 ret.update(extra_kwargs)
TypeError: from_config() takes 2 positional arguments but 3 were given

extract_feature.py detectron2 reports from_config parameters given error

File "/home/wanlu/anaconda3/lib/python3.7/site-packages/detectron2/config/config.py", line 236, in _get_args_from_config ret = from_config_func(*args, **kwargs) TypeError: from_config() takes 2 positional arguments but 3 were given
How to handle this issue? thank you!

What path should I put the pretrained model file at ?

TypeError: from_config() takes 2 positional arguments but 3 were given

Thanks for this project for bottom-up-attention. I run the code below:

python3 train_net.py --mode caffe --config-file configs/bua-caffe/test-bua-caffe-r101.yaml --eval-only

I can install detectron2 successfully, but it also happened. Could you tell me which argument is redundant, please?

the problem of feature extraction

Which directory should I put the pre-trained models in before I run the extract_feature.py?

and without using the pre-trained models to run, it outputs an importError:cannot import name '_C'(from models.bua import _C).I wonder weather it can be solved after using the pre-trained model or it is other problems.

python3 extract_features.py --mode caffe --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml --image-dir /home/yhb/yianhang/bottom-up-attention.pytorch-master/new_dataset_release/new_dataset_release/images --gt-bbox-dir /home/yhb/yianhang/bottom-up-attention.pytorch-master/new_dataset_release/new_dataset_release/feature --out-dir /home/yhb/yianhang/bottom-up-attention.pytorch-master/new_dataset_release/new_dataset_release/featue --resume

running error:
python3 extract_features.py --mode caffe --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml --image-dir /home/yhb/yianhang/bottom-up-attention.pytorch-master/new_dataset_release/new_dataset_release/images --gt-bbox-dir /home/yhb/yianhang/bottom-up-attention.pytorch-master/new_dataset_release/new_dataset_release/feature --out-dir /home/yhb/yianhang/bottom-up-attention.pytorch-master/new_dataset_release/new_dataset_release/featue --resume

Failed to load OpenCL runtime
Traceback (most recent call last):
File "extract_features.py", line 24, in
from utils.utils import mkdir, save_features
File "/home/yhb/yianhang/bottom-up-attention.pytorch-master/utils/init.py", line 1, in
from .utils import save_features
File "/home/yhb/yianhang/bottom-up-attention.pytorch-master/utils/utils.py", line 7, in
from models.bua.layers.nms import nms
File "/home/yhb/yianhang/bottom-up-attention.pytorch-master/models/init.py", line 1, in
from .bua import add_bottom_up_attention_config
File "/home/yhb/yianhang/bottom-up-attention.pytorch-master/models/bua/init.py", line 4, in
from .roi_heads import BUACaffeRes5ROIHeads
File "/home/yhb/yianhang/bottom-up-attention.pytorch-master/models/bua/roi_heads.py", line 17, in
from .fast_rcnn import BUACaffeFastRCNNOutputs, BUACaffeFastRCNNOutputLayers, BUADetection2FastRCNNOutputs, BUADetectron2FastRCNNOutputLayers
File "/home/yhb/yianhang/bottom-up-attention.pytorch-master/models/bua/fast_rcnn.py", line 15, in
from .layers.nms import batched_nms
File "/home/yhb/yianhang/bottom-up-attention.pytorch-master/models/bua/layers/nms.py", line 3, in
from models.bua import _C
ImportError: cannot import name '_C'

Follows is my install of the apex,and there are any questions?
python setup.py install:
Processing dependencies for apex==0.1
Finished processing dependencies for apex==0.1

python setup.py build develop:
torch.version = 1.4.0

setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
running build
running build_py
running develop
running egg_info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
reading manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
running build_ext
Creating /home/yhb/anaconda3/envs/bottom-up-attention/lib/python3.6/site-packages/apex.egg-link (link to .)
Removing apex 0.1 from easy-install.pth file
Adding apex 0.1 to easy-install.pth file

Installed /home/yhb/apex
Processing dependencies for apex==0.1
Finished processing dependencies for apex==0.1

Missing train.yaml file

when I try to train net,it has a error:
AssertionError: Config file 'configs/bua-caffe/train-bua-caffe-r101.yaml' does not exist!
How to handle this issue? thank you!

ROI Feature Extraction features_pooled outside --min-max-boxes '10,100' specification

Is this normal?

I've been getting features in the hundreds, like torch.Size([223, 2048]) in features_pooled.
I just want the features corresponding to each bounding box.
Am I doing this correctly by running the default command in readme?

python3 extract_features.py --mode caffe \
         --num-cpus 32 --gpus '0,1,2,3' \
         --extract-mode roi_feats \
         --min-max-boxes '10,100' \
         --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \ 
         --image-dir <image_dir> --bbox-dir <out_dir> --out-dir <out_dir>

Can someone help me please, thanks!

the error when feature extraction

hi,the code is stoping at this：
[06/13 21:54:52 detectron2]: Full config saved to ./output/config.yaml
[06/13 21:54:52 d2.utils.env]: Using a generated random seed 52939937
Number of images: 10.
2021-06-13 21:54:56,515 INFO services.py:1274 -- View the Ray dashboard at http://127.0.0.1:8266
Number of GPUs: 3.
0%| | 0/10 [00:00<?, ?it/s]
(pid=8018) Number of images on split2: 3.(pid=8014) Number of images on split0: 4.
(pid=8013) Number of images on split1: 3.

The output bbox seems not consistent with the original caffe repository

Hi @MIL-VLG @Zoroaster97 , thanks for your released repository.
The output bbox seems not consistent with the original caffe repository.
For example, the ouput bbox of image COCO_val2014_000000349021.jpg (https://cocodataset.org/#explore?id=349021) in the original pre-extracted file is:

[[ 72.63866 323.50052 638.9267 477.20337 ]
[ 0. 0. 531.0018 160.37456 ]
[196.17493 202.92207 582.0358 392.5355 ]
[298.6358 5.273723 638.9267 147.76785 ]
[489.84888 322.74255 638.9267 466.3207 ]
[184.39835 30.255976 362.12308 214.42981 ]
[454.17618 277.0955 473.7012 312.25125 ]
[ 0. 0. 247.79265 249.6724 ]
[ 0. 381.52252 302.8443 477.20337 ]
[246.22423 258.31384 283.56186 318.43063 ]
[334.2851 338.47977 392.62964 392.6544 ]
[ 47.213596 182.49136 227.34457 248.61078 ]
[164.66795 0. 588.44946 261.22583 ]
[246.60812 93.04805 618.67426 248.01872 ]
[400.15796 336.73657 450.4243 380.4387 ]
[306.83762 328.26505 519.74115 393.41925 ]
[178.46791 20.020086 638.9267 154.33356 ]
[197.05843 52.80718 334.35858 147.37617 ]
[533.7555 229.08055 561.8308 268.14777 ]
[455.75137 327.14612 504.37048 371.77405 ]
[ 59.182163 282.64435 505.19406 421.88565 ]
[ 26.859505 388.26898 237.05745 461.26172 ]
[312.04608 0. 567.1143 198.15587 ]
[205.47354 25.19839 337.72504 236.4498 ]
[238.22566 407.087 471.11194 473.29214 ]]

while the ouput bbox in this repository is:

[[ 91.14841461, 255.71485901, 638.92669678, 477.20336914],
[ 71.61338043, 323.10354614, 638.92669678, 477.20336914],
[ 0. , 0. , 499.94268799, 170.06887817],
[ 202.8966217 , 205.51268005, 638.92669678, 446.46264648],
[ 0. , 16.99241638, 533.19714355, 417.59436035],
[ 298.3203125 , 2.68718195, 638.92669678, 149.85836792],
[ 190.87086487, 0. , 638.92669678, 238.85952759],
[ 454.37680054, 278.24752808, 473.9914856 , 312.40264893],
[ 0. , 339.8447876 , 376.48498535, 450.60623169],
[ 513.50311279, 349.7003479 , 631.64971924, 468.46160889],
[ 22.46875763, 228.85710144, 462.55206299, 368.88565063],
[ 162.73419189, 40.60558701, 343.64959717, 231.47537231],
[ 246.15264893, 258.07592773, 284.59838867, 318.48400879],
[ 0. , 0. , 216.33488464, 186.10873413],
[ 299.89395142, 72.79099274, 638.92669678, 251.35600281],
[ 202.20649719, 17.4803524 , 638.92669678, 141.70742798],
[ 45.16207123, 182.70654297, 229.69351196, 251.06326294],
[ 399.81082153, 337.11459351, 450.15582275, 380.50622559],
[ 308.40762329, 329.45181274, 522.77636719, 393.75430298],
[ 176.18048096, 57.12338257, 368.40402222, 249.69836426],
[ 535.86181641, 233.0916748 , 565.06640625, 285.69943237],
[ 101.26475525, 0. , 337.56744385, 227.04043579],
[ 0. , 312.76931763, 325.76217651, 437.45617676],
[ 460.67947388, 339.67019653, 638.92669678, 477.20336914],
[ 515.121521 , 160.04045105, 638.92669678, 345.72409058],
[ 318.59381104, 324.57019043, 407.98672485, 406.04110718],
[ 454.62234497, 327.0267334 , 506.03155518, 372.97406006],
[ 197.41650391, 52.59553146, 333.95153809, 145.55369568],
[ 314.03543091, 0. , 571.31677246, 201.28834534]]

Object and Attribute Vocabularies

Hi :-)

First of all: Thank you for your awesome work! It saves me massive amounts of time and is really useful for my research!

I have a question regarding the object and attribute vocabularies: Where do they come from? On the VG Dataset website, I can only find very different information see VG Objects and VG Attributes . And VG is the dataset the models are trained on, right?
Also on peteanderson80/bottom-up-attention, I can't find anything that would answer my question...

Thanks in advance!

Config files for training missing

Hi, thanks for this project - would be great to get rid of the awkward caffe dependency from the original project. :)

Would it be possible for you to share the training config file?
configs/bua-caffe/train-bua-caffe-r101.yaml

I realize it says you're working on getting the same results, but are the current results very far from the original results or somewhat close?

/ Annika

Question about new version of Detectron2

Thanks very much for the project! Great work!

May I ask if you have plan to integrate with the latest version of detectron2? It seems like current version does not compile correct for CUDA11, and the latest version of detectron2. It would be great if it is possible to integrate with the newer version of detectron2. Thanks for the help!

Looking forward to the newest detectron

Wonderful job! I have run your program and finally got nice feature.
It is great if you support the newest detectron.

A problem met when extract the features

Number of images: 5.
2021-05-06 18:01:20,386 INFO services.py:1267 -- View the Ray dashboard at http://127.0.0.1:8265
Number of GPUs: 1.
0%| | 0/5 00:00<?, ?it/s /home/wwy/anaconda3/lib/python3.8/site-packages/ray/autoscaler/_private/cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via pip install 'ray[default]'. Please update your install command.
(raylet) warnings.warn(
(pid=17184) Number of images on split0: 5.
(pid=17184) ResNet.make_stage(first_stride=) is deprecated! Use 'stride_per_block' or 'stride' instead.
(pid=17184) ResNet.make_stage(first_stride=) is deprecated! Use 'stride_per_block' or 'stride' instead.
(pid=17184) ResNet.make_stage(first_stride=) is deprecated! Use 'stride_per_block' or 'stride' instead.
(pid=17184)
2021-05-06 18:01:30,167 ERROR worker.py:1056 -- Possible unhandled error from worker: ray::extract_feat() (pid=17184, ip=114.212.115.213)
File "python/ray/_raylet.pyx", line 505, in ray._raylet.execute_task
File "/home/wwy/PycharmProjects/pythonProject/bottom-up-attention.pytorch/extract_features.py", line 89, in extract_feat
DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
File "/home/wwy/anaconda3/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 215, in resume_or_load
return self.load(path, checkpointables=[])
File "/home/wwy/anaconda3/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 141, in load
assert os.path.isfile(path), "Checkpoint {} not found!".format(path)
AssertionError: Checkpoint bua-caffe-frcn-r101_with_attributes.pth not found!

Feature extractions aren't being saved anywhere

Hey! Thanks for this gem of a repo!

I'm trying to extract bboxes and features using extract_features.py (also tested with extracted_features_faster.py but the behaviour is the same). The loop runs fine, but the generated npz files don't seem to be saved in the specified args directory. Nor can I find them anywhere else... the command I'm running:

python3 extract_features.py \
--image-dir /data/lama/mscoco/images/val2014/ \
--bbox-dir data/bbox/ \
--out-dir data/roi/ \
--extract-mode bbox_feats

resnet152 two-stage features extraction

Can I merely change the "MODE" from "1" to "2" when "bbox-only" and "3" when "gt-bbox", and change the "PROPOSAL_GENERATOR" into "PrecomputedProposals" when "gt-bbox" to create a two-stage features extraction configuration in the file "extract-bua-caffe-r152.yaml"? What's more, you mentioned the two-stage procedure outperforms slightly in README.md, so does the resnet152 case?

Does it support CPU only?

Hi, thanks for releasing this awesome work. Looks like it requires at least 1 GPU for feature extraction. I'm wondering if it can also support pure CPU environment? Thanks!

关于特征提取的问题

您好，感谢您pytorch版本的代码
现在有这样一个问题，如果我想根据指定的proposal box（已经有了box），提取图中相应proposal_box的feature，我看到您代码中两阶段的训练的策略和这个很相似，但是不是很明白二阶段的特征提取，是通过已有的box指导生成新的box的特征，还是它就是提取的输入的box的特征呢？
希望得到您的指点。

ImportError: cannot import name '_C'

I run extract_features.py and report an error:
Traceback (most recent call last):
File "/home/amax/renpengzhen/21Cross-Modal Retrieval/RefCode/bottom-up-attention.pytorch-master/extract_features.py", line 24, in
from utils.utils import mkdir, save_features
File "/home/amax/renpengzhen/21Cross-Modal Retrieval/RefCode/bottom-up-attention.pytorch-master/utils/init.py", line 1, in
from .utils import save_features
File "/home/amax/renpengzhen/21Cross-Modal Retrieval/RefCode/bottom-up-attention.pytorch-master/utils/utils.py", line 7, in
from models.bua.layers.nms import nms
File "/home/amax/renpengzhen/21Cross-Modal Retrieval/RefCode/bottom-up-attention.pytorch-master/models/init.py", line 1, in
from .bua import add_bottom_up_attention_config
File "/home/amax/renpengzhen/21Cross-Modal Retrieval/RefCode/bottom-up-attention.pytorch-master/models/bua/init.py", line 4, in
from .roi_heads import BUACaffeRes5ROIHeads
File "/home/amax/renpengzhen/21Cross-Modal Retrieval/RefCode/bottom-up-attention.pytorch-master/models/bua/roi_heads.py", line 17, in
from .fast_rcnn import BUACaffeFastRCNNOutputs, BUACaffeFastRCNNOutputLayers, BUADetection2FastRCNNOutputs, BUADetectron2FastRCNNOutputLayers
File "/home/amax/renpengzhen/21Cross-Modal Retrieval/RefCode/bottom-up-attention.pytorch-master/models/bua/fast_rcnn.py", line 15, in
from .layers.nms import batched_nms
File "/home/amax/renpengzhen/21Cross-Modal Retrieval/RefCode/bottom-up-attention.pytorch-master/models/bua/layers/nms.py", line 3, in
from models.bua import _C
ImportError: cannot import name '_C'