Giter VIP home page Giter VIP logo

Comments (9)

Hakuyume avatar Hakuyume commented on July 27, 2024

When number of bounding boxes per image is varying, use (R', 4) and (R',).

Let me confirm. (R', 4) is the coordinates of bounding boxes and (R') is the indices of corresponding image in input batch. Is it right?

I think this is a trade-off problem between simplicity of APIs and efficiency.
If we use list of (R', 4) format for batched images, the APIs will be more simple because (R', 4) and (R',) format is very different from other two. We can convert it to (R', 4) and (R',) format internally if needed. As you pointed out, this conversion will cause an overhead. Actually, I don't know how large the overhead will be. But I guess the overhead will not be so large and we can choose simplicity of APIs.

from chainercv.

yuyu2172 avatar yuyu2172 commented on July 27, 2024

Let me confirm. (R', 4) is the coordinates of bounding boxes and (R') is the indices of corresponding image in input batch. Is it right?

Yes. Thanks for confirming.

from chainercv.

yuyu2172 avatar yuyu2172 commented on July 27, 2024

As I pointed out in the initial post, the convention needs to be carried on when bbox comes with other data type.
This means that variable sized batch of images need to be represented by the same convention used for bbox.
Concatenating list of batched images may introduce non-negligible overheads.

from chainercv.

Hakuyume avatar Hakuyume commented on July 27, 2024

Concatenating list of batched images may introduce non-negligible overheads.

Perhaps, I am misunderstanding. Concatenating list of batched images is list of (B, C, H, W) -> (N, B, C, H, W)?

from chainercv.

yuyu2172 avatar yuyu2172 commented on July 27, 2024

Concatenating list of batched images means [(R_1, C, H, W) ... , (R_B, C, H, W)] -> (R_1 + ... + R_B, C, H, W).

from chainercv.

yuyu2172 avatar yuyu2172 commented on July 27, 2024

In terms of simplicity, I am not sure if list of bbox is better.
Since you need to convert batch array representation back into list representation whenever exposing values publicly, extra conversion code becomes necessary making internal code less obvious.

For example, I tried this convention in a building block of Faster RCNN, and it required me some extra conversion codes that do not look nice to me.
In the code, a function takes (B, C, H, W) and list of bboxes as inputs and returns list of (R, C).

https://github.com/yuyu2172/chainercv/blob/4e25beb116336c4c7c1462b752f38937cae1a2db/chainercv/links/model/faster_rcnn/faster_rcnn_vgg.py#L94

from chainercv.

yuyu2172 avatar yuyu2172 commented on July 27, 2024

@Hakuyume

What do you think?

from chainercv.

Hakuyume avatar Hakuyume commented on July 27, 2024

Since you need to convert batch array representation back into list representation whenever exposing values publicly, extra conversion code becomes necessary making internal code less obvious.

What I was talking about is simplicity of APIs. I thought less representations are better. As you pointed out, internal conversion code makes the code a little complicated but I think it is acceptable.

from chainercv.

yuyu2172 avatar yuyu2172 commented on July 27, 2024

Here are rules we will be using.

  • bb is a (R,)
  • bbox is a (R, 4)
  • bboxes is a (B, R, 4) or list of (R_i, 4) i=1, ..., B.
  • rois is a (R', 4) which consists of bounding boxes for multiple images. Assuming that there are B images each containing R_i bounding boxes, R' = \sum R_i. rois comes together with a (R',) array called batch_indices, which contains batch indices of images to which bounding boxes correspond to.

from chainercv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.