Giter VIP home page Giter VIP logo

Comments (1)

DjuLee avatar DjuLee commented on August 16, 2024

The AAAI paper might be helpful to help answer both your questions but here are some hopefully clarifying points:

  1. there is no explicit tracking in the traditional sense (where you track an object with its bounding box coordinates). There is no white circle coordinates to track, the network simply does pixel-wise occupancy prediction (and occupancy is represented in white). The tracking consists in the following: at training time the network learns to capture patterns of occupied cells over several frames (i.e dynamics of occupancy). It is then able to predict pixel-wise occupancy into the future/occlusion given those learned patterns. Again, there is no explicit understanding of objects, but the network is able to capture patterns of pixels moving together and though it is doing pixel-wise prediction (rather than object prediction in the form of bounding boxes) it is able to predict coherent occupancy grids because it is the easiest way for the network to make sense of the visible input.

  2. We wish to predict pixel-wise occupancy, which is either 0 or 1. So it's a binary output, and a relevant distribution to capture this output is the binary distribution. If we had ground truth for the entire output occupancy, we would use the Binary Cross Entropy (BCE) loss as provided by TORCH, calculated for every output pixel using the ground truth target occupancy. However, we do not have ground truth for the entire output, since we have only partial observations of the scene. We do not know what happens in natural occlusions, so we cannot calculate a loss on pixels that are occluded. In order to not penalise the network for its predictions on occluded cells/pixels, we decide to ignore these cells when calculating the BCE loss. We do this by masking our output prediction with the visibility grid - the visibility grid gives a 1 to the pixels that are visible (either occupied or free), and 0 to those occluded. In doing so, we are only considering the loss from the cells that are visible. This is why the file is named WeightedBCE. If you look at line 27, that is the BCE loss, where input corresponds to the network prediction, and target is the visible part of the ground truth. The masking of the prediction with the visibility mask happens on line 29 and 31, where the input (i.e network prediction) is dot multiplied by the visibility mask (weights). The additional eps (epsilon) value is there to make sure we don't calculate the log of an empty input. It's just for numerical reasons. A similar masking of the gradients occurs in the updatedGradInput function.

from deeptracking.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.