Giter VIP home page Giter VIP logo

procamsolver's Introduction

#ProCamSolver 0.1

This is a C++ library for reconstructing a 3D scene from a number of 2D images of the scene. Using the library, the 3D coordinates of points in the scene can be calculated as well as the position/orientation/intrinsics of each camera. It is assumed that a set of pixel correspondences has already been calculated (that is, we have pairs of pixels - one from each of two different images - that correspond to the same 3D point).

If there are enough 2D images of the same 3D scene then a set of pixel correspondences define a 3D reconstruction of the scene (the 'shape') and the position, orientation and intrinsics of each camera (the 'motion') up to a projective transofrmation. This library provides various methods of finding these and a convenient platform for developing new methods. It also provides methods of finding the projective transofrmation that leads to a Euclidean (zero skew) set of cameras.

API documentation can be found here.

The code is pretty much working at the moment but isn't finished, I'll be tidying it up, tesing it, adding more documentation and more functionality over the next few weeks

Introduction

Quick mathematical background

For the mathematically minded, the problem can be expressed most succinctly in terms of three matrices: The 'shape matrix', the 'motion matrix' and the 'measurement matrix'. The shape matrix represents the 3D coordinates of the points in the scene in a 4xN matrix whose columns are of the form (x_n,y_n,z_n,1)^T, where N is the number of 3D points. The motion matrix consists of the camera projection matrices of each view placed one on top of the other in a 3Mx4 matrix, where M is the number of views. The measurement matrix is a representation of the pixel correspondences in a 3MxN matrix where each column represents to the correspondences of one 3D point, with the 2D pixel coordinates of each view placed one on top of the other. The pixel coordinates are of the form s(x,y,1)^T where s is a scalar called the 'projective depth'.

In this form, these matrices have the simple relationship:

E = P * S

where

E = measurement matrix

P = motion matrix

S = shape matrix

The difficulty is that we don't know P, S or the projective depths of E. However, because P and S have only 4 columns/rows respectively, we know that E must be a rank 4 matrix, and this is often enough to give a unique solution.up to a projective transformation.

Overview of methodology

  1. Use the Measurement Matrix to find minimum Spanning Tree of views and create Fundamental Matrices for branches of the tree. Use Martinec Pajda's SVD method to approximate the Motion Matrix.
  2. Use the Motion Matrix from step 1 to fill in occlusions in the Measurement Matrix. Perform standard SVD on the Measurement Matrix to get the second level approximation of the Motion Matrix. 2b. Euclidean Lift the Motion Matrix.
  3. Use Levenberg-Marquardt to minimise the Reprojection Error which gives the final Motion Matrix.
  4. Use KR Decomposition to recover Intrinsics and Extrinsics.

Usage

  1. Fill the Measurement Matrix (e.g. with the CorrespenceSet class, the first set of indices are for cameras, whilst remaining indices are projectors).
  2. Use MotionMatrix::svd_solve(const MeasurementMatrix &) to gain the first level approximation by minimising the Frobenius Norm. This step performs 4 things (as described here). This stage gives you a solution with an arbitrary projective transformation.
  3. Use MotionMatrix::diac_euclidean_lift(int N), where N is the number of views which are cameras (i.e. not projectors), this transforms our matrices in the Motion Matrix into Euclidean transforms. Optionally this can also be performed with MotionMatrix::diac_euclidean_lift(int N, ImageTransform<F> &) where we input F known intrinsics (i.e. for F cameras).
  4. Perform Bundle Adjustment using MotionMatrix::bundle_adjust(int f, const MeasurementMatrix &), this is a levmar fit to minimise Reprojection Error where f is the number of cameras. N.B. code also available to perform this stage with known intrinsics.
  5. (Optional) To create a 3D scan, create a ShapeMatrix and use ShapeMatrix::solve(MeasurementMatrix &, MotionMatrix &).
  6. Use MotionMatrix::KR_decompose(ImageTransform<V>& K, ImageTransform<V>& R) to get Intrinsics (K) and Extrinsics (R) where V is view count, and use MotionMatrix::T(i) to get the translation in projection space, use K.view(i).inverse() * R.view(i).inverse() * MotionMatrix::T(i) to recover the world translation of the view.

Further work

RANSAC implementation

Implement RANSAC into FundamentalMatrix class which will solve against subsets of the MeasurementMatrix, and then mask the MeasurementMatrix (by deletion of datapoints) for further calculation downstream.

RANSAC is best performed on step 1 (Fundamental Matrix) of here. Svoboda et al (A Convenient Multi-Camera Self-Calibration for Virtual Environment) also perform this on the other steps

Better sparse matrix methods

With large datasets, we should be selecting a good subset to work with which is an involved process. A strategy would need to be made to find the best way of summarising the data without losing important features.

Use the sparse matrix in the bundle adjustment

Currently we're filling occlusions before this stage by simulating the 3d points.

Standalone app

Since we template against number of views, we could create version of the classes for 3-10 views for example (3 views is the minimum requirement).

procamsolver's People

Contributors

elliotwoods avatar danftang avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.