Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

question about camera intrinsics about tsdf-fusion HOT 7 CLOSED

andyzeng commented on May 18, 2024

question about camera intrinsics

from tsdf-fusion.

Comments (7)

andyzeng commented on May 18, 2024

Hello!

For the camera intrinsics, yes - the camera intrinsics differ between the data in the old and new versions because I used different cameras. The old version uses a sequence of depth frames I captured with a RealSense F200, while the new version uses a sequence of depth frames from the Microsoft 7-Scenes dataset, which uses a Kinect. Camera intrinsics are usually unique to each camera. However, default camera intrinsics for the Kinect, principle point (320,240) and focal length (585,585), are usually a good enough approximation for the Kinect, so you can start with that.

But if you care about achieving maximum accuracy for your 3D data (which in some applications, is important), you can run this calibration procedure with OpenNI with a checkerboard to estimate a more accurate intrinsics for your camera.

The camera pose (extrinsics) is a rigid transformation (consisting of a rotation matrix and translation vector) that describes the camera’s location in the world. Here is a pretty solid introduction to extrinsic camera matrices. They are usually estimated from SLAM, SfM, or other camera localization and reconstruction algorithms.

When changing the number of frames from 50 to 1, the voxel volume is still generated. The reason why you’re not seeing a point cloud is because of the function call SaveVoxelGrid2SurfacePointCloud. By default, the last parameter weight_thresh is set to 1, which tells the function to only visualize voxels with weight values greater than 1 (weight values are determined by the number of frames the voxel was seen). After changing num_frames = 1;, try replacing line 188 - 190 of demo.cu with the following:

SaveVoxelGrid2SurfacePointCloud("tsdf.ply", voxel_grid_dim_x, voxel_grid_dim_y, voxel_grid_dim_z, 
                                  voxel_size, voxel_grid_origin_x, voxel_grid_origin_y, voxel_grid_origin_z,
                                  voxel_grid_TSDF, voxel_grid_weight, 0.2f, 0.0f);

And the code will produce a point cloud visualization of the voxel volume from just one depth frame.

from tsdf-fusion.

Azpril45 commented on May 18, 2024

Dear @andyzeng

Thank you for your answer.

In my own project, the depth image is like this:

I look through the information you told me and finally got the camera intrinsic and extrinsic matrices like this:
intrinsic matrix [585 0 320
0 585 240
0 0 1 ]
extrinsic matrix [1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1]
I changed the function SaveVoxelGrid2SurfacePointCloud as you said and that actually worked. But when I tried to generate the 3d volume data using the image above as input, nothing generated. I thought the reason may be the format of the input data. So I use Matlab to open the depth image and found that the data you provided are uint16 but mine are uint8.
I am wondering if the format of the depth image actually causes the generation problem or there are more parameters (like voxel_grid_origin_xyz) I have to change in order to generate my own 3d volume data.

Thank you
Sincerely yours
Tony

from tsdf-fusion.

andyzeng commented on May 18, 2024

The format of the depth image is likely the problem. You will have to modify the function ReadDepth in utils.hpp to load in the correct depth values from your depth image format.

from tsdf-fusion.

Azpril45 commented on May 18, 2024

Dear @andyzeng

Thank you for your kind answer.
It turned out the problem caused by the format of the depth image. After some preprocessing to the depth image, the 3d voxel data successfully generated. Thank you for all your help.
But still some parameters confused me.
First is voxel grid origin. The comment here is that this is the location of voxel grid origin in base frame camera coordinates.
float voxel_grid_origin_x = -1.5f; // Location of voxel grid origin in base frame camera coordinates
float voxel_grid_origin_y = -1.5f;
float voxel_grid_origin_z = 0.5f;
I tried to change this parameter(like set all of them to 0.0f) and generated a totally drifferet volume data. I'm wondering why you set voxel grid origin like this.
Second is trunc_margin. I just want to know why you set this parameter as voxel_size *5 and how it works in the TSDF computation processing.
Last is about the voxel_grid_dim. In my opinion, this is the resolution of the TSDF volume. It means that I can generate a TSDF volume data with different resolutions?
If I got a depth image like this

It is a crop depth image(128x128) from the original depth image.
I want to generate a TSDF volume with the resolution 32x32x32. How should I set these parameters?

Thank you
Sincerely yours
Tony

from tsdf-fusion.

andyzeng commented on May 18, 2024

Re voxel grid origin: if you imagine your TSDF volume as a 3D box that is axis-aligned in 3D camera space, the voxel grid origin defines the location of the origin corner of the volume. By setting the voxel grid origin to (0,0,0) in camera coordinates, you're translating the 3D box so that the origin corner lies on the camera location - hence giving you different volumetric data. Moving voxel grid origin will move your 3D box.

Re trunc_margin: a regular distance field would have values from 0 (close to surface) all the way to infinity (far from surface). trunc_margin defines when to cut the distance field (hence the term "truncated") so that you don't integrate distance values too far away from the surface. For more information on volumetric integration, I would recommend taking a look at this.

Re TSDF volume resolution: voxel_grid_dim is the number of voxels that make up your volume. On the other hand, voxel_size determines the size of your voxels. You will need to change both voxel_grid_dim and voxel_size to generate TSDF volume data with different resolutions.

For your hand example, project the depth data of the hand into camera coordinates, and find a reasonable location to define the voxel grid origin. You can change voxel_grid_dim to be 32x32x32, and then change voxel_size so that the voxel volume encompasses the whole hand.

from tsdf-fusion.

Azpril45 commented on May 18, 2024

Dear @andyzeng

Thank you for your kindness.
In my case, I first segmented the hand as foreground from a depth image(640x480) and cropped the hand region into 128x128. Then, I calculated the center of mass(COM). I want to align this COM to the voxel grid origin and calculate a TSDF with the resolution of 60x60x60 voxels. Each voxel represents a space of 5x5x5 mm, so the whole TSDF expands a space of 300x300x300 mm.
I changed the parameters like this:
int im_width = 128;
int im_height = 128;
float voxel_grid_origin_x = -1.5f;
float voxel_grid_origin_y = -1.5f;
float voxel_grid_origin_z = 0.5f;
float voxel_size = 0.005f;
float trunc_margin = voxel_size * 10;
int voxel_grid_dim_x = 60;
int voxel_grid_dim_y = 60;
int voxel_grid_dim_z = 60;
But it failed. However, if I keep all the parameters the same, just simply changed the voxel grid dim to 500x500x500, the TSDF generated successfully. I don't know why. Does the voxel grid origin cause the problem?
So now, I still confused about how to set the voxel grid origin.
I have already calculated the center of mass in image coordinate which is (64,64). According to what you said, voxel grid origin defines the location of the origin corner of the volume. Why you set the voxel grid origin to (-1.5, -1.5, 0.5)? It seems like this voxel grid origin can work well if the input depth image is 640x480.
According to what you told me last time, I should project the depth data of the hand into camera coordinates. I don't know the meaning of this sentence. Can you explain some details about it?

Thank you
Sincerely yours
Tony

from tsdf-fusion.

andyzeng commented on May 18, 2024

Re camera coordinates: in order to create a 3D point cloud using a depth image (like this piece of code), you typically use the camera intrinsics in order to project the depth values into 3D coordinate space. This 3D coordinate space is called the camera coordinate space.

Now back to the voxel grid origin: keep in mind that the voxel grid origin is not the the center of the voxel volume. If your TSDF voxel grid is 60x60x60 voxels (where voxels coordinates range from (0,0,0) to (60,60,60)), the voxel grid origin is the 3D location of voxel (0,0,0) in camera coordinate space. In other words, it is a "corner" of the voxel grid, not the "middle" of it.

For your problem: yes, you will need to set the right voxel grid origin. After using the camera intrinsics to project the depth data of the hand into camera coordinates, you should have a 3D camera coordinate location for each pixel. Your voxel grid will be a 3D bounding box around the 3D locations of the pixels that represent the hand. Find the smallest 3D location of all of those 3D points, and that will be the location of your desired voxel grid origin.

With that said, I highly recommend searching online for some academic resources that can introduce you to the basic concepts of 3D vision (such as this or this). The computer vision course that I TA'd at Princeton also has some nice introductory content for 3D vision (see course slides here). You will need a good understanding of these topics before understanding what the code in this repository does.

from tsdf-fusion.

question about camera intrinsics about tsdf-fusion HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent