Thanks for this work.
My question might be a bit irrelevant to this.
I have created a simulated knee environment in Blender and for every frame, I have the depth, camera pose, and the rendered image. Using these three ingredients, I should be able to reconstruct the 3 point cloud and mesh of the environment using other approaches such as TSDF (https://github.com/andyzeng/tsdf-fusion-python). But that is not happening. Even though I put the correct camera matrix intrinsic (I think).
To cross check the Blender camera pose info, I used only the depth images and Kinect Fusion to estimate the camera pose and used the estimated camera pose info for depth image on TSDF and still it did not work. I tried the same approach (estimating the camera pose using Kinect Fusion) on the demo data of the TSDF and it worked (Kinect Fuseion-estimate pose + groundtruth depth). That means the Kinect Fusion could produce the correct camera pose info as good as groundtruth pose, to be used with TSDF. This narrows down the issue I have to the groundtruth depth that Blender produces.
My question is that, do I need to do any transformation on the depth images that Blender produces?
I see you are talking about making a conversion between the Camera Pose and the DSO. I wonder I might need to do the same to be able to get the TSDF algorithm working.