lmb-freiburg / rgbd-pose3d Goto Github PK

View Code? Open in Web Editor NEW

192.0 192.0 33.0 1.03 MB

3D Human Pose Estimation in RGBD Images for Robotic Task Learning

Home Page: https://lmb.informatik.uni-freiburg.de/projects/rgbd-pose3d/

Python 100.00%

rgbd-pose3d's People

Contributors

Stargazers

Watchers

rgbd-pose3d's Issues

Hardware requirements?

Hi,

I am trying to run your 3D human detection with RGB-D camera (Intel RealSense R200) and I would like to know what should be the hardware requirements? I have two GPUs detected by Tensorflow (Nvidia Quadro P4000 (8GB memory) and Nvidia GeForce GTX 1050 (4GB memory)) and I managed to run forward_pass.py on one and ROSNode.py on another but I am getting warnning : "Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.40GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.". So I could live with that if I get any meaningful result, but now TF tree of a human doesn't look correct. Now I am not sure is it because of the lack of memory or something else is wrong.
(btw I can run OpenPose 2D detection on three cameras at the same time with this GPU configuration)

Thank you in advance and looking forward to your response!

2D projection resulting in points that 'jump around', as if snapping to a low resolution grid

Hi there

Amazing work. We've started using your model on the output of an Azure Kinect DK, because the model including in their SDK does not handle occlusions very well.

One odd thing we noticed is that the 2d keypoints, when visualized on a video don't seem to move smoothly, along with the real joints. They take discrete jumps of a certain number of pixels, as if the model output was constrained by a low resolution grid.

Is this to be expected?

Thank you!

RGB to depth alignment

Hey there,

Thanks for your work!

I'm currently facing a problem of aligning depth map and rgb frame from your dataset.

I'm currently following an idea of projecting depth map to camera, then transform it to the world coordinate and the project it back to the color image plane. For the sake of testing, I've tried doing this without paying attention to depth and trying using the API you've provided as much as possible.

So the steps I'm doing right now

Project from depth pixel space to depth camera with my own function

def project_from_view(depth, camid, calib_data):
""" project from pixel plane to the camera coordinates """
depth_map = np.array(depth)
intrinsics = calib_data[camid][0]
focals, opt_center = np.eye(3), np.eye(3)
focals[[0, 1], [0, 1]] = [intrinsics[0][0], intrinsics[1][1]]
opt_center[[0, 1], [2, 2]] = [intrinsics[0][2], intrinsics[1][2]]
pts = np.vstack((np.where(depth_map != None), np.full(depth_map.shape[0] * depth_map.shape[1], -1)))
pts = np.linalg.inv(focals).dot(opt_center.dot(pts))
pts[2, :] = 1
pts = pts * depth_map.flatten()
return pts.T

Projecting to world coordinate frame with the function from your API:
trafo_cam2world(pts_cam_d, depth_cam_id, calib_frame)
Projecting from world to image plane with the function from your API:
project_from_world_to_view(pts_world, color_cam_id, calib_frame)

After applying those steps I get the following map:

The depth map at the bottom right is shifted, but I'm sure that I've used the correct intrinsics and extrinsics.

Moreover, I've tested my projection function separately and do not think that there might be an issue here.

Is it a normal behavior? Still hard to believe that reprojection gives such a big displacement while 3D points are perfectly mapped and aligned.

Possible to run on CPU?

Hello,

I'm currently downloading the weights and the repo (it's going to take a while). I wanted to try to run the network on a low powered robot (Atom 1.6Ghz 4x core, no dedicated gpu). I saw in the comments of forward_pass.py 1-2Hz results on beefy GPUs. Do you think it is worth testing on my use-case?

Thank you.

What are the inputs to VoxelPoseNet during training?

Hi, thank you for your work!
I wonder how to train VoxelPoseNet? The inputs to it are 2D groundtruth? or 2D results predicted by 2D network? And is it possible to cascade 2D and 3D network making it an end-to-end network?

How defines the intrinsic calibration data？

Very appreciated for your help and quick reply~
There is another question about wrapped depth image. As your article mentioned, the depth map is transformed into the color frame using the camera calibration.
The intrinsic calibration data in 'forward_pass.py' seems to be the product of depth cam's intrinsic parameters and scaling resolution ratios.
Why not use the color cam's intrinsic calibration directly when depth map was already in the color space?

I want to know 'Naive Lifting' is whose works

hello ,I am interested in your work .i have a little problem . i read your paper and i see you compare your work with named 'Naive Lifting' .but you don not tell 'Naive Lifting' is whose work. i want to study futher it .can you tell me 'Naive Lifting' is whose works. thank you

Can you provide the code for training

Hello，I am very interested in your work, Can you provide the code for training? look forward to your reply.

Which is the channel order of Network output, [1, D, H, W, C] or [1, W, H, D, C] ?

Thanks for your work!
I follow your voxelposenet, rewrite it using Tensorlayer.
The 3D network‘s result from my own data is not so good, the channel order may be reason.
The annotations of function 'PoseNet3D._detect_scorevol' indicate "Tensor scorevolume is [1, D, H, W, C]".
I didn't found channel rearrangement in your following codes. But final result use the order of 'xyz'.

Any answers? Thanks a lot.

How did you calibrate the Kinect v2 devices when collecting the MKV dataset?

Hi, thank you so much for your work. We are interested in generating a pedestrian walking dataset using a similar approach. However due to the limited resources we have we want to first consult on how others have done it. Would you mind sharing with us a brief overview of your Kinect calibration method? Did you use any special technique to make it work?

Problem with 3d plot

Thanks for the code!
But I simply can't draw a 3D plot with forward_pass.py I tried to call the 3d limb Coco function but it doesn't look like the teaser, and hand_plot_3d simply don't work. What is the right way for me to get a plot like the teaser?

request for training code

I found that this repo only contain test code, can you release your training code?

Compatibility issues

Hi,

I am trying to run this code with Ubuntu 18.04 and CUDA 11.
More info about my driver card
NVIDIA-SMI 460.80
Driver Version: 460.80
CUDA Version: 11.2

I am encountering a series of problems with my recent configuration.
I tried to downgrade to CUDA 10 but nvidia drivers 418 (required for CUDA 10) are not compatible with my graphic card.

Are you planning on updating this code with more recent libraries?
Do you have any suggestion about how to you your module with more recent HW configurations?

Thanks

why did you get 3d groundtruth by using opencv sfm instead of depth camera itself?

at first, It is very simple for everyone to obtain the 3d coordinates of the human or other things surface points;
secondly, as I understand, opencv sfm is a method to get things surface points by mutilview camera;
so, I'm very confused and wonder to know why you use sfm instead of depth camera, is that some reasons?

Latency of ROS node

Hi, thanks for sharing the results of your research! When trying out your ROS node, I noticed there is sometimes quite a bit of latency even though you set queue_size to 1 on both image subscribers. This is usually caused by a too small buff_size. Basically, the trick is that whenever you subscribe to large messages (such as images) and want to achieve low latency by setting a small queue_size, to supply the keyword argument buff_size to the constructor and set it to queue_size * avg_msg_size_in_bytes.

In your code example, I set buff_size to 10000000 (~10 MB), such that an average image message should now easily fit in there. This way, the messages won't be queued up anymore in the operating system's buffer that is inducing the latency (and you may not even need a separate processing thread anymore).

This issue is only relevant for rospy, not roscpp. See here for further details: ros/ros_comm#536

Measure of accuracy

Hi, thank you for releasing the code. I just had a question about the way the results are reported in your paper. Though you have compared the results with Tome et al., it is only visually expressed. Is there any additional resource online where you compare the results in terms of mean per joint position error (MPJPE) as well?

Thanks!

My development is on the win10, can I not use the ROS system and transplant to Win10, visualization with open3d or any other solution instead?

How to get the ground truth 3d keypoint locations through triangulation techniques?

Thanks a lot for this work. I'm interested in the collection of MKV dataset, because my research interest is 3d human pose estimation. Could you please give some detailed information about the "triangulation techniques"?
Any suggestions will be appreciated.

lmb-freiburg / rgbd-pose3d Goto Github PK

rgbd-pose3d's People

Contributors

Stargazers

Watchers

Forkers

rgbd-pose3d's Issues

Recommend Projects

Recommend Topics

Recommend Org