lmb-freiburg / rgbd-pose3d Goto Github PK
View Code? Open in Web Editor NEW3D Human Pose Estimation in RGBD Images for Robotic Task Learning
Home Page: https://lmb.informatik.uni-freiburg.de/projects/rgbd-pose3d/
3D Human Pose Estimation in RGBD Images for Robotic Task Learning
Home Page: https://lmb.informatik.uni-freiburg.de/projects/rgbd-pose3d/
Hi,
I am trying to run your 3D human detection with RGB-D camera (Intel RealSense R200) and I would like to know what should be the hardware requirements? I have two GPUs detected by Tensorflow (Nvidia Quadro P4000 (8GB memory) and Nvidia GeForce GTX 1050 (4GB memory)) and I managed to run forward_pass.py on one and ROSNode.py on another but I am getting warnning : "Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.40GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.". So I could live with that if I get any meaningful result, but now TF tree of a human doesn't look correct. Now I am not sure is it because of the lack of memory or something else is wrong.
(btw I can run OpenPose 2D detection on three cameras at the same time with this GPU configuration)
Thank you in advance and looking forward to your response!
Hi there
Amazing work. We've started using your model on the output of an Azure Kinect DK, because the model including in their SDK does not handle occlusions very well.
One odd thing we noticed is that the 2d keypoints, when visualized on a video don't seem to move smoothly, along with the real joints. They take discrete jumps of a certain number of pixels, as if the model output was constrained by a low resolution grid.
Is this to be expected?
Thank you!
Hey there,
Thanks for your work!
I'm currently facing a problem of aligning depth map and rgb frame from your dataset.
I'm currently following an idea of projecting depth map to camera, then transform it to the world coordinate and the project it back to the color image plane. For the sake of testing, I've tried doing this without paying attention to depth and trying using the API you've provided as much as possible.
So the steps I'm doing right now
def project_from_view(depth, camid, calib_data):
""" project from pixel plane to the camera coordinates """
depth_map = np.array(depth)
intrinsics = calib_data[camid][0]
focals, opt_center = np.eye(3), np.eye(3)
focals[[0, 1], [0, 1]] = [intrinsics[0][0], intrinsics[1][1]]
opt_center[[0, 1], [2, 2]] = [intrinsics[0][2], intrinsics[1][2]]
pts = np.vstack((np.where(depth_map != None), np.full(depth_map.shape[0] * depth_map.shape[1], -1)))
pts = np.linalg.inv(focals).dot(opt_center.dot(pts))
pts[2, :] = 1
pts = pts * depth_map.flatten()
return pts.T
Projecting to world coordinate frame with the function from your API:
trafo_cam2world(pts_cam_d, depth_cam_id, calib_frame)
Projecting from world to image plane with the function from your API:
project_from_world_to_view(pts_world, color_cam_id, calib_frame)
After applying those steps I get the following map:
The depth map at the bottom right is shifted, but I'm sure that I've used the correct intrinsics and extrinsics.
Moreover, I've tested my projection function separately and do not think that there might be an issue here.
Is it a normal behavior? Still hard to believe that reprojection gives such a big displacement while 3D points are perfectly mapped and aligned.
Hello,
I'm currently downloading the weights and the repo (it's going to take a while). I wanted to try to run the network on a low powered robot (Atom 1.6Ghz 4x core, no dedicated gpu). I saw in the comments of forward_pass.py
1-2Hz results on beefy GPUs. Do you think it is worth testing on my use-case?
Thank you.
Hi, thank you for your work!
I wonder how to train VoxelPoseNet? The inputs to it are 2D groundtruth? or 2D results predicted by 2D network? And is it possible to cascade 2D and 3D network making it an end-to-end network?
Very appreciated for your help and quick reply~
There is another question about wrapped depth image. As your article mentioned, the depth map is transformed into the color frame using the camera calibration.
The intrinsic calibration data in 'forward_pass.py' seems to be the product of depth cam's intrinsic parameters and scaling resolution ratios.
Why not use the color cam's intrinsic calibration directly when depth map was already in the color space?
hello ,I am interested in your work .i have a little problem . i read your paper and i see you compare your work with named 'Naive Lifting' .but you don not tell 'Naive Lifting' is whose work. i want to study futher it .can you tell me 'Naive Lifting' is whose works. thank you
Hello,I am very interested in your work, Can you provide the code for training? look forward to your reply.
Thanks for your work!
I follow your voxelposenet, rewrite it using Tensorlayer.
The 3D network‘s result from my own data is not so good, the channel order may be reason.
The annotations of function 'PoseNet3D._detect_scorevol' indicate "Tensor scorevolume is [1, D, H, W, C]".
I didn't found channel rearrangement in your following codes. But final result use the order of 'xyz'.
Any answers? Thanks a lot.
Hi, thank you so much for your work. We are interested in generating a pedestrian walking dataset using a similar approach. However due to the limited resources we have we want to first consult on how others have done it. Would you mind sharing with us a brief overview of your Kinect calibration method? Did you use any special technique to make it work?
Thanks for the code!
But I simply can't draw a 3D plot with forward_pass.py I tried to call the 3d limb Coco function but it doesn't look like the teaser, and hand_plot_3d simply don't work. What is the right way for me to get a plot like the teaser?
I found that this repo only contain test code, can you release your training code?
Hi,
I am trying to run this code with Ubuntu 18.04 and CUDA 11.
More info about my driver card
NVIDIA-SMI 460.80
Driver Version: 460.80
CUDA Version: 11.2
I am encountering a series of problems with my recent configuration.
I tried to downgrade to CUDA 10 but nvidia drivers 418 (required for CUDA 10) are not compatible with my graphic card.
Are you planning on updating this code with more recent libraries?
Do you have any suggestion about how to you your module with more recent HW configurations?
Thanks
at first, It is very simple for everyone to obtain the 3d coordinates of the human or other things surface points;
secondly, as I understand, opencv sfm is a method to get things surface points by mutilview camera;
so, I'm very confused and wonder to know why you use sfm instead of depth camera, is that some reasons?
Hi, thanks for sharing the results of your research! When trying out your ROS node, I noticed there is sometimes quite a bit of latency even though you set queue_size
to 1 on both image subscribers. This is usually caused by a too small buff_size
. Basically, the trick is that whenever you subscribe to large messages (such as images) and want to achieve low latency by setting a small queue_size
, to supply the keyword argument buff_size
to the constructor and set it to queue_size * avg_msg_size_in_bytes
.
In your code example, I set buff_size to 10000000
(~10 MB), such that an average image message should now easily fit in there. This way, the messages won't be queued up anymore in the operating system's buffer that is inducing the latency (and you may not even need a separate processing thread anymore).
This issue is only relevant for rospy, not roscpp. See here for further details: ros/ros_comm#536
Hi, thank you for releasing the code. I just had a question about the way the results are reported in your paper. Though you have compared the results with Tome et al., it is only visually expressed. Is there any additional resource online where you compare the results in terms of mean per joint position error (MPJPE) as well?
Thanks!
Thank you for your great work.
I want to reproduce your result.But I can't find the dataset for training and testing .
Is this dataset published? If published , can you tell me where can I find the dataset ?
Thanks in advance.
Can you offer me this Covert Code (gt3d to voxel V)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.