Giter VIP home page Giter VIP logo

Comments (13)

mzucker avatar mzucker commented on August 17, 2024 2

from apriltag.

mzucker avatar mzucker commented on August 17, 2024 1

Ok, commit 1d5d313 should have all the the functionality you want. Once you get a detection, you can call Detector.detection_pose() which returns a 4x4 rigid transformation matrix as well as some information about goodness of fit that you can discard if you don't care about it.

The apriltag.py demo code can demonstrate this, see README.md for details about how to enable pose detection from the command line. You'll need to know some basic parameters of your camera, as well as the physical dimensions of the tag.

One known problem regarding fiducial markers like apriltag is that detecting orientations from a single tag in an image can be really noisy if there are no strong perspective cues (translation is not as poorly affected). This is the computer vision version of the Necker cube ambiguity. There's no magic to avoiding this, but combining information from three or more tags can help (essentially you throw away the orientations of each tag, and extract the orientation from the combined tag centers).

Please let me know if the new demo code works for you!

from apriltag.

mzucker avatar mzucker commented on August 17, 2024

Is the Python wrapper for OpenCV available to you? If so, it should be possible to call cv2.solvePnP with the right arguments to get the 3D pose; otherwise, there is a C function homography_to_pose in the apriltag library that I can wrap in Python (see line 284 of homography.c in master).

Either way, I can add a pose reconstruction Python example to the repository in the next day or two, please stand by...

from apriltag.

ginsi avatar ginsi commented on August 17, 2024

Yes, I have Python-OpenCV (cv2) and I am going to look at solvePnP (and report here the results); I can't use C function because I am very far to be fluent in C; anyway I had found homography_to_pose and tried (just to test my understanding) to translate this in Matlab (I have a very old version of the basic packet). A problem is that I could not find any solid way for compare my findings with reference results; I am trying to reach results comparable with the output of the (C++) program apriltags_demo: it gives x,y,z,yaw,pitch,roll of each tag but I could not find what is its reference system (I find it strange that apparently (from some practical tests) the pitch axis seems to be vertical and yaw horizontal!). I think that if you add that function to the wrapper, this will be very very useful to me and probably to many other, In the meanwhile I thank you very much for your attention and advices. I will keep my eye here :-)

from apriltag.

ginsi avatar ginsi commented on August 17, 2024

I have had a look at solvePnP, but this led me to think I must have seriously misunderstood something basilar: so please bear me while I try to sumarize:

  • my problem, end-to-end, is: I want a software that gets a picture containing one or more april tags and tells me were each tag is, and how it is oriented, with respect to a camera-fixed reference frame; this is something that apriltag_demo gives me (if I give it one focal length in pixels and the tag side in meters); but this has two drawback: (a: most important) being a demo it is not very suitable for direct integration in my software; (b: secondary) the orientation of the tags is expressed in euler angles, while having a rototranslation matrix instead would really be better (I could directly use this to compute some tag to tag transforms that for me is very useful); in my understanding that rototranslation matrix IS the pose matrix. Computing a rototranslation matrix from a translation and Euler angles is easy, if one knows which particular set of Euler angles have been selected
  • I found your python wrapper of the same library (it is the same, isn't it?) which seemed perfect for me, but...
  • ...I realized that your wrapper does not give a rototranslation/pose; it gives an homography (3x3)
  • I read that the homography is related to the pose matrix via the camera matrix; if the homography was 4x4, the equation H = CM * PM could be easily reversed by finding the (left) inverse of the CM: CMinv * H = CMinv*CM * PM => H = CMinv * PM; being the homography from the wrapper only 3x3, this probably means that we should either work entirely in 3x3 (is it possible?) or go to rectangular CM and PM and use a CM left inverse (is it possible?)
  • I was surpised that the solvePnP that you suggested to use goes back requiring "image points" and "object points" as calling parameters, that in my view were in an already solved part of the problem; this in particular led me to believe I am probably misunderstanding some very basilar things!
  • your other suggestion, that to me appears more in the right direction for my needs, is the homography_to_pose function; I am not able to understand all the what and way of the maths behind, but I think that, if necessary, I may be able to rewrite it in python (with numpy? OpenCV?) with a little help; in particular I do not understand what the line 335 does; as for line 332, I think it does an SVD decomposition, for which I could use cv2.SVDecomp(src) → w,u,vt, but what is (line 335) R = matd_op("M*M'", svd.U, svd.V)? perhaps the product between U and adjoint(V) (where adjoint == traspose_conj)?

from apriltag.

ginsi avatar ginsi commented on August 17, 2024

Wonderful!

from apriltag.

mzucker avatar mzucker commented on August 17, 2024

Hi, sorry to say that between getting back up to speed on the math and day job being a little more demanding than expected, I'm not as far along as I had hoped. Still expect to have something in the next few days, please stand by.

from apriltag.

ginsi avatar ginsi commented on August 17, 2024

No problem, of course... in the mean time I am trying to improve my knowledge on these things and the maths behind...

from apriltag.

ginsi avatar ginsi commented on August 17, 2024

I'm pretty sure this is all I needed and much more (but also very useful). Unfortunately (for me) my laptop has just died and I need a couple of days to recover before being capable to test. I will report as soon as I can. I the mean time I thank you for all this effort and for the added explanations.

from apriltag.

ginsi avatar ginsi commented on August 17, 2024

Recovery of laptop took a bit more than expected...
I have run the new version of apriltag.py as suggested at the end of the README.md and seen that now each Detection includes a Pose, which is exactly what I need. I think I succeded also in better understanding the involved units for the translation part (afaiu the first three elements in the 4th column of the pose are in the same unit used to express the side length of the black frames; saving this, the software does not really care what they are, can you confirm, please? (I guess that the example uses meters, so .127 m = 5")).
I imagine this is not the right place to ask about the meaning of Goodness, Decision Margin, Init Error, Final Error, but may be you could redirect me somewhere, for this?).
I thank you again for the assistance, and in general for making the wrapper available to the community.
I am not sure if the "netiquette" of issues foresees that now I close mine, therefore I will not do it immediately, can you please suggest me if I should? Cheers...

from apriltag.

mzucker avatar mzucker commented on August 17, 2024

Not 100% accurate to say that a Detection includes a pose, but rather you can compute a pose from a Detection.

As I began to type this reply up, I realized I had a bug in 1d5d313 that made the scales in the translation vectors off by a factor of two, which I then fixed in aaa47b4 (tested by calibrating my new iPhone camera and photographing a tag using a meter stick to separate the two).

Now that that is fixed, yes you are exactly correct about the units on pose. I believe the tags used for the mapping example were 5" = .127m, so now that I have fixed my bug, the units on the translation vectors (topmost three elements of pose matrix on right hand side) should be correctly expressed in meters relative to the camera frame (with X right, Y down, and Z pointing out the lens, origin at the camera center).

Goodness and Decision Margin are from the part of the code that I didn't write, so I will refer you to the original apriltag paper for details, but in short I believe "goodness" refers to the pixel-wise intensity contrast around the perimeter of the quad, whereas "decision margin" refers to the contrast within the quad itself. It's not clear to me that goodness is used much in the current code version (i.e. it appears to be zero all the time). In general, higher decision margin is better (i.e. means more contrast within the tag). Those two quantities have nothing to do with pose detection other than their correlation to the overall quality of the photo (better photos = pose detection).

Init Error and Final Error refer to the reprojection error associated with a given tag. Basically, there is a closed-form solution to use linear algebra to estimate pose from point correspondences alone, but it yields a biased estimate that is not the most accurate, especially if the locations of the quad corners are subject to lots of noise. You can then use an iterative optimization technique (I chose Levenberg-Marquardt) to refine this estimate. These reprojection errors are measured in pixels squared (i.e. they are the sum of squared distances, measured in pixel coordinates). In general we want these errors to be very low (1-2 pixels for small tags in the image, tens of pixels for big tags) relative to the tag size. We always expect the final error to be lower than the initial (i.e., the refinement process should do no harm).

Anecdotally, it seems like my new quad detection algorithm (run demos using the -c option) generally has better (lower) reprojection error than the old one, which is nice to see.

I'm going to leave this issue open for a little while longer and if you have any questions or problems in the next few weeks, just keep replying here. If you let me know everything is working great, or if I don't hear from you for a few weeks, I'll close the issue at a later date.

Glad to help!

from apriltag.

ginsi avatar ginsi commented on August 17, 2024

Hi!
After some fighting with Xubuntu installation on my new laptop (apparently they destroyed my capacity to avail of the DNS service surrogated by my router - so important on the LAN, but this is another story); I could eventually do some more testing about apriltags library and wrapper. I am really impressed; the accuracy of distance measurements in the world seems to better than 1 mm on 1m (I am not able to make better measurements); the inter-tags distance is also very good. I have not yet studied enough to know what accuracy figures can be expected, and how. I am working with a 5 Mpix Pi camera on raspberry, after having calibrated it to get the camera matrix and the distorsion parameters; so I feed the rectified image and the new matrix to the apriltag software); for the time being I have used apriltag.py as it is and not yet called the wrapper from my software, but a look at the apriltag.py code make me very very confident that everything will go like a charm.
Of course you are right saying that detection does not include pose! my first look at the code had been too fast!
I thank you so much also for the clarifications you have been so kind to give me (about errors etc).
Cheers

from apriltag.

mzucker avatar mzucker commented on August 17, 2024

Great, seems like you know your business with calibration and rectification, glad you're getting good results.

I was planning on adding pose information some day, your feature request just prompted me to go ahead and finally do it.

I'm closing this issue now – best of luck using the code.

from apriltag.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.