Giter VIP home page Giter VIP logo

Comments (15)

nileshkulkarni avatar nileshkulkarni commented on June 20, 2024

Hi,
Thank you for your interest in our work. Here are the answers to your questions.

  1. We parameterize the rigid transform for every part as an axis angle transformation. The axis of the part is a parameter in the network is not predicted per image. We predict a rotation angle (theta) corresponding to every part. So you can interpret the rigid transform as (x0, y0, z0) and an angle. Here are more details for it.
    We can convert this axis angle to a rotation matrix let's call it R_{0} corresponding to part 0.

  2. Given R_{0} as the rotation matrix for the zeroth part, we can transform all the vertices by doing p_new = R_{0} * p_old. See #1 (comment) for more details on to blend vertices to get the final mesh.

  3. We learn the angle by using ResNet to encode the image and predict the angle corresponding to each part of the mesh.

I hope this helps!

Best,
Nilesh

from acsm.

anslt avatar anslt commented on June 20, 2024

Thanks for your reply. I could easily understand the model from your answer!

Thus, the axis is fix for each category and we need to find the axis before the training.
Do I have a right understanding?

from acsm.

nileshkulkarni avatar nileshkulkarni commented on June 20, 2024

Yes, it is a learned parameter in the network we initialize it to be the y-axis and then it gets learned from there.

from acsm.

anslt avatar anslt commented on June 20, 2024

Hello Nilesh,
I would like to further ask for the translation in the part transform.
How to apply the translation to the part transformation?
It applies to different parts for different translations or apply the whole object with one translation?
If we apply the translation to each part, would it make the part leave the main body?

Furthermore, you suggest to apply regularization losses (entropy) on transformation.
Thus, we need a probability for 8 different transformation. How to obtain this probability?

Thank you for your reply.

from acsm.

nileshkulkarni avatar nileshkulkarni commented on June 20, 2024

There is translation prediction for every vertex. You can consider the transformation (R, T) and can be applied as
new_verts = R*verts + T

So the entropy regularization is applied by predicting a probability associated with the camera pose prediction. You can refer to this for more details. https://github.com/nileshkulkarni/csm/blob/848fa12039551de6c7ba796685f568a8bba65ab2/csm/nnutils/icn_net.py#L206-L230

Best,
Nilesh

from acsm.

anslt avatar anslt commented on June 20, 2024

Do we use a resnet to learn translation or it like axis to act as bias?
If there are some problem for learning translation, does one of the parts like leave the body?

from acsm.

nileshkulkarni avatar nileshkulkarni commented on June 20, 2024

We use renset to learn translation as every object might have a different? It doesn't happen that the parts leave the body a) the mask doesn't allow things to move in an arbitrary manner, and we also have an L2 regularization loss on the predicted translation which prevents it from predicting high values that might make the shape look unreasonable.

from acsm.

anslt avatar anslt commented on June 20, 2024

Thanks for your reply.
Thus, in my understanding, int the articulation part, you use resnet to learn translation and angle in the rotation?
If it has multiple cameras, you also need a probability with each articulation?

from acsm.

nileshkulkarni avatar nileshkulkarni commented on June 20, 2024

So the articulation prediction is a single prediction. There are no multiple predictions for it in our model. So if "horse" has 8 parts we will predict 8 angles and 8 translations.

Whereas the camera pose prediction is done by predicting multiple cameras, and hence this prediction has the probability term like the canonical surface mapping paper

from acsm.

anslt avatar anslt commented on June 20, 2024

Thanks for your reply, do you use the same ResNet for part transformation and camera prediction?
Also, since the norm axis = (x0, y0, z0) needs to be 1. How do you optimize the axis with this constrained condition?

from acsm.

nileshkulkarni avatar nileshkulkarni commented on June 20, 2024

yes we use the same ResNet for transformation and camera prediction.

This is my module that predicts the rotation as axis angle prediction.


class QuatPredictorSingleAxis(nn.Module):
    def __init__(self,nz_feat, nz_rot=2, ):
        super(QuatPredictorSingleAxis, self).__init__()
        self.pred_layer = nn.Linear(nz_feat, nz_rot)
        self.axis = nn.Parameter(torch.FloatTensor([1,0,0]))
    def forward(self, feat):
        vec = self.pred_layer.forward(feat)
        vec = torch.nn.functional.normalize(vec)
        angle = torch.atan2(vec[:,1], vec[:,0]).unsqueeze(-1)
        self.axis.data = torch.nn.functional.normalize(self.axis.unsqueeze(0)).squeeze(0).data
        axis = self.axis.unsqueeze(0).repeat(len(angle), 1)
        quat = axang2quat(angle, axis)
        return quat

def axang2quat(angle, axis):
    cangle = torch.cos(angle/2)
    sangle = torch.sin(angle/2)
    qw = cangle
    qx =  axis[...,None,0]*sangle
    qy = axis[...,None,1]*sangle
    qz = axis[...,None,2]*sangle
    quat = torch.cat([qw, qx, qy, qz], dim=-1)
    return quat

from acsm.

anslt avatar anslt commented on June 20, 2024

Thanks for your prompt answer!

I saw your answer in #1 for the last question about the rotation center.
Assume we rotate the body for axis = [0,1,0] and angle = pi / 2 with no translation in "horse".
Thus, except for the rotation center for the body part, the other point has changed.
Based on this, we want to next rotate the neck. The rotation center for the neck is changed due to the rotation for the body, right?

from acsm.

nileshkulkarni avatar nileshkulkarni commented on June 20, 2024

Hi @anslt,
I think I said something incorrectly earlier. We have 8 different transform predictors corresponding to each camera pose prediction.
#2 (comment). There are multiple predictions for transform from our model.

Answering your other question.
So the rotations are applied in a bottom-up fashion, where you first apply the next transformation and then you apply the body transformation, so the rotation center does not change. You apply the rotation for the children first and then for the parent.

from acsm.

anslt avatar anslt commented on June 20, 2024

Thanks, I have done this part.
The next question is about the weights of the loss for the translation for part transformation?

from acsm.

nileshkulkarni avatar nileshkulkarni commented on June 20, 2024

Hi @anslt,

Not sure what you mean, if you are referring to the lambda corresponding the translation regularization then in my case the value was 10.0.

Best,
Nilesh

from acsm.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.