Giter VIP home page Giter VIP logo

Comments (5)

shyu4184 avatar shyu4184 commented on August 20, 2024 1

@lionlai1989

Sorry for late reply.
Author said that they generated a filter which has the intensity value of 1 at the shifted location and 0 for the rest of intensities. That is, if you shifted by (x=1, y=2), the ground truth filter has the intensity value of 1 at the location of (1, 2) and the rest pixels have zero values such as a shifted delta filter. In this work, as far as I understood, the ground truth and trained filters are regarded as a 2-dimensional convolution filter. And if the size of filter is k*k, it has the k^2 cases of translation (classes).

from deepsum.

shyu4184 avatar shyu4184 commented on August 20, 2024

@lionlai1989
Hello,
Did you solve this problem?
I'm implementing the RegNet using pytorch and my own dataset.
In my case, I translated the feature maps by the amount of randomly selected integer translation. In addition, I generated the ground truth filters as mentioned in the paper.
However, the obtained filters are so different compared with the ground truth one.

Would you share you approach?

from deepsum.

lionlai1989 avatar lionlai1989 commented on August 20, 2024

Hello @shyu4184
Right now I am using the Arthur's tensorflow code with my sentinel-2 (LR) and SPOT(HR) dataset. I tried to avoid modifying the code of network itself and instead feed my dataset into the training and testing process.

  1. So, in the network,py, there are four lines of code which I don't know where to get them from. All I have is input (LR) and ground truth(HR). For me, the mask is not necessary because all my images have no cloud. And I don't know what are other three input.
    #self.mask_y=tf.placeholder('float32',shape=[None,1,None,None,1],name='mask_y')
    #self.y_filters=tf.placeholder('float32',shape=[None,None,self.dyn_filter_size**2],name='y_filters')
    #self.fill_coeff=tf.placeholder(tf.float32,shape=[None,self.T_in,self.T_in,None,None,1],name='fill_coeff')
    #self.norm_baseline=tf.placeholder('float32',shape=[None,1],name='norm_baseline')
  1. Back to your questions, you said you translated the feature maps by the amount of randomly selected integer translation. Here, I think feature maps is the output of SISRNet, right? And here I don't understand why we can random shift the feature maps wrt the first one by a random integer amount of pixels. I mean this shifted pixel will be a ground truth when pre-train the network, How can we randomly shift them?

  2. And I don't understand how does the registered shift related to the one-shot softmax layer? I mean the value of shift is 1,2,3..., and how can 1,2,3 be transfer to different classes which is the number of k*K (filter size).

Sorry for bringing more questions. If you can help me to understand the implementation.

from deepsum.

shyu4184 avatar shyu4184 commented on August 20, 2024

@lionlai1989
Since I only try to implement the RegNet, I cannot answer the usage of four variables in the first question. Yeah, you're right. The feature maps are the output of SISRNet. The first one is a reference for the rest of feature maps. In addition, since the role of the RegNet is to align the misaligned super-resolved feature maps, we need to translate the feature maps to train the RegNet. Pytorch has a function to shift the feature map as torch.roll. I'm not sure, it maybe tf.roll in the tensorflow.

I also confused that how they generate the ground truth filter. As mentioned in the paper, they synthesized the ground truth filter as a convolution filter whose intensity value is 1 at the shifted pixel and the rest of the intensity value is 0 such as delta filter. Although I followed their explanation, but I'm not sure it's correctly implemented or not.

from deepsum.

lionlai1989 avatar lionlai1989 commented on August 20, 2024

@shyu4184
Thank you for your feedback. It feels so good to have someone to talk wrt this topic.
My question is how to generate the ground truth of RegNet when pretraining?

For example, if a image's shift is (x=1, y=2) pixels (paper said it has to be interger) wrt the reference image. Then what should the ground truth vector look like? And how is it related to the number of filter k*k? It is the part really confuse me.

from deepsum.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.