Giter VIP home page Giter VIP logo

Comments (4)

DotWang avatar DotWang commented on May 30, 2024

The IMP weights have been released in the ViTAE-Transformer repo, please move to https://github.com/ViTAE-Transformer/ViTAE-Transformer/tree/main/Image-Classification

from vitae-transformer-remote-sensing.

lauraset avatar lauraset commented on May 30, 2024

Hi, @DotWang. Thank you very much. I got it. I still have a question about the lower performance of RSP in semantic segmentation and change detection, compared to IMP. You mentioned two reasons, i.e., the dataset volume and the task granularity. However, intuitively, the data distribution of MillionAID is closer to these datasets (i.e., Postdam) used in remote senisng. Is the lower performance of RSP related to the heavy reliance of transformer on large amounts of samples? On the other hand, the task granularity exists in both RSP and IMP. So I feel that this reason may not explain the lower performance of RSP. However, it proved that RSP may be only effective in the classification task and can not generalize well to the segmentation task. Maybe there are some errors in my statement. So I want to know your opinion. Thank you very much.

from vitae-transformer-remote-sensing.

DotWang avatar DotWang commented on May 30, 2024

I first explain the task granularity. We conduct the IMP and RSP on four tasks: classification, detection, segmentation, and change detection (CD). The experiment results show that RSP performs better than IMP on the first two tasks, not only on the classification task. Intuitively, the granularity of classification and detection are separately in the scene and object-level, meaning the features that they require are close, which is convenient for the transferring of RSP weights. Segmentation is operated at pixel-level, compared with detection, it requires more detailed semantic information. From the task definition, CD may locate between detection and segmentation.

For the data volume, here, the volume does not only mean the image number, it also means the category number. Our pretraining dataset --- MillionAID only has 51 classes, far less than the Imagenet-IK. Limited categories decrease the dataset complexity, restricting the model performance. As you can see, our pretraining accuracies can reach 98%, that is almost impossible on the ImageNet-1k training. At this time, the model may not learn universal and detailed representations as the IMP. In my own opinion, the RSP may perform better than IMP when the pertaining dataset becomes more challenging.

I also notice that you mention the Potsdam dataset. For this dataset, the spectral differences may also affect the performance. Since we use IR-R-G channels, the gaps between evaluation and pretraining are larger than other RS dataset.

In summary:

  • Task granularity means the segmentation requires more detailed semantic information.

  • Current data volume makes the model not yet reach its potential.

  • Spectral differences extra deepen the domain gap for potsdam dataset.

from vitae-transformer-remote-sensing.

lauraset avatar lauraset commented on May 30, 2024

Hello, @DotWang. Thank you for your detailed explaination. I got it.

from vitae-transformer-remote-sensing.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.