ankurderia / mft Goto Github PK

Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

Jupyter Notebook 96.61% Python 3.39%

deep-learning hsi-classification multimodal-datasets multimodal-deep-learning remote-sensing transformer-models hyperspectral-image-classification

mft's Introduction

Multimodal Fusion Transformer for Remote Sensing Image Classification

Swalpa Kumar Roy, Ankur Deria, Danfeng Hong, Behnood Rasti, Antonio Plaza, and Jocelyn Chanussot

Sample Dataset

Get the disjoint dataset (Trento11x11 folder) from Google Drive.

Get the disjoint dataset (Houston11x11 folder) from Google Drive

Get the disjoint dataset (MUUFL11x11 folder) from Google Drive

The repository contains the implementations for Multimodal Fusion Transformer for Remote Sensing Image Classification.

Dataset

Trento AISA Eagle sensors were used to collect HSI data over rural regions in the south of Trento, Italy, where the Optech ALTM 3100EA sensors collected LiDAR data. There are 63 bands in each HSI with wavelength ranging from 0.42-0.99 μm, and 1 raster in the LiDAR data that provides elevation information. The spectral resolution is 9.2 nm, and the spatial resolution is 1 meters per pixel. The scene comprises 6 vegetation land-cover classes that are mutually exclusive and a pixel count of 600 × 166.
Muffle The MUUFL Gulfport scene was collected over the campus of the University of Southern Mississippi in November 2010 using the Reflective Optics System Imaging Spectrometer (ROSIS) sensor. There are 325 × 220 pixels with 72 spectral bands in the HSI of this dataset. The LiDAR image of this dataset contains elevation data of 2 rasters. The 8 initial and final bands were removed due to noise, giving a total of 64 bands. The data depicts 11 urban land-cover classes containing 53687 ground truth pixels.
Houston was acquired by the ITRES CASI-1500 sensor over the University of Houston campus, TX, USA, in June 2012. This data set was originally released by the 2013 IEEE GRSS data fusion contest2, and it has been widely applied for evaluating the performance of land cover classification. The original image is 349 × 1905 pixels recorded in 144 bands ranging from 0.364 to 1.046 μm.
Augsburg scene There are three types of data in Augsburg scene which include an HSI, a dual-Pol SAR image, and a DSM image. SAR data are collected from Sentinel-1 platform, while HS and DSM data are captured by DAS-EOC, DLR over the city of Augsburg, Germany. The collection is done by the HySpex sensor, the Sentinel-1 sensor, and the DLR-3 K system, respectively. The spatial resolutions of all images are down-sampled to a unified spatial resolution of 30 m ground sampling distance (GSD) for adequately managing the multimodal fusion. For the HSI, there are 332 × 485 pixels and 180 spectral bands ranging between 0.4-2.5 μm. The DSM image has a single band, whereas the SAR image has 4 bands. The four bands indicate VV intensity, VH intensity, the real component, and the imaginary component of the PolSAR covariance matrix’s off-diagonal element.

Models

The following traditional machine learning methods will be available:

The following deep learning methods will be available:

The following transformer based deep learning methods will be available:

Appreciation from Geoscience and Remote Sensing Society (GRSS)

Citation

Please kindly cite the papers if this code is useful and helpful for your research.

@article{roy2022multimodal,
  title={Multimodal Fusion Transformer for Remote Sensing Image Classification},
  author={Roy, Swalpa Kumar and Deria, Ankur and Hong, Danfeng and Rasti, Behnood and Plazza, Antonio and Chanussot, Jocelyn},
  journal={IEEE Transactions on Geoscience and Remote Sensing},
  volume = {61},
  year={2023},
  doi = {10.1109/TGRS.2023.3286826}
}

Non-Official Implementation

Thanks to Srinadh Reddy for the re-implementation of MFT paper (https://github.com/srinadh99/Transformer-Models-for-Multimodal-Remote-Sensing-Data)

mft's People

Contributors

Stargazers

Watchers

Forkers

hengxyz hg20220926 wangzhan000 llcurrs fiphoenix shiyang980713 fbasatemur omshinde rongtongxueya gischuck venonary chaof96 shujunyy123 83160574ckw kaesar633 sime-lab

mft's Issues

Preprocessing of data

PLz share the code for preprocessing of data, like how the data is normalized and seperated with class indexes? and How patches are created?
Also when the model output is predicted (as in classification map), is it on the same normalized data used in training?

Issue with Replicating Results

Hello! I am currently attempting to replicate your study published in the "Multimodal Fusion Transformer for Remote Sensing Image Classification". First, I would like to express my gratitude for your contribution to this field and for making your code available publicly. While replicating your work, I strictly followed the methodology described in your paper and used the open-source code. However, the results I obtained are somewhat different from those reported in your paper. Could you provide some details about the experimental setup, such as the version of Python and pytorch? Are there any hyperparameters that I might need to adjust to align with your experimental settings? In your opinion, what other factors might contribute to the discrepancy in replication results? Thank you once again for your work, and I look forward to your response.

Visualization problems?

I'd like to know how did you visualize the disjoint data.

patch 11x11 code script

Hello, thank you for your great work.
I would like to ask a question. How can I partition a mat image file into patches of size 11x11? Could you please provide the code script?

How to run the program on 2018 Houston and Lidar ?

Tried to use on 2018 data but getting shape error.

Can you help in converting 2018 IEEE data in 11x11 disjoint data??

Classfication map visualization

Thank you for your prompt response. Unfortunately, the solution provided in issue #7 does not align with the visualization results as in figure 9 of your transaction.
I would greatly appreciate it if you could provide your own code that generates similar results to those shown in figure 9. Your assistance with this matter would be invaluable, as I have been struggling and stuck for this visualization issue for several weeks.
My classification map result for your MFT look like this

Please send the code to my email address: [email protected].
It shall be greatest favor.

final results visualization

Can you tell me how the final visualisation was done, similar to the final visualisation in Figure 9, thank you very much! Can you provide me with your code, I would greatly appreciate it!

Augsburg hyperspectral data set

Hello, thank you for your work. I don't see the download link of the Augsburg dataset in your github, could you please provide it? It is very important to me

Not sure how to build the training set and the test set

I am very interested in your research, I want to test your model on some other hyperspectral data sets, but I encountered some problems on data preprocessing, when I get the original data of other datasets, don't know how to process, you can detailed and tell me about your data preprocessing scheme? For example, I got the data Houston2013, and I do not know how to use the original data to build the training set and the test set. Can you help me? I will greatly appreciate your help.

The original data like this.

Classification map generation with both HSI and Lidar Data provided to MFT model for Trento

Hi, I am really grateful for the prompt responses for last issues. I tried to generate classification map with your model provided for Trento data. The classification map looks like below. I am unbale to understand why it's not same like shown in your transaction. The fig below shows map of few classes only, that is also not correct. As the model is multimodal and needs two data for prediction: complete HSI (including background pixels) as well as lidar.
I hope you will help in this matter also.

Classiification map visualization issues

Hi, your work is awesome. i tried to run your code over disjoint datasets which worked great. The only thing is how to create classification maps visualization for these disjoint data sets like shown in your transaction? Please help it will be great favor. The link you sahred in other issues for guidance is for single modal but for Multimodal like yours please share code.

Original dataset of Trento and MUUFL

Hello, thanks for sharing, can you share with me your original dataset of Trento and MUUFL, I would be grateful.Because I only found the original dataset for Houston

If the final classification is not on a pixel level, how can we obtain a visual representation of the results?

Your job is nice, and I wanna know how to get the visualizing result ,