Giter VIP home page Giter VIP logo

sum's Introduction

Brand-Attention Module

Saliency Unification through Mamba for Visual Attention Modeling

Paper . Project Page



Visual attention modeling, important for interpreting and prioritizing visual stimuli, plays a significant role in applications such as marketing, multimedia, and robotics. Traditional saliency prediction models, especially those based on Convolutional Neural Networks (CNNs) or Transformers, achieve notable success by leveraging large-scale annotated datasets. However, the current state-of-the-art (SOTA) models that use Transformers are computationally expensive. Additionally, separate models are often required for each image type, lacking a unified approach. In this paper, we propose Saliency Unification through Mamba (SUM), a novel approach that integrates the efficient long-range dependency modeling of Mamba with U-Net to provide a unified model for diverse image types. Using a novel Conditional Visual State Space (C-VSS) block, SUM dynamically adapts to various image types, including natural scenes, web pages, and commercial imagery, ensuring universal applicability across different data types. Our comprehensive evaluations across five benchmarks demonstrate that SUM seamlessly adapts to different visual characteristics and consistently outperforms existing models. These results position SUM as a versatile and powerful tool for advancing visual attention modeling, offering a robust solution universally applicable across different types of visual content.

Alireza Hosseini, Amirhossein Kazerouni, Saeed Akhavan, Michael Brudno, Babak Taati

Brand-Attention Module

(a) Overview of SUM model, (b) demonstrates our conditional-U-Net-based model for saliency prediction, and (c) illustrates the proposed C-VSS module.

Installation

Ensure you have Python >= 3.10 installed on your system. Then, install the required libraries and dependencies.

Requirements

Install PyTorch and other necessary libraries:

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
  • If you encounter NVCC problems during installation, see: NVCC Issue.

Pre-trained Weights

Download the SUM model from the provided Google Drive link and move it to the specified directory:

Usage

Inference

To generate saliency maps, use the inference.py script. Here are the steps and commands:

python inference.py --img_path /path/to/your/image.jpg --condition [0, 1, 2, 3] --output_path /path/to/output --heat_map_type [HOT, Overlay]

Parameters:

  • --img_path: Path to the input image for which you want to generate the saliency map.
  • --condition: Condition index for generating the saliency map. Each number corresponds to a specific type of visual content:
    • 0: Natural scenes based on the Salicon dataset (Mouse data).
    • 1: Natural scenes (Eye-tracking data).
    • 2: E-Commercial images.
    • 3: User Interface (UI) images.
  • --output_path: Path to the folder where the output saliency map will be saved.
  • --heat_map_type: Type of heatmap to generate. Choose either HOT for a standalone heatmap or Overlay to overlay the heatmap on the original image.

Examples

Generate a standalone HOT heatmap for natural scenes images:

python inference.py --img_path input_image.jpg --condition 1 --output_path output_results --heat_map_type HOT

Overlay the heatmap on the original image for e-commerce images:

python inference.py --img_path input_image.jpg --condition 2 --output_path output_results --heat_map_type Overlay

Example Images

Input SUM
Original Image Saliency Map

Training

To train the model, first download the necessary pre-trained weights and datasets:

  1. Pretrained Encoder Weights: Download from VMamba GitHub or google drive and move the file to net/pre_trained_weights/vssmsmall_dp03_ckpt_epoch_238.pth.
  2. Datasets: Download the dataset of 7 different sets from the provided Google Drive link. This zip file contains 256x256 images of stimuli, saliency maps, fixation maps, and ID CSVs of datasets SALICON, MIT1003, CAT2000, SALECI, UEYE, and FIWI.

Run the training process:

python train.py
  • If you want to run the training in Google Colab to manage resource constraints better, reduce the batch size or use the alternative script:
python train_colab.py

Validation

For model validation on the dataset's validation set, download the dataset as mentioned above. then execute the validation script:

python validation.py

Acknowledgment

We would like to thank the authors and contributors of VMamba, VM-UNet, and TranSalNet for their open-sourced code which significantly aided this project.

Citation

@article{hosseini2024sum,
  title={SUM: Saliency Unification through Mamba for Visual Attention Modeling},
  author={Hosseini, Alireza and Kazerouni, Amirhossein and Akhavan, Saeed and Brudno, Michael and Taati, Babak},
  journal={arXiv preprint arXiv:2406.17815},
  year={2024}
}

sum's People

Contributors

arhosseini77 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.