Giter VIP home page Giter VIP logo

diad's Introduction

[AAAI 2024] DiAD

DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection

Haoyang He1#, Jiangning Zhang1,2#, Hongxu Chen1, Xuhai Chen1, Zhishan Li1, Xu Chen2, Yabiao Wang2, Chengjie Wang2, Lei Xie1*

(#Equal contribution, *Corresponding author)

1College of Control Science and Engineering, Zhejiang University, 2Youtu Lab, Tencent

[Paper] [Project Page]

Our DiAD will also be supported in ADer

Abstract

Reconstruction-based approaches have achieved remarkable outcomes in anomaly detection. The exceptional image reconstruction capabilities of recently popular diffusion models have sparked research efforts to utilize them for enhanced reconstruction of anomalous images. Nonetheless, these methods might face challenges related to the preservation of image categories and pixel-wise structural integrity in the more practical multi-class setting. To solve the above problems, we propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection, which consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion’s denoising network, and a feature-space pre-trained feature extractor. Firstly, The SG network is proposed for reconstructing anomalous regions while preserving the original image’s semantic information. Secondly, we introduce Spatial-aware Feature Fusion (SFF) block to maximize reconstruction accuracy when dealing with extensively reconstructed areas. Thirdly, the input and reconstructed images are processed by a pre-trained feature extractor to generate anomaly maps based on features extracted at different scales. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach which surpasses the state-of-the-art methods, e.g., achieving 96.8/52.6 and 97.2/99.0 (AUROC/AP) for localization and detection respectively on multi-class MVTec-AD dataset.

1. Installation

First create a new conda environment

conda env create -f environment.yaml
conda activate diad
pip3 install timm==0.8.15dev0 mmselfsup pandas transformers openpyxl imgaug numba numpy tensorboard fvcore accimage Ninja

2.Dataset

2.1 MVTec-AD

  • Create the MVTec-AD dataset directory. Download the MVTec-AD dataset from MVTec-AD. Unzip the file and move them to ./training/MVTec-AD/. The MVTec-AD dataset directory should be as follows.
|-- training
    |-- MVTec-AD
        |-- mvtec_anomaly_detection
            |-- bottle
                |-- ground_truth
                    |-- broken_large
                        |-- 000_mask.png
                    |-- broken_small
                        |-- 000_mask.png
                    |-- contamination
                        |-- 000_mask.png
                |-- test
                    |-- broken_large
                        |-- 000.png
                    |-- broken_small
                        |-- 000.png
                    |-- contamination
                        |-- 000.png
                    |-- good
                        |-- 000.png
                |-- train
                    |-- good
                        |-- 000.png
        |-- train.json
        |-- test.json

2.2 VisA

  • Create the VisA dataset directory. Download the VisA dataset from VisA_20220922.tar. Unzip the file and move them to ./training/VisA/. The VisA dataset directory should be as follows.
|-- training
    |-- VisA
        |-- visa
            |-- candle
                |-- Data
                    |-- Images
                        |-- Anomaly
                            |-- 000.JPG
                        |-- Normal
                            |-- 0000.JPG
                    |-- Masks
                        |--Anomaly 
                            |-- 000.png        
        |-- visa.csv

3. Finetune the Autoencoders

  • Finetune the Autoencoders first by downloading the pretrained Autoencoders from kl-f8.zip. Move it to ./models/autoencoders.ckpt. And finetune the model with running

python finetune_autoencoder.py

  • Once finished the finetuned model is under the folder ./lightning_logs/version_x/checkpoints/epoch=xxx-step=xxx.ckpt. Then move it to the folder with changed name ./models/mvtec_ae.ckpt. The same finetune process on VisA dataset.
  • If you use the given pretrained autoencoder model, you can go step 4 to build the model.
Autoencoder Pretrained Model
MVTec First Stage Autoencoder mvtecad_fs
VisA First Stage Autoencoder visa_fs

4. Build the model

  • We use the pre-trianed stable diffusion v1.5, the finetuned autoencoders and the Semantic-Guided Network to build the full needed model for training. The stable diffusion v1.5 could be downloaded from "v1-5-pruned.ckpt". Move it under the folder ./models/v1-5-pruned.ckpt. Then run the code to get the output model ./models/diad.ckpt.

python build_model.py

5. Train

  • Training the model by simply run

python train.py

  • Batch size, learning rate, data path, gpus, and resume path could be easily edited in train.py.

6. Test

The output of the saved checkpoint could be saved under ./val_ckpt/epoch=xxx-step=xxx.ckptFor evaluation and visualization, set the checkpoint path --resume_path and run the following code:

python test.py --resume_path ./val_ckpt/epoch=xxx-step=xxx.ckpt

The images are saved under `./log_image/, where

  • xxx-input.jpg is the input image.
  • xxx-reconstruction.jpg is the reconstructed image through autoencoder without diffusion model.
  • xxx-features.jpg is the feature map of the anomaly score.
  • xxx-samples.jpg is the reconstructed image through the autoencoder and diffusion model.
  • xxx-heatmap.png is the heatmap of the anomaly score.

Citation

If you find this code useful, don't forget to star the repo and cite the paper:

@misc{he2023diad,
      title={DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection},
      author={Haoyang He and Jiangning Zhang and Hongxu Chen and Xuhai Chen and Zhishan Li and Xu Chen and Yabiao Wang and Chengjie Wang and Lei Xie},
      year={2023},
      eprint={2312.06607},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

We thank the great works UniAD, LDM and ControlNet for providing assistance for our research.

diad's People

Contributors

lewandofskee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.