dvlab-research / mood Goto Github PK

Official PyTorch implementation of MOOD series: (1) MOODv1: Rethinking Out-of-distributionDetection: Masked Image Modeling Is All You Need. (2) MOODv2: Masked Image Modeling for Out-of-Distribution Detection.

Python 100.00%

outlier-detection cvpr2023 ood-detection pytorch masked-image-modeling

mood's Introduction

MOOD

• 🤗 Model • 🐱 Code • 📃 MOODv1 • 📃 MOODv2

MOODv1: Rethinking Out-of-Distribution Detection: Masked Image Modeling is All You Need (CVPR2023)

The core of out-of-distribution (OOD) detection is to learn the in-distribution (ID) representation, which is distinguishable from OOD samples. Previous work applied recognition-based methods to learn the ID features, which tend to learn shortcuts instead of comprehensive representations. In this work, we find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly. We deeply explore the main contributors of OOD detection and find that reconstruction-based pretext tasks have the potential to provide a generally applicable and efficacious prior, which benefits the model in learning intrinsic data distributions of the ID dataset. Specifically, we take Masked Image Modeling as a pretext task for our OOD detection framework (MOOD). Without bells and whistles, MOOD outperforms previous SOTA of one-class OOD detection by 5.7%, multi-class OOD detection by 3.0%, and near-distribution OOD detection by 2.1%. It even defeats the 10-shot-per-class outlier exposure OOD detection, although we do not include any OOD samples for our detection.

MOODv2: Masked Image Modeling for Out-of-Distribution Detection

The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-distribution (ID) representation, distinct from OOD samples. While previous methods predominantly leaned on recognition-based techniques for this purpose, they often resulted in shortcut learning, lacking comprehensive representations. In our study, we conducted a comprehensive analysis, exploring distinct pretraining tasks and employing various OOD score functions. The results highlight that the feature representations pre-trained through reconstruction yield a notable enhancement and narrow the performance gap among various score functions. This suggests that even simple score functions can rival complex ones when leveraging reconstruction-based pretext tasks. Reconstruction-based pretext tasks adapt well to various score functions. As such, it holds promising potential for further expansion. Our OOD detection framework, MOODv2, employs the masked image modeling pretext task. Without bells and whistles, MOODv2 impressively enhances 14.30% AUROC to 95.68% on ImageNet and achieves 99.98% on CIFAR-10.

mood's People

Contributors

Stargazers

Watchers

Forkers

yoojlee sy00n dearborn-open-ai harshnandwana

mood's Issues

MoCov3 checkpoint

Thanks for your excellent work! I wonder if it is possible to release the ckpt of MoCov3 pre-trained on ImageNet-22k in Table-1? I saw that in the original MoCo paper, the checkpoints are obtained by pre-training on ImageNet-1k.

Supplementary / Appendix

Where can I find the appendix that is mentioned in the paper?

Problem with visualizer package

Hi, I try to run the model but getting an error related to visualizer package which is not inside requirement file. I install it using pip but getting this error:

ImportError: cannot import name 'get_local' from 'visualizer' (/opt/conda/lib/python3.8/site-packages/visualizer.py)

Is there any specific version of the visualizer need to be installed?

Thanks for you helps.

why there is no intermediate fine-tuning for one-class ImageNet-30?

In my view, there is no difference between one-class ImageNet-30 and one-class cifar10.

why intermediate fine-tuning is ok for one-class cifar10, but is not ok for one-class ImageNet-30?

The results of ID CIFAR100-> OOD CIFAR10

After Fine-tuning on the CIFAR100 with the model that are self-supervised pretrained and then intermediate fine-tuned on ImageNet-22k, I got the 87.169 AUROC for ID CIFAR100-> OOD CIFAR10， this is a significant difference from the 98.3 AUROC reported in the paper, How can I get normal results?
The command line I ran and the results are shown below:

command line:

OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=2 run_class_finetuning.py --model beit_base_patch16_224 --data_path /home/ubuntu/code/open-set/MOOD --data_set cifar100 --nb_classes 100 --disable_eval_during_finetuning --finetune /home/ubuntu/code/open-set/MOOD/beit_base_patch16_224_pt22k_ft22k.pth --output_dir logs_cifar100_test --batch_size 128 --lr 1.5e-3 --update_freq 1 --warmup_epochs 5 --epochs 90 --layer_decay 0.65 --drop_path 0.2 --weight_decay 0.05 --layer_scale_init_value 0.1 --clip_grad 3.0

results：

What is difference between "fine-tuning" and "intermediate fine-tuning" ?

Q1 ：Are they right?

[1] intermediate fine-tuning is working on ImageNet-21K?

[2] fine-tuning is working on one-class ID-dataset?

Q2：where are they related code?

the paper say:

Pre-train the Masked Image Modeling ViT on ImageNet-21k

Apply intermediate fine-tuning ViT on ImageNet-21k.

Apply fine-tuning of pre-trained ViT on the ID dataset.

the code contains:

a. Pretrained models
b. Fine-tuning on In-Distribution Dataset

In my view, 'a' is for '1' and 'b' is for '3', where is the code for '2' ?

cant find training script for MOODv2

Hey Team
i cant find the method to train or finetune the model also i wish to use this model as an api where if i pass a image the model can tell if it is an ood or not for that is there a script for prediction where i can get the knowledge of it is an outlier or not

Mahalanobis distance calculation matrix is not working and throwing these errors

python src/demo.py --img_path 1000_F_26800115_YlmErNLIVZeNZXPzUc3z4GAD1gGkVABu.jpg --cfg /app/mmpretrain/work_dirs/beitv2_beit-base-p16_8xb256-amp-coslr-300e_in1k/beitv2_beit-base-p16_8xb256-amp-coslr-300e_in1k.py --checkpoint pretrain/epoch_19.pth --fc outputs/fc.pkl --id_train_feature outputs/imagenet_train.pkl --id_val_feature outputs/imagenet_train.pkl --methods Mahalanobis
=> Loading model
/root/miniconda3/envs/open-mmlab/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1639180588308/work/aten/src/ATen/native/TensorShape.cpp:2157.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Loads checkpoint by local backend from path: pretrain/epoch_19.pth
=> Loading image
=> Extracting feature
Extracted Feature: (1, 768)
w.shape=(8192, 768), b.shape=(8192,)
image path: 1000_F_26800115_YlmErNLIVZeNZXPzUc3z4GAD1gGkVABu.jpg
=> Loading features
feature_id_train.shape=(91710, 768), feature_id_val.shape=(91710, 768)
=> Computing logits...
=> Computing softmax...
Computing classwise mean feature: 0%| | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "src/demo.py", line 305, in
main()
File "src/demo.py", line 263, in main
fs = feature_id_train[train_labels == i]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 91710 but corresponding boolean dimension is 200000

problems while running src/extract_feature_vit.py

if im using any command

python3 src/extract_feature_vit.py $IMAGENET_PATH --cfg configs/beit-base-p16_224px.py --checkpoint pretrain/epoch_19.pth --fc_save_path output/fc.pkl
python src/extract_feature_vit.py $IMAGENET_PATH --out_file outputs/imagenet_train.pkl --cfg configs/beit-base-p16_224px.py --checkpoint pretrain/beitv2-base.pth --img_list datalists/imagenet2012_train_random_200k.txt

im getting below given error

Traceback (most recent call last):
File "src/extract_feature_vit.py", line 12, in
from .list_dataset import ImageFilelist
ImportError: attempted relative import with no known parent package

unable to download the pretained model

I am unable to download the pre-trained model i am getting some error while clicking the link for beit-base-patch16-224-pt22k-ft22k