🎒 I am pursuing my PhD on the topic of visual perception and reasoning in the open world.
🔭 I’m recently focusing on scene graph generation 🕸, vision language models 🧠, and embodied AI 🤖️.
Benchmarking Generalized Out-of-Distribution Detection
License: MIT License
Hi,
This is more of a minor issue, but when using multiple GPUs if the experiment directory already exists the program will expect user input, but this triggers a EOF error:
Traceback (most recent call last):
File "main.py", line 36, in <module>
launch(
File "/home/jz288/OpenOOD/openood/utils/launch.py", line 69, in launch
mp.spawn(
File "/home/jz288/anaconda3/envs/openood/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/jz288/anaconda3/envs/openood/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/jz288/anaconda3/envs/openood/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/jz288/anaconda3/envs/openood/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/jz288/OpenOOD/openood/utils/launch.py", line 132, in _distributed_worker
main_func(*args)
File "/home/jz288/OpenOOD/main.py", line 24, in main
pipeline.run()
File "/home/jz288/OpenOOD/openood/pipelines/train_pipeline.py", line 16, in run
setup_logger(self.config)
File "/home/jz288/OpenOOD/openood/utils/logger.py", line 86, in setup_logger
ans = input('Exp dir already exists, merge it? (y/n)')
EOFError: EOF when reading a line
From here it seems that getting input()
to work with multiprocessing needs some extra efforts.
Hi,
Thanks for your great work. I'd like to run the script but I found the environment.yml cannot be directly installed in many GPU machines. The problem might be caused by the version of Cython.
Best Regards,
Hongxin
Hi,
Thanks for open-sourcing this project; I can see how many efforts are behind this. I got the following error when downloading the datasets:
Access denied with the following error:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
You may still be able to access the file from the browser:
https://drive.google.com/uc?id=1PGKheHUsf29leJPPGuXqzLBMwl8qMF8_
I could manually download these files by opening the link in the browser, but it's definitely sub-optimal. Is there anything wrong on my side or if there is someone else who also has this problem? Would appreciate any clarification/suggestion. Thanks!
Hi,
I'm wondering if there could be (or there already has) a versioning for OpenOOD? Let's say, for example, OpenOOD v0.1 includes N methods and v0.2 includes N+5 methods, where the 5 new methods improve the SOTA upon the N methods in v0.1. Then in this case having a versioning will make it more clear when comparing the results.
Best,
Jingyang
Dear authors, I was running the script to perform OOD detection on NIPS' 18 MDS method with Cifar10 (ID) and its corresponding OOD datasets. I found that at the first step when loading the pre-trained model on cifar10, the test accuracy on Cifar10 is not correct. I was using the resnet-18 with 94.3% test ACC yet but the log shows that the ACC is only 40.77%. The result is bad accordingly:
dataset | FPR@95 | AUROC | AUPR_IN | AUPR_OUT | CCR_4 | CCR_3 | CCR_2 | CCR_1 | ACC |
---|---|---|---|---|---|---|---|---|---|
cifar100 | 85.87 | 61.48 | 58.65 | 62.44 | 0.00 | 0.04 | 0.71 | 7.26 | 40.77 |
---------- | -------- | ------- | --------- | ---------- | ------- | ------- | ------- | ------- | ------- |
tin | 86.49 | 59.69 | 57.08 | 60.70 | 0.00 | 0.10 | 0.62 | 6.13 | 40.77 |
---------- | -------- | ------- | --------- | ---------- | ------- | ------- | ------- | ------- | ------- |
nearood | 86.18 | 60.58 | 57.87 | 61.57 | 0.00 | 0.07 | 0.67 | 6.69 | 40.77 |
---------- | -------- | ------- | --------- | ---------- | ------- | ------- | ------- | ------- | ------- |
mnist | 0.00 | 98.71 | 98.51 | 99.19 | 39.36 | 39.68 | 39.87 | 40.04 | 40.77 |
---------- | -------- | ------- | --------- | ---------- | ------- | ------- | ------- | ------- | ------- |
svhn | 94.62 | 65.20 | 50.56 | 80.11 | 0.47 | 2.47 | 6.89 | 15.17 | 40.77 |
---------- | -------- | ------- | --------- | ---------- | ------- | ------- | ------- | ------- | ------- |
texture | 64.41 | 78.05 | 83.66 | 71.90 | 0.22 | 0.92 | 4.59 | 16.86 | 40.77 |
---------- | -------- | ------- | --------- | ---------- | ------- | ------- | ------- | ------- | ------- |
place365 | 93.60 | 52.33 | 21.88 | 80.93 | 0.00 | 0.10 | 0.59 | 5.36 | 40.77 |
---------- | -------- | ------- | --------- | ---------- | ------- | ------- | ------- | ------- | ------- |
farood | 63.16 | 73.57 | 63.65 | 83.03 | 10.01 | 10.79 | 12.98 | 19.36 | 40.77 |
I test other OOD methods such as ICML' 22 KNN, ICLR' 18 ODIN, and NIPS' 20 EBO, which all show the correct testing accuracy ( 94.3%). Any suggestions and help would be greatly appreciated!
There are too many links which are blank, so we can see Page Not Found after clicking them. such as links in OOD Benchmark Table
. Hope you can fix it.
Hi, I tried to train Resnet-50 with SimCLR loss on Cifar10 from scratch. The scrip is as follows:
python main.py --config configs/datasets/cifar10/cifar10.yml configs/preprocessors/base_preprocessor.yml configs/networks/simclr.yml configs/pipelines/train/?.yml
I am wondering which trainer from openood/trainers/
should I choose?
Besides, I am wondering if there are pre-trained SimCLR ResNet models for Cifar-10/100 available. Thank you so much.
In the README, why is MOS under Open Set since it has OOD in the title?
Hi,thanks for your excellent work. When I tran the network on my gpus, I foud that it can only train on a single GPU. Do you know how to train on multi machines?
Dear @Jingkang50
Are you planning to include Robin dataset presented at ECCV 2022?
Thanks for sharing the great work. I may overlook something, but was wondering if you could clarify my understanding. According to the official CSI repository, there is an evaluation method 'baseline_marginalized' and it uses an ensembled score using rotation augmentation. I cannot find the relevant script in /evaluator. Am I missing something? Thanks
How to start executing this project via command line, can you give an example
Hello, first of all thank you very much for providing such an excellent code base.
But I personally encountered a problem in the process of running:
I used the per-model you provided, and tested it under the msp method, and the result of the first test was normal. But I did the same second test without any changes and this time I found the FPR above 90 and the AUROC very low. I also found this problem after changing the graphics card: the first test was fine, the second test was obviously wrong.
So I would like to ask whether some of the cuda operators written by yourself are used in your code base so that the code of test_ood can only be run once.
thank you very much.
We plan to write a short summary for every implemented method in our wiki space.
Please follow the template below and write the draft here.
Paper Title
Method Description
- Overview: A one-sentence summary
- Model Architecture: special design for network architectures
- Training: special design for training pipeline
- Inference: special design for inference pipeline
- Comments: say something more if any
Implementation (List all the related python files)
xxx.py:
the function of this python file.
Script
# how to run the code
Thanks for creating this very useful benchmark!
I think it could be even more valuable by including more real-world like datasets.
E.g. https://arxiv.org/abs/1911.11132 or https://arxiv.org/abs/2209.11960
Hi, sorry to bother you.
I encountered some errors. I would like to know why the config is not found
main.py: error: the following arguments are required: --config
scripts/0_basics/mnist_train.sh: 15: --config: not found
scripts/0_basics/mnist_train.sh: 16: --optimizer.num_epochs: not found
scripts/0_basics/mnist_train.sh: 17: --num_workers: not found
Watermarking (post-hoc, NeurIPS'22)
Paper: https://openreview.net/forum?id=6rhl2k1SUGs
Code: https://github.com/QizhouWang/watermarking
SHE (post-hoc, ICLR'23)
Paper: https://openreview.net/forum?id=KkazG4lgKL
Code: https://github.com/zjs975584714/SHE_ood_detection
CIDER (training, ICLR'23)
Paper: https://openreview.net/forum?id=aEFaE0W5pAd
Code: https://github.com/deeplearning-wisc/cider
NPOS (training, ICLR'23)
Paper: https://openreview.net/forum?id=JHklpEZqduQ
Code: https://github.com/deeplearning-wisc/npos
Hello, thanks a lot for developing this library!
It seems that there is an issue in the OoD configuration for CIFAR10: in configs/datasets/cifar10/cifar10_ood.yml, shouldn't the validation set point to cifar10 rather than cifar100? Thanks
I ran opengan's code and found poor results. Then, I checked the paper as well as Excel table you published and the results are the same. I read the code and found that it was missing two important parts.
I really appreciate your work.
Config is a class , but what is it used to do.
And what is the meaning of the code which is "config = [Config(path) for path in opt.config] "
Dear Authors, I have a follow-up about the "extremely good" result using the MDS method using large noise for input processing in #154. I was running on Cifar-10 as ID. I was testing different noises using the postprocessor_sweep
in configs/postprocessors/mds.yml
:
postprocessor:
name: mds
APS_mode: True
postprocessor_args:
noise: 0.0014
...
postprocessor_sweep:
noise_list: [0, 0.0005, 0.001, 0.0014, 0.002, 0.0024, 0.005, 0.01, 0.05, 0.1, 0.2,0.3]
The script surprisingly picks noise = 0.3, and claims that it has the best AUROC on validation datasets. This does not make sense as from my perspective 0.3 will no doubt distort any information in the original image. I expect all ID/OOD samples would have extremely high scores and are not distinguishable. But here is the log and result:
Performing inference on cifar10 dataset...
Starting automatic parameter search...
Hyperparam:[0], auroc:0.6337179444444444
Hyperparam:[0.0005], auroc:0.6298636666666666
Hyperparam:[0.001], auroc:0.625976388888889
Hyperparam:[0.0014], auroc:0.6228961111111111
Hyperparam:[0.002], auroc:0.6182671111111111
Hyperparam:[0.0024], auroc:0.6152154999999999
Hyperparam:[0.005], auroc:0.5956825
Hyperparam:[0.01], auroc:0.5598402222222223
Hyperparam:[0.05], auroc:0.3970671111111111
Hyperparam:[0.1], auroc:0.48044600000000004
Hyperparam:[0.2], auroc:0.8543432777777779
Hyperparam:[0.3], auroc:0.9787509999999999
Final hyperparam: 0.3
ood.csv outputs:
I also visualize the scores using histogram and boxplot:
I then think what happens is that only the ID samples are processed with large noise of 0.3, while OOD samples are only processed with small noise. But what causes this in the code? I think the issue is at the configs/postprocessors/mds.yml
. When we specify the default noise, postprocessor.postprocessor_args.noise
, it isn't overwritten for processing OOD samples even after the hyperparameter searching. Therefore, the code chooses 0.3 for ID, and 0.0014 for OOD for input perturbation in this case. To verify, I set postprocessor.postprocessor_args.noise = 0.3
, I expect all the ID/OOD scores to be pushed very high this time:
postprocessor:
name: mds
APS_mode: True
postprocessor_args:
noise: 0.3
...
postprocessor_sweep:
noise_list: [0.3]
The log gives:
Starting automatic parameter search...
Hyperparam:[0.3], auroc:0.500437
Final hyperparam: 0.3
And boxplots show that both ID/OOD scores are indistinguishable (except for MNIST):
But now I am confused. It looks like the ID dataset (Cifar-10) is not pushed toward zero as it was in the previous case, although the noise is the same. But the result kind of makes sense because scores are indistinguishable. It is hard for me to tell if the code is "correct" at this point. I am not sure if I set the config file correctly, but any feedback from you will be greatly appreciated. I am more than happy to provide more details if needed.
Hello,
There are several serious issues with the use of ViT.
Perhaps one of the easiest to fix is that currently, with the config file provided as an example, images are center-cropped to 384 (to fit the input size of ViT). However, the pre_size is left to default e.g. 256 for ImageNet or even less for other datasets.
This means that the preprocessor behavior will be to resize to 256, then pad to 384, leading to bad performances (~70% accuracy instead of ~84% expected). The config file should include : config.dataset.pre_size = 384 (or 400 for some slight center crop).
But a more serious issue is that the current implementation cannot possibly work because the current wrapper for ViT, ImageClassifierWithReturnFeature, does not handle the 'return_loss' kwargs of the parent class (ImageClassifier from mmcls). By default, 'return_loss' is set to True, which returns the loss, and must be set to False to get the output. The wrapper must either set return_loss to False, or directly compute the output using extract_feat() and head.layers(). Additionally, the wrapper does not yet support return_feature_list, which is required for certain methods (e.g. mahalanobis). This can be done by setting model.backbone.out_indices = range(len(self.backbone.layers))
Hello, first of all, thank you very much for the code you summarized, but I found a problem in the process of using the code. This question is about the fact that when I tested the nearood and farood data of ood in various ways, I found that the accuracy values of the data sets of ood did not change under each method. For example, I use "sh scripts/ood/msp/mnist_test_ood_msp.sh" test but found that the accuracy of each data set is 98.88.
I'm very strange about this, so really looking forward to your reply。
When I ran (imagenet_test_ood_mds.sh
),
python main.py \
--config configs/datasets/imagenet/imagenet.yml \
configs/datasets/imagenet/imagenet_ood.yml \
configs/networks/resnet50.yml \
configs/pipelines/test/test_ood.yml \
configs/preprocessors/base_preprocessor.yml \
configs/postprocessors/mds.yml \
--num_workers 4 \
--ood_dataset.image_size 256 \
--dataset.test.batch_size 256 \
--dataset.val.batch_size 256 \
--network.pretrained True \
--network.checkpoint 'results/checkpoints/imagenet_res50_acc76.10.pth' \
--merge_option merge
I received an error saying that
FileNotFoundError: [Errno 2] No such file or directory: './data/images_largescale/imagenet_1k/train/n02113978/n02113978_2340.JPEG'
I assume that I need to download the train set. If so, could you provide a link for me to download it?
Thanks.
Hi, there.
Many thanks for your excellent open-source repo.
I found out that the pretrained mnist models your provided is with an accuracy of 99.60% (which is different from the data you report).
So where to download the ckpt for mnist with an accuracy of 98.50%?
Hi,
The paper has results for ensemble-based methods like MCDropout and DeepEnsemble. These methods require re-training the classifier. However, the released checkpoints do not contain these checkpoints. Could you please provide checkpoints for ensemble-based methods?
By the way, does scripts/uncertainty/ensemble
indicate DeepEnsemble? If so, how to train the model used for testing good on cifar10?
Thank you.
Thanks for developing and sharing the OpenOOD. But the environment is to old for the source code. Such as torchvision 0.9.1, there is no torchvision.models.vision_transformer in this version. There are some other similar problems.
Hi,
In the paper the reported results of OE (both OOD performance and ID accuracy) are significantly WORSE than other methods, which goes against the observations of the OE paper and my experience/impression. After looking at the code, I identify two issues in OpenOOD's current implementation.
First, the outlier data and ID data are passed through the network in separate forward runs, while in OE's official implementation they are concatenated in a single batch and passed through the network together. The current implementation is likely to cause an unstable estimation of BN statistics due to the difference between ID and outlier data.
OpenOOD/openood/trainers/oe_trainer.py
Line 57 in 539cf43
Second, there seem to be something wrong with the SoftCrossEntropy loss defined here. Basically with this loss the model accuracy would be rather low, while simply replacing it with the provided loss in the official implementation will fix this issue.
Overall, once fixing the above two issues by following OE's official implementation, I can get these numbers on CIFAR-10:
acc | nearood | farood | |
---|---|---|---|
paper | not reported | 76.4 | 75.2 |
fixed | 94.89 | 93.51 | 95.40 |
The results after fixing the bugs make much more sense to me.
Hi,
When evaluating gradnorm/mls/vim, there will be a KeyError: 'APS_mode'
when executing ood_evaluator.py
. It seems that this is because in gradnorm/mls/vim.yml the APS_mode
field is simply missing.
Thanks for providing this repo!
I downloaded this repo and tried to run the following script, as indicated by the readme file:
sh scripts/basics/mnist/train_mnist.sh
However, I got both 11.35% accuracy on testing dataset and training loss is not decreasing!
ERROR:root:[Errno 2] No such file or directory: './data/images_largescale/imagenet_1k/train/n09421951/n09421951_14600.JPEG'
Hi, I only find the valid dataset of imagenet in Onedrive. What should I do?
Thanks for your great work.
I am trying to learn how does this project work. But I do not understand that whether this work uses CPU and GPU together or not.
If yes, I want to ask where does the code show the work used the GPU, Imean that people usually use the GPU by the exemple " loss_fn = loss_fn.cuda()" .
besides, if I want to use two pieces of Gpu of a machine, only I need to change the num_gpu=2?
thank you
Hi,
There is a "split" argument in the PixMixPreprocessor init function (it is never used actually).
OpenOOD/openood/preprocessors/pixmix_preprocessor.py
Lines 22 to 23 in 539cf43
However, when fetching the preprocessor, the "split" argument is not passed, and therefore there will be TypeError: __init__() missing 1 required positional argument: 'split'
. This error is likely to happen for other preprocessors as well.
OpenOOD/openood/preprocessors/utils.py
Lines 23 to 26 in 539cf43
Hi,
When I'm running the following provided CIFAR-10 script on one Quadro RTX 6000 24GB GPU,
CUDA_VISIBLE_DEVICES="7"
python main.py \
--config configs/datasets/cifar10/cifar10.yml \
configs/preprocessors/base_preprocessor.yml \
configs/networks/resnet18_32x32.yml \
configs/pipelines/train/baseline.yml \
I got CUDA out of memory error as follows which is pretty weird.
RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 23.65 GiB total capacity; 412.68 MiB already allocated; 8.56 MiB free; 442.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
There should be nothing wrong with my installed openood
conda environment as I could successfully run my own script in this environment. Then I have no idea what's going on here. Would appreciate some help!
Thanks
Hello,
Thank you for maintaining this OOD repo!
I tried to run FSOOD
but I got this FileNotFoundError
FileNotFoundError: [Errno 2] No such file or directory: './data/imglist/digits/val_notmnist.txt'
Could you check this issue or tell me how to handle this? I put all the data from this repo so I was able to run other ood tests.
Hi, @Jingkang50 thanks for your great work, it seems that no default preprocessor was given during testing.
PYTHONPATH='.':$PYTHONPATH \
python main.py \
--config configs/datasets/digits/mnist.yml \
configs/networks/lenet.yml \
configs/pipelines/test/test_acc.yml \
--dataset.image_size 28 \
--network.name lenet \
--network.checkpoint ./results/mnist_lenet_base_e100_lr0.1/best.ckpt \
--num_workers 4
# get the error:
# return dict.__getitem__(sub_cfg, sub_key)
# KeyError: 'preprocessor'
PYTHONPATH='.':$PYTHONPATH \
python main.py \
--config configs/datasets/digits/mnist.yml \
configs/networks/lenet.yml \
configs/pipelines/test/test_acc.yml \
configs/preprocessors/base_preprocessor.yml \ # fixed by add this preprocessor
--dataset.image_size 28 \
--network.name lenet \
--network.checkpoint ./results/mnist_lenet_base_e100_lr0.1/best_epoch87_acc0.9950.ckpt \
--num_workers 4
maybe we need to fix this bug in scripts
.
Hi,
I want to use default settings to replace command line arguments like
parser.add_argument('--config', dest='config', nargs='+', required=True, default = ['configs/datasets/digits/mnist.yml', 'configs/networks/lenet.yml','configs/pipelines/train/baseline.yml']) but an error occured.
How to deal with such problems? Thank you very much~
FileNotFoundError: [Errno 2] No such file or directory: 'configs/datasets/digits/mnist.yml configs/networks/lenet.yml configs/pipelines/train/baseline.yml
Thank you for the great work. I'm trying to train a model using MOS but the template requires the following pathimglist_pth: ./data/benchmark_imglist/cifar10/train_cifar10_mos.txt
. The following file doesn't seem to be present in the SharePoint. Is there a way to resolve this? Thanks for the help!
I found the dataloader of cider lacks the TwoCropTransform.
data = torch.cat([data[0], data[1]], dim=0).cuda()
leading to
data.shape= [batchSize, 3, 32, 32] ==> data.shape=[6, 32, 32]
In theory, it should be
data.shape= [2, batchSize, 3, 32, 32] ==> data.shape=[2 * batchSize, 3, 32, 32]
Is this an error in code writing?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.