Giter VIP home page Giter VIP logo

cdfsl-benchmark's Introduction

Cross-Domain Few-Shot Learning (CD-FSL) Benchmark

Website

LeaderBoard

Paper

Please cite the following paper in use of this evaluation framework: https://arxiv.org/pdf/1912.07200.pdf

@inproceedings{guo2020broader,
  title={A broader study of cross-domain few-shot learning},
  author={Guo, Yunhui and Codella, Noel C and Karlinsky, Leonid and Codella, James V and Smith, John R and Saenko, Kate and Rosing, Tajana and Feris, Rogerio},
  year={2020},
  organization={ECCV}
}

Introduction

The Cross-Domain Few-Shot Learning (CD-FSL) challenge benchmark includes data from the CropDiseases [1], EuroSAT [2], ISIC2018 [3-4], and ChestX [5] datasets, which covers plant disease images, satellite images, dermoscopic images of skin lesions, and X-ray images, respectively. The selected datasets reflect real-world use cases for few-shot learning since collecting enough examples from above domains is often difficult, expensive, or in some cases not possible. In addition, they demonstrate the following spectrum of readily quantifiable domain shifts from ImageNet: 1) CropDiseases images are most similar as they include perspective color images of natural elements, but are more specialized than anything available in ImageNet, 2) EuroSAT images are less similar as they have lost perspective distortion, but are still color images of natural scenes, 3) ISIC2018 images are even less similar as they have lost perspective distortion and no longer represent natural scenes, and 4) ChestX images are the most dissimilar as they have lost perspective distortion, all color, and do not represent natural scenes.

Datasets

The following datasets are used for evaluation in this challenge:

Source domain:

  • miniImageNet

The datasets below are used for relicating the results of the Multi-model Selection in the paper, for the challenge, only pre-trained model on miniImageNet is allowed

Target domains:

General information

  • No meta-learning in-domain

  • Only ImageNet based models or meta-learning allowed.

  • 5-way classification

  • n-shot, for varying n per dataset

  • 600 randomly selected few-shot 5-way trials up to 50-shot (scripts provided to generate the trials)

  • Average accuracy across all trials reported for evaluation.

  • For generating the trials for evaluation, please refer to finetune.py and the examples below

Specific Tasks:

EuroSAT

• Shots: n = {5, 20, 50}

ISIC2018

• Shots: n = {5, 20, 50}

Plant Disease

• Shots: n = {5, 20, 50}

ChestX-Ray8

• Shots: n = {5, 20, 50}

Unsupervised Track

An optional second track has been included in this challenge that allows the use of a subset of unlabeled images from each dataset for study of un/self/semi-supervised learning methods. For learning and evaluation within each dataset, the images listed in text files contained in the unsupervised-track subfolder specific to each dataset may be used for such learning methods. Please see the website for additional information.

Enviroment

Python 3.5.5

Pytorch 0.4.1

h5py 2.9.0

Steps

  1. Download the datasets for evaluation (EuroSAT, ISIC2018, Plant Disease, ChestX-Ray8) using the above links.

  2. Download miniImageNet using https://drive.google.com/file/d/1uxpnJ3Pmmwl-6779qiVJ5JpWwOGl48xt/view?usp=sharing

  3. Download CUB if multi-model selection is used.

    Change directory to ./filelists/CUB
    run source ./download_CUB.sh
  1. Change configuration file ./configs.py to reflect the correct paths to each dataset. Please see the existing example paths for information on which subfolders these paths should point to.

  2. Train base models on miniImageNet

    Standard supervised learning on miniImageNet

        python ./train.py --dataset miniImageNet --model ResNet10  --method baseline --train_aug

    Train meta-learning method (protonet) on miniImageNet

        python ./train.py --dataset miniImageNet --model ResNet10  --method protonet --n_shot 5 --train_aug
  3. Save features for evaluation (optional, if there is no need to adapt the features during testing)

    Save features for testing

        python save_features.py --model ResNet10 --method baseline --dataset CropDisease --n_shot 5 --train_aug
  4. Test with saved features (optional, if there is no need to adapt the features during testing)

        python test_with_saved_features.py --model ResNet10 --method baseline --dataset CropDisease --n_shot 5 --train_aug
  5. Test

    Finetune with frozen model backbone:

        python finetune.py --model ResNet10 --method baseline  --train_aug --n_shot 5 --freeze_backbone

    Finetune

        python finetune.py --model ResNet10 --method baseline  --train_aug --n_shot 5 

    Example output: 600 Test Acc = 49.91% +- 0.44%

  6. Test with Multi-model selection (make sure you have trained models on all the source domains (miniImageNet, CUB, Caltech256, CIFAR100, DTD))

    Test Multi-model selection without fine-tuning:

       python model_selection.py --model ResNet10 --method baseline  --train_aug --n_shot 5 

    Test Multi-model selection without fine-tuning:

      python model_selection.py --model ResNet10 --method baseline  --train_aug --n_shot 5 --fine_tune_all_models
  7. For testing your own methods, simply replace the function finetune() in finetune.py with your own method. Your method should at least have the following arguments,

    novel_loader: data loader for the corresponding dataset (EuroSAT, ISIC2018, Plant Disease, ChestX-Ray8)

    n_query: number of query images per class

    n_way: number of shots

    n_support: number of support images per class

References

[1] Sharada P Mohanty, David P Hughes, and Marcel Salathe. Using deep learning for image based plant disease detection. Frontiers in plant science, 7:1419, 2016

[2] Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 12(7):2217–2226, 2019.

[3] Philipp Tschandl, Cliff Rosendahl, and Harald Kittler. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 5:180161, 2018.

[4] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint. arXiv:1902.03368, 2019

[5] Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2097–2106, 2017

cdfsl-benchmark's People

Contributors

gyhui14 avatar lovian14 avatar stevemar avatar yunhuiguo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdfsl-benchmark's Issues

MAML implementation

Could you provide your MAML implementation or the repo you used to get the reported MAML results ? Thanks

Question about arxiv link

Hi,
Submitting the paper on arxiv takes a while to get the arxiv link. And our current state on arxiv is on hold. I'm afraid we won't be able to get the link by the deadline.
So if we don't get the arxiv link by the deadline, can we just upload the PDF of our paper on CMT3?

Can't get paper scores using code

I'm running this command that is provided in README to train a proto net model:

    python ./train.py --dataset miniImageNet --model ResNet10  --method protonet --n_shot 5 --train_aug

I'm getting these scores in test phase:

chestX CropDisease ISIC EuroSAT
22.77% +- 0.41% 61.43% +- 0.79 36.74% +- 0.53% 66.57% +- 0.72%

which are less than scores reported in paper:

chestX CropDisease ISIC EuroSAT
24.05% ± 1.01% 79.72% ± 0.67% 39.57% ± 0.57% 73.29% ± 0.71%

What setting should I use to get the scores close to the ones that reported in paper?

Some questions about this challenge

With such large domain differences (miniImageNet-->Chest-X), does domain generation make sense? Will there be such a big domain difference in the real world? In this case there seems to be no valuable information that can be transfered from the source domain to the target domain.

dataset version for ChestX

Could you please clarify what version of dataset ChestX did you test in your original paper? On that original CVPR 2017 paper, the author used ChestX-ray8, which contained 108,948 images. However, currently the data provided by the kaggle link is Chest-Xray 14, which is a modified version. Should we use ChestX-ray8 to reproduce your results?

Question about track 2: unlabelled data

For track 2, unlabelled data is available. Is it still allowed to use mini-ImageNet for training or can training only be performed on that unlabelled set?

Question about the arXiv paper

Hi @yunhuiguo,

sorry, I have one more question about the arXiv paper.

Since we can't finish our paper before your workshop deadline, so we will put it on arXiv as the rule required.
I would like to know is it available to submit the arXiv paper to other conferences? (if the conference accepts an arXiv submission).

We will extend the paper with extra experiments, but most of the content will be the same as the arXiv paper.

Thank you for your help.

Further adaptation of meta learning methods on the support set

Hi,

Thanks for sharing the code. I have two questions:

  1. Can finetune.py also finetune features learned by meta learning methods (e.g. ProtoNet)?

  2. I notice there is a command option --adaptation to adapt meta-learned features on the support set. And when I run:

        python test_with_saved_features.py --method protonet --dataset EuroSAT --n_shot 50 --train_aug --adaptation
    

I get 88.82% +- 0.46% which is much higher than what is reported in Table 1 ( 80.48% ± 0.57%). Further adaptation also brings improvement to ProtoNet on other datasets. Have you tried this option?

Thanks.

Leaderboard/ invitations

Hi,

Was just wondering if the leaderboard/ invitations will be sent out today, as it is 27 May? Thanks.

Where are the codes for the core part of Incremental Multi-model Selection ?

I think this is an interesting and inspiring work, but in your released code, I don't find the code for "Fine-tuning last-k", "Transductive fine-tuning" and "Transfer from Multiple Pretrained Models" as shown in your paper. I hope you can release these parts as soon. I also want to know the start date of the CVPR2020-VL3-Challenge. Looking forward to your reply!

RAM utilization

Hi,

I was wondering why the code seems to have the RAM blowing up (>25 Gb) in the data loading process. Do we need to force the code to utilise disk space as RAM if our RAM size limit is 25 Gb?

Thanks.

Questions about Cross-Domain Few-Shot Learning (CD-FSL) Challenge

Hallo, I see in your paper, the results of approach based Transfer learning is obviously better than the results of methods based Meta learning. I wonder if the challenge will evaluate the results of the two kind of methods separately.
And could we use a different backbone networks, such as ResNet18?

Question about the pre-trained data?

Is the model pre-trained on the entire miniImageNet (100 classes) or on the meta-training set (64 base classes)?
Besides, why the parameter --num_classes is 200 in io_utils.py?
Thank you very much!

Is each query image independent or not?

Hi @yunhuiguo,

Is it allowed to have any operation between query images if we make sure there is no assumption about their class? (distance calculation etc.)
Or should we keep the query images being independent too?

If we can have those operations, is it allowed to check the predicted class of query images from our model? (e.g. when predicting the second query image, can we check the predicted class of the first images? we won't assume the number of the query images predicted to each class should be same.)

Thank you!

Use of Ensemble Boosting

Hi,

I was wondering if we are allowed to use ensemble boosting by training multiple neural networks and reporting the results of the ensemble? Thanks.

Some question about CDFSL

Hello, About this challege, 15 query samples of each category can be used for unsupervised training in each episode??

Some question about this project.

I have some question about this challenge:

  1. In the setting that we use miniImageNet as the source domain and use CropDiseases, EuroSAT, ISIC2018 and ChestX as the target domains, if we use the base classes (64) of the miniImageNet as the training classes, which classes do we use as the validation classes and do we use all classes in the target domain as test classes?

  2. As we can see, ImageNet data (155G) needs to be downloaded to construct the miniImageNet dataset. So can we use a simpler way, just like in CrossDomainFewShot.

  1. Can you provide data processing scripts for the four datasets: CropDiseases, EuroSAT, ISIC2018 and ChestX? I only saw the ones of miniImageNet and CUB datasets?

Implicit use of unlabelled information in Transductive fine-tuning

Hi,

I was just wondering if there was an implicit use of unlabelled data in the process of transductive fine-tuning in the baseline paper. This is because transductive fine-tuning uses the batch norm statistics to fine-tune, and thus we are not predicting each query image in the batch independently of other query images in the batch.

This is an issue, because based on the above explanation, transduction fine-tuning should only be allowed for Track 2 and not for Track 1 results? An alternative is for the transductive fine-tuning to only use the statistics of the support images - then this should be permitted in Track 1. May I kindly ask which version this paper has implemented?

Question about Meta Learning architecture that used in the paper.

Hello, I saw that you use Relation Net as one of the comparison models for your method.

In the paper you use ResNet10 for the backbone of Relation Net encoder, however there's no information about relation module.

Can I get the configuration of the relation module that you used for your experiment?

Modification of the evaluation code

Hi,

Is it alright to make slight modifications of the evaluation code to make it easier to interface with the models we have? We will make sure that the evaluation protocol is unchanged (which you can verify by checking the code).

GPUs for 5-way 50-shot

Hi, could you please tell us how many GPUs we need to train a few-shot model (Res-12+any classifier) in the setting of 5-way 50-shot. Also, is it necessary? I mean, 50 samples may not be considered as "few-shot"

Questions about the rules

Hi, thanks for organizing the challenge!!

We have several questions about the rules:

  1. Are we allowed to use ImageNet validation or test set?
  2. We should only use ResNet-10 as the backbone?
  3. Do we need to respect the preprocessing in the provided code? especially the resolution of images? (many academic papers working on lower resolution)

Thanks and looking forward to your reply.

Questions about submitting files

we used some of the same technology in the two technical files PDFs submitted by track1 and track2. Can the two PDFs use the same or similar descriptions(or Figure) in the public part? and Can I submit track1 and track2 documents in the same submission ?

Saved model for all the experiments

Hi, could you provide the links to the saved models: MatchingNet, MatchingNet+FWT, MAML, ProtoNet, ProtoNet+FWT, RelationNet, RelationNet+FWT, MetaOpt for the cross-domain experiments in Table 1.

Evaluation episodes

On the challenge website it states: "For each evaluation, the same 600 randomly sampled few-shot episodes should be used for consistency, ... . Participants must employ the supplied code in order to conduct their evaluations." (https://www.learning-with-limited-labels.com/challenge#h.p_h74e3HJnMMqn)

As far as I can see, the the supplied code in finetune.py, does not guarantee, that always the same 600 episodes will be sampled. The only control over randomness is line 164 np.random.seed(10), but this does not seem to be enough.

Could you please clarify, if one should use this exact code or if one should instead take additional measures to guarantee, that always the same 600 episodes will be sampled?

Thank you for your help!

Easier way to setup Chest X-rays?

The current way requires to downloading the entire NIH Chest X-ray Dataset of 42GB...This will not be easy in China. o(╥﹏╥)o
We can infer from the "./datasets/Chest?_few_shot.py" file that not all the images in this dataset need to be used, so can you provide a simplified version of the dataset and annotations with a google drive link?

Implicit transduction learning

I would like to check implicit transduction learning is allowed to used in Track 1 or not?
We followed the instructions in your paper and other issues.
For the current setting, we use implicit transduction learning in our result for Track 1.

#2
#8

But, I just noticed that in issue 22, you mention that the query samples are not predicted independently if applying implicit transduction learning.

So it's forbidden in Track 1?
We may not have enough time to change the result now...

Question about cdfsl challenge

  1. Can we use own processing of datasets, i.e., normalizing the size of all picture in a dataset and packaging it into a large tensor for storage.

  2. Can Transductive Inference [1] be used in track 1? Or is implicit transduction learning (i.e., MAML uses batchnorm to share information in test inference) OK.

[1] Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning, ICCV2019

Question about the track submission

Hello, we would like to check if we can submit for both Track 1 and Track 2 without using unlabeled data or we are only supposed to submit for Track 1 ?

unable to reproduce the result in the paper

Hi, I want to reproduce the result in Table 1 in your paper with Protonet provided in your code. But I am not able to reproduce the result. The 5-way 5-shot classification accuracy I can get is 69.23 +- 0.75, which is also 10% lower than reported. Any suggestion?

ISIC 2018 dataset Links

It seems that the direct link for the ISIC 2018 dataset is not working. Could you update the working link ?

Questions about the Competition

Hi @yunhuiguo .

Thank you for hosting this competition, it's really interesting and challenging!
After carefully reading the rules on your website and in your paper, I still have some questions, please correct me if I misunderstood.

  1. Is it allowed to use the paper submitted to your workshop as the paper required in this competition?
  2. Is it allowed to modify those hyper-parameters for training? (e.g. epoch, n_query)
    Intuitively, training a module by the normal way will have a better performance than by the meta-train way in current setting. (baseline will see more images)

In one epoch:
baseline will see: 16 (batch size) * 2400 (dataloader length) = 38400 images
protonet will see: (5+16)(n_support + n_query) * 5 (n_way) * 100 (dataloader length) = 10500 images

  1. During testing (in finetune.py), it seems only concat the trained backbone with a learnable classifier.
    Other mechanisms in the module will not affect, right? (e.g. calculate the prototype in PN, relation module in RN).
    Or we can modify the code, let other mechanisms also work during testing?

Looking forward to your reply, thank you so much.

Other Models listed in paper

Hello, I've noticed that there are only code of Baseline and ProtoNet. Could you please update the code of other models used in the experiment?

Question regrads to the rule

Hi, I have a question regards to the competition rule.

For the transductive setting, do we allow to use intra-class relationships for data? For example, if we have a similarity matrix (shape 100 * 100 in 5way-5shot-15query setting, 5*(5+15) ) and we want to initialize it, can we assume the entry for queries from the same class to have similarity 1? That is, we don't know the current query belongs to which class, but we do know it belongs to the same class with some other query.

Thanks!

Why was MAML exclused from results?

The paper says that it failed to operate on the entire benchmark and I was curious what that meant and why.

Thanks in advance for the response and thanks for sharing this awesome benchmark for meta-learning and AGI.

Refactoring dataset module

Thank you for sharing the code for the CDFSL benchmark.

I'm currently working on a research project based on the code you've provided, but I noticed that there is a lot of duplicate code in the data and datasets folders. This makes it harder to modify the code, e.g., to customize the transformations across all the datasets.

Would you consider accepting a PR if I work on some refactorizations to remove these duplicates?

Thanks.

Question regrads to enviroment

Do we have to use PyTorch and python version as below for the competition?

Python 3.5.5

Pytorch 0.4.1

h5py 2.9.0

Thanks

Question regards to the training set (mini-imagenet train)

Hi, I visualized some images in the aircraft carrier category and compared them with the original mini-ImageNet dataset provided by matching network in this website. The image id I checked is n02687172_5357.JPEG (from dataset of this competition), which could not be found on the txt file. Then I checked the category name, there are only 4 categories matched from the competition dataset to the list of txt file provided by the matching network. Since the training set is chosen 64 categories from 100 categories of ImageNet, the number of the matched category is unbelievable. I wonder do you use the subset of ImageNet2012 or modified something else?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.