Hello Kevin I really appreciate your work in metric learning from pytorch-metric l

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Use benchmark for new datasets about powerful-benchmarker HOT 10 CLOSED

kevinmusgrave commented on September 14, 2024

Use benchmark for new datasets

from powerful-benchmarker.

Comments (10)

KevinMusgrave commented on September 14, 2024

Thanks for your interest in the library. Here are the steps to use your folder based dataset:

Use torchvision.datasets.ImageFolder. This assumes images are organized into folders, where each folder is a class.
Write a wrapper class around the ImageFolder object:

from torch.utils.data import Dataset
from torchvision import datasets
import numpy as np

class YourDataset(Dataset):
    def __init__(self, root, transform=None, download=False):
        self.dataset = datasets.ImageFolder(root, transform=transform)

        # these look useless, but are required by powerful-benchmarker
        self.labels = np.array([b for (a, b) in self.dataset.imgs])
        self.transform = transform 
 
    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        return self.dataset[idx]

r = runner(**(args.__dict__))
from your_custom_modules import your_dataset
r.register("dataset", your_dataset)
r.run()

In the above, your_custom_modules should be a folder that contains an empty __init__.py file, as well as your_dataset.py, which contains your custom dataset class. You can have multiple dataset classes in there, and they will all get registered. See the documentation for details.

Add the following flags to the run command:

python run.py \
<all your other flags> \
--dataset~OVERRIDE~ {YourDataset: {root: /path/to/your/dataset}} \ 
--split_manager~APPLY~2 {data_and_label_getter_keys: null}

You can also make these changes directly in the config files. To do that, download the files to some location. (It's probably easiest to just download this whole repo, and then move the configs folder). Then set the --root_config_folder flag to that location, either at the command line, or in run.py. These will now be considered the default config files, so you can change them however you like to minimize the number of command line flags. For example, you can make YourDataset the default instead of CUB200, by changing default.yaml in the config_dataset folder.

Let me know if that helps!

from powerful-benchmarker.

gustmd0121 commented on September 14, 2024

This answered my question. Thank you very much!! :)

from powerful-benchmarker.

gustmd0121 commented on September 14, 2024

Hello, I was able to successfully train/validate according to your detailed instructions.
Unfortunately, I bumped into an error during testing using the --splits_to_eval [test] method instructed in the documentations.
Any help or input will be valuable. Thank you!

INFO:root:Getting split: Test50_50_Partitions4_1 / train / length 11900 / using train transform
INFO:root:Creating end_of_epoch_hook
INFO:root:Getting split: Test50_50_Partitions4_1 / test / length 16100 / using eval transform
Traceback (most recent call last):
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/api_parsers/base_api_parser.py", line 27, in run_train_or_eval
    self.run_for_each_split_scheme()
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/api_parsers/base_api_parser.py", line 50, in run_for_each_split_scheme
    self.eval()
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/api_parsers/base_api_parser.py", line 66, in eval
    self.setup_eval_and_run(load_best_model=True, use_input_embedder=False)
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/api_parsers/base_api_parser.py", line 70, in setup_eval_and_run
    eval_dict = self.get_eval_dict(load_best_model, untrained, untrained, use_input_embedder=use_input_embedder)
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/api_parsers/base_api_parser.py", line 148, in get_eval_dict
    best_epoch, _ = pml_cf.latest_version(self.model_folder, best=True)
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/pytorch_metric_learning/utils/common_functions.py", line 327, in latest_version
    resume_epoch = max(version)
ValueError: max() arg is an empty sequence

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 46, in <module>
    r.run()
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/runners/single_experiment_runner.py", line 18, in run
    return self.run_new_experiment_or_resume(self.YR)
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/runners/single_experiment_runner.py", line 33, in run_new_experiment_or_resume
    return self.start_experiment(args)
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/runners/single_experiment_runner.py", line 22, in start_experiment
    run_output = api_parser.run()
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/api_parsers/base_api_parser.py", line 17, in run
    return self.run_train_or_eval()
  File "/home/hyunseung/anaconda3/envs/dota/lib/python3.7/site-packages/powerful_benchmarker/api_parsers/base_api_parser.py", line 36, in run_train_or_eval
    raise ValueError
ValueError

from powerful-benchmarker.

KevinMusgrave commented on September 14, 2024

Can you go to <experiment_folder>/Test50_50_Partitions4_1/saved_models and see if there are any models saved there?

If yes, can you paste their names here or take a screenshot?

If no, can you check one of the other folders like <experiment_folder>/Test50_50_Partitions4_0/saved_models?

from powerful-benchmarker.

gustmd0121 commented on September 14, 2024

As you noted, I think the problem is that my 4_1 partition does not contain best files for some reason. I checked 4_0, 4_2, 4_3 and they all contain the required files.

from powerful-benchmarker.

KevinMusgrave commented on September 14, 2024

Hmm that's strange. Can you open Test50_50_Partitions4_1/saved_csvs/accuracies_normalized_compared_to_self_GlobalEmbeddingSpaceTester_level_0_VAL.csv and paste the contents here?

from powerful-benchmarker.

gustmd0121 commented on September 14, 2024

Sure this is what I got for partition 4_1

from powerful-benchmarker.

gustmd0121 commented on September 14, 2024

It's strange because I just train/val/tested with triplet loss using the same dataset and it worked fine.

from powerful-benchmarker.

KevinMusgrave commented on September 14, 2024

Ok I found the issue. The bug occurs when accuracy never improves over the initial model. (Epoch -1 is the initial trunk model, and epoch 0 is the initial trunk + embedder.) I've made a separate issue to address this #49.

In the meantime, just try to find hyperparameters that actually improve accuracy over the 0th model and testing should work. Alternatively, you can set check_untrained_accuracy to False, and that'll also work. But improving over the 0th model is probably what you want to do, and keeping the flag set to True allows you to check that.

from powerful-benchmarker.

gustmd0121 commented on September 14, 2024

Yes I will try to improve accuracy using different hyperparameters. Thanks again for everything!

from powerful-benchmarker.

Use benchmark for new datasets about powerful-benchmarker HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent