Comments (11)
yes, that's what I commented earlier:
could this be related to your path system? I noticed that your downloading of CIFAR-100 goes to ../data\cifar100\cifar-100-python.tar.gz, which uses both / and . You can change the data path in dataset_config.py, and the results path on main_incremental.py with the --results-path argument.
Seems like it might be an issue with Windows? The path seems to be wrong. But also seems like the system doesn't complain, so maybe it loads an empty dataset? Which you could check with this:
the dataset is empty? It seems that the vector of targets contains a single value instead of the LongTensor that is expected by the CE loss. You could check that by setting a break point on line 170 of incremental_learning.py. Check if the ´targets´ variable has a list of labels for the batch.
Could you try it and let me know what you get? Also, --network resnet34
should be --network resnet32
if you use small input datasets such as CIFAR-100.
from facil.
Yeah, in Linux it should work fine. Answering your questions:
Are the exemplars defined as the number of data used to retrain the model during rehearsal?
Yes, the exemplars are the number of data/images that will be used during rehearsal.
Are the exemplar size enforced during initial training for the encountered labels?
I'm not sure if I understand the question. The exemplars are selected from the training data of that task at the end of its training session.
For fixed memory, will the number of data per class depend on how many labels have been encountered or all the labels that will ever be encountered?
It depends on the labels encountered. The framework's main comparison strength is to enforce that the incremental learning is done without knowledge or access to future tasks/labels, as in a realistic scenario setting. In the case of fixed memory, you have a buffer of X images that is updated after learning each task. Since it is fixed, as more classes are learned, less exemplars per class are available.
How do you set the initial number of classes and step size between tasks?
short answer: Since most scenarios divide the number of classes equally among tasks, that is the default setting.
longer answer: The arg --nc-first-task
allows to define a larger first task. And providing a list of datasets allows for each of them to be learned one after the other. If you would want another partition of a dataset, you could either define them as separate datasets of the desired length, or by modifying the corresponding dataset code. I recommend the first, since it can be defined entirely into the dataset_config.py
and making use of the class_order
entry.
Could you elaborate of the role of grid search?
GridSearch was the name we gave it at the beginning, and later we adapted to the Continual Hyperparameter Search defined in "Class-incremental learning: survey and performance evaluation on image classification" and in "A continual learning survey: Defying forgetting in classification tasks". We plan on changing the naming since I agree that it is confusing. In short, it allows to choose the main hyperparameter related to stability-plasticity (aka intransigence-forgetting) at each task without knowledge of future tasks.
How can one set the scenario in which for cifar100 we start with 50 classes, have a step size of 10 and there is either fixed or growing memory, or there is access to the full data set during rehearsal (i.e. retraining)?
You would use --datasets cifar100
with --nc-first-task 50 --num-tasks 6
(instead of steps you define the number of tasks 50-10-10-10-10-10). For fixed memory you would use --num-exemplars X
and for growing memory --num-exemplars-per-class X
. To have access to all data, you can check the joint.py baseline.
from facil.
Thanks for your answers
from facil.
I am successfully able to run your code on WSL2 on Windows 11. Thanks!
from facil.
In case this is useful to anyone, I found the same error on windows and it was fixed by forcing the targets to be ".long()"
from facil.
There is no line 63 in finetuning.py
(see code), and there is no final_target
variable in the corresponding criterion()
function. I would need more insight into which modifications you have added to be able to guess where the error comes from.
from facil.
The two lines and 'final_target' were added for debugging purposing only.
I have set up my virtual environment and clone the repo again, and the error still persists. Please find the attached JPG.
from facil.
I just cloned the repository from scratch on a fresh machine and didn't get that error (output pasted below). So I´m not sure how to reproduce your error. Some possibilities that come to mind could be:
- the dataset is empty? It seems that the vector of targets contains a single value instead of the LongTensor that is expected by the CE loss. You could check that by setting a break point on line 170 of
incremental?learning.py
. Check if the ´targets´ variable has a list of labels for the batch. - could this be related to your path system? I noticed that your downloading of CIFAR-100 goes to
../data\cifar100\cifar-100-python.tar.gz
, which uses both/
and\
. You can change the data path indataset_config.py
, and the results path onmain_incremental.py
with the--results-path
argument. - if is something else, you could still force
targets
to be of the correct type by usingtargets.long()
?
(base) mmasana@XXX:~/libraries$ git clone https://github.com/mmasana/FACIL.git
Cloning into 'FACIL'...
remote: Enumerating objects: 101, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 101 (delta 0), reused 0 (delta 0), pack-reused 98
Receiving objects: 100% (101/101), 7.62 MiB | 14.58 MiB/s, done.
Resolving deltas: 100% (29/29), done.
(base) mmasana@XXX:~/libraries$ cd FACIL/
(base) mmasana@XXX:~/libraries/FACIL$ ls
docs environment.yml LICENSE README.md requirements.txt scripts src
(base) mmasana@XXX:~/libraries/FACIL$ python3 -u src/main_incremental.py
=========================================================
Arguments =
approach: finetuning
batch_size: 64
clipping: 10000
datasets: ['cifar100']
eval_on_train: False
exp_name: None
fix_bn: False
gpu: 0
gridsearch_tasks: -1
keep_existing_head: False
last_layer_analysis: False
log: ['disk']
lr: 0.1
lr_factor: 3
lr_min: 0.0001
lr_patience: 5
momentum: 0.0
multi_softmax: False
nc_first_task: None
nepochs: 200
network: resnet32
no_cudnn_deterministic: False
num_tasks: 4
num_workers: 4
pin_memory: False
pretrained: False
results_path: ../results
save_models: False
seed: 0
stop_at_task: 0
use_valid_only: False
warmup_lr_factor: 1.0
warmup_nepochs: 0
weight_decay: 0.0
==========================================================
Approach arguments =
all_outputs: False
==========================================================
Exemplars dataset arguments =
exemplar_selection: random
num_exemplars: 0
num_exemplars_per_class: 0
==========================================================
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ../data/cifar100/cifar-100-python.tar.gz
100%|#########9| 168468480/169001437 [00:16<00:00, 6700375.69it/s]Extracting ../data/cifar100/cifar-100-python.tar.gz to ../data/cifar100
Files already downloaded and verified
[(0, 25), (1, 25), (2, 25), (3, 25)]
************************************************************************************************************
Task 0
************************************************************************************************************
| Epoch 1, time= 6.3s | Train: skip eval | Valid: time= 0.5s loss=2.688, TAw acc= 19.4% | *
_(program continues on until completion)_
from facil.
I am getting the same error. All I did was:
python -u src/main_incremental.py --approach finetuning --network resnet34
I am running this on windows 10 powershell.
Output:
(LIGN_test) PS F:\dev\Projects\LIGN\.rug\FACIL> python -u src/main_incremental.py --approach finetuning --network resnet34
============================================================================================================
Arguments =
approach: finetuning
batch_size: 64
clipping: 10000
datasets: ['cifar100']
eval_on_train: False
exp_name: None
fix_bn: False
gpu: 0
gridsearch_tasks: -1
keep_existing_head: False
last_layer_analysis: False
log: ['disk']
lr: 0.1
lr_factor: 3
lr_min: 0.0001
lr_patience: 5
momentum: 0.0
multi_softmax: False
nc_first_task: None
nepochs: 200
network: resnet34
no_cudnn_deterministic: False
num_tasks: 4
num_workers: 4
pin_memory: False
pretrained: False
results_path: ../results
save_models: False
seed: 0
stop_at_task: 0
use_valid_only: False
warmup_lr_factor: 1.0
warmup_nepochs: 0
weight_decay: 0.0
============================================================================================================
Approach arguments =
all_outputs: False
============================================================================================================
Exemplars dataset arguments =
exemplar_selection: random
num_exemplars: 0
num_exemplars_per_class: 0
============================================================================================================
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ../data\cifar100\cifar-100-python.tar.gz
100.0%
Extracting ../data\cifar100\cifar-100-python.tar.gz to ../data\cifar100
Files already downloaded and verified
[(0, 25), (1, 25), (2, 25), (3, 25)]
************************************************************************************************************
Task 0
************************************************************************************************************
C:\Users\josue\anaconda3\envs\LIGN_test\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
File "src/main_incremental.py", line 316, in <module>
main()
File "src/main_incremental.py", line 264, in main
appr.train(t, trn_loader[t], val_loader[t])
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\incremental_learning.py", line 56, in train
self.train_loop(t, trn_loader, val_loader)
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\finetuning.py", line 52, in train_loop
super().train_loop(t, trn_loader, val_loader)
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\incremental_learning.py", line 111, in train_loop
self.train_epoch(t, trn_loader)
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\incremental_learning.py", line 171, in train_epoch
loss = self.criterion(t, outputs, targets.to(self.device))
File "F:\dev\Projects\LIGN\.rug\FACIL\src\approach\finetuning.py", line 61, in criterion
return torch.nn.functional.cross_entropy(outputs[t], targets - self.model.task_offset[t])
File "C:\Users\josue\anaconda3\envs\LIGN_test\lib\site-packages\torch\nn\functional.py", line 2824, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target' in call to _thnn_nll_loss_forward
from facil.
One difference I noticed between your output and mine is the following:
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ../data/cifar100/cifar-100-python.tar.gz
vs
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ../data\cifar100\cifar-100-python.tar.gz
File path format is different.
from facil.
Got it working in Linux with no issues. I will check in windows in the future and let you know.
Btw I had a few questions about your implementations:
- Are the exemplars defined as the number of data used to retrain the model during rehearsal?
- Are the exemplar size enforced during initial training for the encountered labels?
- For fixed memory, will the number of data per class depend on how many labels have been encountered or all the labels that will ever be encountered?
- How do you set the initial number of classes and step size between tasks?
- Could you elaborate of the role of grid search?
- How can one set the scenario in which for cifar100 we start with 50 classes, have a step size of 10 and there is either fixed or growing memory, or there is access to the full data set during rehearsal (i.e. retraining)?
Feel free to let me know if you would like me to elaborate on any of the questions
from facil.
Related Issues (20)
- TaskID HOT 1
- There is a bug when running the code HOT 4
- Imagenet100 HOT 4
- Unable to save models (--save-models: save trained models) HOT 2
- can not reproduce the similar result as paper HOT 2
- Adding a new dataset HOT 2
- Conversion of saved trained models (at checkpoints) HOT 2
- Limit the number of images per class HOT 2
- Varying the number of classes among the tasks HOT 2
- Trying to reproduce results in the paper HOT 1
- Upperbound results HOT 6
- accuracy HOT 1
- LwF CIFAR-100 (10/10) No exemplars accuracy HOT 1
- LwM - no gradient in attention distillation loss HOT 5
- Unable to match the accuracy results present in the ANCL paper using FACIL framework HOT 1
- UNABLE TO MATCH THE ACCURACY RESULTS FOR CIFAR100 DATASET FOR LWF APPROACH HOT 2
- HOW TO RUN THE FRAMEWORK ON IMAGENET DATASET? HOT 1
- Error coming while running on Imagenet Dataset HOT 1
- EEIL approach distillation loss
- GridSearch to find the accurate lambda value HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from facil.