brjathu / itaml Goto Github PK

View Code? Open in Web Editor NEW

94.0 94.0 16.0 6.92 MB

Official implementation of "iTAML : An Incremental Task-Agnostic Meta-learning Approach". CVPR 2020

Python 94.76% Jupyter Notebook 5.24%

itaml's People

Contributors

Stargazers

Watchers

Forkers

mukhery nikhilroxtomar joaanna mmderakhshani skrighyz junayed zhjpqq zhen-zohn-wang yuanwanglll lalapo chillingdream joeljosephjin seongwoongcho ivyhuang-25 rtfant

itaml's Issues

Something strange about the update of theta and psi in the inner loop

Hi Jathushan，

Thanks for your awesome work.
Though I have a question in your paper. In Page3, in the paragraph whose name is Inner loop, you illustrate that Here, theta is updated in the inner loop for all tasks, but psi_i is only updated for i_th task. You have motioned in the above that you take apart the model into two parts where theta corresponds to the part which can get the feature vector v and meantime, psi corresponds to the part which can get the predictions p. But in my opinion, for a model, the two parts are organized unitedly and when we train the model, the backward of this two parts is simultaneous. So i am confused how you manage the separate update because according to the Algorithm 1 in Page 3 and the train function in the code, i can't catch the point.
Wonder if i miss something again and could you explain that for me pls?

Best.
C

Something strange in the Algorithm 1..

Hi Jathushan，

Thanks for your awesome work.
Though I have a quesion in your paper. In Page3 Algorithm 1, line 3, we should do 'e iteration' on task '[1,t]' to update the model parameters.
It confused me that under the setting of life long learning or we say incremental learning, the old task data except that were saved in Memory before is forbidden to use again once we train the model on the new task. e.g. When task k was about to train to update the parameters, the data from task 1 to task k-1 should never be used again except exemplar that was stored in Memory before.
If we could do iteration on the old tasks or old data other than exemplar data, the training method should belong to Joint Training but not Life long learning anymore.
Wonder if i miss something and could you explain that for me pls?

Best.
C

Nan Loss during training MNIST dataset

Hi. First of all, I have to appreciate regarding your great implementation.

During training on MNIST dataset using the following command, I found nan:

Could you please explain to me am I doing something wrong? I just ran following command:

python train_mnist.py 0

about adding parameters

I am curious about how you get the sum of the parameters of networks of each task. Did you apply any normalization?

Why BCE is used instead of CE with Softmax?

Each task seems to be a multi-class classification, so why not using nn.CrossEntropyLoss?

Training script for ImageNet-100

Hi @brjathu, thanks for sharing your work! In Table 2 you provide the experiment for ImageNet-100 and ImageNet-1000. However, I can not find details about how you construct the ImageNet-100 dataset for training and also the training script like train_imagenet.py in the Repo. Here are two questions:

Could you provide the training script for the ImageNet dataset (both for ImageNet-100 and ImageNet-1000) and also the details about how you construct the ImageNet-100 dataset?
Is there any difference between ImageNet-100 in your paper and the popular MiniImageNet as mentioned in Few-Shot Class-Incremental Learning (CVPR 2020)?

Fair comparison

I got one question, I want to ask if I understood it correctly.

During the task inference, you need to have a continuum assuming they belongs to the same class. It makes comparison with other methods unfair, is it correct? What is your opinion?

Code for other methods

The paper is awesome. By any chance will you be able to provide the implementations of other methods you used for benchmarking purposes?

Running without CUDA

Thanks for the code and the amazing job you did to share the requirements and dependencies to facilitate running the model for community. I was able to run the code on a GPU-enabled Ubuntu 18.04. but could not run it without using GPU. I was wondering if we can run the model without using GPU.

Thanks

What is the difference between two output of model? -->outputs2, outputs = model(inputs)

Thanks for open source your great work, I'm confused when read the source code, the model has two same output x1 and x2.

    x1 = self.fc(x)
    x2 = self.fc(x)
    return x1, x2

In Learner, some time use x1, some times use x2 or both, like

outputs2, outputs = model(inputs)
outputs2, _ = model(inputs)
_, outputs = meta_model(inputs)

What is the purpose of such a design?

A question about theta and task-specific phi

Dear author:
I have read your paper and feel interest about your task-agnostic setting, I see PHI = {theta, phi} where phi =[phi_1^T phi_2^T ...]^T and PHI_i = {theta, phi_i} , but when I read your code I could not find out (maybe I don`t know) where to separate the parameter phi to phi_1 and phi_2 to create PHI_i for training task-specific model in the inner loop.

sys.argv[1] is out of range

Hello,

Thanks for the amazing paper and your contribution to share your code.

I have an issue and would appreciate your help to solve it:
When running 'train_cifar' code, I face "out of index error" for line 97:
start_sess=int(sys.argv[1])
when checking len(sys.argv) = 1, that is why sys.argv[1] is out of index
I changed the code in that same line to to sys.arg[0] and faced another error:
"invalid literal for int() with base 10: '/codepath/train_cifar.py"

I have been stumped in solving this error and would appreciate your suggestion.
I am using Ubuntu 18.04 - nvidia gpu
Thanks you

AttributeError: type object 'args' has no attribute 'overflow'

Hello,someone.
When i run python3 train_cifar.py 0 ,it raises an error:’AttributeError: type object 'args' has no attribute 'overflow'
Thanks someone for helping me.

about adding classes

The function 'add_classes' in basic_net.py is not called in the code, so is the number of final classification nodes(10 or 100) of the network fixed? Is there any other place in the code where the number of nodes(classifier) increases？Thanks!

Where to check task accuracy and class accuracy?

Fig 7 shows interesting performance of both task accuracy and class accuracy, I am wondering where to check those numbers. And where to decide the size of data continuum? Sorry I couldn't find corresponding part in the code. I might missed something. Thanks.

Obviously catastrophic forgetting

Thanks for your work, i think your work is a milestone work in lifelong machine learning.
But i run your code, the program does not seem to be running correctly.

And the log file is here:
session_0

session_1

session_2

session_3

session_4

The only place I modify the program is start_sess = 0
So why is the program like this？

a few question about the implementation

Hello Jathushan Rajasegaran, thanks for your nice work. I have some questions about the implementation of the model in the meta train and meta test parts.

In the paper, this model consists of a meta-training process and an inference process. The inference process consists of task prediction and class prediction. In the code, is the meta_test() function means the inference process?

And the function meta_test() consists of three parts:

Meta -training
meta test with task knowledge
meta test without task knowledge.

I guess the meta training in the meta_test() is the adaptation process mentioned in the inference process. And the meta test without task knowledge is the task prediction. But why does the task prediction happens after the adaptation in the code? It seems that you directly use the task_id to do the adaptation and class prediction.

train_mnist loss go to nan at Sess 3

command : python train_mnist 0 (start sess 0)

In my case, training goes well until Sess 2, Sess 2 best accuracy was 81.06450013224115

And here is my logs for Sess 3

[2, 2, 2, 2, 2]
{'min_class': 6, 'max_class': 8, 'task': 3, 'max_task': 5, 'n_train_data': 14181, 'n_test_data': 8017}
{0: 2115, 1: 2042, 2: 1874, 3: 1986}

Epoch: [1 | 20] LR: 0.100000 Sess: 3
Processing |################################| (56/56) | Total: 0:00:03 | Loss: 0.0897 | top1: 52.8383 | top5: 52.8383
Processing |################################| (32/32) Total: 0:00:08 | Loss: 1.9010 | top1: 60.0225 | top1_task: 60.0225
50.638297872340424
46.86581782566112
83.51120597652081
61.37965760322256
{1: 993, 0: 78, 2: 492, 3: 465, 5: 777, 4: 788, 6: 298, 7: 921}

Epoch: [2 | 20] LR: 0.100000 Sess: 3
Processing |################################| (56/56) | Total: 0:00:03 | Loss: 0.0334 | top1: 57.3866 | top5: 57.3866
Processing |################################| (32/32) Total: 0:00:08 | Loss: 1.7808 | top1: 72.2714 | top1_task: 72.2714
74.18439716312056
67.58080313418218
76.46744930629669
71.09768378650554
{1: 1001, 0: 568, 3: 947, 2: 433, 4: 934, 5: 499, 6: 509, 7: 903}

Epoch: [3 | 20] LR: 0.100000 Sess: 3
Processing |################################| (56/56) | Total: 0:00:03 | Loss: 0.0134 | top1: 57.7886 | top5: 57.7886
Processing |################################| (32/32) Total: 0:00:08 | Loss: 2.3776 | top1: 50.8544 | top1_task: 50.8544
47.84869976359338
63.07541625857003
40.821771611526145
50.95669687814703
{1: 977, 0: 35, 3: 993, 2: 295, 4: 765, 7: 626, 6: 386}

something wrong happends in training cifar10

in incremental_dataloader.py:

class iCIFAR10(DataHandler):
base_dataset = datasets.cifar.CIFAR10
train_transforms = [
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=63 / 255),
transforms.ToTensor(),
]
common_transforms = [
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
]

train_transforms has no Normalize function, this will get unworkable when testing.