brjathu / itaml Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of "iTAML : An Incremental Task-Agnostic Meta-learning Approach". CVPR 2020
Official implementation of "iTAML : An Incremental Task-Agnostic Meta-learning Approach". CVPR 2020
Hi Jathushan,
Thanks for your awesome work.
Though I have a question in your paper. In Page3, in the paragraph whose name is Inner loop, you illustrate that Here, theta is updated in the inner loop for all tasks, but psi_i is only updated for i_th task. You have motioned in the above that you take apart the model into two parts where theta corresponds to the part which can get the feature vector v and meantime, psi corresponds to the part which can get the predictions p. But in my opinion, for a model, the two parts are organized unitedly and when we train the model, the backward of this two parts is simultaneous. So i am confused how you manage the separate update because according to the Algorithm 1 in Page 3 and the train function in the code, i can't catch the point.
Wonder if i miss something again and could you explain that for me pls?
Best.
C
Hi Jathushan,
Thanks for your awesome work.
Though I have a quesion in your paper. In Page3 Algorithm 1, line 3, we should do 'e iteration' on task '[1,t]' to update the model parameters.
It confused me that under the setting of life long learning or we say incremental learning, the old task data except that were saved in Memory before is forbidden to use again once we train the model on the new task. e.g. When task k was about to train to update the parameters, the data from task 1 to task k-1 should never be used again except exemplar that was stored in Memory before.
If we could do iteration on the old tasks or old data other than exemplar data, the training method should belong to Joint Training but not Life long learning anymore.
Wonder if i miss something and could you explain that for me pls?
Best.
C
I am curious about how you get the sum of the parameters of networks of each task. Did you apply any normalization?
Each task seems to be a multi-class classification, so why not using nn.CrossEntropyLoss?
Hi @brjathu, thanks for sharing your work! In Table 2 you provide the experiment for ImageNet-100 and ImageNet-1000. However, I can not find details about how you construct the ImageNet-100 dataset for training and also the training script like train_imagenet.py
in the Repo. Here are two questions:
Could you provide the training script for the ImageNet dataset (both for ImageNet-100 and ImageNet-1000) and also the details about how you construct the ImageNet-100 dataset?
Is there any difference between ImageNet-100 in your paper and the popular MiniImageNet as mentioned in Few-Shot Class-Incremental Learning (CVPR 2020)?
I got one question, I want to ask if I understood it correctly.
During the task inference, you need to have a continuum assuming they belongs to the same class. It makes comparison with other methods unfair, is it correct? What is your opinion?
The paper is awesome. By any chance will you be able to provide the implementations of other methods you used for benchmarking purposes?
Thanks for the code and the amazing job you did to share the requirements and dependencies to facilitate running the model for community. I was able to run the code on a GPU-enabled Ubuntu 18.04. but could not run it without using GPU. I was wondering if we can run the model without using GPU.
Thanks
Thanks for open source your great work, I'm confused when read the source code, the model has two same output x1 and x2.
x1 = self.fc(x) x2 = self.fc(x) return x1, x2
In Learner, some time use x1, some times use x2 or both, like
outputs2, outputs = model(inputs)
outputs2, _ = model(inputs)
_, outputs = meta_model(inputs)
What is the purpose of such a design?
Dear author:
I have read your paper and feel interest about your task-agnostic setting, I see PHI = {theta, phi} where phi =[phi_1^T phi_2^T ...]^T and PHI_i = {theta, phi_i} , but when I read your code I could not find out (maybe I don`t know) where to separate the parameter phi to phi_1 and phi_2 to create PHI_i for training task-specific model in the inner loop.
Hello,
Thanks for the amazing paper and your contribution to share your code.
I have an issue and would appreciate your help to solve it:
When running 'train_cifar' code, I face "out of index error" for line 97:
start_sess=int(sys.argv[1])
when checking len(sys.argv) = 1, that is why sys.argv[1] is out of index
I changed the code in that same line to to sys.arg[0] and faced another error:
"invalid literal for int() with base 10: '/codepath/train_cifar.py"
I have been stumped in solving this error and would appreciate your suggestion.
I am using Ubuntu 18.04 - nvidia gpu
Thanks you
Hello,someone.
When i run python3 train_cifar.py 0 ,it raises an error:’AttributeError: type object 'args' has no attribute 'overflow'
Thanks someone for helping me.
The function 'add_classes' in basic_net.py is not called in the code, so is the number of final classification nodes(10 or 100) of the network fixed? Is there any other place in the code where the number of nodes(classifier) increases?Thanks!
Fig 7 shows interesting performance of both task accuracy and class accuracy, I am wondering where to check those numbers. And where to decide the size of data continuum? Sorry I couldn't find corresponding part in the code. I might missed something. Thanks.
Thanks for your work, i think your work is a milestone work in lifelong machine learning.
But i run your code, the program does not seem to be running correctly.
And the log file is here:
session_0
session_1
session_2
session_3
session_4
The only place I modify the program is start_sess = 0
So why is the program like this?
Hello Jathushan Rajasegaran, thanks for your nice work. I have some questions about the implementation of the model in the meta train and meta test parts.
In the paper, this model consists of a meta-training process and an inference process. The inference process consists of task prediction and class prediction. In the code, is the meta_test()
function means the inference process?
And the function meta_test()
consists of three parts:
I guess the meta training in the meta_test()
is the adaptation process mentioned in the inference process. And the meta test without task knowledge is the task prediction. But why does the task prediction happens after the adaptation in the code? It seems that you directly use the task_id
to do the adaptation and class prediction.
command : python train_mnist 0 (start sess 0)
In my case, training goes well until Sess 2, Sess 2 best accuracy was 81.06450013224115
And here is my logs for Sess 3
[2, 2, 2, 2, 2]
{'min_class': 6, 'max_class': 8, 'task': 3, 'max_task': 5, 'n_train_data': 14181, 'n_test_data': 8017}
{0: 2115, 1: 2042, 2: 1874, 3: 1986}
Epoch: [1 | 20] LR: 0.100000 Sess: 3
Processing |################################| (56/56) | Total: 0:00:03 | Loss: 0.0897 | top1: 52.8383 | top5: 52.8383
Processing |################################| (32/32) Total: 0:00:08 | Loss: 1.9010 | top1: 60.0225 | top1_task: 60.0225
50.638297872340424
46.86581782566112
83.51120597652081
61.37965760322256
{1: 993, 0: 78, 2: 492, 3: 465, 5: 777, 4: 788, 6: 298, 7: 921}
Epoch: [2 | 20] LR: 0.100000 Sess: 3
Processing |################################| (56/56) | Total: 0:00:03 | Loss: 0.0334 | top1: 57.3866 | top5: 57.3866
Processing |################################| (32/32) Total: 0:00:08 | Loss: 1.7808 | top1: 72.2714 | top1_task: 72.2714
74.18439716312056
67.58080313418218
76.46744930629669
71.09768378650554
{1: 1001, 0: 568, 3: 947, 2: 433, 4: 934, 5: 499, 6: 509, 7: 903}
Epoch: [3 | 20] LR: 0.100000 Sess: 3
Processing |################################| (56/56) | Total: 0:00:03 | Loss: 0.0134 | top1: 57.7886 | top5: 57.7886
Processing |################################| (32/32) Total: 0:00:08 | Loss: 2.3776 | top1: 50.8544 | top1_task: 50.8544
47.84869976359338
63.07541625857003
40.821771611526145
50.95669687814703
{1: 977, 0: 35, 3: 993, 2: 295, 4: 765, 7: 626, 6: 386}
Epoch: [4 | 20] LR: 0.100000 Sess: 3
Processing |################################| (56/56) | Total: 0:00:03 | Loss: nan | top1: 37.5855 | top5: 37.5855
Processing |###### | (6/32) Total: 0:00:02 | Loss: nan | top1: 45.8333 | top1_task: 45.8333^CTraceback (most recent call last):
in incremental_dataloader.py:
class iCIFAR10(DataHandler):
base_dataset = datasets.cifar.CIFAR10
train_transforms = [
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=63 / 255),
transforms.ToTensor(),
]
common_transforms = [
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
]
train_transforms has no Normalize function, this will get unworkable when testing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.