cbfinn / maml Goto Github PK

View Code? Open in Web Editor NEW

2.5K 2.5K 603.0 572 KB

Code for "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"

License: MIT License

Python 100.00%

maml's People

Contributors

Stargazers

Watchers

Forkers

statml benjamesbabala amoliu jdc08161063 ja1r0 codeaudit apprisi cclauss huiyi1990 missfall mydp2017 ntforked ml-lab csyanbin 19ai ryfan-rs katyprogrammer geyang hyeonwoonoh randal3465 w1368027790 schangpi sungjinlees winston-zillow baiyancheng20 leejayyoon turinglife andreirusu ilovecv debasmitdas abdullahjamal zhuwenxiao backpropper soudia lipppppp jungel2star lilleswing neka-nat jithsjoy ronamit linxihui soroushmehr jenny-nlc xiaogangli oztc frankiegu boragocode sammy4321 keithsw seominlee liuq4360 qicny huoliangyu wh-forker tianheyu927 salimoha nottombrown qibinc barzinm kaitoops codegank skiman6010 cbensimon planetceres tonnyyan jayvischeng nunofernandes-plight ghassenj kaffaljidhmah2 jeremynixon williamd4112 clovervnd nishant-neo scapeqin zendevelopmentsystems coladrill hongzimao amwons shaanxicode wshenx sweaterr thapliy majianzhu fushunren jiangziguo shubhampachori12110095 chetanmehra codes-kzhan dongleecsu apple-jack bkj porkpy xjtucantor eveningdong kevinlee9 wwxfromtju xhwxd tjujianyu homangab andrewliao11

maml's Issues

Questions about baseline and metalearn training

Hello, Chelsea.

It seems that when training the MAML on Omniglot and miniImagemet, there are meta_batch_size of tasks in one same batch. These tasks share the same network parameters, and each task has the labels from 0~4(5 ways) but classes between each tasks are not the same. My question is why is it make sense to do this.
For the regression task, it seems that MAML method has 20 data points for training but baseline method only has 10 in one task:
self.pretrain_op = tf.train.AdamOptimizer(self.meta_lr).minimize(total_loss1) self.metatrain_op = optimizer.apply_gradients(gvs)
MAML uses both inputa and inputb for training, but baseline method only used inputa.

unused = meta_task((self.support_x[0], self.query_x[0], self.support_y[0], self.query_y[0]), False)

Hi, I'm working on the source code of MAML, Thanks for your sharing.

when I try to understand

# to initialize the batch norm variables, might want to combine this, and not run idx 0 twice.
unused = meta_task((self.support_x[0], self.query_x[0], self.support_y[0], self.query_y[0]), False)

I don't know why need this line since it's unused??

task_outputa = self.forward(inputa, weights, reuse=reuse)  # only reuse on the first iter

This seems to be the only line not reusing variable, I don't understand why this line need NOT reuse??
Thanks.

Online MAML (FTML) code coming up?

Is moving mean and variance in batchnorm layer updated properly?

Hello,

I am trying to reimplement MAML by myself, but I figured it out that implementing batch norm is tricky. Since statistics of data can change over internal iteration (inside loop) by SGD, it is hard to decide how beta and gamma (not used for this case because of ReLU nonlinearity) should be updated. Furthermore, it is rather unclear which means and variances should be tracked for validation. Should it be tracked overall mean and variance across tasks and internal iterations? Or, should the statistics before it is tuned to tasks be measured?

I tried some of the possible strategies, but the result did not match up with the performance reported. The best performance was observed without using batch norm, but it still 95% for Omniglot Dataset for 5-way, 1-shot problem.

With this consideration, I looked up the code and I found out that moving means and variances are not saved and restored. I could not find:

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

so, I checked the saved model with the code:

        # from main.py
        if FLAGS.test_iter > 0:
            model_file = model_file[:model_file.index('model')] + 'model' + str(FLAGS.test_iter)
        if model_file:
            ind1 = model_file.index('model')
            resume_itr = int(model_file[ind1+5:])
            print("Restoring model weights from " + model_file)
            with tf.variable_scope("", reuse=True):
                b = tf.get_variable("model/0/beta",)
                mm = tf.get_variable("model/0/moving_mean",)
                mv = tf.get_variable("model/0/moving_variance")

                print( sess.run([b,mm,mv]) )

and the result was

Restoring model weights from logs/omniglot5way//cls_5.mbs_32.ubs_1.numstep1.updatelr0.4batchnorm/model2000
[array([-0.01856588, -0.07139841,  0.00564138,  0.01990231, -0.00762643,
       -0.05611767,  0.07031356, -0.03621985,  0.05920702,  0.12788662,
        0.04555263, -0.06217157, -0.07977205,  0.01632672, -0.03578645,
        0.10676555, -0.04455299, -0.0573478 ,  0.52247691, -0.05695038,
        0.14302482, -0.07892933, -0.02123305,  0.01870824,  0.01471483,
       -0.06067625,  0.097821  , -0.05786318,  0.03801388, -0.04843186,
       -0.01786073,  0.0293963 ,  0.56441385,  0.07509601,  0.11491237,
        0.01052142,  0.23142786,  0.03433308, -0.05783347, -0.0444839 ,
        0.02227049, -0.02804896, -0.04594825,  0.05347209, -0.0399643 ,
        0.02923759,  0.1299762 , -0.02817831, -0.0735756 , -0.0284342 ,
       -0.04498725, -0.05203079, -0.04267518, -0.03341504, -0.05648317,
       -0.02747083, -0.03525382,  0.34740165, -0.00822794,  0.03952603,
        0.03410957,  0.29954502, -0.01362322, -0.04790628], dtype=float32), array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32), array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.], dtype=float32)]

It feels like the only beta variables are updated, so it seems what batchnorm layers do is shifting only.

Any thoughts?

Thanks!

small problems when testing

When I run the testing code after training as follow:
python main.py --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=4 --update_batch_size=1 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=log/miniimagenet1shot/ --num_filters=32 --max_pool=True

python main.py --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=2 --update_batch_size=5 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=logs/miniimagenet5shot/ --num_filters=32 --max_pool=True

I encountered several problems:
1). seems that in main.py Line 194-197 need indentation
L290 in main.py need to be commented for testing
2). I follow the default code and setting, but only get 46.90% and 61.03% for 1-shot and 5-shot rather than 48.70% and 63.11% reported in the paper on test set. Do I miss any training or testing tricks?

Registering two gradient with name 'MaxPoolGrad'

I run on miniImagenet dataset and encountered this problem:
with tensorflow version v1.2

File "main.py", line 33, in <module>
    from maml import MAML
  File "/home/yanbin/maml/maml.py", line 3, in <module>
    import special_grads
  File "/home/yanbin/maml/special_grads.py", line 6, in <module>
    @ops.RegisterGradient("MaxPoolGrad")
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1750, in __call__
    _gradient_registry.register(f, self._op_type)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/registry.py", line 62, in register
    (self._name, name, function_name, filename, line_number))
KeyError: "Registering two gradient with name 'MaxPoolGrad' !(Previous registration was in <module> /usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py:77)"

Does this mean the tensorflow v1.2 already implemented MaxPoolGrad?

A question about mini-imagenet

Hi, I am a little confused about the image.zip that mentioned in https://github.com/cbfinn/maml/blob/master/data/miniImagenet/proc_images.py#L4, is it from ILSRVC2012_img_train.tar ?

many thanks,
Max

Error when running main.py

Hi, I was trying to test the model on Omniglot dataset, I downloaded and processed the dataset as instructed, but when I run main.py using the Usage Instruction for 5-way 1-shot omniglot and 20-way 1-shot omniglot , I encounter the following error message:

Generating filenames
Generating image processing ops
Batching images
Manipulating image data to be right shape
Generating filenames
Traceback (most recent call last):
  File "main.py", line 350, in <module>
    main()
  File "main.py", line 280, in main
    image_tensor, label_tensor = data_generator.make_data_tensor(train=False)
  File "/home/rvl224/maml/data_generator.py", line 99, in make_data_tensor
    sampled_character_folders = random.sample(folders, self.num_classes)
  File "/home/rvl224/anaconda2/lib/python2.7/random.py", line 323, in sample
    raise ValueError("sample larger than population")
ValueError: sample larger than population

It seems like the length of folders in
sampled_character_folders = random.sample(folders, self.num_classes)
become zero when generating the metaval_input_tensors in main.py

I am using ubuntu14.04, anaconda python 2.7*, and tf 1.0, with latest numpy

Really appreciate for open sourcing this amazing work,
Max.

Help!!!

where can I get .csv files for miniImagenet

Error: No gradient defined for operation (op type: CropAndResizeGradImage)

Hi,
I'm running a model in maml training.
but an error occur:
No gradient defined for operation ``'tower0/gradients/tower0/roi_align/crop_and_resize/CropAndResize_grad/CropAndResizeGradImage' (op type: CropAndResizeGradImage)

I've noticed that in special_grads.py you wrote
@ops.RegisterGradient("MaxPoolGrad") def _MaxPoolGradGrad(op, grad):
an register for gradient, I'm not sure how it works, buts seem relative to my error.
And it only occurs when I do second order differentiation.
I'd like to know how can I write one for CropAndResizeGradImage for my model.

I have been looking on the internet and haven't found a solution
please help
thank you

Any tips to speed up training process?

The overall training speed seems to be a little bit slow.
Any tip to speed up ? such as enlarger task num?

Why update gradient of inner loop multiple times?

For classification, why did you update theta' multiple times for each task?

validation result

hi, could you tell me what is result[0] and result[1] ？
and what is inner gradient?

A few questions regarding the paper

Hi Chelsea,

I am new to the field and after reading your paper, I have a few questions regarding that.

In section 2.2, at the point where the algorithm suggests to compute the adapted parameters with gradient descent, the paper states that it is possible to use multiple gradient updates than just one. How can someone extend it for multiple gradients? If I get it correctly, you mean to do several updates (more than 1) calculating theta_i' with gradient descent and then go to the meta-update stage? If yes, then the meta-update stage will need to calculate gradients of 3rd order or more,right?
What do you consider as a task in the classifcation setting? Is it the case that each class represents a task? In the regression setting, does the distribution over tasks correspond to the joint distribution of amplitude and phase?
How "similar" the tasks into the distribution should be in order for the MAML algorithm to be effective? Is there a measure of similarity?
How did you apply the first-order approximation of MAML? The paper states than in that case "the second derivatives are omitted. Note that the resulting method still computes the meta-gradient at the post-update parameter values theta_i', which provides for effective meta-learning." So, what is the difference with the original algorithm? Which step changes?
I can see your algorithm as a way of trying to predict (using gradient through a gradient) what are the best directions to move now such that the model will be close to the optimal parameters of the future (optimality is defined depending on which tasks will be shown at test time). Is this a correct intuition?

Thank you in advance for your time.

Aris Papadopoulos

A question about the inner gradient calculation

Hi Chelsea! In you paper, during meta training process, the inner gradients(theta') are calculated independently for different tasks within one batch. Then the updated parameters are applied to the new samples to calculate the outer gradients(theta). However, in your code maml.py - task_metalearn(), I didn't see independent calculation of the inner gradients for different tasks within a batch. Can I ask why?

About First-order Gradient Approximation

Dear Chelsea,

Could you please try to clarify to me:
When we use first-order gradient approximation for meta update, we sample for a particular task source='inputa' and target='inputb'. Then what's the difference between:

meta update on 'inputb' based on theta' from 'inputa'
update 'inputa' and 'inputb' sequential using normal updates

Thanks in advance for your reply.

getting statistics for batchnorm

Hi Chelsea,

I'm reimplementing MAML for few-shot classification, and I have troubles understanding how exactly you use batch norm. I was hoping you can help me out and clarify so I can understand this better. Below I'm assuming N=5-way k=1-shot learning and 4 tasks per meta-update.

During training, in the inner loop update, how do you compute the mean/var for batch normalisation? Do you

use the (5x1)=5 images from the current batch, so the task-train set?
use the (5x(1+15))=80 images from the current task, so the task-train and task-test set?
use the (4x5x(1+15))=320 images from all tasks in the current meta-batch?

Also, does this differ depending on how many gradient update steps you do? And when you're evaluating, will you use the same procedure?

Thanks a lot in advance!

MAML makes weights stuck in some bad equilibrium

Hi Chelsea,
I'm trying to use MAML for training an convolutional autoencoder, which should learn to encode robot motorcurrents. (Several traces from different robots, so every robot is a task, the traces are the samples of the task.)

I got this working in general, but it seems like MAML drives the weights into a direction, so that it produces a complete straight line, at the baseline of the amplitude. (Which in general probably makes sense?! because from this it is rather easy to go train to new motorcurrents?!) (See figure below)

It seems that the problem is, that when I try to finetune afterward for one robot, the optimization doesn't guide the weights out of this straight line equilibrium. So it actually doesn't work. I tried to add some noise to the weights, then it somehow goes out, but then the MAML pretraining is not preserved really well...

This graph shows the output of the autoencoder after MAML training

This graph (the green one) shows the real motor current. MAML seems to find a straight line at the base of the amplitude, but when trying to finetune to this task, it doesn't get out, anymore.

Questions regarding test time and evaluation methodology

Hi Chelsea,

As far as I have understood by studying your paper, during test time, for example in the classification setting, you get N unseen classes and K instances for each one of them, you run some steps using gradient descent on these N*K examples and then you evaluate the model's ability to classify new instances in these N classes. If what I have written is correct, I have the following questions:

What is the objective over which you optimize during test time? Is it the loss function where the summation is only for the N*K examples?
During the evaluation, do you use examples from these N classes except the K ones that you have already seen?
During the evaluation, is the algorithm only provided with examples belonging to these N classes?

Thank you in advance for your time.

What's the main difference between 3 branches?

Question about the batch normalization parameters (beta and gamma, not the stats)

Why don't you update the batch normalization parameters (beta and gamma) in the inner training loop (for j in range(num_updates - 1)) and while finetuning on a test task, like you do it for the other network parameters (weights and biases) ?

PS: I don't mean the batch normalization statistics (moving mean and variance). In your case you only use beta, because you use this function https://www.tensorflow.org/versions/r1.5/api_docs/python/tf/contrib/layers/batch_norm

Thanks in advance!

cannot create a tensor larger than 2GB

When I run python main.py --datasource=omniglot --metatrain_iterations=40000 --meta_batch_size=16 --update_batch_size=5 --num_classes=20 --update_lr=0.1 --num_updates=5 --logdir=logs/omniglot20way5shot/, it raised a ValueError: Cannot create a tensor proto whose content is larger than 2GB. I'd like to ask how to train the model on omniglot, 20-way, 5-shot?

RL experiments

I can only see the code for the regression/classification experiments. What about the code for the RL experiments?

Any gus succeed to update moving_mean and moving_variance?

Dear all:
I try to use the moving version of batch norm:

			# # add batch_norm ops before meta_op
			update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
			with tf.control_dependencies(update_ops):
				# update theta
				self.meta_op = optimizer.apply_gradients(gvs)

and set the flags of batch_normalization properly:

x = tf.layers.batch_normalization(x, training=training, name=scope + '_bn', reuse=tf.AUTO_REUSE, fused=True)

However, it has errors when runing:

'gradients/truediv_7_grad/Neg' has inputs from different frames. The input 'map_fn/while/MAML_39/conv3_bn/AssignMovingAvg_1' is in frame 'map_fn/while/while_context'. The input 'Sum_5' is in frame ''.

I search google and can not find effective solutions yet.
Anyone succeed to use the moving version of batch_norm?? please help me or share part of your code.

What's the method do you use to fit sinusoid curve after obtaining predicted values of y & mean loss & stddev & ci95 by using test function in main.py

What's the method do you use to fit sinusoid curve when you got predicted values y corresponding to x, 10 (x, y) points totally. Linear regression？

why bias Variable are created by tf.variable instead of tf.get_variable?

Hi, these variables in 4 Convs & 1 Dense layer across different tasks and different train/test modes are shared by tf.get_variable function.
However, I noticed that you use tf.Variable for all bias Variables. Will these lines create new bias variable when being called each time?

weights['conv1'] = tf.get_variable('conv1', [k, k, self.channels, self.dim_hidden], initializer=conv_initializer, dtype=dtype)
weights['b1'] = tf.Variable(tf.zeros([self.dim_hidden]))

The argument values for the fine-tuning baseline results in mini-ImageNet

Hi Chelsea, could you provide the argument values for the fine-tuning baseline results in mini-ImageNet, as shown in the bottom plot of Table 1? One small problem that I found is when metatrain_iterations = 0, because of line 239-241 in main.py, inputb becomes an empty tensor (line 269 in main.py). However, inputb is used in the function task_metalearn in maml.py on line 95.

A question about the testing script

Hi, thanks for sharing the project. @cbfinn
After training the model with script python main.py --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=4 --update_batch_size=1 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=logs/miniimagenet1shot/ --num_filters=32 --max_pool=True, I use the script python main.py --datasource=miniimagenet --train=False --test_set=True --logdir=logs/miniimagenet1shot --meta_batch_size=1 --num_classes=5 --num_filters=32 --max_pool=True --update_batch_size=1 to test the model. But the terminal shows like the following picture. The terminal always displays this interface and no results are returned. Could you tell me what's wrong with my script? Thanks very much!

MaxPoolGrad already registered in tensorflow 1.2

Got the error below. Current work around is to comment out the special_grads.py

(tf2) ~/3rdparty/maml$ pip3 list | grep tensorflow
tensorflow (1.2.1)

(tf2) ~/3rdparty/maml$ python3 main.py --train=False --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=4 --update_batch_size=5 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=logs/miniimagenet5shot/ --num_filters=32 --max_pool=True --stop_grad=True
Traceback (most recent call last):
  File "main.py", line 33, in <module>
    from maml import MAML
  File "/Users/winstonq/3rdparty/maml/maml.py", line 3, in <module>
    import special_grads
  File "/Users/winstonq/3rdparty/maml/special_grads.py", line 6, in <module>
    @ops.RegisterGradient("MaxPoolGrad")
  File "/Users/winstonq/.tox/tf2/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1750, in __call__
    _gradient_registry.register(f, self._op_type)
  File "/Users/winstonq/.tox/tf2/lib/python3.5/site-packages/tensorflow/python/framework/registry.py", line 62, in register
    (self._name, name, function_name, filename, line_number))
KeyError: "Registering two gradient with name 'MaxPoolGrad' !(Previous registration was in _find_and_load_unlocked <frozen importlib._bootstrap>:958)"

Too slow to generate filenames?

Hi,
Each time I run the main.py, it takes a long time to generate filename. Could you please fix it?

Using model that pre-trained on ImageNet

Hi, I was thinking about using model that was trained on ImageNet as feature extractor, than randomly initialize and re-train the last few layers using the method you proposed. Will this idea make any sense to you ?

where download dataset?

where download miniImagenet dataset?

a question about the code

hello,may i ask you a question? when i read your code,"main.py",i was confused about the meaning of the parameter test_num_updates ? can you explain it to me? it seems a little stupid about this question. i am looking forward to receiving your reply

what is the meaning of "amp" and "phase"?

Dear,

In my project for predicting the stock price, I only have input data(n features) and label(price),
what will I do for the amp and phase?

I just want to use my data for training.

Thanks.

may I ask a question about the algorithm ?

hello ,big gun, I'm new in this area, after reading your paper MAML,I am confused why you update theta in the inner loop to get theta' , and then in outer loop, you use theta' to calculate final loss.
In my opinion, if i consider all the tasks as a batch , and use original theta to calculate each task 's loss,and then sum up to do gradient descent, in this way it's also reasonable to find a proper theta that suits all the tasks,because it makse the sum of all the tasks loss minimal,so i dont know the reason you do a update in inner loop,
I want to know the insight your algorithm do and the advantages of your algorithm compared to my opinoin.
Thanks a lot , maybe problem seems stupid ,looking forward to your reply, hahaha!

Cannot understand the insights of the work

Hi Chelsea,

I am new to the meta learning problem and after reading your paper, I have some doubts. For the pseudo code of Algorithm 1, you said that "while not done do". Dose it mean that there are many iterations and in each iteration, you sampled K samples from each task, updated the parameters theta' corresponding to each task, sampled samples using new theta', and finally, updated the network parameter theta based all tasks. Whether the stop criterion is the iteration number you set beforehand? By the way, the results in your work seems wonderful but it's rather hard for me to figure out the insights of MAML. Why using such update strategy can achieve such well performance? Is there any work that can be suggestive? Anyway, thanks for your idea and have a nice day!

Pretrained weights

Hi,
Would it be possible for you to share the meta-learners trained weights for 5 way imagenet?
Also, do you have the download link for the mini-imagenet files?
Thanks

Exception running Omniglot 20-way 5-shot

When I run 20-way 1-shot like so, everything works fine:

sh-log FOML-O520 python main.py --datasource=omniglot --metatrain_iterations=40000 --meta_batch_size=16 --update_batch_size=1 --num_classes=20 --update_lr=0.1 --num_updates=5 --logdir=logs/omniglot20way/ --stop_grad=True

However, when I change --update_batch_size to 5, I see this error:

Generating filenames
Traceback (most recent call last):
  File "main.py", line 346, in <module>
    main()
  File "main.py", line 267, in main
    image_tensor, label_tensor = data_generator.make_data_tensor()
  File "/root/code/maml/data_generator.py", line 104, in make_data_tensor
    filename_queue = tf.train.string_input_producer(tf.convert_to_tensor(all_filenames), shuffle=False)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 932, in convert_to_tensor
    as_ref=False)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1022, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 233, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 218, in constant
    name=name).outputs[0]
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
    op_def=op_def)
  File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1569, in __init__
    "Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.

This is with tensorflow-gpu version 1.5.0. Any pointers would be great!

Question on fine-tuning steps of the pretrained model

Hi Chelsea,

In Figure 3, when compared with pretrained models, MAML demonstrates superior effectiveness on fast learning with a few gradient steps. I am wondering, it is also worthwhile to compare with the pretrained model when it is more thoroughly fine-tuned, say after 100 fine-tuning steps during meta-testing?

Thank you!

batch_norm layer need train update_ops?

Hi, according to batch_norm tensorflow api doc, it says:

Note: when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op. For example:

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss)

In your MAML implementation:

tf_layers.batch_norm(inp, activation_fn=activation, reuse=reuse, scope=scope)

and you seemingly did NOT train update_ops.
Could you help me ?

stop second derivatives by stop_grad=True

I noticed the new version of MAML paper said second derivatives has little influence on performance.
So I run the code and using --stop_grad=True and all other setting are as default.
The results seems 2% lower than --stop_grad=False.

My training cmd is as follows:
python main.py --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=4 --update_batch_size=1 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=logs_scope/miniimagenet1shot/ --num_filters=32 --max_pool=True --stop_grad=True

Do I miss anything to stop second derivatives?

How are tasks formally defined in the context of few-shot classification?

First of all, congratz for the paper and code. The idea behind MAML is elegantly simple yet powerful. In fact the paper is very easy to read, except for (and this is my question about) the part where you guys talk about its concrete application in Classification (section 5.2). Specifically, I don't understand how tasks are formally defined in this particular use case. In fact, since MAML trains an optimal parameter initialization theta, this theta should be the starting point for the parameters of all tasks' models. This entails that the number of parameters of theta should be the same number of parameters used for all tasks' models. However, in the context of classification, the usual approach is to use a softmax layer with as many output units as classes, and it might happen that different classification tasks involve 1) different classes and/or 2) different NUMBER of classes. If two tasks have a different number of classes, the softmax layer in each network would be different and the number of parameters would be different too. Therefore, there CANNOT BE a general theta that fits the number of parameters of all possible tasks (unless you enforce that each task should have a constant number of classes defined beforehand, which would become a limitation). How do you guys deal with this? I hope my question was clear enough.

how to understand parameter 'FLAGS.metatrain_iterations'?

I am learning the codes. I found the parameter 'FLAGS.metatrain_iterations', which is set 70000 for sinusoid demonstration. In each of iteration of 70000 iterations, new training data of total 25 tasks is produced, which amounts to a large number of examples for learning.

How to understand the parameter correctly? Does it mean meta-learning still need a large number of examples for training? Many thanks.

Please use "Issues" in the way it was

Guys, If you have any questions about the paper, you can just email to the corresponding author. GitHub Issues is meant to report bugs of the project. Please read this first before open a new issues. The authors are very kind to share their work with us, don't make them annoying to stop doing so. A benign open source environment needs to be protected by all of us.

May I get the data?

May I get the data and the csv file ? Thank you very much!

a question on transduction

why instead of

unused = task_metalearn((self.inputa[0], self.inputb[0], self.labela[0], self.labelb[0]), False)

you haven't used the following:

unused = self.forward(self.inputa[0], weights, reuse=reuse)

Batch normalization

Hello, Chelsea.

The batch normalization documentation says that this ops is not attached to the TensorFlow graph by default. So, there're two ways to force the updates during training:

explicitly tell the graph to update ops in tf.GraphKeys.UPDATE_OPS
or set updates_collections parameter of batch_norm to None.

I don't see neither of those in the code. Maybe I'm missing something.

I haven't been able to make the first way work due to while cycle in map_fn function. But the second modification is easy and seems to work. Although, I'm not sure I see any difference in performance.

why add accuraciesa with accuraciesb in eval?

in my opinion, the part a is used for update initial weight and the part b is used for inner learning.
So in eval, part a can be used to evaluation because the meta-learning is not trainning. But part b is used for the model training, so the accuracy should not include part b accuracy.

Re-train and get 46.40%

Hi, Chelsea:
I re-train your source code without any modification and get 46.40% accuracy. Although it's very close to your 48.7% in paper, I would be glad to hear some advice to reach 48.7%. Since I'm doing research on meta-learning, the trivial difference would be important for later algorithm comparison.
Here is my final re-train performance.

Mean validation accuracy/loss, stddev, and confidence intervals
(array([0.2019989 , 0.36166546, 0.40066576, 0.44499955, 0.45466623,
       0.4573329 , 0.45799953, 0.46166632, 0.46266633, 0.4639997 ,
       0.46366638], dtype=float32), array([0.03259854, 0.16752282, 0.1966368 , 0.21332692, 0.22017805,
       0.2248985 , 0.22855242, 0.22861326, 0.22760995, 0.22782418,
       0.22864777], dtype=float32), array([0.00260843, 0.01340462, 0.01573422, 0.01706971, 0.01761791,
       0.01799563, 0.018288  , 0.01829287, 0.01821259, 0.01822973,
       0.01829563], dtype=float32))

a question about paper

Hello,
I read the paper "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks" and not clear about how the comparison was done between MAML and a pretrained network. Could you clarify my doubts?
It says "P(T) is continuous, where the amplitude varies within [0.1, 5.0]", does it mean amplitude follows a uniform distribution? And I am also not clear about how MAML model was applied on test after training. According to the described algorithm, MAML need a batch of tasks to adjust its parameters, when do the comparison, there is only one sinusoid curve.

Thank you!

cbfinn / maml Goto Github PK

maml's People

Contributors

Stargazers

Watchers

Forkers

maml's Issues

Recommend Projects

Recommend Topics

Recommend Org