cbfinn / maml Goto Github PK
View Code? Open in Web Editor NEWCode for "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"
License: MIT License
Code for "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"
License: MIT License
Hello, Chelsea.
It seems that when training the MAML on Omniglot and miniImagemet, there are meta_batch_size of tasks in one same batch. These tasks share the same network parameters, and each task has the labels from 0~4(5 ways) but classes between each tasks are not the same. My question is why is it make sense to do this.
For the regression task, it seems that MAML method has 20 data points for training but baseline method only has 10 in one task:
self.pretrain_op = tf.train.AdamOptimizer(self.meta_lr).minimize(total_loss1) self.metatrain_op = optimizer.apply_gradients(gvs)
MAML uses both inputa and inputb for training, but baseline method only used inputa.
Hi, I'm working on the source code of MAML, Thanks for your sharing.
# to initialize the batch norm variables, might want to combine this, and not run idx 0 twice.
unused = meta_task((self.support_x[0], self.query_x[0], self.support_y[0], self.query_y[0]), False)
I don't know why need this line since it's unused??
task_outputa = self.forward(inputa, weights, reuse=reuse) # only reuse on the first iter
This seems to be the only line not reusing variable, I don't understand why this line need NOT reuse??
Thanks.
Online MAML (FTML) code coming up?
Hello,
I am trying to reimplement MAML by myself, but I figured it out that implementing batch norm is tricky. Since statistics of data can change over internal iteration (inside loop) by SGD, it is hard to decide how beta and gamma (not used for this case because of ReLU nonlinearity) should be updated. Furthermore, it is rather unclear which means and variances should be tracked for validation. Should it be tracked overall mean and variance across tasks and internal iterations? Or, should the statistics before it is tuned to tasks be measured?
I tried some of the possible strategies, but the result did not match up with the performance reported. The best performance was observed without using batch norm, but it still 95% for Omniglot Dataset for 5-way, 1-shot problem.
With this consideration, I looked up the code and I found out that moving means and variances are not saved and restored. I could not find:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
so, I checked the saved model with the code:
# from main.py
if FLAGS.test_iter > 0:
model_file = model_file[:model_file.index('model')] + 'model' + str(FLAGS.test_iter)
if model_file:
ind1 = model_file.index('model')
resume_itr = int(model_file[ind1+5:])
print("Restoring model weights from " + model_file)
with tf.variable_scope("", reuse=True):
b = tf.get_variable("model/0/beta",)
mm = tf.get_variable("model/0/moving_mean",)
mv = tf.get_variable("model/0/moving_variance")
print( sess.run([b,mm,mv]) )
and the result was
Restoring model weights from logs/omniglot5way//cls_5.mbs_32.ubs_1.numstep1.updatelr0.4batchnorm/model2000
[array([-0.01856588, -0.07139841, 0.00564138, 0.01990231, -0.00762643,
-0.05611767, 0.07031356, -0.03621985, 0.05920702, 0.12788662,
0.04555263, -0.06217157, -0.07977205, 0.01632672, -0.03578645,
0.10676555, -0.04455299, -0.0573478 , 0.52247691, -0.05695038,
0.14302482, -0.07892933, -0.02123305, 0.01870824, 0.01471483,
-0.06067625, 0.097821 , -0.05786318, 0.03801388, -0.04843186,
-0.01786073, 0.0293963 , 0.56441385, 0.07509601, 0.11491237,
0.01052142, 0.23142786, 0.03433308, -0.05783347, -0.0444839 ,
0.02227049, -0.02804896, -0.04594825, 0.05347209, -0.0399643 ,
0.02923759, 0.1299762 , -0.02817831, -0.0735756 , -0.0284342 ,
-0.04498725, -0.05203079, -0.04267518, -0.03341504, -0.05648317,
-0.02747083, -0.03525382, 0.34740165, -0.00822794, 0.03952603,
0.03410957, 0.29954502, -0.01362322, -0.04790628], dtype=float32), array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32), array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)]
It feels like the only beta variables are updated, so it seems what batchnorm layers do is shifting only.
Any thoughts?
Thanks!
When I run the testing code after training as follow:
python main.py --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=4 --update_batch_size=1 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=log/miniimagenet1shot/ --num_filters=32 --max_pool=True
python main.py --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=2 --update_batch_size=5 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=logs/miniimagenet5shot/ --num_filters=32 --max_pool=True
I encountered several problems:
1). seems that in main.py Line 194-197 need indentation
L290 in main.py need to be commented for testing
2). I follow the default code and setting, but only get 46.90% and 61.03% for 1-shot and 5-shot rather than 48.70% and 63.11% reported in the paper on test set. Do I miss any training or testing tricks?
I run on miniImagenet dataset and encountered this problem:
with tensorflow version v1.2
File "main.py", line 33, in <module>
from maml import MAML
File "/home/yanbin/maml/maml.py", line 3, in <module>
import special_grads
File "/home/yanbin/maml/special_grads.py", line 6, in <module>
@ops.RegisterGradient("MaxPoolGrad")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1750, in __call__
_gradient_registry.register(f, self._op_type)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/registry.py", line 62, in register
(self._name, name, function_name, filename, line_number))
KeyError: "Registering two gradient with name 'MaxPoolGrad' !(Previous registration was in <module> /usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py:77)"
Does this mean the tensorflow v1.2 already implemented MaxPoolGrad?
Hi, I am a little confused about the image.zip that mentioned in https://github.com/cbfinn/maml/blob/master/data/miniImagenet/proc_images.py#L4, is it from ILSRVC2012_img_train.tar ?
many thanks,
Max
Hi, I was trying to test the model on Omniglot dataset, I downloaded and processed the dataset as instructed, but when I run main.py using the Usage Instruction for 5-way 1-shot omniglot and 20-way 1-shot omniglot , I encounter the following error message:
Generating filenames
Generating image processing ops
Batching images
Manipulating image data to be right shape
Generating filenames
Traceback (most recent call last):
File "main.py", line 350, in <module>
main()
File "main.py", line 280, in main
image_tensor, label_tensor = data_generator.make_data_tensor(train=False)
File "/home/rvl224/maml/data_generator.py", line 99, in make_data_tensor
sampled_character_folders = random.sample(folders, self.num_classes)
File "/home/rvl224/anaconda2/lib/python2.7/random.py", line 323, in sample
raise ValueError("sample larger than population")
ValueError: sample larger than population
It seems like the length of folders in
sampled_character_folders = random.sample(folders, self.num_classes)
become zero when generating the metaval_input_tensors
in main.py
I am using ubuntu14.04, anaconda python 2.7*, and tf 1.0, with latest numpy
Really appreciate for open sourcing this amazing work,
Max.
where can I get .csv files for miniImagenet
Hi,
I'm running a model in maml training.
but an error occur:
No gradient defined for operation ``'tower0/gradients/tower0/roi_align/crop_and_resize/CropAndResize_grad/CropAndResizeGradImage' (op type: CropAndResizeGradImage)
I've noticed that in special_grads.py
you wrote
@ops.RegisterGradient("MaxPoolGrad") def _MaxPoolGradGrad(op, grad):
an register for gradient, I'm not sure how it works, buts seem relative to my error.
And it only occurs when I do second order differentiation.
I'd like to know how can I write one for CropAndResizeGradImage
for my model.
I have been looking on the internet and haven't found a solution
please help
thank you
The overall training speed seems to be a little bit slow.
Any tip to speed up ? such as enlarger task num?
For classification, why did you update theta' multiple times for each task?
hi, could you tell me what is result[0] and result[1] ?
and what is inner gradient?
Hi Chelsea,
I am new to the field and after reading your paper, I have a few questions regarding that.
In section 2.2, at the point where the algorithm suggests to compute the adapted parameters with gradient descent, the paper states that it is possible to use multiple gradient updates than just one. How can someone extend it for multiple gradients? If I get it correctly, you mean to do several updates (more than 1) calculating theta_i' with gradient descent and then go to the meta-update stage? If yes, then the meta-update stage will need to calculate gradients of 3rd order or more,right?
What do you consider as a task in the classifcation setting? Is it the case that each class represents a task? In the regression setting, does the distribution over tasks correspond to the joint distribution of amplitude and phase?
How "similar" the tasks into the distribution should be in order for the MAML algorithm to be effective? Is there a measure of similarity?
How did you apply the first-order approximation of MAML? The paper states than in that case "the second derivatives are omitted. Note that the resulting method still computes the meta-gradient at the post-update parameter values theta_i', which provides for effective meta-learning." So, what is the difference with the original algorithm? Which step changes?
I can see your algorithm as a way of trying to predict (using gradient through a gradient) what are the best directions to move now such that the model will be close to the optimal parameters of the future (optimality is defined depending on which tasks will be shown at test time). Is this a correct intuition?
Thank you in advance for your time.
Aris Papadopoulos
Hi Chelsea! In you paper, during meta training process, the inner gradients(theta') are calculated independently for different tasks within one batch. Then the updated parameters are applied to the new samples to calculate the outer gradients(theta). However, in your code maml.py - task_metalearn(), I didn't see independent calculation of the inner gradients for different tasks within a batch. Can I ask why?
Dear Chelsea,
Could you please try to clarify to me:
When we use first-order gradient approximation for meta update, we sample for a particular task source='inputa' and target='inputb'. Then what's the difference between:
Thanks in advance for your reply.
Hi Chelsea,
I'm reimplementing MAML for few-shot classification, and I have troubles understanding how exactly you use batch norm. I was hoping you can help me out and clarify so I can understand this better. Below I'm assuming N=5-way k=1-shot learning and 4 tasks per meta-update.
During training, in the inner loop update, how do you compute the mean/var for batch normalisation? Do you
Also, does this differ depending on how many gradient update steps you do? And when you're evaluating, will you use the same procedure?
Thanks a lot in advance!
Hi Chelsea,
I'm trying to use MAML for training an convolutional autoencoder, which should learn to encode robot motorcurrents. (Several traces from different robots, so every robot is a task, the traces are the samples of the task.)
I got this working in general, but it seems like MAML drives the weights into a direction, so that it produces a complete straight line, at the baseline of the amplitude. (Which in general probably makes sense?! because from this it is rather easy to go train to new motorcurrents?!) (See figure below)
It seems that the problem is, that when I try to finetune afterward for one robot, the optimization doesn't guide the weights out of this straight line equilibrium. So it actually doesn't work. I tried to add some noise to the weights, then it somehow goes out, but then the MAML pretraining is not preserved really well...
This graph shows the output of the autoencoder after MAML training
This graph (the green one) shows the real motor current. MAML seems to find a straight line at the base of the amplitude, but when trying to finetune to this task, it doesn't get out, anymore.
Hi Chelsea,
As far as I have understood by studying your paper, during test time, for example in the classification setting, you get N unseen classes and K instances for each one of them, you run some steps using gradient descent on these N*K examples and then you evaluate the model's ability to classify new instances in these N classes. If what I have written is correct, I have the following questions:
Thank you in advance for your time.
What's the main difference between 3 branches?
Why don't you update the batch normalization parameters (beta and gamma) in the inner training loop (for j in range(num_updates - 1)) and while finetuning on a test task, like you do it for the other network parameters (weights and biases) ?
PS: I don't mean the batch normalization statistics (moving mean and variance). In your case you only use beta, because you use this function https://www.tensorflow.org/versions/r1.5/api_docs/python/tf/contrib/layers/batch_norm
Thanks in advance!
When I run python main.py --datasource=omniglot --metatrain_iterations=40000 --meta_batch_size=16 --update_batch_size=5 --num_classes=20 --update_lr=0.1 --num_updates=5 --logdir=logs/omniglot20way5shot/, it raised a ValueError: Cannot create a tensor proto whose content is larger than 2GB. I'd like to ask how to train the model on omniglot, 20-way, 5-shot?
I can only see the code for the regression/classification experiments. What about the code for the RL experiments?
Dear all:
I try to use the moving version of batch norm:
# # add batch_norm ops before meta_op
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
# update theta
self.meta_op = optimizer.apply_gradients(gvs)
and set the flags of batch_normalization properly:
x = tf.layers.batch_normalization(x, training=training, name=scope + '_bn', reuse=tf.AUTO_REUSE, fused=True)
However, it has errors when runing:
'gradients/truediv_7_grad/Neg' has inputs from different frames. The input 'map_fn/while/MAML_39/conv3_bn/AssignMovingAvg_1' is in frame 'map_fn/while/while_context'. The input 'Sum_5' is in frame ''.
I search google and can not find effective solutions yet.
Anyone succeed to use the moving version of batch_norm?? please help me or share part of your code.
What's the method do you use to fit sinusoid curve when you got predicted values y corresponding to x, 10 (x, y) points totally. Linear regression?
Hi, these variables in 4 Convs & 1 Dense layer across different tasks and different train/test modes are shared by tf.get_variable function.
However, I noticed that you use tf.Variable for all bias Variables. Will these lines create new bias variable when being called each time?
weights['conv1'] = tf.get_variable('conv1', [k, k, self.channels, self.dim_hidden], initializer=conv_initializer, dtype=dtype)
weights['b1'] = tf.Variable(tf.zeros([self.dim_hidden]))
Hi Chelsea, could you provide the argument values for the fine-tuning baseline results in mini-ImageNet, as shown in the bottom plot of Table 1? One small problem that I found is when metatrain_iterations
= 0, because of line 239-241 in main.py, inputb
becomes an empty tensor (line 269 in main.py). However, inputb
is used in the function task_metalearn
in maml.py on line 95.
Hi, thanks for sharing the project. @cbfinn
After training the model with script python main.py --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=4 --update_batch_size=1 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=logs/miniimagenet1shot/ --num_filters=32 --max_pool=True
, I use the script python main.py --datasource=miniimagenet --train=False --test_set=True --logdir=logs/miniimagenet1shot --meta_batch_size=1 --num_classes=5 --num_filters=32 --max_pool=True --update_batch_size=1
to test the model. But the terminal shows like the following picture. The terminal always displays this interface and no results are returned. Could you tell me what's wrong with my script? Thanks very much!
Got the error below. Current work around is to comment out the special_grads.py
(tf2) ~/3rdparty/maml$ pip3 list | grep tensorflow
tensorflow (1.2.1)
(tf2) ~/3rdparty/maml$ python3 main.py --train=False --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=4 --update_batch_size=5 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=logs/miniimagenet5shot/ --num_filters=32 --max_pool=True --stop_grad=True
Traceback (most recent call last):
File "main.py", line 33, in <module>
from maml import MAML
File "/Users/winstonq/3rdparty/maml/maml.py", line 3, in <module>
import special_grads
File "/Users/winstonq/3rdparty/maml/special_grads.py", line 6, in <module>
@ops.RegisterGradient("MaxPoolGrad")
File "/Users/winstonq/.tox/tf2/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1750, in __call__
_gradient_registry.register(f, self._op_type)
File "/Users/winstonq/.tox/tf2/lib/python3.5/site-packages/tensorflow/python/framework/registry.py", line 62, in register
(self._name, name, function_name, filename, line_number))
KeyError: "Registering two gradient with name 'MaxPoolGrad' !(Previous registration was in _find_and_load_unlocked <frozen importlib._bootstrap>:958)"
Hi,
Each time I run the main.py, it takes a long time to generate filename. Could you please fix it?
Hi, I was thinking about using model that was trained on ImageNet as feature extractor, than randomly initialize and re-train the last few layers using the method you proposed. Will this idea make any sense to you ?
where download miniImagenet dataset?
hello,may i ask you a question? when i read your code,"main.py",i was confused about the meaning of the parameter test_num_updates ? can you explain it to me? it seems a little stupid about this question. i am looking forward to receiving your reply
Dear,
In my project for predicting the stock price, I only have input data(n features) and label(price),
what will I do for the amp and phase?
I just want to use my data for training.
Thanks.
hello ,big gun, I'm new in this area, after reading your paper MAML,I am confused why you update theta in the inner loop to get theta' , and then in outer loop, you use theta' to calculate final loss.
In my opinion, if i consider all the tasks as a batch , and use original theta to calculate each task 's loss,and then sum up to do gradient descent, in this way it's also reasonable to find a proper theta that suits all the tasks,because it makse the sum of all the tasks loss minimal,so i dont know the reason you do a update in inner loop,
I want to know the insight your algorithm do and the advantages of your algorithm compared to my opinoin.
Thanks a lot , maybe problem seems stupid ,looking forward to your reply, hahaha!
Hi Chelsea,
I am new to the meta learning problem and after reading your paper, I have some doubts. For the pseudo code of Algorithm 1, you said that "while not done do". Dose it mean that there are many iterations and in each iteration, you sampled K samples from each task, updated the parameters theta' corresponding to each task, sampled samples using new theta', and finally, updated the network parameter theta based all tasks. Whether the stop criterion is the iteration number you set beforehand? By the way, the results in your work seems wonderful but it's rather hard for me to figure out the insights of MAML. Why using such update strategy can achieve such well performance? Is there any work that can be suggestive? Anyway, thanks for your idea and have a nice day!
Hi,
Would it be possible for you to share the meta-learners trained weights for 5 way imagenet?
Also, do you have the download link for the mini-imagenet files?
Thanks
When I run 20-way 1-shot like so, everything works fine:
sh-log FOML-O520 python main.py --datasource=omniglot --metatrain_iterations=40000 --meta_batch_size=16 --update_batch_size=1 --num_classes=20 --update_lr=0.1 --num_updates=5 --logdir=logs/omniglot20way/ --stop_grad=True
However, when I change --update_batch_size
to 5, I see this error:
Generating filenames
Traceback (most recent call last):
File "main.py", line 346, in <module>
main()
File "main.py", line 267, in main
image_tensor, label_tensor = data_generator.make_data_tensor()
File "/root/code/maml/data_generator.py", line 104, in make_data_tensor
filename_queue = tf.train.string_input_producer(tf.convert_to_tensor(all_filenames), shuffle=False)
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 932, in convert_to_tensor
as_ref=False)
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1022, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 233, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 218, in constant
name=name).outputs[0]
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1569, in __init__
"Cannot create a tensor proto whose content is larger than 2GB.")
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
This is with tensorflow-gpu
version 1.5.0. Any pointers would be great!
Hi Chelsea,
In Figure 3, when compared with pretrained models, MAML demonstrates superior effectiveness on fast learning with a few gradient steps. I am wondering, it is also worthwhile to compare with the pretrained model when it is more thoroughly fine-tuned, say after 100 fine-tuning steps during meta-testing?
Thank you!
Hi, according to batch_norm
tensorflow api doc, it says:
Note: when training, the moving_mean and moving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op. For example:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
In your MAML implementation:
tf_layers.batch_norm(inp, activation_fn=activation, reuse=reuse, scope=scope)
and you seemingly did NOT train update_ops.
Could you help me ?
I noticed the new version of MAML paper said second derivatives has little influence on performance.
So I run the code and using --stop_grad=True and all other setting are as default.
The results seems 2% lower than --stop_grad=False.
My training cmd is as follows:
python main.py --datasource=miniimagenet --metatrain_iterations=60000 --meta_batch_size=4 --update_batch_size=1 --update_lr=0.01 --num_updates=5 --num_classes=5 --logdir=logs_scope/miniimagenet1shot/ --num_filters=32 --max_pool=True --stop_grad=True
Do I miss anything to stop second derivatives?
First of all, congratz for the paper and code. The idea behind MAML is elegantly simple yet powerful. In fact the paper is very easy to read, except for (and this is my question about) the part where you guys talk about its concrete application in Classification (section 5.2). Specifically, I don't understand how tasks are formally defined in this particular use case. In fact, since MAML trains an optimal parameter initialization theta, this theta should be the starting point for the parameters of all tasks' models. This entails that the number of parameters of theta should be the same number of parameters used for all tasks' models. However, in the context of classification, the usual approach is to use a softmax layer with as many output units as classes, and it might happen that different classification tasks involve 1) different classes and/or 2) different NUMBER of classes. If two tasks have a different number of classes, the softmax layer in each network would be different and the number of parameters would be different too. Therefore, there CANNOT BE a general theta that fits the number of parameters of all possible tasks (unless you enforce that each task should have a constant number of classes defined beforehand, which would become a limitation). How do you guys deal with this? I hope my question was clear enough.
I am learning the codes. I found the parameter 'FLAGS.metatrain_iterations', which is set 70000 for sinusoid demonstration. In each of iteration of 70000 iterations, new training data of total 25 tasks is produced, which amounts to a large number of examples for learning.
How to understand the parameter correctly? Does it mean meta-learning still need a large number of examples for training? Many thanks.
Guys, If you have any questions about the paper, you can just email to the corresponding author. GitHub Issues is meant to report bugs of the project. Please read this first before open a new issues. The authors are very kind to share their work with us, don't make them annoying to stop doing so. A benign open source environment needs to be protected by all of us.
May I get the data and the csv file ? Thank you very much!
why instead of
unused = task_metalearn((self.inputa[0], self.inputb[0], self.labela[0], self.labelb[0]), False)
you haven't used the following:
unused = self.forward(self.inputa[0], weights, reuse=reuse)
?
Hello, Chelsea.
The batch normalization documentation says that this ops is not attached to the TensorFlow graph by default. So, there're two ways to force the updates during training:
tf.GraphKeys.UPDATE_OPS
updates_collections
parameter of batch_norm to None
.I don't see neither of those in the code. Maybe I'm missing something.
I haven't been able to make the first way work due to while cycle in map_fn
function. But the second modification is easy and seems to work. Although, I'm not sure I see any difference in performance.
in my opinion, the part a is used for update initial weight and the part b is used for inner learning.
So in eval, part a can be used to evaluation because the meta-learning is not trainning. But part b is used for the model training, so the accuracy should not include part b accuracy.
Hi, Chelsea:
I re-train your source code without any modification and get 46.40% accuracy. Although it's very close to your 48.7% in paper, I would be glad to hear some advice to reach 48.7%. Since I'm doing research on meta-learning, the trivial difference would be important for later algorithm comparison.
Here is my final re-train performance.
Mean validation accuracy/loss, stddev, and confidence intervals
(array([0.2019989 , 0.36166546, 0.40066576, 0.44499955, 0.45466623,
0.4573329 , 0.45799953, 0.46166632, 0.46266633, 0.4639997 ,
0.46366638], dtype=float32), array([0.03259854, 0.16752282, 0.1966368 , 0.21332692, 0.22017805,
0.2248985 , 0.22855242, 0.22861326, 0.22760995, 0.22782418,
0.22864777], dtype=float32), array([0.00260843, 0.01340462, 0.01573422, 0.01706971, 0.01761791,
0.01799563, 0.018288 , 0.01829287, 0.01821259, 0.01822973,
0.01829563], dtype=float32))
Hello,
I read the paper "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks" and not clear about how the comparison was done between MAML and a pretrained network. Could you clarify my doubts?
It says "P(T) is continuous, where the amplitude varies within [0.1, 5.0]", does it mean amplitude follows a uniform distribution? And I am also not clear about how MAML model was applied on test after training. According to the described algorithm, MAML need a batch of tasks to adjust its parameters, when do the comparison, there is only one sinusoid curve.
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.