This is a great library first of all, so kudos to the developers ! My questions is

Sorry for the delay! [11] referred to <a href="

Question: Connection MLE "parametrized" GP in infinite Width Limit vs minimizing MSE "parametrized" Kernel in infinite Width about neural-tangents HOT 4 CLOSED

yCobanoglu commented on August 17, 2024

Question: Connection MLE "parametrized" GP in infinite Width Limit vs minimizing MSE "parametrized" Kernel in infinite Width

from neural-tangents.

Comments (4)

romanngg commented on August 17, 2024

Sorry for the delay!

[11] referred to https://arxiv.org/abs/1902.06720, sorry for the confusion - at some point we replaced references with inline links but missed this one, will fix.
Yes, absolutely, you can parameterize your kernels and backprop through nt.predict functions, and, in general, most other nt functions.
Same for NTK, with one caveat that interpreting NTK inference as a GP is a bit nuanced. Running gradient descent on an infinitely wide neural network yields a multivariate normal distribution as eq 16 in https://arxiv.org/pdf/1902.06720.pdf (the one you get by passing get="ntk" to

neural-tangents/neural_tangents/_src/predict.py

Line 635 in 928b0bc

def predict_fn(get: Optional[Get] = None,

), which is not the same as a GP posterior using an NTK kernel (the one you get by passing get="ntkgp"). I recommend @bobby-he's paper https://proceedings.neurips.cc/paper/2020/file/0b1ec366924b26fc98fa7b71a9c249cf-Paper.pdf for details on this.

Re MLE for NTK:

If you want to optimize the marginal likelihood of your training set, then it is the same for both NNGP and NTK - NTK at initialization, before training, is exactly the NNGP.
If you want to optimize the validation loss or the predictive likelihood on your validation data given training data, then NNGP and NTK will be different - see equations (13 - nngp) and (16 - ntk) in https://arxiv.org/abs/1902.06720. But both are Gaussians (with different means and covariances that you can get through https://neural-tangents.readthedocs.io/en/latest/_autosummary/neural_tangents.predict.gradient_descent_mse_ensemble.html#neural_tangents.predict.gradient_descent_mse_ensemble) and you can backprop through their validation loss / likelihood.

Btw https://arxiv.org/abs/2012.09943 seems to try something related - tune neural network parameters by optimizing the marginal likelihood of the respective NNGP.

Lmk if this helps, I'm not sure if I answered / understood everything correctly!

from neural-tangents.

yCobanoglu commented on August 17, 2024

Hi thanks for detailed answer i will look into the material. And a quick follow up so why don't we optimize $\sigma_{w}^{2}$ and $\sigma_{b}^{2}$. These are the variances of the weights and bias at initialization (which are set in advance, for example referred to as Standart NTK Parametrizatiion in the Code) ?

from neural-tangents.

romanngg commented on August 17, 2024

I think https://arxiv.org/abs/2012.09943 looks into those. You're right it's possible to optimize these parameters and sometime ago I experimented with it a bit, but just didn't see much generalization improvement. I think the reason is that these are only 2 scalar parameters per layer, and good defaults for them already been been studied quite a lot (e.g. https://arxiv.org/pdf/1611.01232.pdf style papers), so I just don't think they provide enough flexibility to get a noticeable improvement. But if you parameterize your kernel with lots of parameters, I think it's much more promising. I believe this paper https://arxiv.org/abs/2102.03909 got improvement in meta-learning settings from optimizing the initialization parameters of a neural network by tuning them via the empirical (https://neural-tangents.readthedocs.io/en/latest/empirical.html) NTK, so in this setting parameters of the kernel = parameters of a finite width neural network.

from neural-tangents.

yCobanoglu commented on August 17, 2024

Thanks alot your answer was very helpful !

from neural-tangents.

Question: Connection MLE "parametrized" GP in infinite Width Limit vs minimizing MSE "parametrized" Kernel in infinite Width about neural-tangents HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent