Giter VIP home page Giter VIP logo

optim's People

Contributors

clementfarabet avatar jucor avatar koraykv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

optim's Issues

luarocks install fail

trying to install using "luarocks --local install" fails.

It seems to be a missing file as when I create a file "dokmedia/optim/optim" and run "luarocks --local make optim-1.0.1-0.rockspec" it will install.

ASGD has weight decay built in?

The averaged SGD function implements
x := (1 - lambda eta_t) x - eta_t df/dx(z,x)
which includes L2 weight decay with decay constant lambda. The weight decay constant lambda also appears in the learning rate decay function
eta_t = eta0 / (1 + lambda eta0 t) ^ 0.75

The ASGD papers I've read don't seem to require the use of L2 weight decay. Moreover, Xu (2010) - Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent - seems to imply that the lambda term in the learning rate decay function should be a multiple of the smallest eigenvalue of the Hessian. Looking at Bottou's SGD code, weight decay doesn't appear to be present in the CRF example, although it is in the readme file, from which the torch implementation seems to be derived.

Is the L2 weight decay an essential part of the ASGD implementation? Why is the weight decay constant tied to the learning rate decay function?

Thanks,
Jason

fista.lua: line search condition: possible error?

Hi,

I am currently working my way through the FISTA paper and your implementation and noticed a difference in the condition for the line search.
By FISTA paper, I mean the one you cite in your implementation: http://goo.gl/bSuKQ

On page 12 (page number 194) in the box describing the FISTA algorithm the condition is stated as:

F(p_L(y)) <= Q_L(p_L(y),y)

Note the upper case F which is defined on page 6 as F(x) = f(x) + g(x).

If I am not mistaken, you only use the lower case f(x) in line 109 of fista.lua.

if fply <= Q then

There is a comment on that line which I don't quite get. Maybe it explains why you omit the g(x).
Is this an error or is there a reason for omitting the g(x)?

Thanks in advance!

Best,

Hubert

gradParameters *must* be shared when using optim, but must *not* be shared when using updateParameters()

The methodology required to share parameters between modules is not the same when using the standard Module:updateParameters() versus the updates of the optim package.

Without optim, only the parameters should be shared; the gradients should be independent.
If you share the gradients, the shared gradients for each copy of the parameters contain the sum of the gradients over all of the copies. If you then use the standard updateParameters(), the update of each copy of the shared parameters then performs the full update for the shared parameters. Since every copy of the shared parameters is updated separately, the update of the shared parameters is effectively scaled by the number of copies. You need to accumulate the gradients across the shared parameters exactly once, in the parameters themselves, rather than in both the gradients and then again in the parameters.

If you use the optim package, the parameters and gradParameters are first flattened, in the process of which shared parameters are coalesced into a single copy. Gradients must be shared both so the sizes of the parameters and gradParameters match, and because each set of shared, coalesced parameters is only updated once, so the accumulation across shared parameters must be done in the gradients, rather than in the parameters.

This difference is not obvious, and does not appear during Jacobian unit-testing, which does not use the optim framework.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.