For those interested, these are the current timings for the likelihoods I've rewritten

Some timings using cython code generation using jinja2 (<a href="http://jinja.pocoo.or

For reference, <a href="http://code.google.com/p/pymc/issues/detail?id=328" rel="nofol

likelihood function refactoring (numexpr_dist) about pymc HOT 16 CLOSED

pymc-devs commented on August 26, 2024

likelihood function refactoring (numexpr_dist)

from pymc.

Comments (16)

fonnesbeck commented on August 26, 2024

Interesting. I would imagine that cython would improve on the performance for smaller sizes. I'm assuming you started with numexpr because of ease of implementation?

from pymc.

jsalvatier commented on August 26, 2024

Yeah, I would expect that too. I started with numexpr because it's the idea that first seemed good. One good thing about numexpr is that it automatically multithreads the computation.

from pymc.

jsalvatier commented on August 26, 2024

Some timings using cython code generation using jinja2 (http://jinja.pocoo.org/) templates.

sizes [1, 10, 100, 1000, 10000, 100000]
bernoulli_like(x, p) [ 2.874 3.133 2.321 0.888 0.274 0.137]
beta_like(x, alpha, beta) [ 4.253 4.071 3.54 3.625 3.856 3.97 ]
betabin_like(xd, alpha, beta, n) [ 4.561 2.589 0.639 0.141 0.096 0.09 ]
cauchy_like(x, alpha, beta) [ 4.468 4.034 3.185 0.978 0.267 0.172]
gamma_like(x, alpha, beta) [ 4.258 3.636 1.524 0.246 0.031 0.008]
normal_like(x, mu, tau) [ 4.326 4.251 3.199 1.468 0.257 0.145]

The gist is that for small sizes, it's about 4x slower, but for sizes > 1000 there are ~8x improvements. I'm not sure what's going on with beta_like, it's not much different than the other functions.

To duplicate the comparisons run speed_test2.py in the tests folder. You will need to rebuild and you will need the code generation package I wrote ufunc_gen (https://github.com/jsalvatier/ufunc_gen).

Reasons why I like the code generation approach:

fast
supports multidimensional arrays correctly
it is now easy to do efficiency improvements on many likelihoods simultaneously, or even switch languages; it wouldn't be very tedious to generate and compile c-extensions instead of cython extensions.
easy to open up code generation to the user as well for custom likelihoods.
opens up the possibility of automatically (but symbolically) differentiated gradients and jacobians by using SymPy. This would eliminate even more code, work especially well for custom likelihoods and eliminate one source of errors.
eliminates a lot of manual fortran code

Drawbacks:

multivariate distributions still need to be custom written.

I may look into Theano (http://deeplearning.net/software/theano/), which, it strikes me now, is doing more or less exactly this kind of code generation from text kernel.

I am somewhat curious about eliminating the large quantity of metaprogramming we currently have in distributions.py and using templating to do code generation instead. I suspect this would make the code easier to understand and maintain. Code generation might also be good for auto writing tests. The reason I think code generation is suited to PyMC is that when we build distributions we essentially want to create many classes with a standard format.

from pymc.

jsalvatier commented on August 26, 2024

It comes to mind that Theano would also facilitate GPU usage. Theano also, optimizes the whole graph at once, which I think would be very advantageous to likelihood calculations. It also does symbolic differentiation.

Once you start thinking code generation is a good idea, Theano starts to look really good.

from pymc.

apatil commented on August 26, 2024

Also, once you start thinking about code generation, Fortran looks a lot better. :) I agree with you that, if we use any code generation approach, it should be easy to convert to different 'backends'.

Theano would really shine if it had access to whole submodels, not just one variable at a time, right?

from pymc.

apatil commented on August 26, 2024

The timings you're showing are relative to the Fortran timings, right? How does Cython manage to beat Fortran by such a wide margin for some of them?

from pymc.

jsalvatier commented on August 26, 2024

Yes, I think more more you include in your graph, the better.

Yes, relative to Fortran. I am not sure how Cython is beating Fortran so much. I'll make sure I haven't made a mistake. Could it be compile options on Cython vs F2Py?

I asked about Theano on the Theano list, and the responses were very encouraging, it avoided all the potential issues I was worred about except for linear algebra support, which is currently primitive (dot, outer and tensordot), though under development, and masked arrays (though it should be possible to DIY). I am intrigued by Theano enough that I am going to try building a prototype PyMC like package for its underlying DAG, to see how the design turns out. Then I will see what can inspire PyMC.

from pymc.

apatil commented on August 26, 2024

Cool, sounds extremely interesting. Please keep us posted.

from pymc.

fonnesbeck commented on August 26, 2024

An interesting thread on the numpy list on the relative virtues of C/C++/FORTRAN. This is the start of the relevant bit:

http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055392.html

from pymc.

fonnesbeck commented on August 26, 2024

For reference, here is the discussion started on the Google Code tracker.

from pymc.

fonnesbeck commented on August 26, 2024

Once the 2.2 release is out the door, this is one of the first things that I want to improve for 2.3. I think the first step is to figure out what the optimal implementation would be given current performance and the outlook for the future. Numexpr might be the easiest to implement, but the capabilities of Cython are broader. Are we willing to take a hit on smaller models for the sake of speed in larger models?

from pymc.

twiecki commented on August 26, 2024

My .02$: I think Cython is a great backend for this. But I wouldn't rule out Theano, especially given that there has been some foreway been made by James Bergstra (although the project seems inactive) and that GPU would be for free:

https://github.com/jaberg/MonteTheano

from pymc.

fonnesbeck commented on August 26, 2024

I suppose another advantage of Theano is that we would also get gradients for free, which would be handy for Hamiltonian MC. I will do some exploring.

from pymc.

fonnesbeck commented on August 26, 2024

On the other hand, you don't really ever get GPU for "free" -- you need to install CUDA, Boost, PyCUDA. I would like to keep the dependencies list as short as possible going forward, as it represents an adoption hurdle for new users.

from pymc.

twiecki commented on August 26, 2024

I don't think Theano depends on PyCUDA or Boost. And the CUDA
dependency is optional:

http://deeplearning.net/software/theano/install.html

On Thu, Apr 5, 2012 at 10:48 PM, Chris Fonnesbeck
[email protected]
wrote:

On the other hand, you don't really ever get GPU for "free" -- you need to install CUDA, Boost, PyCUDA. I would like to keep the dependencies list as short as possible going forward, as it represents an adoption hurdle for new users.

Reply to this email directly or view it on GitHub:
#18 (comment)

from pymc.

jsalvatier commented on August 26, 2024

I think we've abandoned this for now.

from pymc.

likelihood function refactoring (numexpr_dist) about pymc HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent