Giter VIP home page Giter VIP logo

Comments (16)

fonnesbeck avatar fonnesbeck commented on August 26, 2024

Interesting. I would imagine that cython would improve on the performance for smaller sizes. I'm assuming you started with numexpr because of ease of implementation?

from pymc.

jsalvatier avatar jsalvatier commented on August 26, 2024

Yeah, I would expect that too. I started with numexpr because it's the idea that first seemed good. One good thing about numexpr is that it automatically multithreads the computation.

from pymc.

jsalvatier avatar jsalvatier commented on August 26, 2024

Some timings using cython code generation using jinja2 (http://jinja.pocoo.org/) templates.

sizes [1, 10, 100, 1000, 10000, 100000]
bernoulli_like(x, p) [ 2.874 3.133 2.321 0.888 0.274 0.137]
beta_like(x, alpha, beta) [ 4.253 4.071 3.54 3.625 3.856 3.97 ]
betabin_like(xd, alpha, beta, n) [ 4.561 2.589 0.639 0.141 0.096 0.09 ]
cauchy_like(x, alpha, beta) [ 4.468 4.034 3.185 0.978 0.267 0.172]
gamma_like(x, alpha, beta) [ 4.258 3.636 1.524 0.246 0.031 0.008]
normal_like(x, mu, tau) [ 4.326 4.251 3.199 1.468 0.257 0.145]

The gist is that for small sizes, it's about 4x slower, but for sizes > 1000 there are ~8x improvements. I'm not sure what's going on with beta_like, it's not much different than the other functions.

To duplicate the comparisons run speed_test2.py in the tests folder. You will need to rebuild and you will need the code generation package I wrote ufunc_gen (https://github.com/jsalvatier/ufunc_gen).

Reasons why I like the code generation approach:

  • fast
  • supports multidimensional arrays correctly
  • it is now easy to do efficiency improvements on many likelihoods simultaneously, or even switch languages; it wouldn't be very tedious to generate and compile c-extensions instead of cython extensions.
  • easy to open up code generation to the user as well for custom likelihoods.
  • opens up the possibility of automatically (but symbolically) differentiated gradients and jacobians by using SymPy. This would eliminate even more code, work especially well for custom likelihoods and eliminate one source of errors.
  • eliminates a lot of manual fortran code

Drawbacks:

  • multivariate distributions still need to be custom written.

I may look into Theano (http://deeplearning.net/software/theano/), which, it strikes me now, is doing more or less exactly this kind of code generation from text kernel.

I am somewhat curious about eliminating the large quantity of metaprogramming we currently have in distributions.py and using templating to do code generation instead. I suspect this would make the code easier to understand and maintain. Code generation might also be good for auto writing tests. The reason I think code generation is suited to PyMC is that when we build distributions we essentially want to create many classes with a standard format.

from pymc.

jsalvatier avatar jsalvatier commented on August 26, 2024

It comes to mind that Theano would also facilitate GPU usage. Theano also, optimizes the whole graph at once, which I think would be very advantageous to likelihood calculations. It also does symbolic differentiation.

Once you start thinking code generation is a good idea, Theano starts to look really good.

from pymc.

apatil avatar apatil commented on August 26, 2024

Also, once you start thinking about code generation, Fortran looks a lot better. :) I agree with you that, if we use any code generation approach, it should be easy to convert to different 'backends'.

Theano would really shine if it had access to whole submodels, not just one variable at a time, right?

from pymc.

apatil avatar apatil commented on August 26, 2024

The timings you're showing are relative to the Fortran timings, right? How does Cython manage to beat Fortran by such a wide margin for some of them?

from pymc.

jsalvatier avatar jsalvatier commented on August 26, 2024

Yes, I think more more you include in your graph, the better.

Yes, relative to Fortran. I am not sure how Cython is beating Fortran so much. I'll make sure I haven't made a mistake. Could it be compile options on Cython vs F2Py?

I asked about Theano on the Theano list, and the responses were very encouraging, it avoided all the potential issues I was worred about except for linear algebra support, which is currently primitive (dot, outer and tensordot), though under development, and masked arrays (though it should be possible to DIY). I am intrigued by Theano enough that I am going to try building a prototype PyMC like package for its underlying DAG, to see how the design turns out. Then I will see what can inspire PyMC.

from pymc.

apatil avatar apatil commented on August 26, 2024

Cool, sounds extremely interesting. Please keep us posted.

from pymc.

fonnesbeck avatar fonnesbeck commented on August 26, 2024

An interesting thread on the numpy list on the relative virtues of C/C++/FORTRAN. This is the start of the relevant bit:

http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055392.html

from pymc.

fonnesbeck avatar fonnesbeck commented on August 26, 2024

For reference, here is the discussion started on the Google Code tracker.

from pymc.

fonnesbeck avatar fonnesbeck commented on August 26, 2024

Once the 2.2 release is out the door, this is one of the first things that I want to improve for 2.3. I think the first step is to figure out what the optimal implementation would be given current performance and the outlook for the future. Numexpr might be the easiest to implement, but the capabilities of Cython are broader. Are we willing to take a hit on smaller models for the sake of speed in larger models?

from pymc.

twiecki avatar twiecki commented on August 26, 2024

My .02$: I think Cython is a great backend for this. But I wouldn't rule out Theano, especially given that there has been some foreway been made by James Bergstra (although the project seems inactive) and that GPU would be for free:

https://github.com/jaberg/MonteTheano

from pymc.

fonnesbeck avatar fonnesbeck commented on August 26, 2024

I suppose another advantage of Theano is that we would also get gradients for free, which would be handy for Hamiltonian MC. I will do some exploring.

from pymc.

fonnesbeck avatar fonnesbeck commented on August 26, 2024

On the other hand, you don't really ever get GPU for "free" -- you need to install CUDA, Boost, PyCUDA. I would like to keep the dependencies list as short as possible going forward, as it represents an adoption hurdle for new users.

from pymc.

twiecki avatar twiecki commented on August 26, 2024

I don't think Theano depends on PyCUDA or Boost. And the CUDA
dependency is optional:

http://deeplearning.net/software/theano/install.html

On Thu, Apr 5, 2012 at 10:48 PM, Chris Fonnesbeck
[email protected]
wrote:

On the other hand, you don't really ever get GPU for "free" -- you need to install CUDA, Boost, PyCUDA. I would like to keep the dependencies list as short as possible going forward, as it represents an adoption hurdle for new users.


Reply to this email directly or view it on GitHub:
#18 (comment)

from pymc.

jsalvatier avatar jsalvatier commented on August 26, 2024

I think we've abandoned this for now.

from pymc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.