Giter VIP home page Giter VIP logo

Comments (8)

kyptin avatar kyptin commented on May 21, 2024

I'm not sure I understand why, but I did have success with the following approach, which uses code from pspy in my function:

(defn process
  [items]
  (let [id "process"
        t0 (System/nanoTime)]
    (try
      (reduce (fn [count item] (inc count)) 0 events)
      (finally
        (let [t-elapsed (- (System/nanoTime) t0)]
          (swap! p/*pdata* #(assoc % id (conj (% id []) t-elapsed))))))))

The main difference, I think, is that there is no inner anonymous function with this approach.

I hope that this approach could serve as the basis for a defnp macro that doesn't require a totally different mechanism for storing timing information.

from timbre.

ptaoussanis avatar ptaoussanis commented on May 21, 2024

Hi Jeff,

p/defnp shouldn't themselves interfere with any lazy sequences. Best bet would be if you could get me some kind of minimal example that reproduces the behaviour so I can take a look?

What makes you suspect that something's retaining a head? Just the OOM error?

Depending on what you've got wrapped for profiling, keep in mind that profiling itself involves quite a bit of data collection and could contribute to (or by itself cause) an OOM if you're already operating near your memory limits and/or if the way things have been wrapped means that the data collection doesn't get a chance to terminate regularly.

In particular, note that anything wrapped in a p (or variant) will get times conj'ed into a vector until the calling profile scope ends. This was a performance tradeoff so that data collection is cheaper (aggregate stats are only calculated once on scope termination) - but it does mean that there's an increasing memory overhead for every wrapped p call in a loop, etc.

An alternative would be to gather cumulative stats on each wrapped p call, which'd limit memory use at the cost of increased times.

Or (this occurs to me now), we could strike a balance between the two and gather the cumulative stats every 10k calls, say. I've never seen a case of this being an issue though, so we should confirm first: 1) That this is actually the cause of the OOM, and 2) That your p wrappings are actually tracking what you're expecting them to be tracking (i.e. let's rule out unintentional recursion, etc.).

So, yeah - try get me a reproducible case if you can and let's go from there. Cheers! :-)

from timbre.

kyptin avatar kyptin commented on May 21, 2024

OK, here's a REPL example that demonstrates the problem:

user=> (require '[taoensso.timbre.profiling :refer [profile defnp]])
nil
user=> (defn process [items] (reduce (fn [count item] (inc count)) 0 items))
#'user/process
user=> (defnp process2 [items] (reduce (fn [count item] (inc count)) 0 items))
#'user/process2
user=> (process (range 1e8))
100000000
user=> (profile :info :_ (process2 (range 1e8)))

OutOfMemoryError Java heap space  clojure.core/range (core.clj:2653)
user=> 

The only difference between the two examples the profiling, so I think profiling is causing the issue.

I thought about the vector of times. That could also cause problems, but I think it doesn't in my case. (I have about 5 functions that are each being called about 3 million times. That's a lot, but probably not too much for 3G of heap space.) I had thought about a streaming approach to computing the statistics, but I like your "chunked" approach better. I'd love to see this change, too, although it isn't as pressing for me at the moment.

Thanks for your help, and for timbre in general.

from timbre.

ptaoussanis avatar ptaoussanis commented on May 21, 2024

This is quite interesting! Have reduced it further:

((fn [coll]         (reduce (fn [n x] (inc n)) 0 coll))   (range 1e8)) ; fine (~6s)
((fn [coll] ((fn [] (reduce (fn [n x] (inc n)) 0 coll)))) (range 1e8)) ; OOMs

This is reproducible with (at least) Clojure 1.6.0-RC1 and 1.4.1.

So it's indeed the additional fn wrapping that's causing the problem, though it's not obvious to me why that would be the case.

UPDATE: In hindsight it's actually quite simple - the inner fn is realising the full range but the calling fn is still retaining a reference which the JVM can't release. Will push a fix shortly.

Thanks for your help, and for timbre in general.

You're very welcome, thank you for saying so! Appreciate the bug reports :-)

Cheers!

from timbre.

ptaoussanis avatar ptaoussanis commented on May 21, 2024

Okay, just pushed v3.1.6 with the fix.

from timbre.

kyptin avatar kyptin commented on May 21, 2024

Thanks very much—for the fix and the timeliness! This is a big help for me.

I'll file an issue for the periodic stats calculations, to keep a bound on the size of the times arrays in *pdata*, as we discussed above. As I said, I'm not currently affected by this, so as far as I'm concerned it's not urgent. But I think it is an important step in the evolution of the profiling code, in order to avoid introducing OOMs.

from timbre.

ptaoussanis avatar ptaoussanis commented on May 21, 2024

No problem, am happy you caught this.

I'll file an issue for the periodic stats calculations, to keep a bound on the size of the times arrays in pdata, as we discussed above.

Hmm - I'm on the fence about that one. Brought it up since I suspected it may have been the cause of the error you were seeing. After ruling it out, I'm not convinced it'd be a beneficial change in practice.

The memory overhead we're looking at is ~8 bytes per unterminated p call. In my experience most profiling jobs are measuring on the order of thousands to millions of calls. One normally profiles expensive code, so more than millions of calls for one job is unlikely. Anyway let's say 10m on the upper end - so we're talking ~80MB of memory that'll get GC'd on termination.

If the chunked stats change were completely free I'd say why not - but here it'd involve a reasonable amount of code modding and extra complication. Not sure if that tradeoff makes sense.

You're welcome to open an issue (and offer a counter argument if you have one) - it's certainly worth thinking about! What are your thoughts?

from timbre.

kyptin avatar kyptin commented on May 21, 2024

Sorry, I already filed the issue as #57. I'll reply to your comments there.

from timbre.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.