Giter VIP home page Giter VIP logo

Comments (13)

grasph avatar grasph commented on June 8, 2024 1

Hello Jerry,

I missed to refer to #269 in the post. I've opened a different issue to address improving performance beyond the pre-v0.10.16 one.

The proposal to default to DataFrame sounds good to me. I like the CSV.read interface approach to specify the sink.

Philippe.

from unroot.jl.

oschulz avatar oschulz commented on June 8, 2024 1

DataFrames is a heavy dependency though ... but if the user requests a DataFrame explicitly, then we could do it lightweight via a Pkg extension (with Requires fallback for Julia <v1.9).

from unroot.jl.

grasph avatar grasph commented on June 8, 2024 1

Dependency of the package on DataFrames package removed, while user can still specify DataFrames.DataFrame as sink. copycols option is used if the sink supports it as a named parameter.

from unroot.jl.

Moelf avatar Moelf commented on June 8, 2024

yep, I have know about this for a while now...

sorry, I'm pretty sure it was introduced in:

because we lost cache :( will try to fix this soon

from unroot.jl.

Moelf avatar Moelf commented on June 8, 2024

on a separate note:

The Tree contains many branches (1689). The setup time increases non linearly
with the number of branches: it takes 1.9s if only 706 branches are included,
2.8s if only the other 983 branches are included,

I think this is not surprising because we "abuse" the type system a bit, maybe we should default to DataFrames.jl behavior, and only when user needs for evt in tree performance, we let them opt-in to this penalty?

from unroot.jl.

grasph avatar grasph commented on June 8, 2024

Note that converting the LazyTree to a TypedTables Table takes less than 100ms. So we should be able to have typed column without the large time penalty?

from unroot.jl.

Moelf avatar Moelf commented on June 8, 2024

Note that converting the LazyTree to a TypedTables Table takes less than 100ms. So we should be able to have typed column without the large time penalty?

right now 100ms sounds reasonable to me. But try to see the latency of converting a 1600-column wide DataFrame to TypedTables for the first time, I suspect it would be kinda slow too.


We probably want to change the default behavior at 1.0, because this would completely break

tree = LazyTree(...)

for evt in tree
...
end

in terms of performance

from unroot.jl.

tamasgal avatar tamasgal commented on June 8, 2024

We will do a hackathon at JuliaHEP ;) We have a huge todo list...

from unroot.jl.

grasph avatar grasph commented on June 8, 2024

I expect for 0.x.y, an option to choose DataFrame and a notice in the documentation is sufficient.

Here is an attempt that adds a sink option to LazyTree: master...grasph:UnROOT.jl:sink-option

TBC that it does not break anything and keep row loop performance.

With DataFrame it takes 6.5s instead of the 21s. Not as fast as wished, but it's already an improvement.

Philippe.

from unroot.jl.

grasph avatar grasph commented on June 8, 2024

2.5 s after adding copycols=true

from unroot.jl.

Moelf avatar Moelf commented on June 8, 2024

we don't need to depend on DataFrames, basically it comes down to return NamedTupled-like or Dict-like.

DataFrames.jl can easily ingest whatever we give back.

from unroot.jl.

Moelf avatar Moelf commented on June 8, 2024

using ./test/samples/NanoAODv5_sample.root we have in the repo, I see

v0.10.15

julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
  1.158496 seconds (7.63 M allocations: 1.040 GiB, 12.09% gc time, 49.04% compilation time)

master

julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
 67.227559 seconds (466.77 M allocations: 34.525 GiB, 13.13% gc time, 0.75% compilation time)

from unroot.jl.

Moelf avatar Moelf commented on June 8, 2024

oh, but on v0.10.15

julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
  1.444993 seconds (8.10 M allocations: 1.072 GiB, 7.23% gc time, 69.73% compilation time)

julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
ERROR: KeyError: key ((ROOTFile with 1 entry and 18 streamers.
/home/akako/Documents/github/dotFiles/homedir/.julia/dev/UnROOT/test/samples/tree_with_jagged_array.root
, "t1"), ()) not found
Stacktrace:
...
    @ LRUCache ~/.julia/packages/LRUCache/NCFtW/src/LRUCache.jl:124
  [5] _get!
    @ Memoization ~/.julia/packages/Memoization/7WxyR/src/Memoization.jl:214 [inlined]
  [6] _getindex
    @ UnROOT ~/.julia/packages/Memoization/7WxyR/src/Memoization.jl:209 [inlined]
  [7] getindex
    @ UnROOT ~/Documents/github/dotFiles/homedir/.julia/dev/UnROOT/src/root.jl:162 [inlined]
...

julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
  0.529355 seconds (6.85 M allocations: 1010.557 MiB, 23.07% gc time)

:(

from unroot.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.