Comments (13)
Hello Jerry,
I missed to refer to #269 in the post. I've opened a different issue to address improving performance beyond the pre-v0.10.16 one.
The proposal to default to DataFrame sounds good to me. I like the CSV.read interface approach to specify the sink.
Philippe.
from unroot.jl.
DataFrames is a heavy dependency though ... but if the user requests a DataFrame
explicitly, then we could do it lightweight via a Pkg extension (with Requires fallback for Julia <v1.9).
from unroot.jl.
Dependency of the package on DataFrames package removed, while user can still specify DataFrames.DataFrame as sink. copycols option is used if the sink supports it as a named parameter.
from unroot.jl.
yep, I have know about this for a while now...
sorry, I'm pretty sure it was introduced in:
because we lost cache :( will try to fix this soon
from unroot.jl.
on a separate note:
The Tree contains many branches (1689). The setup time increases non linearly
with the number of branches: it takes 1.9s if only 706 branches are included,
2.8s if only the other 983 branches are included,
I think this is not surprising because we "abuse" the type system a bit, maybe we should default to DataFrames.jl behavior, and only when user needs for evt in tree
performance, we let them opt-in to this penalty?
from unroot.jl.
Note that converting the LazyTree to a TypedTables Table takes less than 100ms. So we should be able to have typed column without the large time penalty?
from unroot.jl.
Note that converting the LazyTree to a TypedTables Table takes less than 100ms. So we should be able to have typed column without the large time penalty?
right now 100ms sounds reasonable to me. But try to see the latency of converting a 1600-column wide DataFrame
to TypedTables
for the first time, I suspect it would be kinda slow too.
We probably want to change the default behavior at 1.0, because this would completely break
tree = LazyTree(...)
for evt in tree
...
end
in terms of performance
from unroot.jl.
We will do a hackathon at JuliaHEP ;) We have a huge todo list...
from unroot.jl.
I expect for 0.x.y, an option to choose DataFrame and a notice in the documentation is sufficient.
Here is an attempt that adds a sink option to LazyTree: master...grasph:UnROOT.jl:sink-option
TBC that it does not break anything and keep row loop performance.
With DataFrame it takes 6.5s instead of the 21s. Not as fast as wished, but it's already an improvement.
Philippe.
from unroot.jl.
2.5 s after adding copycols=true
from unroot.jl.
we don't need to depend on DataFrames, basically it comes down to return NamedTupled-like
or Dict-like
.
DataFrames.jl can easily ingest whatever we give back.
from unroot.jl.
using ./test/samples/NanoAODv5_sample.root
we have in the repo, I see
v0.10.15
julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
1.158496 seconds (7.63 M allocations: 1.040 GiB, 12.09% gc time, 49.04% compilation time)
master
julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
67.227559 seconds (466.77 M allocations: 34.525 GiB, 13.13% gc time, 0.75% compilation time)
from unroot.jl.
oh, but on v0.10.15
julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
1.444993 seconds (8.10 M allocations: 1.072 GiB, 7.23% gc time, 69.73% compilation time)
julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
ERROR: KeyError: key ((ROOTFile with 1 entry and 18 streamers.
/home/akako/Documents/github/dotFiles/homedir/.julia/dev/UnROOT/test/samples/tree_with_jagged_array.root
, "t1"), ()) not found
Stacktrace:
...
@ LRUCache ~/.julia/packages/LRUCache/NCFtW/src/LRUCache.jl:124
[5] _get!
@ Memoization ~/.julia/packages/Memoization/7WxyR/src/Memoization.jl:214 [inlined]
[6] _getindex
@ UnROOT ~/.julia/packages/Memoization/7WxyR/src/Memoization.jl:209 [inlined]
[7] getindex
@ UnROOT ~/Documents/github/dotFiles/homedir/.julia/dev/UnROOT/src/root.jl:162 [inlined]
...
julia> @time LazyTree("./test/samples/NanoAODv5_sample.root", "Events");
0.529355 seconds (6.85 M allocations: 1010.557 MiB, 23.07% gc time)
:(
from unroot.jl.
Related Issues (20)
- How to test concurrency bug? HOT 2
- LRU cache preventing unmapping of ROOT files HOT 7
- [RNTuple] Missing zigzag encoding support
- Dangling TBasket not handled properly (offset index type wrong) HOT 2
- Opening `km3net_online.root` causes huge memory usage spike HOT 2
- `LazyTree()` hang regression in 0.10.16
- Pre-compilation failure after upgrading to v1.9.3 HOT 6
- Fix Documentation due to their 1.0 release
- `RNTuple` reading extremely slow
- `nanoAOD_ttbar` latency HOT 26
- CI broken on nighly due to MD5.jl using SHA.jl internals
- RNTuple RC2 compatibility
- Do not manage to read a TTree with a structure of arrays of basic types HOT 17
- Cannot read empty collections from a RNTuple file HOT 1
- ConcurrencyViolationError when reading with XRootD HOT 2
- [RNTuple] Wrong offset `Index32/Index64` array when read from multiple pages HOT 7
- [RNTuple] accessing nested structs is not lazy enough HOT 1
- [WIP] 0.11.0 breaking changes items
- Re-write resources with `Base.Lockable`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unroot.jl.