Giter VIP home page Giter VIP logo

Comments (7)

Moelf avatar Moelf commented on June 2, 2024

@jpivarski this is a head-scratcher. There are a few special features of this file:

  • footer's extension links are not empty, meaning at some point more fields and columns are created after the writing process has started
  • because of that, it may not be very surprising, that the first cluster and the second cluster have different numbers of storage columns (835 for 1st cluster, 869 for 2nd cluster and all the subsequent ones)

869 matches the number of total column records after appending the ones found int footer to that of the header

none of these directly explain why the offset/content would misalign suddenly though, my understanding is the field and column records in the footer extension should just be "appended" to the header ones, so in theory they shouldn't mess up the indexing of storage column, thus everything I can read from 1st cluster should continue to work in 2nd cluster.

from unroot.jl.

Moelf avatar Moelf commented on June 2, 2024

I think that might be red herring:

# remember Julia is 1-based index
julia> rf.header.field_records[182:201]
20-element rVector{UnROOT.FieldRecord}:
 UnROOT.FieldRecord(field_version=0x00000000, type_version=0xffffffff, parent_field_id=0x000000b5, struct_role=0x0002, flags=0x0000, repetition=0, field_name="AntiKt4TruthDressedWZJetsAux:", type_name="xAOD::JetAuxContainer_v1", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0xffffffff, parent_field_id=0x000000b5, struct_role=0x0002, flags=0x0000, repetition=0, field_name=":_0", type_name="xAOD::AuxContainerBase", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0xffffffff, parent_field_id=0x000000b6, struct_role=0x0002, flags=0x0000, repetition=0, field_name=":_0", type_name="SG::IAuxStore", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0xffffffff, parent_field_id=0x000000b7, struct_role=0x0002, flags=0x0000, repetition=0, field_name=":_0", type_name="SG::IConstAuxStore", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0xffffffff, parent_field_id=0x000000b6, struct_role=0x0002, flags=0x0000, repetition=0, field_name=":_1", type_name="SG::IAuxStoreIO", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0xffffffff, parent_field_id=0x000000b6, struct_role=0x0002, flags=0x0000, repetition=0, field_name=":_2", type_name="SG::IAuxStoreHolder", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0xffffffff, parent_field_id=0x000000b6, struct_role=0x0002, flags=0x0000, repetition=0, field_name=":_3", type_name="ILockable", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000b5, struct_role=0x0001, flags=0x0000, repetition=0, field_name="pt", type_name="std::vector<float>", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000bc, struct_role=0x0000, flags=0x0000, repetition=0, field_name="_0", type_name="float", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000b5, struct_role=0x0001, flags=0x0000, repetition=0, field_name="eta", type_name="std::vector<float>", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000be, struct_role=0x0000, flags=0x0000, repetition=0, field_name="_0", type_name="float", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000b5, struct_role=0x0001, flags=0x0000, repetition=0, field_name="phi", type_name="std::vector<float>", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000c0, struct_role=0x0000, flags=0x0000, repetition=0, field_name="_0", type_name="float", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000b5, struct_role=0x0001, flags=0x0000, repetition=0, field_name="m", type_name="std::vector<float>", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000c2, struct_role=0x0000, flags=0x0000, repetition=0, field_name="_0", type_name="float", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000b5, struct_role=0x0001, flags=0x0000, repetition=0, field_name="constituentLinks", type_name="std::vector<std::vector<ElementLink<DataVector<xAOD::IParticle> >>>", type_alias="xAOD::JetAuxContainer_v1::ConstituentLinks_t", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000c4, struct_role=0x0001, flags=0x0000, repetition=0, field_name="_0", type_name="std::vector<ElementLink<DataVector<xAOD::IParticle> >>", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0xffffffff, parent_field_id=0x000000c5, struct_role=0x0002, flags=0x0000, repetition=0, field_name="_0", type_name="ElementLink<DataVector<xAOD::IParticle> >", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0xffffffff, parent_field_id=0x000000c6, struct_role=0x0002, flags=0x0000, repetition=0, field_name=":_0", type_name="ElementLinkBase", type_alias="", field_desc="", )

 UnROOT.FieldRecord(field_version=0x00000000, type_version=0x00000000, parent_field_id=0x000000c7, struct_role=0x0000, flags=0x0000, repetition=0, field_name="m_persKey", type_name="std::uint32_t", type_alias="SG::sgkey_t", field_desc="", )


julia> collect(rnt.var"AntiKt4TruthDressedWZJetsAux:")
field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000bc, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0011, nbits=0x0020, field_id=0x000000bd, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000be, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0011, nbits=0x0020, field_id=0x000000bf, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000c0, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0011, nbits=0x0020, field_id=0x000000c1, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000c2, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0011, nbits=0x0020, field_id=0x000000c3, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000c4, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000c5, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0014, nbits=0x0020, field_id=0x000000c8, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0014, nbits=0x0020, field_id=0x000000c9, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000ca, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000cb, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0011, nbits=0x0020, field_id=0x000000cc, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000bc, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0011, nbits=0x0020, field_id=0x000000bd, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000be, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0011, nbits=0x0020, field_id=0x000000bf, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000c0, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0011, nbits=0x0020, field_id=0x000000c1, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000c2, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0011, nbits=0x0020, field_id=0x000000c3, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000c4, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x000e, nbits=0x0040, field_id=0x000000c5, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0014, nbits=0x0020, field_id=0x000000c8, flags=0x00000000, first_ele_idx=0, )

field.columnrecord = UnROOT.ColumnRecord(type=0x0014, nbits=0x0020, field_id=0x000000c9, flags=0x00000000, first_ele_idx=0, )

none of the columns touched have flags=0x08 in this case.

from unroot.jl.

Moelf avatar Moelf commented on June 2, 2024
julia> rnt = LazyTree("./DAOD_TRUTH1.zprime125.rntuple.root", "RNT:CollectionTree", "AntiKt4TruthDressedWZJetsAux:");

julia> rnt.var"AntiKt4TruthDressedWZJetsAux:".rn.schema
RNTupleSchema with 1 top fields
└─ Symbol("AntiKt4TruthDressedWZJetsAux:")  Struct
                                             ├─ :m  Vector
                                             │       ├─ :offset  Leaf{UnROOT.Index64}(col=38)
                                             │       └─ :content  Leaf{Float32}(col=39)
                                             ├─ :pt  Vector
                                             │        ├─ :offset  Leaf{UnROOT.Index64}(col=32)
                                             │        └─ :content  Leaf{Float32}(col=33)
                                             ├─ :eta  Vector
                                             │         ├─ :offset  Leaf{UnROOT.Index64}(col=34)
                                             │         └─ :content  Leaf{Float32}(col=35)
                                             ├─ :constituentWeights  Vector
                                             │                        ├─ :offset  Leaf{UnROOT.Index64}(col=44)
                                             │                        └─ :content  Vector
                                             │                                      ├─ :offset  Leaf{UnROOT.Index64}(col=45)
                                             │                                      └─ :content  Leaf{Float32}(col=46)
                                             ├─ :phi  Vector
                                             │         ├─ :offset  Leaf{UnROOT.Index64}(col=36)
                                             │         └─ :content  Leaf{Float32}(col=37)
                                             └─ :constituentLinks  Vector
                                                                    ├─ :offset  Leaf{UnROOT.Index64}(col=40)
                                                                    └─ :content  Vector
                                                                                  ├─ :offset  Leaf{UnROOT.Index64}(col=41)
                                                                                  └─ :content  Struct
                                                                                                └─ Symbol(":_0")  Struct
                                                                                                                   ├─ :m_persKey  Leaf{UInt32}(col=42)
                                                                                                                   └─ :m_persIndex  Leaf{UInt32}(col=43)

let's manually check all the columns under the :constituentLinks, and specifically in the second cluster, see if they are consistent:

julia> io = rnt.var"AntiKt4TruthDressedWZJetsAux:".rn.io;

julia> rnt.var"AntiKt4TruthDressedWZJetsAux:"[1]; # fill the pagelinks cache

julia> cluster_group = rnt.var"AntiKt4TruthDressedWZJetsAux:".rn.pagelinks[1];

julia> cluster_group.cluster_summaries[2] # we're interested in the second cluster
UnROOT.ClusterSummary(528, 1432)

julia> cluster_group.nested_page_locations[2][40] # this is col=40 in the schema, we converted already
1-element UnROOT.RNTupleListNoFrame{UnROOT.PageDescription}:
 UnROOT.PageDescription(0x00000598, UnROOT.Locator(num_bytes=816, offset=0x0000000003f3eb4a, )
)

# ├─ :offset ⇒ Leaf{UnROOT.Index64}(col=40)
julia> reinterpret(UnROOT.Index64, UnROOT.read_pagedesc(io, cluster_group.nested_page_locations[2][40], 64; split=true)) |> cumsum .|> Int
1432-element Vector{Int64}:
     9
    30
    36
    43
    57
    65
     
 12906
 12919
 12930

julia> cluster_group.nested_page_locations[2][41] # this is col=41 in the schema, we converted already
2-element UnROOT.RNTupleListNoFrame{UnROOT.PageDescription}:
 UnROOT.PageDescription(0x00002000, UnROOT.Locator(num_bytes=4740, offset=0x0000000003c6ab67, )
)
 UnROOT.PageDescription(0x00001282, UnROOT.Locator(num_bytes=2770, offset=0x0000000003f3eea4, )
)

# this is split and delta encoded
# ├─ :offset ⇒ Leaf{UnROOT.Index64}(col=41)
julia> reinterpret(UnROOT.Index64, UnROOT.read_pagedesc(io, cluster_group.nested_page_locations[2][41], 64; split=true)) |> cumsum .|> Int
12930-element Vector{Int64}:
     30
     34
     52
     66
...
 209999
 210008

julia> cluster_group.nested_page_locations[2][42]
8-element UnROOT.RNTupleListNoFrame{UnROOT.PageDescription}:
 UnROOT.PageDescription(0x00004000, UnROOT.Locator(num_bytes=38, offset=0x000000000196af81, )
) 
 UnROOT.PageDescription(0x00004000, UnROOT.Locator(num_bytes=38, offset=0x0000000001f361e0, )
)
 UnROOT.PageDescription(0x00004000, UnROOT.Locator(num_bytes=38, offset=0x00000000024cac93, )
)
 UnROOT.PageDescription(0x00004000, UnROOT.Locator(num_bytes=38, offset=0x0000000002ac8a47, )
)
 UnROOT.PageDescription(0x00004000, UnROOT.Locator(num_bytes=38, offset=0x00000000030d9f71, )
)
 UnROOT.PageDescription(0x00004000, UnROOT.Locator(num_bytes=38, offset=0x00000000036576d6, )
)
 UnROOT.PageDescription(0x00004000, UnROOT.Locator(num_bytes=38, offset=0x0000000003c710aa, )
)
 UnROOT.PageDescription(0x00003648, UnROOT.Locator(num_bytes=38, offset=0x0000000003f3f9a0, )
)

julia> julia> reinterpret(UInt32, UnROOT.read_pagedesc(io, cluster_group.nested_page_locations[2][42], 32; split=true))
128584-element reinterpret(UInt32, ::Vector{UInt8}):
 0x2784318b
 0x2784318b
 0x2784318b
 0x2784318b
 0x2784318b
 0x2784318b
 0x2784318b
 0x2784318b

from unroot.jl.

Moelf avatar Moelf commented on June 2, 2024

First cluster seems to match:

ROOT

In [39]: df = ROOT.RDataFrame("RNT:CollectionTree", "./DAOD_TRUTH1.zprime125.rntuple.root")

In [41]: df.GetColumnType("AntiKt4TruthDressedWZJetsAux:.constituentLinks.:_0.m_persKey")
Out[41]: 'ROOT::VecOps::RVec<ROOT::VecOps::RVec<std::uint32_t>>'

In [67]: list(list(list(df.Take['ROOT::VecOps::RVec<ROOT::VecOps::RVec<std::uint32_t>>']("AntiKt4TruthDressedWZJetsAux:.constituentLinks.:_0.m_persIndex"))[3])[2])
Out[67]: [748, 932, 936, 935, 934]

In [68]: list(list(list(df.Take['ROOT::VecOps::RVec<ROOT::VecOps::RVec<std::uint32_t>>']("AntiKt4TruthDressedWZJetsAux:.constituentLinks.:_0.m_persKey"))[3])[2])
Out[68]: [662974859, 662974859, 662974859, 662974859, 662974859]

In [62]: list(list(list(df.Take['ROOT::VecOps::RVec<ROOT::VecOps::RVec<std::uint32_t>>']("AntiKt4TruthDressedWZJetsAux:.constituentLinks.:_0.m_persIndex"))[527])[7])
Out[62]: [1575, 2481, 435, 2480, 2477, 2532]

In [63]: list(list(list(df.Take['ROOT::VecOps::RVec<ROOT::VecOps::RVec<std::uint32_t>>']("AntiKt4TruthDressedWZJetsAux:.constituentLinks.:_0.m_persKey"))[527])[7])
Out[63]: [662974859, 662974859, 662974859, 662974859, 662974859, 662974859]

UnROOT

julia> br = rnt.var"AntiKt4TruthDressedWZJetsAux:";

julia> [Int(x[1].m_persIndex) for x in br[4].constituentLinks[3]]
5-element Vector{Int64}:
 748
 932
 936
 935
 934

julia> [Int(x[1].m_persKey) for x in br[4].constituentLinks[3]]
5-element Vector{Int64}:
 662974859
 662974859
 662974859
 662974859
 662974859

julia> [Int(x[1].m_persIndex) for x in br[528].constituentLinks[8]]
6-element Vector{Int64}:
 1575
 2481
  435
 2480
 2477
 2532

julia> [Int(x[1].m_persKey) for x in br[528].constituentLinks[8]]
6-element Vector{Int64}:
 662974859
 662974859
 662974859
 662974859
 662974859
 662974859

from unroot.jl.

Moelf avatar Moelf commented on June 2, 2024

Scond cluster

ROOT

In [65]: list(list(list(df.Take['ROOT::VecOps::RVec<ROOT::VecOps::RVec<std::uint32_t>>']("AntiKt4TruthDressedWZJetsAux:.constituentLinks.:_0.m_persIndex"))[528])[0])
Out[65]:
[1709,
 1122,
 1132,
 1808,
 1807,
...

In [92]: list(list(list(df.Take['ROOT::VecOps::RVec<ROOT::VecOps::RVec<std::uint32_t>>']("AntiKt4TruthDressedWZJetsAux:.constituentLinks.:_0.m_persIndex"))[528+1432-1])[-1])
Out[92]: [1427, 900, 546, 849, 1433, 1425, 1431, 845, 964]

Julia

we crash here, so using debug output:

# @show inside second cluster
length(content) = 128584
[Int((x[1]).m_persIndex) for x = first(content, 5)] = [1709, 1122, 1132, 1808, 1807]
[Int((x[1]).m_persIndex) for x = last(content, 9)] = [1427, 900, 546, 849, 1433, 1425, 1431, 845, 964]

So it looks like we have all the data we want, just somehow the index are not aligned ....

what about total amount of data in 2nd cluster?

# be warned: this is slow
a = []
In [116]: for i in range(528, 528+1432):
     ...:     ns = list(list(df.Take['ROOT::VecOps::RVec<ROOT::VecOps::RVec<std::uint32_t>>']("AntiKt4TruthDressedWZJetsAux:.constituentLinks.:_0.m_persIndex"))[i])
     ...:     for n in ns:
     ...:         a.append(len(list(n)))

In [121]: sum(a)
Out[121]: 128584

well, I'm not missing any data here, so idk what the is going on

from unroot.jl.

Moelf avatar Moelf commented on June 2, 2024

Given the content column is fine, now I suspect we're doing something wrong with the offset column:

field.offset_col = Leaf{UnROOT.Index64}(col=41)

and sure enough, this is that weird column we weren't able to read earlier, because we couldn't inflate 29 bytes into 131072

Let's compare the offsets

ROOT

In [8]: a[:10]
Out[8]: [30, 4, 18, 14, 5, 8, 10, 7, 8, 18]

UnROOT

Int.(first(offset, 10)) = [30, 34, 52, 66, 71, 79, 89, 96, 104, 122]
diff([0; Int.(first(offset, 10))]) = [30, 4, 18, 14, 5, 8, 10, 7, 8, 18]

so they start out agreeing!

from unroot.jl.

Moelf avatar Moelf commented on June 2, 2024

Compare the content of col=41

They appear to be the same at first glance, in fact they start and end the same:

julia> length(my_offset) == length(ref_offset)
true

julia> first(my_offset, 10) == first(ref_offset, 10)
true

julia> last(my_offset, 10) == last(ref_offset, 10)
true

ah, it's because of

0x01 64 Index64 Mother columns of (nested) collections, counting is relative to the cluster
0x02 32 Index32 Mother columns of (nested) collections, counting is relative to the cluster

specifically, "counting is relative to the cluster" was not clear to me. But what it means is, if you have 2 pages of Index64 for a column in this cluster, after "de-split encoding" you will get these two arrays:

first page: [30, 4, 18, 14, 5, 8, 10, 7, 8, 18, ..., 22, 1, 16, 14, 14, 12, 4, 7, 15, 24]
first page after cumsum: [..., 81317, 81318, 81334, 81348, 81362, 81374, 81378, 81385, 81400, 81424]

second page: [81428, 13, 9, 7, 5, 20, 21, 4, 8, 6, ..., 8, 3, 4, 9, 14, 18, 8, 14, 16, 6, 8, 7, 11, 9]

you can see that the second page doesn't start with 4, instead, it starts with a huge number 81428. I guess it means index pages are to be interpreted individually within the cluster.

see also: root-project/root#14982

from unroot.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.