Giter VIP home page Giter VIP logo

uproot3's People

Contributors

ast0815 avatar asymmetry avatar benkrikler avatar bfis avatar chrisburr avatar edopro98 avatar eduardo-rodrigues avatar guitargeek avatar hdembinski avatar healthypear avatar henryiii avatar jmschoeffmann avatar jpivarski avatar jrueb avatar kreczko avatar masonproffitt avatar matthewfeickert avatar niclaseich avatar nsmith- avatar oshadura avatar plexoos avatar reikdas avatar riga avatar tamasgal avatar wiso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

uproot3's Issues

Uninterpreted fields in the CMS dark matter search dataset

the following fields can not be interpreted in the "good old" CMS dark energy search dataset:

leaf element Info.triggerBits can not be interpreted
leaf element Electron.hltMatchBits can not be interpreted
leaf element Muon.hltMatchBits can not be interpreted
leaf element Tau.hltMatchBits can not be interpreted
leaf element Photon.hltMatchBits can not be interpreted
leaf element AK4CHS.hltMatchBits can not be interpreted
leaf element AK8CHS.hltMatchBits can not be interpreted
leaf element CA15CHS.hltMatchBits can not be interpreted
leaf element AK4Puppi.hltMatchBits can not be interpreted
leaf element CA8Puppi.hltMatchBits can not be interpreted
leaf element CA15Puppi.hltMatchBits can not be interpreted

uproot version is 2.3.3.
File is /data/ivm/root_data/MC/WZ_13TeV_pythia8/WZ_13TeV_pythia8_1.root

Problem with iterate function

Hi, I’m using uproot to read some root files and I have some problem when I use uproot.iterate function. I wrote this script to explain you the problem.

from math import *
import numpy
import uproot
import sys
import argparse

parser=argparse.ArgumentParser(prog='PROG', description='inspect root files')
parser.add_argument("--fin", action="store",
            dest="fin", default="", help="Input ROOT file")
parser.add_argument("--branch", action="store",
            dest="branch", default="Events", help="Input ROOT file branch (default Events)")
parser.add_argument("--branches", action="store",
            dest="branches", default=[], help="Comma separated list of branches to read (default all)")
parser.add_argument("--fout", action="store",
            dest="fout", default="", help="Output file name to write")
args = parser.parse_args()
list_branches = args.branches.split(',') if args.branches else []

outputfile = open(args.fout, "w")
iterator = uproot.iterate(args.fin, args.branch, 1000,
     branches=list_branches, outputtype=dict)
for block in iterator:
    outputfile.write("%s \n" %block)
outputfile.close() 

If I run this
python try_uproot.py --fin /afs/cern.ch/work/b/bonacor/public/double.root --fout output --branches event
I can see that the last two blocks of the iterator are empty and the filling stops in the 16th iteration (so it seems that there are less than 16000 events). This is not correct because the root file contain 18000 events for branch “event”. If I change the root file and I run this
python try_uproot.py --fin=/afs/cern.ch/work/b/bonacor/public/small10kevts.root --fout=output --branch events --branches=evtNo
the output file shows that the elements in the branch evtNo are 7981 but in the root file we can see that they are 10000. There is something wrong in uproot? This happens of course even if I select different branches.
On the contrary if I run
python try_uproot.py --fin /afs/cern.ch/user/v/valya/public/nano-RelValTTBar.root --fout output --branches event
that is similar to the root file of the first example but with less events, that are 9000 (I obtained double.root copying nano-RelValTTBar.root two times, so it has 18000 events), the output file shows that all the events are read with iterate function.
So, there is maybe a problem of full allocated memory when I run iterate function? Because when I try to read big root file (also bigger than those of the previous examples), it stops inexplicably in reading.

Thank you
Luca

@vkuznet

uproot.open crashing with hadded file as input, TypeError: ord() expected string of length 1, but int found

I'm using uproot for handling a few simple ntuples and I've come across an error when trying to open a file produced using hadd.
Full Traceback:

Python 3.6.3 (default, Oct  4 2017, 06:09:38)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import uproot
>>> uproot.open('test.root')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ddavis/Software/Python/brewed/envs/fresh/lib/python3.6/site-packages/uproot/rootio.py", line 59, in open
    return ROOTDirectory.read(localsource(path), **options)
  File "/Users/ddavis/Software/Python/brewed/envs/fresh/lib/python3.6/site-packages/uproot/rootio.py", line 138, in read
    classes = _defineclasses(streamerinfos)
  File "/Users/ddavis/Software/Python/brewed/envs/fresh/lib/python3.6/site-packages/uproot/rootio.py", line 567, in _defineclasses
    rename = dict((streamerinfo.fName, _safename(streamerinfo.fName)) for streamerinfo in streamerinfos)
  File "/Users/ddavis/Software/Python/brewed/envs/fresh/lib/python3.6/site-packages/uproot/rootio.py", line 567, in <genexpr>
    rename = dict((streamerinfo.fName, _safename(streamerinfo.fName)) for streamerinfo in streamerinfos)
  File "/Users/ddavis/Software/Python/brewed/envs/fresh/lib/python3.6/site-packages/uproot/rootio.py", line 556, in _safename
    out = re.sub(b"[^a-zA-Z0-9]+", lambda bad: b"_" + b"".join(b"%02x" % ord(x) for x in bad.group(0)) + b"_", name).decode("ascii")
  File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/Users/ddavis/Software/Python/brewed/envs/fresh/lib/python3.6/site-packages/uproot/rootio.py", line 556, in <lambda>
    out = re.sub(b"[^a-zA-Z0-9]+", lambda bad: b"_" + b"".join(b"%02x" % ord(x) for x in bad.group(0)) + b"_", name).decode("ascii")
  File "/Users/ddavis/Software/Python/brewed/envs/fresh/lib/python3.6/site-packages/uproot/rootio.py", line 556, in <genexpr>
    out = re.sub(b"[^a-zA-Z0-9]+", lambda bad: b"_" + b"".join(b"%02x" % ord(x) for x in bad.group(0)) + b"_", name).decode("ascii")
TypeError: ord() expected string of length 1, but int found

Here are some toy files (tiny, 2 branches with 2 entries each) created and hadded with ROOT v6.10.08
The individual files that work:
http://cern.ch/ddavis/files/files_for_uproot/out1.root
http://cern.ch/ddavis/files/files_for_uproot/out2.root
The hadded file that doesnt:
http://cern.ch/ddavis/files/files_for_uproot/outs.root

In uproot 2.4.1 (with python 3.6.3) the hadded file causes the crash. Quick google is telling me that ord() is redundant in python3. I'm unfamiliar with this part of python but I'll give it a go and report back if I figure anything out.

Unable to open GEN-SIM-DIGI-RAW file

With uproot 2.1.5 version I'm unable to open GEN-SIM-DIGI-RAW file
(/store/relval/CMSSW_9_4_0/RelValTTbar_13/GEN-SIM-DIGI-RAW/PU25ns_94X_upgrade2018_realistic_v5-v1/10000/3245F8B2-CBC8-E711-8FCA-0CC47A4D762E.root). Here is traceback, it fails in uproot.open.

Traceback (most recent call last):
  File "./tfaas.py", line 53, in listBranches
    with uproot.open(fin) as tree:
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/rootio.py", line 59, in open
    return ROOTDirectory.read(localsource(path), **options)
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/rootio.py", line 132, in read
    streamerinfos, streamerinfosmap, streamerrules = _readstreamers(streamerkey._source, streamerkey._cursor, streamercontext)
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/rootio.py", line 453, in _readstreamers
    streamerinfos = list(topological_sort(streamerinfos))
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/rootio.py", line 449, in topological_sort
    raise ValueError("cannot sort TStreamerInfos into dependency order:\n\n{0}".format("\n".join("{0:20s} requires {1}".format(item.fName, " ".join(dependencies)) for item, dependencies in items)))
ValueError: cannot sort TStreamerInfos into dependency order:

SiPixelRecHit        requires TrackerSingleRecHit
TrackerSingleRecHit  requires OmniClusterRef BaseTrackerRecHit
OmniClusterRef       requires edm::RefCoreWithIndex
SiStripMatchedRecHit2D requires OmniClusterRef BaseTrackerRecHit
SiStripRecHit2D      requires TrackerSingleRecHit
ProjectedSiStripRecHit2D requires TrackerSingleRecHit
L2MuonTrajectorySeed requires edm::Ref<vector<l1extra::L1MuonParticle>,l1extra::L1MuonParticle,edm::refhelper::FindUsingAdvance<vector<l1extra::L1MuonParticle>,l1extra::L1MuonParticle> > TrajectorySeed edm::Ref<BXVector<l1t::Muon>,l1t::Muon,edm::refhelper::FindUsingAdvance<BXVector<l1t::Muon>,l1t::Muon> >
L3MuonTrajectorySeed requires edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> > edm::Ref<vector<l1extra::L1MuonParticle>,l1extra::L1MuonParticle,edm::refhelper::FindUsingAdvance<vector<l1extra::L1MuonParticle>,l1extra::L1MuonParticle> > TrajectorySeed edm::Ref<BXVector<l1t::Muon>,l1t::Muon,edm::refhelper::FindUsingAdvance<BXVector<l1t::Muon>,l1t::Muon> >
TrackingParticle     requires edm::RefVector<vector<reco::GenParticle>,reco::GenParticle,edm::refhelper::FindUsingAdvance<vector<reco::GenParticle>,reco::GenParticle> > edm::RefVector<vector<TrackingVertex>,TrackingVertex,edm::refhelper::FindUsingAdvance<vector<TrackingVertex>,TrackingVertex> > edm::Ref<vector<TrackingVertex>,TrackingVertex,edm::refhelper::FindUsingAdvance<vector<TrackingVertex>,TrackingVertex> >
edm::Ref<vector<TrackingVertex>,TrackingVertex,edm::refhelper::FindUsingAdvance<vector<TrackingVertex>,TrackingVertex> > requires edm::RefCoreWithIndex
reco::Electron       requires reco::RecoCandidate edm::Ref<vector<reco::GsfTrack>,reco::GsfTrack,edm::refhelper::FindUsingAdvance<vector<reco::GsfTrack>,reco::GsfTrack> > edm::Ref<vector<reco::SuperCluster>,reco::SuperCluster,edm::refhelper::FindUsingAdvance<vector<reco::SuperCluster>,reco::SuperCluster> > edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> >
reco::GsfTrack       requires ROOT::Math::DisplacementVector3D<ROOT::Math::Cartesian3D<Double32_t>,ROOT::Math::DefaultCoordinateSystemTag> reco::Track edm::Ref<vector<reco::GsfTrackExtra>,reco::GsfTrackExtra,edm::refhelper::FindUsingAdvance<vector<reco::GsfTrackExtra>,reco::GsfTrackExtra> >
reco::Track          requires reco::TrackBase edm::Ref<vector<reco::TrackExtra>,reco::TrackExtra,edm::refhelper::FindUsingAdvance<vector<reco::TrackExtra>,reco::TrackExtra> >
reco::IsolatedPixelTrackCandidate requires reco::RecoCandidate edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> > edm::Ref<BXVector<l1t::Tau>,l1t::Tau,edm::refhelper::FindUsingAdvance<BXVector<l1t::Tau>,l1t::Tau> > edm::Ref<vector<l1extra::L1JetParticle>,l1extra::L1JetParticle,edm::refhelper::FindUsingAdvance<vector<l1extra::L1JetParticle>,l1extra::L1JetParticle> >
edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> > requires edm::RefCoreWithIndex
edm::Ref<vector<l1extra::L1JetParticle>,l1extra::L1JetParticle,edm::refhelper::FindUsingAdvance<vector<l1extra::L1JetParticle>,l1extra::L1JetParticle> > requires edm::RefCoreWithIndex
reco::MuonTrackLinks requires edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> >
reco::PFTau          requires edm::RefVector<vector<reco::PFRecoTauChargedHadron>,reco::PFRecoTauChargedHadron,edm::refhelper::FindUsingAdvance<vector<reco::PFRecoTauChargedHadron>,reco::PFRecoTauChargedHadron> > edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> > edm::RefVector<vector<reco::RecoTauPiZero>,reco::RecoTauPiZero,edm::refhelper::FindUsingAdvance<vector<reco::RecoTauPiZero>,reco::RecoTauPiZero> > edm::Ref<vector<reco::PFTauTagInfo>,reco::PFTauTagInfo,edm::refhelper::FindUsingAdvance<vector<reco::PFTauTagInfo>,reco::PFTauTagInfo> > reco::BaseTau edm::Ref<vector<reco::PFJet>,reco::PFJet,edm::refhelper::FindUsingAdvance<vector<reco::PFJet>,reco::PFJet> > edm::Ptr<reco::PFCandidate>
reco::BaseTau        requires reco::RecoCandidate edm::RefVector<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> > ROOT::Math::LorentzVector<ROOT::Math::PxPyPzE4D<double> > edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> >
reco::RecoChargedCandidate requires reco::RecoCandidate edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> >
reco::RecoEcalCandidate requires reco::RecoCandidate edm::Ref<vector<reco::SuperCluster>,reco::SuperCluster,edm::refhelper::FindUsingAdvance<vector<reco::SuperCluster>,reco::SuperCluster> >
edm::Ref<vector<reco::SuperCluster>,reco::SuperCluster,edm::refhelper::FindUsingAdvance<vector<reco::SuperCluster>,reco::SuperCluster> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::TrackExtra>,reco::TrackExtra,edm::refhelper::FindUsingAdvance<vector<reco::TrackExtra>,reco::TrackExtra> > requires edm::RefCoreWithIndex
edm::reftobase::Holder<reco::Track,edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> > > requires edm::reftobase::BaseHolder<reco::Track> edm::Ref<vector<reco::Track>,reco::Track,edm::refhelper::FindUsingAdvance<vector<reco::Track>,reco::Track> >
edm::Ref<vector<reco::RecoEcalCandidate>,reco::RecoEcalCandidate,edm::refhelper::FindUsingAdvance<vector<reco::RecoEcalCandidate>,reco::RecoEcalCandidate> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::RecoChargedCandidate>,reco::RecoChargedCandidate,edm::refhelper::FindUsingAdvance<vector<reco::RecoChargedCandidate>,reco::RecoChargedCandidate> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::CaloJet>,reco::CaloJet,edm::refhelper::FindUsingAdvance<vector<reco::CaloJet>,reco::CaloJet> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::MET>,reco::MET,edm::refhelper::FindUsingAdvance<vector<reco::MET>,reco::MET> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::CaloMET>,reco::CaloMET,edm::refhelper::FindUsingAdvance<vector<reco::CaloMET>,reco::CaloMET> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::IsolatedPixelTrackCandidate>,reco::IsolatedPixelTrackCandidate,edm::refhelper::FindUsingAdvance<vector<reco::IsolatedPixelTrackCandidate>,reco::IsolatedPixelTrackCandidate> > requires edm::RefCoreWithIndex
edm::Ref<vector<l1extra::L1MuonParticle>,l1extra::L1MuonParticle,edm::refhelper::FindUsingAdvance<vector<l1extra::L1MuonParticle>,l1extra::L1MuonParticle> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::PFJet>,reco::PFJet,edm::refhelper::FindUsingAdvance<vector<reco::PFJet>,reco::PFJet> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::PFTau>,reco::PFTau,edm::refhelper::FindUsingAdvance<vector<reco::PFTau>,reco::PFTau> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::PFMET>,reco::PFMET,edm::refhelper::FindUsingAdvance<vector<reco::PFMET>,reco::PFMET> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::PFTauTagInfo>,reco::PFTauTagInfo,edm::refhelper::FindUsingAdvance<vector<reco::PFTauTagInfo>,reco::PFTauTagInfo> > requires edm::RefCoreWithIndex
SiStripRecHit1D      requires TrackerSingleRecHit
edm::Ref<vector<reco::GsfTrack>,reco::GsfTrack,edm::refhelper::FindUsingAdvance<vector<reco::GsfTrack>,reco::GsfTrack> > requires edm::RefCoreWithIndex
edm::Ref<vector<reco::GsfTrackExtra>,reco::GsfTrackExtra,edm::refhelper::FindUsingAdvance<vector<reco::GsfTrackExtra>,reco::GsfTrackExtra> > requires edm::RefCoreWithIndex

directly open a TTree

I have a ROOT file like this

KEY: RooFitResult	fitResult;1	Fit Results
KEY: TProcessID	ProcessID0;1	46ded5cc-074c-11e8-9717-f83db9bcbeef
KEY: TTree	nllscan;1	nllscan

when opening with uproot.open I get an error, probably due to the RooFitResult object:

ValueError: cannot sort TStreamerInfos into dependency order:

RooUniformBinning    requires RooAbsBinning

this is not the point. Is it possible to have a function to directly open a TTree inside a file and to ignore everything else? Something like

uproot.open_tree(filename, treename)

Error reading root file

On ifdb01, several files would not open, producing exceptions:

import uproot
print uproot.version.__version__

tree = uproot.open("/data/kirby/prod_anatree_optfilter_bnb_v11_unblind_mcc8/ana_hist_013f8c72-eb27-430a-8dde-d95b9520a36a.root")["analysistree/anatree"]
2.8.12
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-17eb1ab6485e> in <module>()
      2 print uproot.version.__version__
      3 
----> 4 tree = uproot.open("/data/kirby/prod_anatree_optfilter_bnb_v11_unblind_mcc8/ana_hist_013f8c72-eb27-430a-8dde-d95b9520a36a.root")["analysistree/anatree"]

/home/ivm/anaconda2/lib/python2.7/site-packages/uproot-2.8.12-py2.7.egg/uproot/rootio.pyc in open(path, localsource, xrootdsource, httpsource, **options)
     62         else:
     63             openfcn = localsource
---> 64         return ROOTDirectory.read(openfcn(path), **options)
     65 
     66     elif _bytesid(parsed.scheme) == b"root":

/home/ivm/anaconda2/lib/python2.7/site-packages/uproot-2.8.12-py2.7.egg/uproot/rootio.pyc in read(source, *args, **options)
    154                 if read_streamers:
    155                     streamercontext = ROOTDirectory._FileContext(source.path, None, None, streamerclasses, uproot.source.compressed.Compression(fCompress), tfile)
--> 156                     streamerkey = TKey.read(source, Cursor(fSeekInfo), streamercontext, None)
    157                     streamerinfos, streamerinfosmap, streamerrules = _readstreamers(streamerkey._source, streamerkey._cursor, streamercontext, None)
    158                 else:

/home/ivm/anaconda2/lib/python2.7/site-packages/uproot-2.8.12-py2.7.egg/uproot/rootio.pyc in read(cls, source, cursor, context, parent)
    784             context = context.copy()
    785         out = cls.__new__(cls)
--> 786         out = cls._readinto(out, source, cursor, context, parent)
    787         out._postprocess(source, cursor, context, parent)
    788         return out

/home/ivm/anaconda2/lib/python2.7/site-packages/uproot-2.8.12-py2.7.egg/uproot/rootio.pyc in _readinto(cls, self, source, cursor, context, parent)
    817         if source.size() is not None:
    818             if source.size() - self.fSeekKey < self.fNbytes:
--> 819                 raise ValueError("TKey declares that object {0} has {1} bytes but only {2} remain in the file".format(repr(self.fName), self.fNbytes, source.size() - self.fSeekKey))
    820 
    821         # object size != compressed size means it's compressed

ValueError: TKey declares that object '\x00' has 1919905652 bytes but only -23002702 remain in the file

Some branches cause numpy broadcast errors

With the NOvA CAF files, specifically, fardet_r00021429_s37_t00_R16-03-03-prod2reco.f_v1_data.caf.root, I get these errors:

can not get array for rec.spill.spill.intx: <type 'exceptions.ValueError'> could not broadcast input array from shape (2898) into shape (3501)
can not get array for rec.spill.spill.inty: <type 'exceptions.ValueError'> could not broadcast input array from shape (2898) into shape (3501)
can not get array for rec.spill.spill.bposx: <type 'exceptions.ValueError'> could not broadcast input array from shape (2898) into shape (3501)
can not get array for rec.spill.spill.bposy: <type 'exceptions.ValueError'> could not broadcast input array from shape (2898) into shape (3501)
can not get array for rec.spill.intx: <type 'exceptions.ValueError'> could not broadcast input array from shape (2898) into shape (3501)
can not get array for rec.spill.inty: <type 'exceptions.ValueError'> could not broadcast input array from shape (2898) into shape (3501)
can not get array for rec.spill.bposx: <type 'exceptions.ValueError'> could not broadcast input array from shape (2898) into shape (3501)
can not get array for rec.spill.bposy: <type 'exceptions.ValueError'> could not broadcast input array from shape (2898) into shape (3501)

Handle TH1 w/ labeled bins

Currently I get this trace if I try to access one:

>>> h = f["cutflow"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/uscms/home/pedrok/.local/lib/python2.7/site-packages/uproot/rootio.py", line 199, in __getitem__
    return self.get(name)
  File "/uscms/home/pedrok/.local/lib/python2.7/site-packages/uproot/rootio.py", line 314, in get
    return key.get()
  File "/uscms/home/pedrok/.local/lib/python2.7/site-packages/uproot/rootio.py", line 797, in get
    return self._context.classes[classname].read(self._source, self._cursor.copied(), self._context)
  File "/uscms/home/pedrok/.local/lib/python2.7/site-packages/uproot/rootio.py", line 738, in read
    out = cls._readinto(out, source, cursor, context)
  File "<generated from TStreamerInfo 'TH1F' at 0x7f387398d990>", line 10, in _readinto
  File "<generated from TStreamerInfo 'TH1' at 0x7f387398da10>", line 15, in _readinto
  File "/uscms/home/pedrok/.local/lib/python2.7/site-packages/uproot/rootio.py", line 738, in read
    out = cls._readinto(out, source, cursor, context)
  File "<generated from TStreamerInfo 'TAxis' at 0x7f387398db10>", line 16, in _readinto
  File "/uscms/home/pedrok/.local/lib/python2.7/site-packages/uproot/rootio.py", line 421, in _readobjany
    obj = fct.read(source, cursor, context)        # new object
  File "/uscms/home/pedrok/.local/lib/python2.7/site-packages/uproot/rootio.py", line 738, in read
    out = cls._readinto(out, source, cursor, context)
  File "<generated from TStreamerInfo 'THashList' at 0x7f3882536cd0>", line 9, in _readinto
ValueError: attempting to read THashList object version 5 with a class generated by streamer version 0

Can provide example file on request.

Unable to open GEN-SIM-DIGI-RAW via xrootd interface

While trying to use xrootd interface I'm unable to open the following file
/store/relval/CMSSW_9_4_0/RelValTTbar_13/GEN-SIM-DIGI-RAW/PU25ns_94X_upgrade2018_realistic_v5-v1/10000/3245F8B2-CBC8-E711-8FCA-0CC47A4D762E.root

Traceback (most recent call last):
  File "./tfaas.py", line 57, in listBranches
    with uproot.open(fin) as tree:
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/rootio.py", line 62, in open
    return xrootd(path, xrootdsource)
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/rootio.py", line 68, in xrootd
    return ROOTDirectory.read(xrootdsource(path), **options)
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/rootio.py", line 100, in read
    magic, fVersion = cursor.fields(source, ROOTDirectory._format1)
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/source/cursor.py", line 72, in fields
    return format.unpack(source.data(start, stop))
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/source/chunked.py", line 84, in data
    chunk = self._cache[chunkindex]
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/cache/memorycache.py", line 340, in __getitem__
    return super(ThreadSafeMemoryCache, self).__getitem__(key)
  File "/afs/cern.ch/work/v/valya/uproot-2.1.5/uproot/cache/memorycache.py", line 82, in __getitem__
    value = self.spillover[key]
IndexError: tuple index out of range

Unable to read char* fields in objects

It seems that the remaining issue in #28 was caused by a char* field in an object. Example ROOT file and code attached. The issue is with the streamer for the name field in the mydata class.

test.root.gz

#include "TFile.h"
#include "TTree.h"

class mydata : public TObject {
    public:
        mydata() { };
        virtual ~mydata() { };

        int size;
        char* name;
        void set(const char* n, int s) {
                size = s;
                name = new char[s];
                strncpy(name,n,s);
        }

        ClassDef(mydata, 1);
};

void test_uproot()
{
  TFile file("test.root","RECREATE");
  TTree tree("T","T");

  mydata data;

  tree.Branch("data",&data);

  for (int i = 0; i < 1; i++) {
    data.set("test",4);
    tree.Fill();
  }

  tree.Write();

  file.Write();
  file.Close();
}
In [1]: import uproot
In [2]: uproot.open("test.root")
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-2-295bb97bcaec> in <module>()
----> 1 uproot.open("test.root")

/usr/local/lib/python3.5/dist-packages/uproot-2.5.2-py3.5.egg/uproot/rootio.py in open(path, localsource, xrootdsource, **options)
     57     if _bytesid(parsed.scheme) == b"file" or len(parsed.scheme) == 0:
     58         path = parsed.netloc + parsed.path
---> 59         return ROOTDirectory.read(localsource(path), **options)
     60 
     61     elif _bytesid(parsed.scheme) == b"root":

/usr/local/lib/python3.5/dist-packages/uproot-2.5.2-py3.5.egg/uproot/rootio.py in read(source, *args, **options)
    138                     streamerinfos, streamerinfosmap, streamerrules = [], {}, []
    139 
--> 140                 classes = _defineclasses(streamerinfos)
    141                 context = ROOTDirectory._FileContext(source.path, streamerinfos, streamerinfosmap, classes, uproot.source.compressed.Compression(fCompress), tfile)
    142 

/usr/local/lib/python3.5/dist-packages/uproot-2.5.2-py3.5.egg/uproot/rootio.py in _defineclasses(streamerinfos)
    625                         basicnames.append("self.{0}".format(_safename(element.fName)))
    626                         fields.append(_safename(element.fName))
--> 627                         basicletters += _ftype2struct(element.fType)
    628 
    629                         if elementi + 1 == len(streamerinfo.fElements) or not isinstance(streamerinfo.fElements[elementi + 1], TStreamerBasicType) or streamerinfo.fElements[elementi + 1].fArrayLength != 0:

/usr/local/lib/python3.5/dist-packages/uproot-2.5.2-py3.5.egg/uproot/rootio.py in _ftype2struct(fType)
    566         return "d"
    567     else:
--> 568         raise NotImplementedError(fType)
    569 
    570 def _safename(name):

NotImplementedError: 7

'TH1D' object has no attribute 'classname'

Trying to read this file:
E_19keV_U_18kV_histo.zip

and getting:

> import uproot
> file = uproot.open('E_19keV_U_18kV_histo.root')
> file.values()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\IPython\core\formatters.py in __call__(self, obj)
    670                 type_pprinters=self.type_printers,
    671                 deferred_pprinters=self.deferred_printers)
--> 672             printer.pretty(obj)
    673             printer.flush()
    674             return stream.getvalue()

C:\Anaconda3\lib\site-packages\IPython\lib\pretty.py in pretty(self, obj)
    366                 if cls in self.type_pprinters:
    367                     # printer registered in self.type_pprinters
--> 368                     return self.type_pprinters[cls](obj, self, cycle)
    369                 else:
    370                     # deferred printer

C:\Anaconda3\lib\site-packages\IPython\lib\pretty.py in inner(obj, p, cycle)
    550                 p.text(',')
    551                 p.breakable()
--> 552             p.pretty(x)
    553         if len(obj) == 1 and type(obj) is tuple:
    554             # Special case for 1-item tuples.

C:\Anaconda3\lib\site-packages\IPython\lib\pretty.py in pretty(self, obj)
    366                 if cls in self.type_pprinters:
    367                     # printer registered in self.type_pprinters
--> 368                     return self.type_pprinters[cls](obj, self, cycle)
    369                 else:
    370                     # deferred printer

C:\Anaconda3\lib\site-packages\IPython\lib\pretty.py in inner(obj, p, cycle)
    540         if basetype is not None and typ is not basetype and typ.__repr__ != basetype.__repr__:
    541             # If the subclass provides its own repr, use it instead.
--> 542             return p.text(typ.__repr__(obj))
    543 
    544         if cycle:

C:\Anaconda3\lib\site-packages\uproot-2.8.12-py3.6.egg\uproot\hist.py in __repr__(self)
     45             return "<{0} at 0x{1:012x}>".format(self.classname, id(self))
     46         else:
---> 47             return "<{0} {1} 0x{2:012x}>".format(self.classname, repr(self.fName), id(self))
     48 
     49     @property

AttributeError: 'TH1D' object has no attribute 'classname'

ValueError: attempting to read TAttLine object version 2 with a class generated by streamer version 1

I cannot read root files (~13GB) like below (python 2.7.13).

>>> branch = uproot.open("/path/to/mydir/foo.root")["mytree"]['mybranch']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/workdir/envname/lib/python2.7/site-packages/uproot/rootio.py", line 219, in __getitem__
    return self.get(name)
  File "/path/to/workdir/envname/lib/python2.7/site-packages/uproot/rootio.py", line 334, in get
    return key.get()
  File "/path/to/workdir/envname/lib/python2.7/site-packages/uproot/rootio.py", line 827, in get
    return self._context.classes[classname].read(self._source, self._cursor.copied(), self._context, self)
  File "/path/to/workdir/envname/lib/python2.7/site-packages/uproot/rootio.py", line 764, in read
    out = cls._readinto(out, source, cursor, context, parent)
  File "<generated from TStreamerInfo 'TTree' at 0x7fe3f7f22390>", line 11, in _readinto
  File "<generated from TStreamerInfo 'TAttLine' at 0x7fe3f7f223d0>", line 9, in _readinto
ValueError: attempting to read TAttLine object version 2 with a class generated by streamer version 1

I don't understand what happens, so any idea?

Best regards

UserWarning: attempted to read missing baskets from incompletely written file, but encountered ValueError: recovered 2 baskets, expected 1

I get this warning and a crash just when calling

open('f.root').get('nllscan').arrays()

file: https://www.dropbox.com/s/1qimjkzz6moppgr/f.root?dl=0

/afs/cern.ch/user/t/turra/higgs-moriond/venv/lib/python2.7/site-packages/uproot/tree.py:1395: UserWarning: attempted to read missing baskets from incompletely written file, but encountered ValueError: recovered 2 baskets, expected 1
warnings.warn("attempted to read missing baskets from incompletely written file, but encountered {0}: {1}".format(err.class.name, str(err)))
Traceback (most recent call last):
File "analyse_toys_correlation.py", line 13, in
t.arrays()
File "/afs/cern.ch/user/t/turra/higgs-moriond/venv/lib/python2.7/site-packages/uproot/tree.py", line 402, in arrays
futures = [(branch.name, branch.array(interpretation=interpretation, entrystart=entrystart, entrystop=entrystop, cache=cache, basketcache=basketcache, keycache=keycache, executor=executor, blocking=False)) for branch, interpretation in branches]
File "/afs/cern.ch/user/t/turra/higgs-moriond/venv/lib/python2.7/site-packages/uproot/tree.py", line 1154, in array
basket_itemoffset = self._basket_itemoffset(interpretation, basketstart, basketstop, keycache)
File "/afs/cern.ch/user/t/turra/higgs-moriond/venv/lib/python2.7/site-packages/uproot/tree.py", line 1114, in _basket_itemoffset
for j, key in enumerate(self._threadsafe_iterate_keys(keycache, True, basketstart, basketstop)):
File "/afs/cern.ch/user/t/turra/higgs-moriond/venv/lib/python2.7/site-packages/uproot/tree.py", line 834, in _threadsafe_iterate_keys
key = self._basketkey(keysource, i, complete)
File "/afs/cern.ch/user/t/turra/higgs-moriond/venv/lib/python2.7/site-packages/uproot/tree.py", line 1400, in _basketkey
return self._BasketKey(source.parent(), Cursor(self.fBasketSeek[i]), uproot.source.compressed.Compression(self.fCompress), complete)
File "/afs/cern.ch/user/t/turra/higgs-moriond/venv/lib/python2.7/site-packages/uproot/tree.py", line 1320, in init
raise ValueError("TKey declares that object {0} has {1} bytes but only {2} remain in the file".format(repr(self.fName), self.fNbytes, source.size() - self.fSeekKey))
AttributeError: '_BasketKey' object has no attribute 'fName'

XRootD: I/O operation on closed file

In uproot/source/xrootd.py there is a multithreading issue causing the XRootD File objects to be closed prematurely. This is caused by the connections closed inside XRootDSource.__del__ despite being shared between instances of XRootDSource. If prevent this by changing Line 78 to out._source = None everything works correctly.

For completeness, I'm using uproot 2.8.16, XRootD 4.8.2 and Python 3.6 and the simplest example of this bug I have is effectively:

import uproot
from joblib import Parallel, delayed
fn = 'root://a_file_with_two_or_more_ttrees.root'
def process_file(fn, key):
    f = uproot.open(fn)
    arr = f[key].arrays()
a = Parallel(n_jobs=1, verbose=100, backend='threading')(delayed(process_file)(fn, key) for key in uproot.open(fn).keys())

Jagged array elements without fLeafCount

There are some elements, interpreted as jagged array, for which LeafCount can not be found, for example:

import uproot
print uproot.version.__version__

utree = uproot.open("/data/NOvA/fardet_r00021429_s37_t00_R16-03-03-prod2reco.f_v1_data.caf.root")
tree = utree["recTree"]

x = tree["sel.cvn.output"]
a = x.array()
print type(a)
print x.fLeaves[0].fLeafCount

Output:

2.3.3
<class 'uproot.interp.jagged.JaggedArray'>
None

Encoding problem when installing

Issue:
In Debian Jessie with python 3 installed, installing uproot fails with the following error:

>>pip install uproot

Collecting uproot
  Downloading uproot-2.8.12.tar.gz (9.7MB)
    100% |################################| 9.7MB 177kB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-wl1a7pfo/uproot/setup.py", line 74, in <module>
        long_description = get_description(),
      File "/tmp/pip-build-wl1a7pfo/uproot/setup.py", line 44, in get_description
        description = open("README.rst").read()
      File "/build/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 595: ordinal not in range(128)

This error does not happen on my local machine (Mac in a virtual environment).
I believe the error comes from the fact that the encodings are different.

Solution:
The solution I found consists in being explicit in the encoding to use.
Basically, changing https://github.com/scikit-hep/uproot/blob/master/setup.py#L44
into

description = open("README.rst",'rb').read().decode('utf8', 'ignore')

solves the problem in my docker (and also on my Mac).
Should be an easy fix...

Open TDirectoryFile

I was trying to open a regular ROOT file with the following structure:

KEY: TDirectoryFile   events;1        events

TDirectoryFile*         events  events
 KEY: TTree     events;1       Tree containing event data

If I try to open the file and change the directory with

f = uproot.open('test.root')['events']

I only get a <Undefined (no class named 'TDirectoryFile') at 0x7f5cc9f9d0b8>.

I have also tried to rename the TTree to something else that is not identical to the TDirectory name and this works as expected! So uproot seems to have a problem if the TDirectory and TTree in it have the same name.

Fail to construct JaggedArray

Jim,
I got this error while reading one of the root files:

  File "build/bdist.macosx-10.13-x86_64/egg/uproot/tree.py", line 450, in iterate
  File "build/bdist.macosx-10.13-x86_64/egg/uproot/tree.py", line 433, in <lambda>
  File "build/bdist.macosx-10.13-x86_64/egg/uproot/tree.py", line 1032, in <lambda>
  File "build/bdist.macosx-10.13-x86_64/egg/uproot/interp/jagged.py", line 66, in empty
TypeError: __init__() takes exactly 4 arguments (3 given)

Turns out it is a bug here:
https://github.com/scikit-hep/uproot/blob/master/uproot/interp/jagged.py#L66
since JaggedArray ctor requires 3 parameters
https://github.com/scikit-hep/uproot/blob/master/uproot/interp/jagged.py#L224

So, there is a missing stops parameter in L66.

Underscores in branch/leaf names are not accepted

It appears that underscores in branch names are not accepted. One tree I am trying to use uproot on has a field "run_data" which causes the error:

/usr/local/lib/python3.5/dist-packages/uproot-2.4.1-py3.5.egg/uproot/rootio.py in <genexpr>(.0)
    554 
    555 def _safename(name):
--> 556     out = re.sub(b"[^a-zA-Z0-9]+", lambda bad: b"_" + b"".join(b"%02x" % ord(x) for x in bad.group(0)) + b"_", name).decode("ascii")
    557     if keyword.iskeyword(out):
    558         out = out + "__"

TypeError: ord() expected string of length 1, but int found

It seems to be caused by name = b"run_data". That indeed causes

In [16]: name = b"run_data"

In [17]: re.sub(b"[^a-zA-Z0-9]+", lambda bad: b"_" + b"".join(b"%02x" % ord(x) for x in bad.group(0)) + b
    ...: "_", name).decode("ascii")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-3b7fca76a1b8> in <module>()
----> 1 re.sub(b"[^a-zA-Z0-9]+", lambda bad: b"_" + b"".join(b"%02x" % ord(x) for x in bad.group(0)) + b"_", name).decode("ascii")

(etc)

A suggested fix is to add the underscore to the matching string, e.g.

re.sub(b"[^_a-zA-Z0-9]+", lambda bad: b"_" + b"".join(b"%02x" % ord(x) for x in bad.group(0)) + b"_", name).decode("ascii")

I am not sure, however, whether the underscore is also used for token separation within the ROOT file format.

Relevant line in rootio.py is https://github.com/scikit-hep/uproot/blob/8eabb7ccc6de7f9ac87c9a4f6141de1c318736a6/uproot/rootio.py#L556

Iteration over files and Pandas dataframe

Hi,

I need to iterate over a number of files. In the examples, I do not understand if there's an easy way to convert the "chain" into a Pandas data frame. I may find a workaround but that may be a useful feature nonetheless.

Cheers,
Riccardo

index out of bound

If I use the following root file /afs/cern.ch/user/v/valya/public/relval-qcd-7.4.root
I get different errors. Some of them related to not implemented support to TList. But others seems to unable to parse particular tree branch. Here is code snippet to reproduce:

import uproot
import traceback

def treeContent(tree):
    print(tree.contents)
    for key in tree.contents:
        branch = key.split(';')[0]
        print("key", key, branch)
        try:
            print(tree[branch])
        except:
            traceback.print_exc()

fname = '/afs/cern.ch/user/v/valya/public/relval-qcd-7.4.root'
tree = uproot.open(fname)
treeContent(tree)

And here is traceback for index of bound error

Traceback (most recent call last):
  File "vk_test.py", line 10, in treeContent
    print(tree[branch])
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/rootio.py", line 130, in __getitem__
    return self.get(name)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/rootio.py", line 153, in get
    return self.dir.get(name, cycle)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/rootio.py", line 213, in get
    out = out.keys.get(n, cycle)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/rootio.py", line 284, in get
    return key.get()
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/rootio.py", line 338, in get
    out = Deserialized.classes[self.classname](self._filewalker, self._walker)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 106, in __init__
    self.branches = list(uproot.core.TObjArray(filewalker, walker))
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/core.py", line 40, in __init__
    self.items = [uproot.rootio.Deserialized._deserialize(filewalker, walker) for i in range(nobjs)]
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/rootio.py", line 426, in _deserialize
    obj = fct(filewalker, walker)  # new object
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 991, in __init__
    TBranch.__init__(self, filewalker, walker)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 512, in __init__
    self.branches = list(uproot.core.TObjArray(filewalker, walker))
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/core.py", line 40, in __init__
    self.items = [uproot.rootio.Deserialized._deserialize(filewalker, walker) for i in range(nobjs)]
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/rootio.py", line 426, in _deserialize
    obj = fct(filewalker, walker)  # new object
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 991, in __init__
    TBranch.__init__(self, filewalker, walker)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 506, in __init__
    uproot.core.TNamed.__init__(self, filewalker, walker)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/core.py", line 72, in __init__
    self.title = walker.readstring()
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/_walker/lazyarraywalker.py", line 89, in readstring
    return super(LazyArrayWalker, self).readstring(index, length)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/_walker/arraywalker.py", line 91, in readstring
    length = self.data[index]
IndexError: index 84202969 is out of bounds for axis 0 with size 4785388

Error while reading vector

Hi,

I am trying to read a branch of type vector but I am getting some error. Below I described what I did:

>>> import uproot
>>> f = uproot.open("/eos/uscms/store/user/rasharma/LHE_GEN_Analyzer_Output/WPlepWMhad_aQGC.root")["otree"]
>>> a = f.array("LHEWeights")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "uproot/tree.py", line 347, in array
    return self.get(branch).array(interpretation=interpretation, entrystart=entrystart, entrystop=entrystop, cache=cache, basketcache=basketcache, keycache=keycache, executor=executor, blocking=blocking)
  File "uproot/tree.py", line 960, in array
    destination = interpretation.destination(basket_itemoffset[-1], basket_entryoffset[-1])
  File "uproot/interp/jagged.py", line 88, in destination
    contents = self.asdtype.destination(numitems, numentries)
  File "uproot/interp/numerical.py", line 123, in destination
    return numpy.empty((numitems // product,) + self.todims, dtype=self.todtype)
MemoryError

Please let me know, how I can read the branch of type vector?

Thanks,
Ram

Unable to read ntuple with std::vector after doing 'hadd'

I ran into a strange case where I could not read a ntuple with std::vector after doing 'hadd'. I created a test ntuple to reproduce the issue. It contains one event and has several branches:

- v_int16 : std::vector<int16_t>
- v_int32 : std::vector<int32_t>
- v_int64 : std::vector<int64_t>
- v_uint16: std::vector<uint16_t>
- v_uint32: std::vector<uint32_t>
- v_uint64: std::vector<uint64_t>
- v_bool  : std::vector<bool>
- v_float : std::vector<float>
- v_double: std::vector<double>

In uproot 2.5.10, I tried the following and it worked:

>>> f = uproot.open('stlvector.root')
>>> print f.keys()
['ntupler;1']
>>> tree = f['ntupler/tree']
>>> print tree.keys()
['v_int16', 'v_int32', 'v_int64', 'v_uint16', 'v_uint32', 'v_uint64', 'v_bool', 'v_float', 'v_double']
>>> print tree.array('v_int16')
[[1 2 3]]
>>> print tree.array('v_int32')
[[1 2 3]]
>>> print tree.array('v_int64')
[[1 2 3]]
>>> print tree.array('v_uint16')
[[1 2 3]]
>>> print tree.array('v_uint32')
[[1 2 3]]
>>> print tree.array('v_uint64')
[[1 2 3]]
>>> print tree.array('v_bool')
[[False  True]]
>>> print tree.array('v_float')
[[ 999. -999.]]
>>> print tree.array('v_double')
[[ 999. -999.]]

However, if I use hadd [1], I got errors (like [2] for std::vector<int16>) for doing the same thing as above:

>>> f = uproot.open('stlvector_after_hadd.root')
>>> print f.keys()
['ntupler;1']
>>> tree = f['ntupler/tree']
>>> print tree.keys()
['v_int16', 'v_int32', 'v_int64', 'v_uint16', 'v_uint32', 'v_uint64', 'v_bool', 'v_float', 'v_double']
>>> print tree.array('v_int16')
ValueError: cannot interpret branch 'v_int16' as a Python type
>>> print tree.array('v_int32')
ValueError: cannot interpret branch 'v_int32' as a Python type
>>> print tree.array('v_int64')
ValueError: cannot interpret branch 'v_int64' as a Python type
>>> print tree.array('v_uint16')
ValueError: cannot interpret branch 'v_uint16' as a Python type
>>> print tree.array('v_uint32')
ValueError: cannot interpret branch 'v_uint32' as a Python type
>>> print tree.array('v_uint64')
ValueError: cannot interpret branch 'v_uint64' as a Python type
>>> print tree.array('v_bool')
ValueError: cannot interpret branch 'v_bool' as a Python type
>>> print tree.array('v_float')
ValueError: cannot interpret branch 'v_float' as a Python type
>>> print tree.array('v_double')
[[ 999. -999.], [ 999. -999.], [ 999. -999.]]

I also tried

>>> print uproot.interpret(tree['v_int16'])
None
>>> print tree['v_int16']._streamer
None

So it appears that the Streamer info is missing somehow after doing 'hadd'? It's strange... I attach the root files 'stlvector.root' and 'stlvector_after_hadd.root'. I'd appreciate it if you could look into this. Thank you!

Best regards,
Jia Fu

[1]

hadd -f stlvector_after_hadd.root  stlvector.root stlvector.root stlvector.root

[2]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/jiafu/muon_work/venv/lib/python2.7/site-packages/uproot/tree.py", line 347, in array
    return self.get(branch).array(interpretation=interpretation, entrystart=entrystart, entrystop=entrystop, cache=cache, basketcache=basketcache, keycache=keycache, executor=executor, blocking=blocking)
  File "/tmp/jiafu/muon_work/venv/lib/python2.7/site-packages/uproot/tree.py", line 933, in array
    interpretation = self._normalize_interpretation(interpretation)
  File "/tmp/jiafu/muon_work/venv/lib/python2.7/site-packages/uproot/tree.py", line 717, in _normalize_interpretation
    raise ValueError("cannot interpret branch {0} as a Python type".format(repr(self.name)))
ValueError: cannot interpret branch 'v_int16' as a Python type

[3]

# My setup
virtualenv venv
source venv/bin/activate
pip install --upgrade pip
pip install uproot

uproot.open should implement support for with statement

In python almost all open operations can be supported in with statement. You need to implement open with enter and exit method to do that. The following snippet

try:
    with uproot.open(fname) as tree:
        treeContent(tree)
except:
    traceback.print_exc()

produces the following traceback

Traceback (most recent call last):
  File "vk_test.py", line 17, in <module>
    with uproot.open(fname) as tree:
AttributeError: __exit__

Handle first TIOFeature bit: baskets without navigation buffers

To save space, ROOT 6.12 is capable of writing TTrees that have variable-length branches without navigation buffers. This is only if the navigation data is redundant with a leaf count. In these cases, we should impute missing navigation data with a cumulative sum of the leaf count.

Unable to find/read data

Hi, I'm trying to read CMSSW GEN-SIM-RECO ROOT file (/afs/cern.ch/user/v/valya/public/E6B7DD7A-D398-E611-BD88-0025905B85DE.root)
with uproot and I know from ROOT object Browser that it contains data, e.g. it has recoTracks_generalTracks__RECO branch which has
two other branches recoTracks_generalTracks__RECO.present and recoTracks_generalTracks__RECO.obj, see https://www.dropbox.com/s/3xct7jskrelne56/RecoTracks.png?dl=0

Now, I read the file and dumped all branches, I can see the data for recoTracks_generalTracks__RECO.present(it is array of booleans, all Trues) but recoTracks_generalTracks__RECO.obj branch does not exists. Instead I see recoTracks_generalTracks__RECO. branch. If I access it it contains empty array.

    with uproot.open(fname) as tree:
        eTree = tree['Events']
        names = ['recoTracks_generalTracks__RECO.present', 'recoTracks_generalTracks__RECO.', 'recoTracks_generalTracks__RECO.obj']
        for bName in names:
            print("Branch {} exists {}".format(bName, bName in eTree.branchnames))
            data = eTree.array(bName)
            print(data)

and I got the following:

Branch recoTracks_generalTracks__RECO.present exists False
[ True  True  True  True  True  True  True  True  True  True  True  True...]
Branch recoTracks_generalTracks__RECO. exists True
[]
Branch recoTracks_generalTracks__RECO.obj exists False
Traceback (most recent call last):
  File "./vk_test.py", line 41, in <module>
    data = eTree.array(bName)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 529, in array
    out, = self.arrays(branchdtypes=branchdtypes, executor=executor, outputtype=tuple, block=block)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 447, in arrays
    for branch, dtype in self._normalizeselection(branchdtypes, self.allbranches):
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 266, in _normalizeselection
    raise ValueError("cannot produce an array from branch {0}".format(repr(name)))
ValueError: cannot produce an array from branch 'recoTracks_generalTracks__RECO.obj'

So, there are few questions:

  • why recoTracks_generalTracks__RECO.present is not present in branchnames list, but it is listed if I iterate over tree arrays keys like tree.arrays().keys()
  • why this branch actually contains data
  • why uproot does not see Branch recoTracks_generalTracks__RECO.obj
  • why it provides recoTracks_generalTracks__RECO. branch which is not shown in ROOT object Browser
  • if recoTracks_generalTracks__RECO. is indeed recoTracks_generalTracks__RECO.obj why it is empty, since I can see data in ROOT object browser.

I understand traceback because I'm trying to access non-existing branch.

Cache bug while reading array of data

If I read /afs/cern.ch/user/v/valya/public/nano-RelValTTBar.root file with the following code:

def read(fin, branch='Events'):
    normalBranches = []
    with uproot.open(fin) as istream:
        tree = istream[branch]
        branches = [n for _, n in tree.allitems()]
        normalBranches = []
        jaggedBranches = []
        for key, val in tree.allitems():
            data = val.array()
            if not isinstance(data, uproot.interp.jagged.JaggedArray):
                normalBranches.append(key)
            else:
                jaggedBranches.append(key)
        print("\n### Cache error in reading branches")
        cache = {}
        try:
            for key in branches:
                data = tree[key].array(cache=cache)
        except:
            traceback.print_exc()
        print("\n### Cache error in reading normal branches")
        cache = {}
        try:
            for key in normalBranches:
                data = tree[key].array(cache=cache)
        except:
            traceback.print_exc()
        print("\n### Cache error in reading jagged branches")
        cache = {}
        try:
            for key in normalBranches:
                data = tree[key].array(cache=cache)
        except:
            traceback.print_exc()

read('/opt/cms/data/nano-RelValTTBar.root')

I encounter different errors while applying cache to array, see errors below:

### Cache error in reading branches
Traceback (most recent call last):
  File "./cache_bug.py", line 42, in read
    data = tree[key].array(cache=cache)
  File "/Users/vk/Downloads/uproot-2.5.10/uproot/tree.py", line 539, in __getitem__
    return self.get(name)
  File "/Users/vk/Downloads/uproot-2.5.10/uproot/tree.py", line 275, in get
    raise KeyError("not found: {0}".format(repr(name)))
KeyError: "not found: <TBranch 'run' at 0x0001052366d0>"

### Cache error in reading normal branches
Traceback (most recent call last):
  File "./cache_bug.py", line 49, in read
    data = tree[key].array(cache=cache)
  File "/Users/vk/Downloads/uproot-2.5.10/uproot/tree.py", line 938, in array
    cachekey = self._cachekey(interpretation, entrystart, entrystop)
  File "/Users/vk/Downloads/uproot-2.5.10/uproot/tree.py", line 620, in _cachekey
    return "{0};{1};{2};{3};{4}-{5}".format(self._context.sourcepath, self._context.treename, self.name, interpretation.identifier, entrystart, entrystop)
  File "/Users/vk/Downloads/uproot-2.5.10/uproot/interp/numerical.py", line 93, in identifier
    fromdtype = "{0}{1}{2}".format(self._byteorder_transform[self.fromdtype.byteorder], self.fromdtype.kind, self.fromdtype.itemsize)
KeyError: '|'

### Cache error in reading jagged branches
Traceback (most recent call last):
  File "./cache_bug.py", line 56, in read
    data = tree[key].array(cache=cache)
  File "/Users/vk/Downloads/uproot-2.5.10/uproot/tree.py", line 938, in array
    cachekey = self._cachekey(interpretation, entrystart, entrystop)
  File "/Users/vk/Downloads/uproot-2.5.10/uproot/tree.py", line 620, in _cachekey
    return "{0};{1};{2};{3};{4}-{5}".format(self._context.sourcepath, self._context.treename, self.name, interpretation.identifier, entrystart, entrystop)
  File "/Users/vk/Downloads/uproot-2.5.10/uproot/interp/numerical.py", line 93, in identifier
    fromdtype = "{0}{1}{2}".format(self._byteorder_transform[self.fromdtype.byteorder], self.fromdtype.kind, self.fromdtype.itemsize)
KeyError: '|'

vector<bool> not supported

Hi,
I've seen that in uproot is not supported the type vector when I try to read one of them from a root file. Let me show you an example. I run this code

import uproot
t=uproot.open("/afs/cern.ch/work/b/bonacor/public/small10kevts.root")["events"]
t.show()
for (array,) in t.iterate("triggerBit", entrysteps=1000, outputtype=tuple):
    print len(array)

and I have this output

runNo                      (no streamer)              asdtype('>i4')
evtNo                      (no streamer)              asdtype('>i4')
lumi                       (no streamer)              asdtype('>i4')
nvtx                       (no streamer)              asdtype('>i4')
nJets                      (no streamer)              asdtype('>i4')
nLeptons                   (no streamer)              asdtype('>i4')
nBJets                     (no streamer)              asdtype('>i4')
category                   (no streamer)              asdtype('>i4')
rho                        (no streamer)              asdtype('>f4')
ht                         (no streamer)              asdtype('>f4')
mva                        (no streamer)              asdtype('>f4')
met                        (no streamer)              asdtype('>f4')
metSig                     (no streamer)              asdtype('>f4')
mJJ                        (no streamer)              asdtype('>f4')
yJJ                        (no streamer)              asdtype('>f4')
ptJJ                       (no streamer)              asdtype('>f4')
dRJJ                       (no streamer)              asdtype('>f4')
dPhiJJ                     (no streamer)              asdtype('>f4')
dPhiLJ                     (no streamer)              asdtype('>f4')
jetIsBtag                  TStreamerSTL               None
jetFlavor                  TStreamerSTL               asstlvector(asdtype('>i4'))
jetFlavorHadron            TStreamerSTL               asstlvector(asdtype('>i4'))
jetNSub                    TStreamerSTL               asstlvector(asdtype('>i4'))
jetNBSub                   TStreamerSTL               asstlvector(asdtype('>i4'))
jetPt                      TStreamerSTL               asstlvector(asdtype('>f4'))
jetBtag                    TStreamerSTL               asstlvector(asdtype('>f4'))
jetEta                     TStreamerSTL               asstlvector(asdtype('>f4'))
jetPhi                     TStreamerSTL               asstlvector(asdtype('>f4'))
jetMass                    TStreamerSTL               asstlvector(asdtype('>f4'))
jetMassSoftDrop            TStreamerSTL               asstlvector(asdtype('>f4'))
jetChf                     TStreamerSTL               asstlvector(asdtype('>f4'))
jetNhf                     TStreamerSTL               asstlvector(asdtype('>f4'))
jetPhf                     TStreamerSTL               asstlvector(asdtype('>f4'))
jetMuf                     TStreamerSTL               asstlvector(asdtype('>f4'))
jetElf                     TStreamerSTL               asstlvector(asdtype('>f4'))
jetTau1                    TStreamerSTL               asstlvector(asdtype('>f4'))
jetTau2                    TStreamerSTL               asstlvector(asdtype('>f4'))
jetTau3                    TStreamerSTL               asstlvector(asdtype('>f4'))
jetBtagSub0                TStreamerSTL               asstlvector(asdtype('>f4'))
jetBtagSub1                TStreamerSTL               asstlvector(asdtype('>f4'))
jetMassSub0                TStreamerSTL               asstlvector(asdtype('>f4'))
jetMassSub1                TStreamerSTL               asstlvector(asdtype('>f4'))
jetPtSub0                  TStreamerSTL               asstlvector(asdtype('>f4'))
jetPtSub1                  TStreamerSTL               asstlvector(asdtype('>f4'))
jetEtaSub0                 TStreamerSTL               asstlvector(asdtype('>f4'))
jetEtaSub1                 TStreamerSTL               asstlvector(asdtype('>f4'))
jetPhiSub0                 TStreamerSTL               asstlvector(asdtype('>f4'))
jetPhiSub1                 TStreamerSTL               asstlvector(asdtype('>f4'))
jetFlavorSub0              TStreamerSTL               asstlvector(asdtype('>i4'))
jetFlavorSub1              TStreamerSTL               asstlvector(asdtype('>i4'))
jetFlavorHadronSub0        TStreamerSTL               asstlvector(asdtype('>i4'))
jetFlavorHadronSub1        TStreamerSTL               asstlvector(asdtype('>i4'))
lepId                      TStreamerSTL               asstlvector(asdtype('>i4'))
lepPt                      TStreamerSTL               asstlvector(asdtype('>f4'))
lepEta                     TStreamerSTL               asstlvector(asdtype('>f4'))
lepPhi                     TStreamerSTL               asstlvector(asdtype('>f4'))
lepE                       TStreamerSTL               asstlvector(asdtype('>f4'))
lepIso                     TStreamerSTL               asstlvector(asdtype('>f4'))
triggerBit                 TStreamerSTL               None
triggerPre                 TStreamerSTL               asstlvector(asdtype('>i4'))
Traceback (most recent call last):
  File "solution.py", line 7, in <module>
    for (array,) in t.iterate("triggerBit", entrysteps=1000, outputtype=tuple):
  File "/afs/cern.ch/user/b/bonacor/.local/lib/python2.6/site-packages/uproot-2.5.17-py2.6.egg/uproot/tree.py", line 415, in iterate
    branches = list(self._normalize_branches(branches))
  File "/afs/cern.ch/user/b/bonacor/.local/lib/python2.6/site-packages/uproot-2.5.17-py2.6.egg/uproot/tree.py", line 507, in _normalize_branches
    raise ValueError("cannot interpret branch {0} as a Python type".format(repr(name)))
ValueError: cannot interpret branch 'triggerBit' as a Python type

So triggerBit is a TStreamerSTL of an unknown type but, if I inspect the root file using the Print() function on the tree events, I can see that

*............................................................................*
*Br   58 :triggerBit : vector<bool>                                          *
*Entries :    10000 : Total  Size=     241281 bytes  File Size  =      38703 *
*Baskets :        9 : Basket Size=      32000 bytes  Compression=   6.22     *
*............................................................................*

Actually triggerBit should be a vector of 10 elements (where each one is 0 or 1), because if I inspect this branch there are 100000 entries and not 10000. So what happens? Seems that uproot didn't recognise the vector type, right?

Thanks
Luca

@vkuznet

How do I use the object returned by countleaf ?

I can not figure out how to use the object returned by tree[key].countleaf. It does not appear to have any useful attributes or methods. In particular, I do not see how to get the counting branch name.

pt = tree["Muon.pt"]
c = pt.countleaf
print type(c)
print c
print dir(c)
print c.fName

<class 'uproot.rootio.TLeafElement'>
<TLeafElement 'Muon_' at 0x7f658c5cc950>
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__metaclass__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_classname', '_classversion', '_copycontext', '_fields', '_format1', '_postprocess', '_pycode', '_readinto', '_versions', 'fID', 'fIsRange', 'fIsUnsigned', 'fLeafCount', 'fLen', 'fLenType', 'fName', 'fOffset', 'fTitle', 'fType', 'read']
Muon_

Speed improvements and numba jit failures

If I read /afs/cern.ch/user/v/valya/public/nano-RelValTTBar.root file with the following code:

from __future__ import print_function, division, absolute_import

# system modules
import os
import sys
import time

# numpy
import numpy as np

# uproot
import uproot

from numba import jit

#@jit
def read(fin, branch='Events'):
    normalBranches = []
    with uproot.open(fin) as istream:
        tree = istream[branch]
        for key, val in tree.allitems():
            data = val.array()
            if not isinstance(data, uproot.interp.jagged.JaggedArray):
                normalBranches.append(key)
        print("number of non-jagged branches %s" % len(normalBranches))
        time0 = time.time()
        for key in normalBranches:
            data = tree[key].array()
        print("elapsed time", time.time()-time0)

read('/opt/cms/data/nano-RelValTTBar.root')

the time I spent reading the non-jagged (I call them normal) branches is quite high, it is 1.8sec on my Mac. I understand that I can refactor code to access array once (I did it on purpose to demonstrate the time I read only non-jagged array, the first pass I identify non-jagged branches, in second pass I measure time to read them all).

If I apply jit decorator (commented out in a code) I get the following error:

Traceback (most recent call last):
  File "./test.py", line 40, in <module>
    read('/opt/cms/data/nano-RelValTTBar.root')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/dispatcher.py", line 307, in _compile_for_args
    return self.compile(tuple(argtypes))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/dispatcher.py", line 579, in compile
    cres = self._compiler.compile(args, return_type)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/dispatcher.py", line 80, in compile
    flags=flags, locals=self.locals)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/compiler.py", line 763, in compile_extra
    return pipeline.compile_extra(func)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/compiler.py", line 360, in compile_extra
    return self._compile_bytecode()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/compiler.py", line 722, in _compile_bytecode
    return self._compile_core()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/compiler.py", line 709, in _compile_core
    res = pm.run(self.status)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/compiler.py", line 246, in run
    raise patched_exception
AssertionError: Caused By:
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/compiler.py", line 238, in run
    stage()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/compiler.py", line 374, in stage_analyze_bytecode
    func_ir = translate_stage(self.func_id, self.bc)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/compiler.py", line 827, in translate_stage
    return interp.interpret(bytecode)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/interpreter.py", line 92, in interpret
    self.cfa.run()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numba/controlflow.py", line 515, in run
    assert not inst.is_jump, inst
AssertionError: SETUP_WITH(arg=176, lineno=28)

Failed at object (analyzing bytecode)
SETUP_WITH(arg=176, lineno=28)

I'm new to numba and probably doing something stupid, but I got the impression that code is Numba aware which can speed up things.

The main question is does this benchmark to read 583 branches in 1.8sec is expected speed, can it be improved? If so, how?

The main use case here is to develop a reader which will read physics events. Therefore I need to apply cache for normal and jagged branches. As I pointed out in #40 the cache parameter cause an error, I can use my own cache to store data, but depending on dimension of branches and number of events this can cause large memory overhead. And, spending almost 2sec per event to
read branches is somewhat "poor" performance to me.

Incorrect number of entries returned by uproot

When opening a ttree with 50000 entries uproot returns numpy arrays containing either 47856 or 47868 entries depending on the column. Plotting the data with TBrowser and loading the data root_pandas both work as expected.

The file I've seen this with can be downloaded using:

xrdcp root://eospublic.cern.ch//eos/opendata/lhcb/AntimatterMatters2017/data/PhaseSpaceSimulation.root .

Using uproot

In [0]: import uproot
In [1]: f = uproot.open('PhaseSpaceSimulation.root')
In [2]: tree = f['PhaseSpaceTree']
In [3]: {var: len(data) for var, data in tree.arrays().items()}
{b'B_FlightDistance': 47856,
 b'B_VertexChi2': 47856,
 b'H1_Charge': 47868,
 b'H1_IPChi2': 47868,
 b'H1_PX': 47868,
 b'H1_PY': 47868,
 b'H1_PZ': 47868,
 b'H1_ProbK': 47868,
 b'H1_ProbPi': 47868,
 b'H1_isMuon': 47868,
 b'H2_Charge': 47868,
 b'H2_IPChi2': 47868,
 b'H2_PX': 47868,
 b'H2_PY': 47868,
 b'H2_PZ': 47868,
 b'H2_ProbK': 47868,
 b'H2_ProbPi': 47868,
 b'H2_isMuon': 47868,
 b'H3_Charge': 47868,
 b'H3_IPChi2': 47868,
 b'H3_PX': 47868,
 b'H3_PY': 47868,
 b'H3_PZ': 47868,
 b'H3_ProbK': 47868,
 b'H3_ProbPi': 47868,
 b'H3_isMuon': 47868}

Using ROOT pandas

In [0]: import root_pandas
In [1]: df = root_pandas.read_root('PhaseSpaceSimulation.root')
In [2]: df.apply(len)
[All columns contain 50000 entries as expected]
In [3]: df.isnull().sum()
[No columns contain nan values]

Request support for tree branch with vector<TLorentzVector> objects

In giving UpRoot a try we noticed it does not currently support tree branches containing vector objects. Many analysis make use of 4-vector information for many types particles in the event. Having access to a branch with stored TLorentzVector objects would be very useful. It would be awesome if such capability could be added as it would benefit the physics community at large.

Support for nested vectors

This looks awesome, do you plan to support arbitrarily nested vectors? The documentation mentions support for vectors of native types but it's not clear if this extends to nested vectors.

std::string branches cannot be interpreted

I'm saving a few std::strings as single entry branches as meta data in a tree. To retrieve that data I'm trying to just use the structure:

>>> meta_info_str = uproot.open('file.root')['tree'].array('meta')[0]

Where meta is an std::string branch.

I'm getting the ValueError:

cannot interpret branch b'meta' as a Python type

They are saved in C++ as follows:

/// header .. define class member
TTree* m_tree;
std::string m_some_meta_string;
/// in source .. set the branch
m_tree = new TTree("tree","tree");
m_tree->Branch("meta",&m_some_meta_string);
/// later assign it
m_some_meta_string = function_which_returns_a_string();

An example file can be found here: https://phy.duke.edu/~ddavis/public/example.root
The tree with some strings is called WtLoop_meta. String branches include generator, sampleType, campaign, and initialState. If I use tree.show() I see a None interpretation for the strings.

Docs say that uproot should be able to figure out std::string, am I screwing something up (e.g. saving the string to the ROOT file in a way that's incompatible with uproot)?

Using ROOT v6-10-08 to create the file and uproot v2.8.8 to read.

ROOT-6.12.04 support: issue with new ROOT::TIOFeatures field in TTree

hi,

trying to read a ROOT file generated w/ ROOT-6.12.04, I get:

>>> import uproot
>>> uproot.__file__
'/home/binet/tmp/uproot/uproot/build/lib/uproot/__init__.py'
>>> uproot.open("/home/binet/dev/hepsw/go/src/go-hep.org/x/hep/rootio/testdata/stdvec-bool.root")
<ROOTDirectory b'stdvec-bool.root' at 0x7fca9a156b38>
>>> f=_
>>> f.get("tree")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/binet/tmp/uproot/uproot/build/lib/uproot/rootio.py", line 315, in get
    return key.get()
  File "/home/binet/tmp/uproot/uproot/build/lib/uproot/rootio.py", line 803, in get
    return self._context.classes[classname].read(self._source, self._cursor.copied(), self._context)
  File "/home/binet/tmp/uproot/uproot/build/lib/uproot/rootio.py", line 741, in read
    out = cls._readinto(out, source, cursor, context)
  File "<generated from TStreamerInfo b'TTree' at 0x7fca9a150b00>", line 19, in _readinto
  File "/home/binet/tmp/uproot/uproot/build/lib/uproot/rootio.py", line 741, in read
    out = cls._readinto(out, source, cursor, context)
  File "<generated from TStreamerInfo b'ROOT::TIOFeatures' at 0x7fca9b0d9048>", line 9, in _readinto
ValueError: attempting to read ROOT_3a3a_TIOFeatures object version 0 with a class generated by streamer version 1

the input file is available there:

Getting branch type and shape without reading data

Is there a way to get the branch type and shape without actually reading the data ?
In particular, is there a way to do something like this:

b = tree["path"]
btype = b.type() # ndarray or JaggedArray ?
bshape = b.shape() # (nevents,) for ndarray and (nevents, None) for Jagged

This would significantly speed up the process of schema inference.

Thanks.

uproot + pyroot conflict?

Hi @jpivarski - We just added uproot to the CMSSW python stack. We have a unit test that basically imports everything in our python stack to make sure that installations are somewhat sane. From this unit test it seems uproot conflicts with other things in our stack if imported in the same python session. Is this to be expected?

Eg

import uproot

works, but

import rootpy
import uproot

ERROR:ROOT.TUnixSystem.DispatchSignals] segmentation violation
ERROR:stack] File "", line 1, in
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/lcg/root/6.10.09-omkpbe3/lib/ROOT.py", line 318, in _importhook
ERROR:stack] return _orig_ihook( name, *args, **kwds )
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/external/py2-pippkgs_depscipy/3.0-omkpbe3/lib/python2.7/site-packages/uproot/init.py", line 33, in
ERROR:stack] from uproot.tree import iterate
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/lcg/root/6.10.09-omkpbe3/lib/ROOT.py", line 318, in _importhook
ERROR:stack] return _orig_ihook( name, *args, **kwds )
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/external/py2-pippkgs_depscipy/3.0-omkpbe3/lib/python2.7/site-packages/uproot/tree.py", line 77, in
ERROR:stack] from uproot.interp.auto import interpret
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/lcg/root/6.10.09-omkpbe3/lib/ROOT.py", line 318, in _importhook
ERROR:stack] return _orig_ihook( name, *args, **kwds )
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/external/py2-pippkgs_depscipy/3.0-omkpbe3/lib/python2.7/site-packages/uproot/interp/init.py", line 33, in
ERROR:stack] from uproot.interp.jagged import asjagged
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/lcg/root/6.10.09-omkpbe3/lib/ROOT.py", line 318, in _importhook
ERROR:stack] return _orig_ihook( name, *args, **kwds )
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/external/py2-pippkgs_depscipy/3.0-omkpbe3/lib/python2.7/site-packages/uproot/interp/jagged.py", line 53, in
ERROR:stack] _compactify = numba.njit(_compactify)
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/external/py2-pippkgs_depscipy/3.0-omkpbe3/lib/python2.7/site-packages/numba/decorators.py", line 225, in njit
ERROR:stack] return jit(*args, **kws)
ERROR:stack] File "/cvmfs/cms-ib.cern.ch/nweek-02514/slc6_amd64_gcc630/external/py2-pippkgs_depscipy/3.0-omkpbe3/lib/python2.7/site-packages/numba/decorators.py", line 167, in jit

(very long set of tracebacks truncated - I can provide if useful)

Add Zenodo DOI

Hi Jim. What do you think about getting a Zenodo DOI for uproot (and badge for the README)? I think it would be nice for citations.

Error while readin AOD file

Jim, now I'm trying to read AOD file
/afs/cern.ch/user/v/valya/public/C84930B2-7C55-E711-B915-02163E014722.root
and it fails right away:

    for branchname in tree.arrays().keys():
        print(branchname)

gives

Traceback (most recent call last):
  File "vk_test.py", line 38, in <module>
    branchNames(eTree)
  File "vk_test.py", line 24, in branchNames
    for branchname in tree.arrays().keys():
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 458, in arrays
    outi, res = branch.array(dtype, executor, False)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 1427, in array
    return TBranch.array(self, dtype, executor, block)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 1182, in array
    out[start:end] = self._basket(i, parallel=False)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/tree.py", line 857, in _basket
    self._basketwalkers[i]._evaluate(parallel)
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/_walker/lazyarraywalker.py", line 54, in _evaluate
    string = self._original_function(walker.readbytes(length))
  File "/Users/vk/Work/Languages/Python/GIT/uproot/uproot/rootio.py", line 84, in <lambda>
    return lambda x: zlib_decompress(x[9:])
error: Error -3 while decompressing data: incorrect header check

Does not work on windows

The module relies on fcntl and therefore does not work on windows:

ModuleNotFoundError: No module named 'fcntl'

Could you please use a cross-platform alternative? The package of yours is especially important for Windows users, where you can't or won't install latest root distribution.

tests/HZZ*.root typo in Photon_E branch descriptor

running root-dump on the tests/HZZ.root files I get:

$> root-dump HZZ.root
>>> file[HZZ.root]
key[000]: events;1 "" (TTree)
root-dump: error dumping file "HZZ.root": rootio: Tree "events" has no branch named "Photons_E"

and indeed:

root-ls -t HZZ.root 
=== [HZZ.root] ===
version: 53201
TTree                         events                                  (entries=2421)
  NJet                        "NJet/I"                        TBranch
[...]
  NPhoton                     "NPhoton/I"                     TBranch
  Photon_Px                   "Photon_Px[NPhoton]/F"          TBranch
  Photon_Py                   "Photon_Py[NPhoton]/F"          TBranch
  Photon_Pz                   "Photon_Pz[NPhoton]/F"          TBranch
  Photon_E                    "Photons_E[NPhoton]/F"          TBranch
  Photon_Iso                  "Photon_Iso[NPhoton]/F"         TBranch
[...]

there is a typo in the branch descriptor for Photon_E (which reads Photons_E, with an extra s).

not sure who's "right" here (uproot that presumably correctly reads that file, or groot that chokes on it)

Support opening files on the web

Normal ROOT supports opening files from the web, not directly through the constructor, but with the Open method. It would be nice if uproot did so as well.

>>> uproot.open('http://www.scikit-hep.org/uproot/examples/Zmumu.root')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/uproot/rootio.py", line 69, in open
    raise ValueError("URI scheme not recognized: {0}".format(path))
ValueError: URI scheme not recognized: http://www.scikit-hep.org/uproot/examples/Zmumu.root

Python3 can't convert branches with multiple leafs, Python2 seems to work

uproot issue with Python3 when converting rootfile with branches containing multiple leafs. Rootfile has the following leaf structure for a given branch in the tree: "valueX/D:valueY/D:valueZ/D:valueN/I".

Using the following code to open the rootfile and access branches:

import numpy
import uproot

tfile = uproot.open('rootfile_name.root')
ttree = tfile.get('tree')
branch = ttree.get('branch_name')
branch.array()

A TypeError: data type not understood is raised from the numpy.dtype function on line 141 of /uproot/interp/auto.py when using Python3.

https://github.com/scikit-hep/uproot/blob/597a2bb03a9ffb45a32e3208be0aed4781986f9f/uproot/interp/auto.py#L141

I believe this is due to the numpy.dtype function not being able to parse the names of leaves (valueX, valueY, valueZ, valueN) which carry a "bytes" type in python 3 versus a "string" type with Python2. When loading the roofiles into python with uproot are the names of the objects in the tree suppose to be interpreted as "bytes" or "string" types? numpy.dtype seems to have trouble with this depending on which version of python is used.

(Comparing Python versions 3.5.4 and 2.7.13 using numpy version 1.11.2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.