Giter VIP home page Giter VIP logo

deepntuples's Introduction

DeepNTuples

NTuple framework for DeepFlavour

Installation (CMSSW 8_0_25)

cmsrel CMSSW_8_0_25
cd CMSSW_8_0_25/src/
cmsenv
git cms-init
git clone https://github.com/CMSDeepFlavour/DeepNTuples
# Add JetToolBox
cd DeepNTuples
git submodule init
git submodule update

# Add DeepFlavour -- To be updated once the 80X PR is done
cd -
git cms-merge-topic -u mverzett:DeepFlavour-from-CMSSW_8_0_21
mkdir RecoBTag/DeepFlavour/data/
cd RecoBTag/DeepFlavour/data/
wget http://home.fnal.gov/~verzetti//DeepFlavour/training/DeepFlavourNoSL.json
cd -
#compile
scram b -j 4

Installation (CMSSW 8_1_X)

cmsrel CMSSW_8_1_0
cd CMSSW_8_1_0/src/
cmsenv
git cms-init
# Add DeepFlavour -- To be updated once the 80X PR is done
git cms-merge-topic -u cms-btv-pog:DeepFlavour-from-CMSSW_8_1_0
git clone https://github.com/CMSDeepFlavour/DeepNTuples
# Add JetToolBox
cd DeepNTuples
git submodule init
git submodule update

#compile
scram b -j 4

Installation (CMSSW 8_4_X and 9_0_X)

cmsrel CMSSW_8_4_0
cd CMSSW_8_4_0/src/
cmsenv
git cms-init
git clone https://github.com/CMSDeepFlavour/DeepNTuples
# Add JetToolBox
cd DeepNTuples
git submodule init
git submodule update

#DeepCSV is already in the release, but with different names, which will become the defaults in the close future
sed -i 's|deepFlavourJetTags|pfDeepCSVJetTags|g' DeepNtuplizer/production/DeepNtuplizer.py
#compile
scram b -j 4

Installation (CMSSW 9_1_X and newer)

cmsrel CMSSW_10_0_1
cd CMSSW_10_0_1/src/
cmsenv
git cms-init
git clone https://github.com/CMSDeepFlavour/DeepNTuples
cd DeepNTuples
git checkout 94X
# Add JetToolBox
git submodule init
git submodule update
#compile
scram b -j 4

Further settings

It is important to create your grid proxy in a location that is accessible by other nodes (there is no security issue, your full credentials are still needed for access). For this purpose, redirect the grid proxy location by adding the following to your login script:

export X509_USER_PROXY=${HOME}/.gridproxy.pem

Production

Before doing a batch submission you can test the ntuplizer locally in the production directory with:

cmsRun DeepNtuplizer.py inputFiles=/path/to/file.root

The jobs can be submitted using the following syntax

jobSub.py --file <sample file> DeepNtuplizer.py <batch directory> --outpath /path/to/output/directory/

For an example of sample files, please refer to the .cfg files already in the production directory. You first specify the number of jobs to be submitted, then the input dataset name, which should then be followed by the name of the output. Other arguments such as gluonReduction can then be specified if needed. Each argument need to be separted by at least two whitespaces.

The large job output (root files) will NOT be stored in the batch directory. The storage directory is specified by the --outpath argument. The batch directory will contain a symlink to this directory. If the outpath is not specified the ntuples are stored in the deepjet directory, where you need write permission.

The status of the jobs can be checked with

cd <batch directory>
check.py <sample subdirectories to be checked>

The check.py script provides additional options to resubmit failed jobs or to create sample lists in case a satisfying fraction of jobs ended successfully. In this case do:

check.py <sample subdirectories to be checked> --action filelist

This will create file lists that can be further processed by the DeepJet framework For resubmitting failed jobs, do:

check.py <sample subdirectories to be checked> --action resubmit

When the file lists are created, the part used for training of the ttbar and QCD samples (or in principle any other process) can be merged using the executable:

mergeSamples.py <no of jets per file> <output dir> <file lists 1> <file lists 2> <file lists 3> ...

For example:

mergeSamples.py 400000 /path/to/dir/merged ntuple_*/train_val_samples.txt

This will take a significant amount of time - likely more than the ntuple production itself. It is therefore recommended to run the command within 'screen'. In the 94X branch you can also submit via batch by doing --batch. This will create a batch directory in the folder the command is called from.

mergeSamples.py 400000 /path/to/dir/merged ntuple_*/train_val_samples.txt --batch

deepntuples's People

Contributors

astakia avatar cvernier avatar dmajumder avatar emilbols avatar gouskos avatar hqucms avatar jkiesele avatar kirschen avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepntuples's Issues

sorting

In DeepNtuplizer/interface/sorting_modules.h the function compareByABCInv has no strict ordering because in the case of equal values it returns true. This can lead to a crash in the std::sort function.

Broken in 10X

I was able to run the 94X branch in 10_0_1 without issues, but after PR #44 , this appears to be no longer possible.

The ntuplizer fails with exception:

"RefCore: A request to resolve a reference to a product of type 'std::vector <reco::GenParticle> ' with ProductID '2:3084' can not be satisfied because the product cannot be found."

specifically it seems to come when calling (*bhad_in_jet)->pt(), i.e. https://github.com/CMSDeepFlavour/DeepNTuples/pull/44/files#diff-4151dba03f072c2112f3609edced4746R405

An example of a file I use is:
/eos/cms/store/mc/RunIISpring18MiniAOD/QCD_HT2000toInf_TuneCP5_13TeV-madgraph-pythia8/MINIAODSIM/100X_upgrade2018_realistic_v10-v1/00000/26656634-1247-E811-BE41-FA163E262593.root

After reverting PR #44 , I can run without issues.

Bhadron and Bhadron daughter

I had a crash when running the deepNtuplizer and found some part in the code which either I did not understand at all or may be problematic.
The part I am talking about is in DeepNTuples/DeepNtuplizer/src/ntuple_JetInfo.cc
from line 168 to 211. There is a case that the Bhadron_daughter_ vector gets no push_back, but the Bhadron_ gets one. This happens when the most inner if conditional statement gets a false.

From this it follows that we have more objects in the std::vector Bhadron_ then in Bhadron_daughter_.
This is problematic for example in line 393 to 410 where every B hadron should have a daughter (or should be mapped to itself) and they should be filled in the same order. It can come to a crash if the iterIndex is bigger then the length of the Bhadron_daughter vector.

Best regards,
David

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.