Giter VIP home page Giter VIP logo

data-curation's Introduction

data-curation's People

Contributors

annatrz avatar artemislav avatar audrium avatar heitorpb avatar jmhogan avatar joudmas avatar katilp avatar mantasavas avatar mokotus avatar nancyhamdan avatar okraskaj avatar osamamomani avatar tiborsimko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-curation's Issues

CMS: dataset for 2015 MC pileup

Add dataset for 2015 MC pileup:

Check in McM, the information in DAS is not reliable as in the collection of config files, it mixes config files used to process the dataset and the config files that have been used as this dataset as input.
In McM, the name of pileup dataset is available in the dictionary as pileup_dataset_name for the output dataset of level AODSIM.

An example datasets in McM:

Check this during the metadata extraction.

NB:

  • 3209 datasets with TuneCUETP8M1 in the name out of total of 7113 Check if this has a meaning.
  • Campaign info in McM does not indicate the pile-up dataset name in the summary view

records: fix created date methodology description

Script needs some fixes:

  • created_date should be an array
  • methodology description should be: "These data were generated in several steps (see also CMS Monte Carlo Production Overview)"
  • cmsdriver script should be: "Production script"

CMS MC categorisation - check the "Miscellaneous" datasets

When running the categorisation for the 2015 MC list, there are > 600 datasets in the "Miscellaneous" category which collects those datasets that have not been directed to any existing category.

Make a new python script to study these datasets.

  • Input: the dataset list of the "Miscellaneous" category

  • Grouping: Take those dataset names (title_lower in the script) and group them based on the first part of the string. As a first attempt, use the following for grouping

    • the part of the name up to (and including) the first underscore in the name
    • the part of the name up to (and including) the first "To" in the name
  • Output: a markdown output in a similar way as in the original script, i.e.

    • the group "name" (i.e. the part of the name as defined above), the number of datasets in that group
    • the full listing for each group

CMS: run the categorization script for 2015 MC

(from #65)

The simulated data is divided in categories based on the physics processes. This is needed for the searchability. The decision is done based on the first part of the dataset name.

See:

To do:

CMS generator cards for 2015 files

(edited 11.10.2021)

The generator "gridpacks" are stored in /cvmfs/cms.cern.ch/phys_generator/gridpacks/
However, note that not all the LHE cards 2015 are stored yet there.

Take an example dataset from https://github.com/cernopendata/data-curation/blob/master/cms-YYYY-simulated-datasets/inputs/CMS-2015-mc-datasets.txt

Find the generator cards "by-hand" with:

Case no LHE:

Three options:

Through "fragments" stored in McM

Advantage: gets directly the relevant information

  1. search the dataset in McM (request -> output dataset) (example query MiniAODSIM)
  2. find the parent name and redo 1.
  3. if GEN-SIM, find the generator parameters in "Name of fragment"

As the metadata script reads the full dictionary we should have this information already

Config files

Advantage: already available as config for GEN-SIM step

Disadvantage: shows the full config file, not only the cards

The steps 1,2 as above, then

From edmProvDump

Advantage: get the information directly from the file

Disadvantage: to be done in a CMSSW release area, formatting not the best for the display

  1. Find a single root file name $file
  2. In a CMSSW release area (e.g. .../CMSSW_7_6_7/src) and after cmsenv:
    edmProvDump -f "generator SIM" root://eospublic.cern.ch/$file | grep -A9999 "generator SIM"

Case LHE:

Case no gridpack

  • MINIAODSIM has mcdb_id > 1 in McM
  • in use before gridpack was adopted
  1. Example McM query for a MINIAODSIM
  2. Take "Mcdb id", in the dictionary: "mcdb_id": 15839
  3. Find the lhe file in /eos/cms/store/lhe/$mcdb_id
  4. Extract file and read the header with
xz -d -c /eos/cms/store/lhe/$mcdb_id/* > lhe.lhe
  awk '/<header>/,/<\/header>/' lhe.lhe > lhe_header

Case gridpack

  • mcdb_1 = 0
  • Find the gridpack address, two options
From McM dictionary
From edmProvDump
  1. Find a single root file name $file
  2. In a CMSSW release area (e.g. .../CMSSW_7_6_7/src) and after cmsenv:
edmProvDump -f "externalLHEProducer LHE" root://eospublic.cern.ch/$file | grep gridpacks > line
gp=$(sed "s/'/ /g" line | awk '{print $6}')
Extract cards once $gp address is know
if [[ $file == *"madgraph"* ]]; then
      tar -xf $gp ./process/madevent/Cards/run_card.dat
      tar -xf $gp ./process/madevent/Cards/proc_card*.dat
      tar -xf $gp ./process/madevent/Cards/param_card.dat
      mv ./process/madevent/Cards/*.dat $dir/
 elif [[ $file == *"powheg"* ]]; then
      tar -xf $gp *.input
      mv *.input $dir
 elif [[ $file == *"amcatnlo"* ]]; then
      tar -xf $gp process/Cards/run_card.dat
      tar -xf $gp process/Cards/proc_card*.dat
      tar -xf $gp process/Cards/param_card.dat
      mv process/Cards/*.dat $dir/
fi

CMS - HI GT's

(from #72 )

Define the GT needed for the analysis of the datasets in #73 and make them as sqlite files in /cvmfs/cms-opendata-conddb.cern.ch/
Start with the transferred dataset e.g. /HIHighPt/HIRun2011-15Apr2013-v1/RECO

(NB the Run1 GTs are available in https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions?rev=609)

2010 PbPb slc5, CMSSW_3_9_X:

  • /HICorePhysics/HIRun2010-PromptReco-v3/RECO
  • 201 runs Minimum: 150844 Maximum: 153368
  • /HIAllPhysics/HIRun2010-ZS-v2/RECO
  • 174 runs Minimum: 150436 Maximum: 153368

2011 pp slc5, CMSSW_4_4_X :

  • /AllPhysics2760/Run2011A-16Jul2011-v1/RECO
  • 49 runs Minimum: 161329 Maximum: 161484
  • /MinBias0Tesla0/Run2011A-PromptReco-v5/RECO
  • 21 runs Minimum: 169811 Maximum: 170044
  • /MinBias0Tesla1/Run2011A-PromptReco-v5/RECO
  • 21 runs Minimum: 169811 Maximum: 170044
  • /MinBias0Tesla2/Run2011A-PromptReco-v5/RECO
    • 21 runs Minimum: 169811 Maximum: 170044
      tbc:
  • /AllPhysics2760/Nov2011_HI-SD_JetHI-276TeV_ppRereco/RECO
    • 9 runs Minimum: 161366 Maximum: 161474
  • /AllPhysics2760/Nov2011_HI-SD_MuHI-276TeV_ppRereco/RECO
  • 9 runs Minimum: 161366 Maximum: 161474
  • /AllPhysics2760/Nov2011_HI-SD_PhotonHI-276TeV_ppRereco/RECO
  • 9 runs Minimum: 161366 Maximum: 161474
  • /ForwardTriggers/HIRun2011-PromptReco-v1/RECO
  • 724 runs Minimum: 181510 Maximum: 183337

2011 PbPb slc5, CMSSW_3_9_X/CMSSW_4_4:

  • /HIDiMuon/HIRun2011-04Mar2013-v1/RECO
    • 62 runs Minimum: 181611 Maximum: 183013
  • /HIHighPt/HIRun2011-15Apr2013-v1/RECO
  • 62 runs Minimum: 181611 Maximum: 183013
  • /HIMinBiasUPC/HIRun2011-12Jun2013-v1/RECO
  • 62 runs Minimum: 181611 Maximum: 183013
  • /MinimumBias/HIRun2011-PromptReco-v1/RECO -
  • 362 runs Minimum: 181510 Maximum: 183337

II 2011 slc6-based HI-related samples

2011 pp slc6, CMSSW_5_3:

Corresponding AOD already released (or to be released), to be checked if enough

  • /MinimumBias/Run2011A-12Oct2013-v1/RECO
  • /MinimumBias/Run2011B-12Oct2013-v1/RECO

CMS: test AOD and GT reading

Test that reading of 2015 AOD and condition data is OK and see what is still needed for 2015 MiniAOD/AOD with POET.

Condition data are needed for AOD for jet and trigger examples, so it would be good to start with them.

Andro prepared an example branch for 2015 MiniAOD already, it is documented in the README and in his summer student report. but AOD have not been addressed yet.

The AOD datasets have now been transferred and are in /eos/opendata/cms/Run2015D
The dataset file indexes have not been built yet but the exact address of the files can be found from lxplus e.g. by

export EOS_MGM_URL=root://eospublic.cern.ch
eos ls /eos/opendata/cms/Run2015D
# for example, the listing of DoubleMuon AOD files:
eos eos ls /eos/opendata/cms/Run2015D/DoubleMuon/AOD/16Dec2015-v1/10000

A slc6 container image with CMSSW_7_6_7 is available in gitlab-registry.cern.ch/cms-cloud/cmssw-docker/cmssw_7_6_7-slc6_amd64_gcc493:2021-08-11-75df77cf
and the condition data are available on /cvmfs (see ##103 (comment)).

The container image can read the GT transparently from the CMS servers.

However, we will need to test the GT reading from /cvmfs as well. That will need to have /cvmfs mounted on the localhost and pass that to the container. Once an example code needing GT access is available, @katilp can test that.
The condition data reading from cvmfs should go in a similar way as the other Run-2 condition data already available (see 2016 and 2018 MC in the CODP condition data guide), that would be:

process.GlobalTag.connect = cms.string('sqlite_file:/cvmfs/cms-opendata-conddb.cern.ch/76X_dataRun2_16Dec2015_v0.db')
process.GlobalTag.globaltag = '76X_dataRun2_16Dec2015_v0'
process.GlobalTag.snapshotTime = cms.string("9999-12-31 23:59:59.000")

I'm not sure if the snapshot time is needed, maybe try first without.

For information, normally, we do not foresee releasing AOD for the rest of Run-2 (2016-2018) but we do it for 2015 collision data (not MC) because of the 2015 MiniAOD format is still an early version and may lack some information compared to the more recent one.

CMS: 2015 collision datasets - which datasets from RunC

(From #65 ) The release is to be discussed and approved, this is to start the preparations.

HI - pending/to be completed after release

Pending items or to be completed after the HI release

generic fixture updater

Create a generic fixture updater helper script that would take as input a file containing JSON snippets like:

[
  {
     "recid": "7299",
    "somefield": "some value",
  },
  {
     "recid": "13",
    "somefield": "some other value",
  }
]

and that would go through the CERN Open Data fixtures in cernopendata/modules/fixtures/data/records looking for corresponding records and updating the "somefield" information in them via JSON patching.

This could be used for e.g. MC category updating as the categorisation scripts evolves, for type updating and any other "post-import" batch data curation.

P.S. See also update_fixtures_with_rich_file_information.py.

CMS: trigger listing for Run2 data taking

(from #65)

The trigger listings for Run2 data taking are now available in
https://twiki.cern.ch/twiki/bin/viewauth/CMS/HLTPathsRunIIListWith2015

NB (from that page)

Active lumi is the luminosity when the trigger prescale is not 0.
Effective luminosity is the active luminosity times the HLT and L1 prescales.

Due to paths being seeded by many L1 seeds, it is difficult to combine L1 prescales properly here the L1 prescale is simply the lowest prescale of its seeds that is non zero. There could be be accuracy issues here for edge cases so these luminosity values are simply a guide to use when picking your trigger and should not be used for detailed analysis. It is advisable to use the brilcalc trigger path luminosity feature as a cross check.

Not all triggers are written out in physics streams, for example some (mostly for the tau group) triggers are run simply to record their decision but do not accept events. In this case the dataset is "NotFound". Other examples of events not going to the normal physics streams are HLT_Physics split paths and B parking paths.

The table attached, with data taking year added as a last column
triggers_runII_year.txt

CMS: note on 2011RunB ZeroBias

(From #64)

/ZeroBias/Run2011B-12Oct2013-v1/AOD has only 6 runs out of which 2 have some luminosity (178003, 178004), but these two runs are not in the validated run list

It will not be included in the release.

For the record:

Luminosity check on lxplus (the exact CMSSW version does not matter)

cd CMSSW_10_6_10/src 
cmsenv
setenv PATH $HOME/.local/bin:/cvmfs/cms-bril.cern.ch/brilconda/bin:$PATH

brilcalc lumi -r 178004    

etc

`

CMS: 2015 global tags

The global tags needed for 2015 are

  • 76X_dataRun2_16Dec2015_v0 for data
  • 76X_mcRun2_asymptotic_RunIIFall15DR76_v1 for MC

(see https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrontierConditions#Global_Tags_for_RunIIFall15DR76)

The run range for certified 2015 data at PromptReco level is:

246909 - 26027

see https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmV2015Analysis#ReReco_for_25_ns and
Cert_246908-260627_13TeV_PromptReco_Collisions15_25ns_JSON_v2.txt

This includes 0 T runs 246908 -250902 (0 T)
(see https://twiki.cern.ch/twiki/bin/view/CMS/CertificationResults2015#JSON_files)

0 T datasets are not part of the release, but as they've been processed with the same GTs, it would be good to "include" their run range when getting the GT (whether that matters).

The reprocessed certified range starts at a later run (magnet on) and extends to 260627 (#104)

CMS: a script to extract configurations for data taking

All configuration files used in data taking of CMS data are stored and made available as COPD records

The configurations are listed in

Summary pages based on the information of listings above are provided for each year of data taking:

The configuration files have been extracted manually from https://cmsweb.cern.ch/confdb/

Write a script to extract the configuration files for Run2 data taking.

The CODP summary pages show only the data-taking periods which are open data even if all config files will be extracted. 2015 open data will be RunD, see #70 run numbers
Minimum: 256630
Maximum: 260627

CMS: HI trigger listings

(from #72)

2010

The possible triggers for 2010 datasets

Check if HI docs lists them or get from the provenance dumps (see blow)

2011 pp

  • /AllPhysics2760/Run2011A-16Jul2011-v1/RECO
  • /MinBias0Tesla0/Run2011A-PromptReco-v5/RECO
  • /MinBias0Tesla1/Run2011A-PromptReco-v5/RECO 930.3G
  • /MinBias0Tesla2/Run2011A-PromptReco-v5/RECO
  • /AllPhysics2760/Nov2011_HI-SD_JetHI-276TeV_ppRereco/RECO
  • /AllPhysics2760/Nov2011_HI-SD_MuHI-276TeV_ppRereco/RECO
  • /AllPhysics2760/Nov2011_HI-SD_PhotonHI-276TeV_ppRereco/RECO
  • /ForwardTriggers/HIRun2011-PromptReco-v1/RECO

All are included in https://fwyzard.web.cern.ch/fwyzard/hlt/2011/dataset

2011 PbPb

  • /HIDiMuon/HIRun2011-04Mar2013-v1/RECO
  • /HIHighPt/HIRun2011-15Apr2013-v1/RECO
  • /HIMinBiasUPC/HIRun2011-12Jun2013-v1/RECO

are included in https://fwyzard.web.cern.ch/fwyzard/hlt/2011/dataset

  • /MinimumBias/HIRun2011-PromptReco-v1/RECO

should be different from the pp run MinumumBias

CMS 2012

Add CMS 2012 open data release scripts.

cms-YYYY-simulated-datasets: fix `get_all_generator_text()`

Fix get_all_generator_text() in cms-YYYY-simulated-datasets that seems to:

  • have hardcoded RECO-HLT and GEN-SIM steps; these should be read from the configuration files
  • analyse only one parent instead of going all up the chain
  • have "Production Overview" instead of lowercase
  • have process parsing troubles for process = cms.Process('RECO',eras.Run2_2016)

See also the list of troubles in cernopendata/opendata.cern.ch#2613

Example input:

/BulkGravTohhTohbbhbb_narrow_M-4500_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
/QCD_Pt_300to470_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

Beware of how the changes affect past years, since cms-NNNN-... should work for all years.

CMS: MC datasets with _ext

270 of the 2015 MC datasets have _ext in the dataset name, e.g.:

/DYJetsToLL_M-1000to1500_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12-v1/MINIAODSIM
/DYJetsToLL_M-1000to1500_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12_ext1-v1/MINIAODSIM
/DYJetsToLL_M-100to200_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12-v1/MINIAODSIM
/DYJetsToLL_M-100to200_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12_ext1-v1/MINIAODSIM
/DYJetsToLL_M-10to50_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12-v1/MINIAODSIM
/DYJetsToLL_M-10to50_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12_ext1-v1/MINIAODSIM
/DYJetsToLL_M-10to50_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12_ext3-v1/MINIAODSIM
/DYJetsToLL_M-1500to2000_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12-v1/MINIAODSIM
/DYJetsToLL_M-1500to2000_TuneCUETP8M1_13TeV-amcatnloFXFX-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12_ext1-v1/MINIAODSIM
/DYJetsToLL_M-150_TuneCUETP8M1_13TeV-madgraphMLM-pythia8/RunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12-v1/MINIAODSIM

These are very likely more events with the same parameters as the dataset without _ext.
In the CODP view, they should probably be combined in a single record, or at least the records should cross-link.

CMS - HI trigger menus

(from #72)

2010 HI trigger menu:

(from https://fwyzard.web.cern.ch/fwyzard/hlt/2010/summary) the only HI related config file is
CMSSW_3_8_1_onlpatch5_ONLINE /cdaq/special/HeavyIon/HLT_Basic_Express/V7 (runs 146417 - 146421)

  • The HI run range (Minimum: 150436 Maximum: 153368 see #75) is not in the listing above, the trigger configs have to be found somewhere else.
  • run numbers:
    150844 150881 150882 150883 150886 150887 151019 151020 151022 151027 151058 151059 151075 151076 151077 151087 151088 151124 151126 151153 151211 151217 151235 151236 151237 151238 151239 151240 151266 151267 151268 151349 151350 151351 151352 151353 151426 151428 151449 151476 151526 151566 151583 151584 151586 151592 151653 151678 151694 151706 151784 151878 151922 151923 151925 151927 151935 151937 151940 151967 151968 151969 151975 151976 152026 152037 152042 152044 152045 152046 152047 152055 152110 152112 152113 152183 152234 152238 152240 152267 152327 152349 152350 152391 152431 152471 152474 152477 152479 152481 152485 152492 152493 152494 152501 152504 152507 152510 152518 152551 152561 152583 152585 152588 152592 152594 152597 152599 152601 152602 152603 152604 152606 152609 152611 152613 152623 152624 152625 152629 152636 152638 152640 152641 152642 152643 152648 152649 152650 152652 152659 152663 152665 152667 152670 152673 152675 152677 152679 152681 152695 152698 152699 152705 152706 152708 152709 152712 152716 152719 152721 152722 152724 152725 152729 152731 152734 152735 152739 152741 152743 152744 152745 152748 152751 152753 152760 152765 152766 152778 152783 152785 152791 152793 152795 152797 152948 152949 152951 152953 152955 152957 153016 153017 153020 153021 153029 153107 153112 153117 153128 153168 153184 153191 153295 153299 153301 153302 153303 153352 153368 appear all empty in bricalc
  • From elog 6.12.2020 end of HI after 153368
  • From elog on 20.11.10 CMSSW_3_9_1_onlpatch3: "Upon request of HLT. DAQ config switched to dev to test the old menu with the new patch. If stable keep it like that." (after 151784)
  • NB some lumi plot on 16.11
  • on 15.11 a message:

A new HLT menu for HI running is available:
/cdaq/physics/Run2010HI/v1.7/HIHLT/V1

A description of the changes relative to v1.6:

  • Add new prescale column "0". Old columns move one to the right (the old "0" is now "1", etc.). Columns "0" and "1" are identical.
  • DQM gets a global prescale of x10 and path content changed
  • HLT_HIL1SingleMu3 replaced by HLT_HIL2Mu3
  • HLT_HIPhoton15_Core replaced by HLT_HIPhoton20_Core
  • HLT_HIJet35U replaced by HLT_HIJet50U
  • Express gets a prescale x2 in "0" and "1"

A new L1+HLT key will be generated with this updated HLT menu

  • First run 150844 starts on 11.11.10 "New L1/HLT configuration is used L1_HLT_collisions_HI_69b65"(?)
  • on 11.11.10 a message:

A new HLT menu has been prepared: /cdaq/physics/Run2010HI/v1.5/HIHLT/V1
which supersedes /cdaq/physics/Run2010HI/v1.4/HIHLT/V2

The menu introduces the following changes:
-HLTMON stream removed
-HSCP trigger changed L1 see to L1_SingleJet20U_NoBptxOR
-HIMinBiasPixel_SingleTrack changed L1 seed to L1_HcalHcCoincPmOR_BSCMinBiasThresh1_BptxAnd
-Added L1 passthrough for L1_HcalHcCoincPmOR_BSCMinBiasThresh1_BptxAnd
HLT_HIMinBiasHfOrBSC(_Core)
-added to AllPhysics
-replaces HIMinBiasHF in CorePhysics
-replaces HIMinBiasHF in Express
-replaces HIMinBiasHF in DQM
-Added prescale columns for 120/60/30 colliding bunches

2011 HI related trigger menus:

(from https://fwyzard.web.cern.ch/fwyzard/hlt/2011/summary)

CMSSW_4_1_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/2760GeV/v1.1/HLT/V1  (run 161366)
CMSSW_4_1_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/2760GeV/v1.1/HLT/V3  (runs 161396 - 161445)
CMSSW_4_1_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/2760GeV/v1.1/HLT/V4  (runs 161450 - 161474)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.0/HIHLT/V2        (run 181531)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.0/HIHLT/V3        (runs 181604 - 181611)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.1/HIHLT/V1        (runs 181749 - 181778)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.2/HIHLT/V2        (run 182123)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.3/HIHLT/V1        (runs 182133 - 182257)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.3/HIHLT/V5        (runs 182296 - 182382)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.4/HIHLT/V1        (runs 182398 - 182536)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.6/HIHLT/V1        (runs 182561 - 182591)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.6/HIHLT/V2        (runs 182609 - 183123)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/physics/Run2011HI/v1.7/HIHLT/V1        (runs 182960 - 183126)

CMSSW_4_2_5_onlpatch1_ONLINE /cdaq/special/0Tesla/HLT/V2          (runs 169985 - 170044)

CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/special/HeavyIon/HLT_HIon2011_L1TimingTest/V2  (run 181530)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/special/HeavyIon/HLT_HIon2011_Collision_FullBadAPVs/V2   (run 181532)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/special/HeavyIon/HLT_HIon2011_L1EG5/V1   (runs 181683 - 181693)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/special/HeavyIon/HLT_HIon2011_L1EG5/V5   (run 181695)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/special/HeavyIon/HIHLT_v1p1_V1_Run2010HI_EM/V1 (runs 181910 - 181938)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/special/HeavyIon/HIHLT_v1p1_V2_Run2010H1_EM/V1 (runs 181946 - 181985)
CMSSW_4_4_2_onlpatch1_ONLINE /cdaq/special/HeavyIon/v1.1/HIHLT/V1                     (runs 182052 - 182124)

  • 2011 run ranges
    • Minimum: 161329 Maximum: 161484 (/AllPhysics2760/Run2011A-16Jul2011-v1/RECO)
      • /cdaq/physics/Run2011HI/2760GeV/v1.1/HLT/V1 (run 161366)
      • /cdaq/physics/Run2011HI/2760GeV/v1.1/HLT/V3 (runs 161396 - 161445)
      • /cdaq/physics/Run2011HI/2760GeV/v1.1/HLT/V4 (runs 161450 - 161474)
    • Minimum: 169811 Maximum: 170044 (/MinBias0Tesla*/Run2011A-PromptReco-v5/RECO)
      • /cdaq/special/0Tesla/HLT/V2 (runs 169985 - 170044)
    • Minimum: 161366 Maximum: 161474 (/AllPhysics2760/Nov2011_HI-SD*-276TeV_ppRereco/RECO)
      • /cdaq/physics/Run2011HI/2760GeV/v1.1/HLT/V1 (run 161366)
      • /cdaq/physics/Run2011HI/2760GeV/v1.1/HLT/V3 (runs 161396 - 161445)
      • /cdaq/physics/Run2011HI/2760GeV/v1.1/HLT/V4 (runs 161450 - 161474)
    • Minimum: 181611 Maximum: 183013 (/HI*/HIRun2011-*2013-v1/RECO)
      • /cdaq/physics/Run2011HI/v1.0/HIHLT/V3 (runs 181604 - 181611)
      • /cdaq/physics/Run2011HI/v1.1/HIHLT/V1 (runs 181749 - 181778)
      • /cdaq/physics/Run2011HI/v1.2/HIHLT/V2 (run 182123)
      • /cdaq/physics/Run2011HI/v1.3/HIHLT/V1 (runs 182133 - 182257)
      • /cdaq/physics/Run2011HI/v1.3/HIHLT/V5 (runs 182296 - 182382)
      • /cdaq/physics/Run2011HI/v1.4/HIHLT/V1 (runs 182398 - 182536)
      • /cdaq/physics/Run2011HI/v1.6/HIHLT/V1 (runs 182561 - 182591)
      • /cdaq/physics/Run2011HI/v1.6/HIHLT/V2 (runs 182609 - 183123)
      • /cdaq/physics/Run2011HI/v1.7/HIHLT/V1 (runs 182960 - 183126)
    • Minimum: 181510 Maximum: 183337 (/MinimumBias/HIRun2011-PromptReco-v1/RECO)
      • /cdaq/physics/Run2011HI/v1.0/HIHLT/V2 (run 181531)
      • the ones above

Note on the run numbers which are listed in DAS for the datasets but not present in the trigger config listing:

they appear empty (or very brief) when checking in brilcalc, e.g for /AllPhysics2760/Run2011A-16Jul2011-v1/RECO (for which trigger menu listing indicate (run 161366) (runs 161396 - 161445) (runs 161450 - 161474)) starting with runs 161329 161335 161336 161345 161352 161353 161354 161355 161357 161359 161361 give empty values (apart 161357 that crashes), 161364 one ls, 161366

[lxplus7109.cern.ch]~/CMSSW_9_2_3/src $brilcalc beam -r 161329
#Data tag :  19v3
+------+-----+----+------+------+------------+------------+--------------+
| fill | run | ls | time | egev | intensity1 | intensity2 | ncollidingbx |
+------+-----+----+------+------+------------+------------+--------------+
+------+-----+----+------+------+------------+------------+--------------+

[lxplus747.cern.ch]~/CMSSW_5_3_32/src $brilcalc beam -r 161364
#Data tag :  19v3
+------+--------+-----+-------------------+--------+------------+------------+--------------+
| fill | run    | ls  | time              | egev   | intensity1 | intensity2 | ncollidingbx |
+------+--------+-----+-------------------+--------+------------+------------+--------------+
| 1650 | 161364 | 446 | 03/25/11 03:46:58 | 1380.3 | 8.4894e+12 | 8.5723e+12 | 68           |
+------+--------+-----+-------------------+--------+------------+------------+--------------+

[lxplus747.cern.ch]~/CMSSW_5_3_32/src $brilcalc beam -r 161366
#Data tag :  19v3
+------+--------+-----+-------------------+--------+------------+------------+--------------+
| fill | run    | ls  | time              | egev   | intensity1 | intensity2 | ncollidingbx |
+------+--------+-----+-------------------+--------+------------+------------+--------------+
| 1650 | 161366 | 2   | 03/25/11 03:51:37 | 1380.3 | 8.4882e+12 | 8.5717e+12 | 68           |
| 1650 | 161366 | 3   | 03/25/11 03:52:00 | 1380.3 | 8.4871e+12 | 8.5721e+12 | 68           |
....

Then runs after 161366 and before 161396:
161367 161374 161376 161377 161380 161385 161387 are empty, but 161393 has something

[lxplus747.cern.ch]~/CMSSW_5_3_32/src $brilcalc beam -r 161393
#Data tag :  19v3
+------+--------+-----+-------------------+--------+------------+------------+--------------+
| fill | run    | ls  | time              | egev   | intensity1 | intensity2 | ncollidingbx |
+------+--------+-----+-------------------+--------+------------+------------+--------------+
| 1651 | 161393 | 1   | 03/25/11 16:12:41 | 1380.3 | 1.0071e+13 | 9.9466e+12 | 68           |
| 1651 | 161393 | 169 | 03/25/11 17:18:00 | 1380.3 | 1.0072e+13 | 9.9447e+12 | 68           |
| 1651 | 161393 | 170 | 03/25/11 17:18:23 | 1380.2 | 1.0071e+13 | 9.9452e+12 | 68           |
| 1651 | 161393 | 171 | 03/25/11 17:18:47 | 1380.3 | 1.0071e+13 | 9.9453e+12 | 68           |
| 1651 | 161393 | 172 | 03/25/11 17:19:10 | 1380.3 | 1.0071e+13 | 9.9449e+12 | 68           |
| 1651 | 161393 | 173 | 03/25/11 17:19:33 | 1380.2 | 1.0070e+13 | 9.9454e+12 | 68           |
+------+--------+-----+-------------------+--------+------------+------------+--------------+
[lxplus7109.cern.ch]~/CMSSW_9_2_3/src $brilcalc beam -r 161396
#Data tag :  19v3
+------+--------+-----+-------------------+--------+------------+------------+--------------+
| fill | run    | ls  | time              | egev   | intensity1 | intensity2 | ncollidingbx |
+------+--------+-----+-------------------+--------+------------+------------+--------------+
| 1651 | 161396 | 1   | 03/25/11 17:23:25 | 1380.3 | 8.1237e+12 | 8.8577e+12 | 68           |
| 1651 | 161396 | 2   | 03/25/11 17:23:48 | 1380.3 | 1.0071e+13 | 9.9482e+12 | 68          
....

of the last two after 161474: 161476 161484 are empty.

CMS - HI collision data provenance

(from #72 )
Options to get the data provenance

  • query DAS (in all cases for nevents, size, CMSSW version, in most for GT)
  • for the 2010 pp data, the configs were found in a hn forum
  • if config link from DAS not available, print parameters with edmProvDump

Config files available in DAS:

(Dataset list provided by HI conveners, see also the detailed table by HI conveners )

I: 2010-2011 slc5-based HI samples

2010 PbPb slc5, CMSSW_3_9_X:

  • No: /HICorePhysics/HIRun2010-PromptReco-v3/RECO 19.3TB
  • No: /HIAllPhysics/HIRun2010-ZS-v2/RECO 190.9TB

2011 pp slc5, CMSSW_4_4_X :

  • No, no GT: /AllPhysics2760/Run2011A-16Jul2011-v1/RECO 6.6TB
  • No, no GT: /MinBias0Tesla0/Run2011A-PromptReco-v5/RECO 938.8GB
  • No, no GT: /MinBias0Tesla1/Run2011A-PromptReco-v5/RECO 930.3GB
  • No, no GT: /MinBias0Tesla2/Run2011A-PromptReco-v5/RECO 938.7GB
    tbc:
  • No, no GT: /AllPhysics2760/Nov2011_HI-SD_JetHI-276TeV_ppRereco/RECO 275.1GB
  • No, no GT: /AllPhysics2760/Nov2011_HI-SD_MuHI-276TeV_ppRereco/RECO 682.7GB
  • No, no GT: /AllPhysics2760/Nov2011_HI-SD_PhotonHI-276TeV_ppRereco/RECO 315.0GB
  • No: /ForwardTriggers/HIRun2011-PromptReco-v1/RECO 48.3GB

2011 PbPb slc5, CMSSW_3_9_X/CMSSW_4_4:

  • Yes: /HIDiMuon/HIRun2011-04Mar2013-v1/RECO 3.2TB

  • Yes: /HIHighPt/HIRun2011-15Apr2013-v1/RECO 1.6TB

  • Yes: /HIMinBiasUPC/HIRun2011-12Jun2013-v1/RECO 11.6TB

  • No: /MinimumBias/HIRun2011-PromptReco-v1/RECO 1.8TB

II 2011 slc6-based HI-related samples

2011 pp slc6, CMSSW_5_3:

Corresponding AOD already released (or to be released), to be checked if enough

  • Yes: /MinimumBias/Run2011A-12Oct2013-v1/RECO 36.0TB
  • Yes /MinimumBias/Run2011B-12Oct2013-v1/RECO 21.8TB

CMS generator cards as "notes" in MC 2015 metadata

Make an McM query to check how often generator cards are recorded as an HTML file in "notes" for the 2015 MC

e.g. https://cms-pdmv.cern.ch/mcm/requests?produce=%2FExtendedWeakIsospinModel_mumujj_L15000_M1500_CalcHEP%2FRunIIFall15MiniAODv2-PU25nsData2015v1_76X_mcRun2_asymptotic_v12-v1%2FMINIAODSIM&page=0&shown=281474976710655
and the corresponding dictionary
https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get/EXO-RunIIFall15MiniAODv2-01039 with

"notes": "https://github.com/cms-sw/genproductions/blob/1b65fffa802c081907b40fde8fc49dc2bf60d5eb/bin/CalcHEP/cards/production/13TeV/ExtendedWeakIsospinModel/LHEHeaders/ExtendedWeakIsospinModel_mumujj_L15000_M1500_nnpdf30lo-single.txt"

These cards are in https://github.com/cms-sw/genproductions/tree/master/bin

CMS: 2011 RunB dataset trigger listing

(from #64)
check if RunB trigger menus are in http://opendata.cern.ch/record/1700


x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v2.0/HLT/V4                                           (runs 175832 - 175837)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v2.0/HLT/V7                                           (runs 175857 - 175921)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v2.1/HLT/V1                                           (runs 175971 - 176023)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v2.1/HLT/V2                                           (runs 176161 - 176207)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v2.2/HLT/V3                                           (runs 176286 - 176309)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v2.3/HLT/V2                                           (runs 176461 - 176470)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v3.0/HLT/V2                                           (runs 176545 - 176548)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v3.1/HLT/V1                                           (runs 176697 - 177053)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v4.0/HLT/V2                                           (runs 177074 - 177184)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v4.0/HLT/V3                                           (run 177201)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v4.0/HLT/V5                                           (runs 177222 - 177878)
x CMSSW_4_2_7_onlpatch3_ONLINE     /cdaq/physics/Run2011/3e33/v4.0/HLT/V6                                           (runs 177718 - 177719)
o CMSSW_4_2_9_HLT3_onlpatch1_ONLINE /cdaq/physics/Run2011/3e33/v5.0/HLT/V1                                           (runs 178078 - 178380)
o CMSSW_4_2_9_HLT3_onlpatch2_ONLINE /cdaq/physics/Run2011/5e33/v1.4/HLT/V3                                           (runs 178420 - 178479)
o CMSSW_4_2_9_HLT3_onlpatch2_ONLINE /cdaq/physics/Run2011/5e33/v1.4/HLT/V4                                           (runs 178667 - 178708)
o CMSSW_4_2_9_HLT3_onlpatch2_ONLINE /cdaq/physics/Run2011/5e33/v1.4/HLT/V5                                           (runs 178712 - 179889)
o CMSSW_4_2_9_HLT3_onlpatch4_ONLINE /cdaq/physics/Run2011/5e33/v2.2/HLT/V2                                           (runs 179959 - 180093)
o CMSSW_4_2_9_HLT3_onlpatch4_ONLINE /cdaq/physics/Run2011/5e33/v2.2/HLT/V4                                           (runs 180241 - 180252)

x in https://cernbox.cern.ch/index.php/s/J2kW1nKWUE70XOA
o in 2011 missing configs

CMS: ML release checklist

Release guidelines; see https://twiki.cern.ch/twiki/bin/view/CMS/DPOAMLSampleReleaseGuidelines

Agreements

  • the ML group agrees that these samples are of interest for a public release
    • presented/discussed in (meeting/presentation link)
  • the relevant POGs/PAGs and physics coordination agrees that these samples, their parent datasets and workflows to produce them can be brought into the public domain
    • presented/discussed in (meeting/presentation link)
  • CB approval

Parent dataset (if not already public data)

ML sample production

ML sample file

ML sample usage example

CMS: 2015 json for validated runs

Cert_13TeV_16Dec2015ReReco_Collisions15_25ns_JSON_v2.txt:
https://cms-service-dqmdc.web.cern.ch/CAF/certification/Collisions15/13TeV/Reprocessing/Cert_13TeV_16Dec2015ReReco_Collisions15_25ns_JSON_v2.txt
(254231 - 260627)

(see https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmV2015Analysis#ReReco_for_25_ns)

NB that
https://twiki.cern.ch/twiki/bin/view/CMS/CertificationResults2015#JSON_files points to the JSON for the validated runs as in
/afs/cern.ch/cms/CAF/CMSCOMM/COMM_DQM/certification/Collisions15/13TeV/Cert_246908-254879_13TeV_PromptReco_Collisions15_JSON.txt or

Note that the link to the json repository is broken and should be https://cms-service-dqmdc.web.cern.ch/CAF/certification/Collisions15/13TeV/

Note also that the page does not seem to have been updated for reprocessing and the more recent JSON files are in
https://cms-service-dqmdc.web.cern.ch/CAF/certification/Collisions15/13TeV/Reprocessing/ as indicated above

Fix hard coded title 'conffile' for cms-YYYY-simulated-datasets

Using similar approach as in data-curation/cms-2011-simulated-datasets/code/create_config_file_records.py:

def get_title(afile):
    "Return suitable title of configuration file."
    return 'Configuration file for ' + get_process(afile) + ' step ' + get_python_filename(afile)

title should carry some meaning and be generated dynamically and not hard coded as it's now in data-curation/cms-YYYY-simulated-datasets/code/dataset_records.py:

$ git grep conffile
dataset_records.py:236:            configuration_files['title'] = 'conffile'

CMS: 2011 RunB dataset release checklist

Checklist for the release

/ZeroBias/Run2011B-12Oct2013-v1/AOD
/TauPlusX/Run2011B-12Oct2013-v1/AOD
/Tau/Run2011B-12Oct2013-v1/AOD
/SingleMu/Run2011B-12Oct2013-v1/AOD
/SingleElectron/Run2011B-12Oct2013-v1/AOD
/PhotonHad/Run2011B-12Oct2013-v1/AOD
/Photon/Run2011B-12Oct2013-v1/AOD
/MultiJet/Run2011B-12Oct2013-v1/AOD
/MuOnia/Run2011B-12Oct2013-v1/AOD
/MuHad/Run2011B-12Oct2013-v1/AOD
/MuEG/Run2011B-12Oct2013-v1/AOD
/MinimumBias/Run2011B-12Oct2013-v1/AOD
/MET/Run2011B-12Oct2013-v1/AOD
/Jet/Run2011B-12Oct2013-v1/AOD
/HT/Run2011B-12Oct2013-v1/AOD
/ForwardTriggers/Run2011B-12Oct2013-v1/AOD
/ElectronHad/Run2011B-12Oct2013-v1/AOD
/DoubleMu/Run2011B-12Oct2013-v1/AOD
/DoubleElectron/Run2011B-12Oct2013-v1/AOD
/BTag/Run2011B-12Oct2013-v1/AOD

CMS - HI testing and usage instructions/example

Test that the HI data and the respective GT is readable on the open data environment.

Testing needs to be done in

The data will need to be read from the eospublic area, and not from the usual CMS location. @tiborsimko can provide the file listing.

For the Open Data VMs, the GTs will need to be in place in /cvmfs/cms-opendata-conddb.cern.ch/ (see #75), for the containers the condition data can be accessed through normal frontier servers.

The usage instructions and example code should become available in a repository on https://github.com/cms-opendata-analyses at the release of the data, it can be developed on
https://github.com/cms-legacydata-analyses @caredg can advise on that.

The repository can be mirrored to gitlab for an eventual short CI/CD test on the available containers, see

(a very simple test case on a slc5 container in https://gitlab.cern.ch/kati/docker-reco-test/tree/slc5)

CMS: update the script to build the trigger config file listing

The config file extracted in #67 are made available through listing such as

A script has been used the generate the page for 2011 trigger config file listing in https://github.com/cernopendata/data-curation/blob/master/cms-2011-hlt-triggers/code/create_hlt_trigger_information_records.py but NB that it may need to modified due the changes in the underlying data model of the portal.

Write a new script to put together pages for Run2 data taking (2015-2018).

CMS: event display files for HI events

We could add event display files of HI events to the portal as a record and also include them for access in the event display in the portal. This issue itemizes and records the steps and progress.

First look at data in root://eospublic.cern.ch//eos/opendata/cms/hidata/HIRun2011/HIDiMuon

  • Use CMS-OpenData-1.1.4 (I am using Virtual Box 6.1.12 on OSX 10.15.5)
  • From CMS Shell scram project CMSSW_4_4_7
  • Follow instructions here replacing CMSSW_4_2_8 with CMSSW_4_4_7
  • Fetch list of validated runs from links in #85
  • I created a list of files found in eos location above
  • Running edmDumpEventContent the most-relevant objects (and InputTags) seem to be:
edm::SortedCollection<EcalRecHit,edm::StrictWeakOrdering<EcalRecHit> >ย  ย  "ecalRecHit"ย  ย  ย  ย  ย  ย  ย  ย  "EcalRecHitsEB"ย   "RECO"ย  ย   
edm::SortedCollection<EcalRecHit,edm::StrictWeakOrdering<EcalRecHit> >ย  ย  "ecalRecHit"ย  ย  ย  ย  ย  ย  ย  ย  "EcalRecHitsEE"ย   "RECO"ย  ย   
edm::SortedCollection<EcalRecHit,edm::StrictWeakOrdering<EcalRecHit> >ย  ย  "ecalPreshowerRecHit"ย  ย  ย   "EcalRecHitsES"ย   "RECO"ย  ย   
edm::SortedCollection<EcalTriggerPrimitiveDigi,edm::StrictWeakOrdering<EcalTriggerPrimitiveDigi> >ย  ย  "ecalTPSkim"ย  ย  ย  ย  ย  ย  ย  ย  ""ย  ย  ย  ย  ย  ย  ย  ย  "RECO"ย  ย   
edm::SortedCollection<HBHERecHit,edm::StrictWeakOrdering<HBHERecHit> >ย  ย  "hbhereco"ย  ย  ย  ย  ย  ย  ย  ย  ย  ""ย  ย  ย  ย  ย  ย  ย  ย  "RECO"ย  ย   
edm::SortedCollection<HFRecHit,edm::StrictWeakOrdering<HFRecHit> >ย  ย  "hfreco"ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ""ย  ย  ย  ย  ย  ย  ย  ย  "RECO"ย  ย   
edm::SortedCollection<HORecHit,edm::StrictWeakOrdering<HORecHit> >ย  ย  "horeco"ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ""ย  ย  ย  ย  ย  ย  ย  ย  "RECO"ย 
vector<reco::Track>ย  ย  ย  ย  ย  ย  ย  ย  ย   "generalTracks"ย  ย  ย  ย  ย  ย   ""ย  ย  ย  ย  ย  ย  ย  ย  "ppRECO"ย   
vector<reco::Muon>ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  "muons"ย  ย  ย  ย  ย  ย  ย  ย  ย  ย   ""ย  ย  ย  ย  ย  ย  ย  ย  "ppRECO"ย   
``` ย   
- cfg files is (here)[http://cmsdoc.cern.ch/~mccauley/HIDiMuon.py) (will place on github when finalized)
- Event display files are produced but need to understand contents

methodology.description -> methodology.steps

Create a script that will read methodology.description for CMS MC 2011 records located in cernopendata/modules/fixtures/data/records/cms-simulated-datasets-Run2011A.json.

Parse HTML such as:

    "methodology": {
      "description": "<p>These data were processed in several steps:</p>\n<p><strong>Step LHE</strong><br>Release: CMSSW_4_1_8_patch14<br>Global tag: START311_V2::All<br><a href=\"/record/3244\">Configuration file for LHE step HIG-Summer11pLHE-00118_1_cfg.py</a><br>Output dataset: /GluGluToHToZZTo4L_M-550_7TeV-minloHJJ-pythia6-tauola/Summer11-START311_V2-v1/GEN</p>\n        <p><p><strong>Note</strong>\n        <br>\nTo get the exact LHE and generator's parameters, see <a href=\"/docs/cms-mc-production-overview\">CMS Monte Carlo production Overview</a>.</p>\n\n\n<strong>Step SIM</strong><br>Release: CMSSW_5_3_22_patch1<br>Global tag: START53_LV4::All<br><a href=\"/record/3051\">Configuration file for SIM step HIG-Summer11Leg-00197_1_cfg.py</a><br>Output dataset: /GluGluToHToZZTo4L_M-550_7TeV-minloHJJ-pythia6-tauola/Summer11Leg-START53_LV4-v1/GEN-SIM</p>\n\n<p><strong>Step HLT RECO</strong><br>Release: CMSSW_5_3_23_patch1<br>Global tag: START53_LV6::All<br><a href=\"/record/3042\">Configuration file for HLT step HIG-Summer11LegDR-00204_1_cfg.py</a><br><a href=\"/record/3085\">Configuration file for RECO step HIG-Summer11LegDR-00204_2_cfg.py</a><br>Output dataset: /GluGluToHToZZTo4L_M-550_7TeV-minloHJJ-pythia6-tauola/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM</p>\n"
    },

and create methodology.steps nested JSON such as:

"generation": {
    "steps": [
      {
        "type": "RECO-HLT",
        "release": "CMSSW_5_3_11_patch2",
        "global_tag": "START53_V19E::All",
        "configuration_files": [
          {
            "type": "cmsDriver script",
            "script": "#!/bin/bash\nsource /cvmfs/cms.cern.ch/cmsset_default.sh\nexport SCRAM_ARCH=None\nif [ -r CMSSW_5_3_11_patch2/src ] ; then \n echo release CMSSW_5_3_11_patch2 already exists\nelse\nscram p CMSSW CMSSW_5_3_11_patch2\nfi\ncd CMSSW_5_3_11_patch2/src\neval `scram runtime -sh`\n\n\nscram b\ncd ../../\ncmsDriver.py step1 --filein \"dbs:/ADDdiLepton_LambdaT-1600_Tune4C_8TeV-pythia8/Summer12-START50_V13-v3/GEN-SIM\" --fileout file:EXO-Summer12DR53X-02487_step1.root --pileup_input \"dbs:/MinBias_TuneZ2star_8TeV-pythia6/Summer12-START50_V13-v3/GEN-SIM\" --mc --eventcontent RAWSIM --pileup 2012_Summer_50ns_PoissonOOTPU --datatier GEN-SIM-RAW --conditions START53_V19E::All --step DIGI,L1,DIGI2RAW,HLT:7E33v2 --python_filename EXO-Summer12DR53X-02487_1_cfg.py --no_exec --customise Configuration/DataProcessing/Utils.addMonitoring -n 360 || exit $? ; \n\ncmsDriver.py step2 --filein file:EXO-Summer12DR53X-02487_step1.root --fileout file:EXO-Summer12DR53X-02487.root --mc --eventcontent AODSIM,DQM --datatier AODSIM,DQM --conditions START53_V19E::All --step RAW2DIGI,L1Reco,RECO,VALIDATION:validation_prod,DQM:DQMOfflinePOGMC --python_filename EXO-Summer12DR53X-02487_2_cfg.py --no_exec --customise Configuration/DataProcessing/Utils.addMonitoring -n 360 || exit $? ; \n\n"
          },
          {
            "title": "conffile",
            "process": "HLT",
            "conffileID": "1937ebea238cd2fc28f3c019b0eb54ae"
          },
          {
            "title": "conffile",
            "process": "RECO",
            "conffileID": "1937ebea238cd2fc28f3c019b0f1dd0b"
          }
        ]
      },
      {
        "type": "GEN-SIM",
        "release": "CMSSW_5_1_3",
        "global_tag": "START50_V13::All",
        "configuration_files": [
          {
            "title": "cmsDriver script",
            "script": "#!/bin/bash\nsource /cvmfs/cms.cern.ch/cmsset_default.sh\nexport SCRAM_ARCH=None\nif [ -r CMSSW_5_1_3/src ] ; then \n echo release CMSSW_5_1_3 already exists\nelse\nscram p CMSSW CMSSW_5_1_3\nfi\ncd CMSSW_5_1_3/src\neval `scram runtime -sh`\ncurl  -s https://raw.githubusercontent.com/cms-sw/genproductions/V02-01-22/python/EightTeV/ADD_Dilepton_LambdaT_1600_8TeV_pythia8_cff.py --retry 2 --create-dirs -o  Configuration/GenProduction/python/EightTeV/ADD_Dilepton_LambdaT_1600_8TeV_pythia8_cff.py \n[ -s Configuration/GenProduction/python/EightTeV/ADD_Dilepton_LambdaT_1600_8TeV_pythia8_cff.py ] || exit $?;\n\n\nscram b\ncd ../../\ncmsDriver.py Configuration/GenProduction/python/EightTeV/ADD_Dilepton_LambdaT_1600_8TeV_pythia8_cff.py --fileout file:EXO-Summer12-01139.root --mc --eventcontent RAWSIM --pileup NoPileUp --datatier GEN-SIM --conditions START50_V13::All --beamspot Realistic8TeVCollision --step GEN,SIM --datamix NODATAMIXER --python_filename EXO-Summer12-01139_1_cfg.py --no_exec --customise Configuration/DataProcessing/Utils.addMonitoring -n 61 || exit $? ; \n\n"
          },
          {
            "url": "https://raw.githubusercontent.com/cms-sw/genproductions/V02-01-22/python/EightTeV/ADD_Dilepton_LambdaT_1600_8TeV_pythia8_cff.py",
            "title": "Genfragment",
            "script": "import FWCore.ParameterSet.Config as cms\n\ngenerator = cms.EDFilter(\"Pythia8GeneratorFilter\",\n   comEnergy = cms.double(8000.0),\n   crossSection = cms.untracked.double(1.435),\n   filterEfficiency = cms.untracked.double(1),\n   maxEventsToPrint = cms.untracked.int32(0),\n   pythiaHepMCVerbosity = cms.untracked.bool(False),\n   pythiaPylistVerbosity = cms.untracked.int32(0),\n\n   PythiaParameters = cms.PSet(\n      processParameters = cms.vstring(\n         'Main:timesAllowErrors    = 10000',\n         'ParticleDecays:limitTau0 = on',\n         'ParticleDecays:tauMax = 10',\n         'Tune:pp 5',\n         'Tune:ee 3',\n         'PDF:pSet = 5',\n         'ExtraDimensionsLED:ffbar2llbar = on', \n         'ExtraDimensionsLED:gg2llbar = on', \n         'PhaseSpace:mHatMin = 1050',\n         'ExtraDimensionsLED:CutOffmode = 0',\n         'ExtraDimensionsLED:LambdaT = 1600'\n      ),\n      parameterSets = cms.vstring('processParameters')\n   )\n)\n\nconfigurationMetadata = cms.untracked.PSet(\n   version = cms.untracked.string('\\$Revision: 1.0 $'),\n   name = cms.untracked.string('\\$Source: /cvs_server/repositories/CMSSW/CMSSW/Configuration/GenProduction/python/EightTeV/ADD_Dilepton_LambdaT_1600_8TeV_pythia8_cff.py,v $'),\n   annotation = cms.untracked.string('2012 sample with PYTHIA8 at 8 TeV: ADD Dilepton samples with LambdaT = 1600 GeV, Tune4C, pdf: MSTW 2008 LO')\n)\n"
          },
          {
            "title": "Configuration file",
            "process": "SIM",
            "conffileID": "294fcd8902949eb73ba3813549dc621a"
          }
        ]
      }
    ],
    "description": "<p>These data were processed in several steps:</p>"
  },
  "generator": {
    "names": [
      "pythia8"
    ],
    "global_tag": "START50_V13::All"
  }

accordingly.

See cernopendata/opendata.cern.ch#2465 for motivation.

CMS - HI validated runs

CMS: 2015 data release checklist

Checklist for the 2015 release, to be discussed and approved. This is to start the preparations.
(from cernopendata/opendata.cern.ch#1310)

CMS - HI transfers

(from #72 )

I: 2010-2011 slc5-based HI samples

Information provided by HI conveners, see also the detailed table by HI conveners

Transfers started indicated with a tick box, ongoing bold

2010 PbPb slc5, CMSSW_3_9_X:

  • /HICorePhysics/HIRun2010-PromptReco-v3/RECO 19.3TB
  • /HIAllPhysics/HIRun2010-ZS-v2/RECO 190.9TB

2011 pp slc5, CMSSW_4_4_X :

  • /AllPhysics2760/Run2011A-16Jul2011-v1/RECO 6.6TB
  • /MinBias0Tesla0/Run2011A-PromptReco-v5/RECO 938.8GB
  • /MinBias0Tesla1/Run2011A-PromptReco-v5/RECO 930.3GB
  • /MinBias0Tesla2/Run2011A-PromptReco-v5/RECO 938.7GB
    tbc:
  • /AllPhysics2760/Nov2011_HI-SD_JetHI-276TeV_ppRereco/RECO 275.1GB
  • /AllPhysics2760/Nov2011_HI-SD_MuHI-276TeV_ppRereco/RECO 682.7GB
  • /AllPhysics2760/Nov2011_HI-SD_PhotonHI-276TeV_ppRereco/RECO 315.0GB
  • /ForwardTriggers/HIRun2011-PromptReco-v1/RECO 48.3GB

2011 PbPb slc5, CMSSW_3_9_X/CMSSW_4_4:

  • /HIDiMuon/HIRun2011-04Mar2013-v1/RECO 3.2TB

  • /HIHighPt/HIRun2011-15Apr2013-v1/RECO 1.6TB

  • /HIMinBiasUPC/HIRun2011-12Jun2013-v1/RECO 11.6TB

  • /MinimumBias/HIRun2011-PromptReco-v1/RECO 1.8TB

II 2011 slc6-based HI-related samples

2011 pp slc6, CMSSW_5_3:

Corresponding AOD already released (or to be released), to be checked if enough

  • /MinimumBias/Run2011A-12Oct2013-v1/RECO 36.0TB
  • /MinimumBias/Run2011B-12Oct2013-v1/RECO 21.8TB

CMS: provenance metadata for 2015 MC

(from #65)

Run2 legacy metadata of 2016-2018 will be extracted from the ultra-legacy production, see https://github.com/cernopendata/data-curation/tree/master/cms-run2-ultra-legacy-production.

Run1 metadata has been extracted with

2015 is a separate production and the script may need to be modified.

Run2 Fall15 MiniAOD v2 campaign (76X version 2)dataset=/*/*RunIIFall15MiniAODv2-PU25nsData2015v1*/*
Size: 227.4TB
7114 datasets

Check if the 2015 MC extraction works with the ultra-legacy script, or if the previous Run1 scripts should be used.

Run the script and prepare the records.

CMS - HI release checklist

Checklist for the HI release

CMS: transfers with rucio

Setup

source /cvmfs/cms.cern.ch/cmsset_default.sh
source /cvmfs/cms.cern.ch/rucio/setup-py3.sh
voms-proxy-init -voms cms -rfc -valid 192:00
export RUCIO_ACCOUNT=`whoami`

Quota

Check:

rucio list-account-limits $RUCIO_ACCOUNT

Add (as manager):

rucio-admin account set-limits $RUCIO_ACCOUNT T3_CH_CERN_OpenData 20TB

Transfer

Start:

$ rucio add-rule cms:/DoubleMuon/Run2015D-16Dec2015-v1/MINIAOD 1 T3_CH_CERN_OpenData
cdce9cca6ad34b1d938fd74d9e9e557a

Check:

$ rucio list-rules --account $RUCIO_ACCOUNT
ID                                ACCOUNT    SCOPE:NAME                                                                      STATE[OK/REPL/STUCK]    RSE_EXPRESSION         COPIES  EXPIRES (UTC)    CREATED (UTC)
--------------------------------  ---------  ------------------------------------------------------------------------------  ----------------------  -------------------  --------  ---------------  -------------------
cdce9cca6ad34b1d938fd74d9e9e557a  kati       cms:/DoubleMuon/Run2015D-16Dec2015-v1/MINIAOD                                   REPLICATING[0/1068/0]   T3_CH_CERN_OpenData         1                   2021-07-26 07:07:27

or

$ rucio rule-info cdce9cca6ad34b1d938fd74d9e9e557a
Id:                         cdce9cca6ad34b1d938fd74d9e9e557a
Account:                    kati
Scope:                      cms
Name:                       /DoubleMuon/Run2015D-16Dec2015-v1/MINIAOD
RSE Expression:             T3_CH_CERN_OpenData
Copies:                     1
State:                      REPLICATING
Locks OK/REPLICATING/STUCK: 0/1068/0
Grouping:                   DATASET
Expires at:                 None
Locked:                     False
Weight:                     None
Created at:                 2021-07-26 07:07:27
Updated at:                 2021-07-26 07:07:30
Error:                      None
Subscription Id:            None
Source replica expression:  None
Activity:                   User Subscriptions
Comment:                    None
Ignore Quota:               False
Ignore Availability:        False
Purge replicas:             False
Notification:               NO
End of life:                None
Child Rule Id:              None

Monitor

https://cms-rucio-webui.cern.ch/rule?rule_id=cdce9cca6ad34b1d938fd74d9e9e557a

https://monit-kibana.cern.ch/kibana/goto/c3656cd9856f234d2c241e3a34f6f8b8

Datasets at eospublic

Check if they land in a proper location

export EOS_MGM_URL=root://eospublic.cern.ch
eos ls -lh /eos/opendata/cms/

CMS: 2015 collision datasets - which datasets from RunD

(from #65)
Which datasets and which part from RunD:

CMS: 2011 RunB transfer

Transfer started for:

/ZeroBias/Run2011B-12Oct2013-v1/AOD
/TauPlusX/Run2011B-12Oct2013-v1/AOD
/Tau/Run2011B-12Oct2013-v1/AOD
/SingleMu/Run2011B-12Oct2013-v1/AOD
/SingleElectron/Run2011B-12Oct2013-v1/AOD
/PhotonHad/Run2011B-12Oct2013-v1/AOD
/Photon/Run2011B-12Oct2013-v1/AOD
/MultiJet/Run2011B-12Oct2013-v1/AOD
/MuOnia/Run2011B-12Oct2013-v1/AOD
/MuHad/Run2011B-12Oct2013-v1/AOD
/MuEG/Run2011B-12Oct2013-v1/AOD
/MinimumBias/Run2011B-12Oct2013-v1/AOD
/MET/Run2011B-12Oct2013-v1/AOD
/Jet/Run2011B-12Oct2013-v1/AOD
/HT/Run2011B-12Oct2013-v1/AOD
/ForwardTriggers/Run2011B-12Oct2013-v1/AOD
/ElectronHad/Run2011B-12Oct2013-v1/AOD
/DoubleMu/Run2011B-12Oct2013-v1/AOD
/DoubleElectron/Run2011B-12Oct2013-v1/AOD
/BTag/Run2011B-12Oct2013-v1/AOD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.