The cms-opendata-guide from cernopendata

Updates for the x-section page

Thanks @jieunyoo for putting together the x-section page! I have some minor comments and suggestions:

doing cmsenv in the current containers should be avoided as the environment is already set at the startup with some updates with respect to what cmsenv does. E.g. git clone https:... and curl should work in the containers if cmsenv is not done
it is not necessary to copy the data file to the container, the files should be reachable directly with

cmsRun ana.py inputFiles="root://eospublic.cern.ch//eos/opendata/cms/....
for the ease of reading and for the first try, we could also have full command with the example file name (with the full file path which one can get in the correct format from the record file listing)
maybe a screenshot of the listing (or it can also be copied verbatim) would be useful to have

Maybe @caredg has some further suggestions?

Create and complete instructions for CMSSW

Following #6, work on the instructions for CMSSW.

Add section listing workshops

Add section and add links to workshops.

The workshop will actually have the self-guided "full" analysis.

Implement documentation on already-available trigger examples

See #7
We need:

Guide on general trigger information
Guide on getting prescales
Guide on how to get trigger objects and info
Guide on the basics of trigger matching
under the Trigger section

Muon scale and momentum corrections

(from #7)
Information from muon-POG, for the Run1 corrections:

pt < 200 GeV

corrections are available in the table in https://twiki.cern.ch/twiki/bin/viewauth/CMS/RochcorMuon#Available_correction_in_the_AN1
the released data for 2012 are 22Jan ReReco as mentioned in the table.
cross-check by comparing your new mean value or pT distribution with some of the Rochester plots, they usually provide a before/after corrections, or with plots inside SMP-14-022

pt > 200 GeV

Regarding the high pT muons in Run 1, there are no recommendations. At that time, there were not enough high pT muon to derive anything or to simply worry about it. It really became relevant during Run 2 where CMS could hope to have a discovery with searches of particle at the TeV scale.

(For Run2 high pT, the correction are systematic as there is not enough statistic to derive precise correction. You can find the recommendation there: https://twiki.cern.ch/twiki/bin/view/CMS/HighPtMuonReferenceRun2 )

Create and complete instructions for VMs

Create instructions for VMs according to #4

Add minimal set of links to the ROOT page

The CODP pages (e.g. docker guide) will link to this page for more information and further links.

Some cleaning of this page would be required. For the moment, a link to the root tutorial page can be added https://cms-opendata-workshop.github.io/workshop2021-lesson-preexercise-cpp-and-root/ although it is missing the simplest use case, i.e. opening an existing root file and exploring the contents with TBrowser.

For the record, the new CODP getting started with 2015 data page (cernopendata/opendata.cern.ch#3165) is doing this, i.e. it will show how to open a file produced by POET and how to plot pt distribution of electrons: http://opendata.cern.ch/docs/cms-getting-started-2015

Analysis checklist

To-do list for the section "Analysis".
The topics can link to an existing analysis, eg to https://github.com/cms-opendata-analyses/HiggsTauTauNanoAODOutreachAnalysis (or its future extended version) or to to to show an example
In the list below, possible links to existing documentation which can be included to the text.

Create and complete instructions on Event Production

As especified by #7, create and complete the necessary instructions for Even Generation. This should be based (and point when appropiate) on
http://opendata.cern.ch/docs/cms-guide-event-production
and the correpsonding examples in
https://github.com/cms-opendata-analyses/EventProductionExamplesTool

cms-opendata-guide live site not updating (in openshift?)

@mvidalgarcia @tiborsimko I think the live site is not updating any longer. It seems that at the Github level everything builds ok, but somehow openshift isn't getting those updates. Could it be that the webhooks changed after the openshift upgrade? I do not think I can check anything about the configuration of this site on openshift so I can't investigate further. Could you please help us? Thanks a lot.

Create introduction and 'how to use' instructions

According to #20, write the welcome intro at Home and instructions on how to use the site.
Here, the general idea of the structure of the site needs to be clear. It has to explain how to navigate and what to expect in terms of links and connections with the Cern Open Portal, the CMS public twiki pages (mention explicitly their matching history)

Improve front-page instructions and change email

Maybe a little bit more explanation is needed in the beginning. Also, change the email address there.

Add Run 1 and Run 2 tabs to object pages

Add "Run 1 data" / "Run 2 data" tabs to object pages for the parts which are not common
For a proper display of subsections, they need to be placed under each subsection's title

Documentation update checklist for future releases

Making a big checklist! Program lives in this Google doc and was presented in DPOA on 9/25/30.

EMPTY pages to fill:

Pages to rework / remove / big update:

FAQ: port things from the "troubleshooting" portal page over here, since we have better control over this repo for edits
Systematics --> Jet uncertainties: update into tabs for AOD/Mini/Nano, reference correctionLib for 2016.
Physics Objects: update into tabs for AOD/Mini/Nano and add new 2016 info. Do the "to-do"s in the pages
ID Efficiency Study: port most of this into a workshop-connected Carpentries site. Keep tag-and-probe basic information and introduce the corrections based on efficiency that are detailed in Systematics. T&P would be the main way people can calc their own SFs.
Unix: add WSL2 box info with the notes about avoiding Git Bash for future container use.

Pages needing small updates / tweaks:

CI: migrate from Travis to GitHub Actions

QCD background measurement

(from #7 - @slaurila )

Currently the QCD background is estimated using a same-sign control-region, and then normalized to the opposite-sign signal region with an ad-hoc transfer factor of 0.80. As suggested by Stefan, this transfer factor could and should be measured from a separate control region, as done for the actual Run-1 result. However, having separate control regions for different purposes can make the code a bit messy, so it might be better to have a separate script to determine this transfer factor.

Add a link to docker tutorial (workshop)

For the CODP getting started pages, to avoid writing detailed technical instructions in the portal pages, it would be useful to have an open data guide page with links to the current docker tutorial.

Now the open data guide docker page in points to the CODP docker page. This page needs rewriting, see cernopendata/opendata.cern.ch#3163

It would be better to have the open data guide docker page point to the latest tutorial : https://cms-opendata-workshop.github.io/workshop2021-lesson-docker/ It is far more complete than the CODP page and gives full instructions for different operating systems.

Or, if preferred, "import" the tutorial material to the guide. In both cases, for the 2015 data release, I will point for more information to the open data guide docker page.

Implement documentation on the PhysicsObjectExtractor

See #7. Describe the basic principles on how to get information from physics objects.

Fill page for validated runs

Add first draft to https://cms-opendata-guide.web.cern.ch/analysis/selection/validatedRuns/

Internal links from the public egamma page

The egamma public info page has the following internal links:

https://indico.cern.ch/event/185495/contributions/326089/attachments/256024/357732/eleid_wp2012_v2.pdf
https://twiki.cern.ch/twiki/bin/view/CMS/EgammaPFBasedIsolation
https://twiki.cern.ch/twiki/bin/view/CMS/EgammaEARhoCorrection (the link in the document is broken as it is)
https://twiki.cern.ch/twiki/bin/view/CMS/ConversionTools

The information in these pages may have been already covered in the workshop tutorial and POET electron code, but we should either make the pages public or to add a mention of the information being covered to the page.

Rename Frontier to Condition data

Rename the section "Frontier" to "Condition data".
Frontier refers to distributed database caching system use for the LHC, but we recommend reading them from /cvmfs, therefore the naming should not refer to Frontier

Update Data and SImulation pages with 2015 and MINIAOD

analysis/datasim/collisiondata/ and analysis/datasim/mcsimulations/ pages should mention 2015 data and MINIAOD(SIM)

Create and complete instructions on Docker containers

Complete these instrucitons according to #4.

Update VM for 2015

Add year 2015 when referring to the VM image in https://cms-opendata-guide.web.cern.ch/tools/virtualmachines/

Create and complete instructionson CERN Open Data Portal

Explain what it is and point to appropriate links. It would be good to include the cern opendata api.

Public detector perfomance pages by POGS

For further reference, if useful in this guide, public detector performance results listings by POGs

Add MiniOAD to the lists of data formats

In https://cms-opendata-guide.web.cern.ch/cmssw/cmsswdatamodel/, add MiniAOD to the list of data formats so that it can be directly linked to from the getting started page of 2015 data on the open data portal

Change order of MC Simulations and Event Generation in the doc skeleton

MC Simulations refer to the data description and not the process of simulation itself. For the latter there is the Event Generation entry. See #7

How to get started page

We might need a section on how to get started. If someone is completely new to this, she might not know what to do first

CMSSW checklist

To-do list for the section "CMSSW".
The topics can link to an existing analysis or a tool, eg to https://github.com/cms-opendata-analyses/AOD2NanoAODOutreachTool or to https://github.com/cms-legacydata-analyses/PhysObjectExtractorTool, to show an example
In the list below, possible links to existing documentation which can be included to the text.

Check if the current subsections are sufficient
- do we need procuder? - probably not as AOD2NanoAODOutreachTool and PhysicsObjectsInfo are both EDAnalyzers
- Producers would be a really high-level usage, but possible. Maybe we can mention this in the Analyzers section.
Overview
- Explain briefly what it is, where it resides, etc. Explain that is used online as well as offline and for analysis
- link to https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCMSSWFramework but not all very relevant
Data Model
- Explain the Event Data Model (EDM) and data tiers
- link to https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCMSSWFramework#AboutEvents
Analyzers
- Here maybe mention the different types of code in CMSSW: filters, producers and analyzers. For the first two, just give a link.
- Here maybe give an introduction on how CMSSW is used in analysis and point to later in this guide
- maybe a link to http://opendata.cern.ch/docs/cms-getting-started-2011 ?
- or directly to https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookWriteFrameworkModule
Configure
- can link to https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideAboutPythonConfigFile but this is long
~~Data Analysis~~
- ~~is this needed here if the following big section is "Analysis"~~
- This has been removed
~~Frontier~~ Condition data
- see #5 for renaming
- link to http://opendata.cern.ch/docs/cms-guide-for-condition-database

Create additional pages for physical objects in doc skeleton

See #7

Pile-up corrections for MC

(from #7 - @slaurila )

Pileup corrections for MC: Three steps are needed to apply pileup corrections to MC in the HTT analysis:

Publishing the pileup distribution jsons for 2011/2012 data in the open data portal (could be skimmed to contain only the runs published as open data?). The pileup jsons can be found in here.
Adding the number of interactions (TrueNumInteractions variable, see here) to open data NanoAOD
Implementing the reweighting in the HTT analysis code: first calculate the PU distribution in data by combining the pileup json with the golden json, then extract the PU distribution from MC samples and calculate the weights as a function of TrueNumInteractions, finally apply these weights to MC

General Stuff Checklist

While not the heart of the documentation, important to fill out:

Write a brief introduction to the guide (front page)
Write the How to use this guide section in the front page
Take care of the FAQ
Take care of the About section

Add validated runs section under Selection

Add a disclaimer

Remove "official" in the welcome sentence "Welcome to the official guide for CMS open data".

Add a disclaimer inline with https://www.gnu.org/licenses/gpl-3.0.en.html or else to make it clear that this guide is provided by the CMS open data group, at best-effort basis.

Maybe:

Welcome to the CMS open data guide.

This guide is brought to you by the CMS open data group, on a best-effort basis. All software and instructions are provided "as is", without warranty of any kind. This is ongoing work and we appreciate your feedback and/or your help building this guide

@caredg your thoughts?

Update physics object page with 2015 and MINIAOD

Update https://cms-opendata-guide.web.cern.ch/analysis/selection/objects/ with 2015 and MINIAOD information so that it can be linked directly from the open data portal getting started with the 2015 data page

Needed for the updated portal getting started page, see cernopendata/opendata.cern.ch#3165

Note that https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2015 exists.
Many of the links on that page point to CMSSW_7_4_X, the 2015 OD is CMSSW_7_6_X but the information is likely valid.
It would be good to check if the code snippets are similar to those in POET.

((Where's the corresponding up-to-date page for UL MINIAOD?
The tables for WorkBookMiniAODYYYY in https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD go only up to CMSSW_9_4_X and do not have the page for 2018: UL MiniAODv1 so CMSSW_10_6_X.
The NanoAOD is in https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookNanoAOD and updated for the UL)

Computing tools checklist

To-do list for the first section "Computing tools".
None of these topics needs new example code.
In the list below, possible links to existing documentation which can be included to the text.

Check if the current subsections are sufficient
- do we need unix, bash?
- https://swcarpentry.github.io/shell-novice/
CERN Open Data Portal
- Explain what it is, why one wants to use it and where to find it.
- Mention we will point to it throughout the documentation
- can link to http://opendata.cern.ch/docs/about , else?
CMS Open Data
- Explain what it is, why one wants to use it and where to find it
- can link to http://opendata.cern.ch/docs/about-cms , else ?
CMS Twiki
- Introduce this infrastructure and explain how to use it and how scattered it is but how useful it could be
- can link to https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBook
ROOT
- Explain that it comes with the virtual machine and/or docker image
- a link to root tutorial?
C++ and Python
- Mention that CMSSW (point to appropiate anchor in the documentation) is written in C++
- a link HEP related tutorial
- python, check later with https://indico.cern.ch/event/882660/timetable/ else?
- C++
Git
- a link to http://swcarpentry.github.io/git-novice/
- suggest a practical cheatsheet
Docker
- a link to https://awesome-workshop.github.io/intro-to-docker/ (not all relevant)
- CMS OD specific http://opendata.cern.ch/docs/cms-guide-docker
Virtual Machines
- a link to http://opendata.cern.ch/docs/cms-virtual-machine-2011

Missing information about the CMS collaboration

At the front page of the guide, there is no information about what kind of data cms open data is, from which experimental collaboration and what kind of information it contains (hadronic collisions for particle physics analysis). We need to add this.

Enable web analytics

We would like to set up a workflow to get web analytics for the open data page

Statistical interpretation of results

(from #7 - @slaurila )

Currenty, the final product of the analysis is a set of histograms that are interesting inspect qualitatively, but there is no way to conclude if we actually see some hint of Higgs in the data or not. After all selections, we have roughly 2500 signal events and 75000 background events, which gives a naive expected significance of 2500/sqrt(75000)=9.1 sigmas. This means that it is impossible to draw conclusions without including the systematic uncertainties (at least the dominating ones). However, it might be useful to provide a script that allows one to calculate the significance properly, and see how adding in syst. uncertainties for different processes (even if they are just ad-hoc numbers, some 10% here and 20% there) affects the significance. In practice, we would need something like this: http://dpnc.unige.ch/~sfyrla/teaching/Statistics/handsOn3.html

Updating and integration issues (slowness) at push

Hi @tiborsimko and @mvidalgarcia, I am not sure what is going on, but every time one pushes to the repository to update the site, the travis integration fails as there is no rule for 'make html' (no makefile) and it seems that nothing gets updated (this seems to have been the case since the very initial move to mkdocs). However, in the past, I have noticed that after a while, the modifications ARE propagated. This is troublesome right now because the delay is considerable and we are sort of running against time to populate these documentation and need to see the site (I can build it and see it locally, but still). I can't completely figure out how the modifications are propagated to openshift. It is probably through webhooks, but I might be just dead wrong. I'd love to play around, but I do not want to mess up anything. Could you please take a look? Thanks!

Change trigger section before physical objects in the doc skeleton

In a real analysis, selecting the trigger goes before dealing with the physical objects. It is an object by itself, essentially. So it should be before the object selection See #7

Update description of tabs

cms-opendata-guide/docs/index.md

Line 10 in a6a673e

 There are three main tabs to help you navigate the site. It starts with the **Computing Tools** most likely needed to deal with CMS open data. Then, there is a little review of **CMSSW**, which is the software used by CMS. Finally the **Analysis** section guides you through the different steps (in the most general order) that you need to follow for performing a particle physics analysis with CMS open data. 

This gets outdated if someone adds a section tab. Maybe it can be rewritten to make it more general and avoid having to update it every time something changes

Update all docker image pulls from (docker) cmsopendata to gitlab

See cernopendata/opendata.cern.ch#3356

All image pulls (i.e. docker pull ... and docker run ... must be updated to use the GitLab registry address instead of those of cmsopendata on the docker hub.

Create and complet instructions on CMS Open Data

Should probably be based on https://cms-opendata-workshop.github.io/workshop-lesson-dataset-scouting/

tag and probe package update

The material that was in a pull request should be moved to a tutorial. We need to contact the authors and see if they want to update it for the 2022 tutorials, and remove it from here.

Create and complete instructions for virtual machines section

As mentioned in #4, it should link to a link to http://opendata.cern.ch/docs/cms-virtual-machine-2011

Trigger efficiency

(from #7 - @slaurila )

If we can find an existing measurement for the trigger efficiency of HLT_IsoMu17_eta2p1_LooseIsoPFTau20 for data and MC, would it make sense to publish it in the Open Data portal e.g. as a json file? Then it would be very straightforward to apply the trigger efficiency scale factors in the analysis (again, could be done by @anniinakinnunen starting from the json). Alternatively, we should come up with a tag-and-probe example for measuring the trigger efficiency, which then moves this item to the "longer-term" list.

Test issue to see on projects.

This is just a test.

Create and complete instructions for ROOT

Following #4, we need instructions on how to access ROOT and the minimal version of how to use it. Point to documentation.

cernopendata / cms-opendata-guide Goto Github PK

cms-opendata-guide's Introduction

cms-opendata-guide's People

Contributors

Stargazers

Watchers

Forkers

cms-opendata-guide's Issues

pt < 200 GeV

pt > 200 GeV

Recommend Projects

Recommend Topics

Recommend Org