Giter VIP home page Giter VIP logo

cms-opendata-guide's Introduction

cms-opendata-guide's People

Contributors

allanjales avatar asdru30 avatar audrium avatar caredg avatar dbustamante98 avatar jieunyoo avatar jmhogan avatar jrw46742 avatar katilp avatar mattbellis avatar matthewbellis avatar mvidalgarcia avatar npervan avatar rodrigocampellos avatar sib37385 avatar tiborsimko avatar topaklihuseyin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cms-opendata-guide's Issues

Updates for the x-section page

Thanks @jieunyoo for putting together the x-section page! I have some minor comments and suggestions:

  • doing cmsenv in the current containers should be avoided as the environment is already set at the startup with some updates with respect to what cmsenv does. E.g. git clone https:... and curl should work in the containers if cmsenv is not done

  • it is not necessary to copy the data file to the container, the files should be reachable directly with

    cmsRun ana.py inputFiles="root://eospublic.cern.ch//eos/opendata/cms/....

  • for the ease of reading and for the first try, we could also have full command with the example file name (with the full file path which one can get in the correct format from the record file listing)

  • maybe a screenshot of the listing (or it can also be copied verbatim) would be useful to have

Maybe @caredg has some further suggestions?

Muon scale and momentum corrections

(from #7)
Information from muon-POG, for the Run1 corrections:

pt < 200 GeV

pt > 200 GeV

Regarding the high pT muons in Run 1, there are no recommendations. At that time, there were not enough high pT muon to derive anything or to simply worry about it. It really became relevant during Run 2 where CMS could hope to have a discovery with searches of particle at the TeV scale.

(For Run2 high pT, the correction are systematic as there is not enough statistic to derive precise correction. You can find the recommendation there: https://twiki.cern.ch/twiki/bin/view/CMS/HighPtMuonReferenceRun2 )

Add minimal set of links to the ROOT page

The CODP pages (e.g. docker guide) will link to this page for more information and further links.

Some cleaning of this page would be required. For the moment, a link to the root tutorial page can be added https://cms-opendata-workshop.github.io/workshop2021-lesson-preexercise-cpp-and-root/ although it is missing the simplest use case, i.e. opening an existing root file and exploring the contents with TBrowser.

For the record, the new CODP getting started with 2015 data page (cernopendata/opendata.cern.ch#3165) is doing this, i.e. it will show how to open a file produced by POET and how to plot pt distribution of electrons: http://opendata.cern.ch/docs/cms-getting-started-2015

Analysis checklist

To-do list for the section "Analysis".
The topics can link to an existing analysis, eg to https://github.com/cms-opendata-analyses/HiggsTauTauNanoAODOutreachAnalysis (or its future extended version) or to to to show an example
In the list below, possible links to existing documentation which can be included to the text.

  • Data and Simulation

  • Selection
    It should flow as in a real analysis and there the first thing, generally, is to deal with the trigger (an object by itself, basically), so it needs this #16 fix.

    • Write a brief introduction
    • Triggers: Explain what the trigger system is, how it works in CMS and how to extract the information with a few examples. See http://opendata.cern.ch/docs/cms-guide-trigger-system
      • Implement an explain different available examples. See https://github.com/cms-legacydata-analyses/TriggerInfoTool. Maybe they will need separate pages in the doc. #17
      • Guide on general trigger information
      • Guide on getting prescales
      • Guide on how to get trigger objects and info
      • Guide on the basics of trigger matching
      • see suggestion in the context of the HTT example in #11
    • Objects: Describe or point to description of what physical objects are at the different levels if possible, i.e., mc truth, reco.
    • Object ID (here we can separate in different objects #19)
      • Muons
      • Electrons
      • Jet
      • Met
      • Muons
      • Photons
      • Taus
    • Additional corrections
      • jet energy correction
        • for MC (@slaurila) : it would be best to apply the JEC already at the stage of NanoAOD production, and store the corrected jets as a separate collection. For future, I think it would be useful to store the JEC uncertainties in the NanoAOD as well. After the corrections are in the NanoAOD, applying them should be straightforward (just switch to the other jet collection) and this would also allow coparisons between corrected and non-corrected jets at the analysis stage.
      • muon scale and momentum corrections (see #13)
      • electron scale and momentun corrections
      • tau (@slaurila): according to Tau POG, neither tau ID nor the energy scale needs a correction by default
  • Luminosity

  • Backgrounds

    • Techniques
    • QCD Estimation
    • for the example code suggestion see #9
    • Upper-limit Calculations
  • Systematics

    • @slaurila: Ideally, there would be recipes for estimating at least the dominating syst. uncertainties and including them in the results. But again, adding the machinery for full systematics treatment (running different variations of the analysis for shape systematics, combining them into uncertainty bands in plots, etc) will probably make the code considerably more complex, unless there is some new smart way of dealing with all of this RDataFrame..
    • Luminosity Uncertainties
    • MC Uncertainty
    • Object Uncertainty
    • Pileup Uncertainty
    • for the example code see #8
  • Interpretation

    • Statistics
    • for a suggestions and consideration see

cms-opendata-guide live site not updating (in openshift?)

@mvidalgarcia @tiborsimko I think the live site is not updating any longer. It seems that at the Github level everything builds ok, but somehow openshift isn't getting those updates. Could it be that the webhooks changed after the openshift upgrade? I do not think I can check anything about the configuration of this site on openshift so I can't investigate further. Could you please help us? Thanks a lot.

Create introduction and 'how to use' instructions

According to #20, write the welcome intro at Home and instructions on how to use the site.
Here, the general idea of the structure of the site needs to be clear. It has to explain how to navigate and what to expect in terms of links and connections with the Cern Open Portal, the CMS public twiki pages (mention explicitly their matching history)

Add Run 1 and Run 2 tabs to object pages

Add "Run 1 data" / "Run 2 data" tabs to object pages for the parts which are not common
For a proper display of subsections, they need to be placed under each subsection's title

  • Common tools
  • muons
  • electrons
  • photons
  • jets
  • met
  • tau

Documentation update checklist for future releases

Making a big checklist! Program lives in this Google doc and was presented in DPOA on 9/25/30.

EMPTY pages to fill:

  • Triggers: Start by summarizing and linking to workshop lessons
  • Luminosity: Get Brilcalc examples, they exist. Try to remove dropdown and have it as one page
  • #115
  • Interpretation: port workshop lessons / CMSDAS / Julie's EXO-23-006 summary for concepts. PyHF for calculation, as in #10
  • Systematics:
    • Lumi uncertainties: grab info and pubs from ~any of our papers
    • MC uncertainty: port Julie's info from EXO-23-006 as a starting point
    • Pileup uncertainty: grab info from pubs / TWikis. For 2016 this lives in correctionLib. For 2015 find docs
    • Lepton/photon uncertainties: write a basic description. For 2016 these are all in correctionLib. For 2015 and earlier find docs.
    • Idea to put a correctionLib instruction page here

Pages to rework / remove / big update:

  • FAQ: port things from the "troubleshooting" portal page over here, since we have better control over this repo for edits
  • Systematics --> Jet uncertainties: update into tabs for AOD/Mini/Nano, reference correctionLib for 2016.
  • Physics Objects: update into tabs for AOD/Mini/Nano and add new 2016 info. Do the "to-do"s in the pages
  • ID Efficiency Study: port most of this into a workshop-connected Carpentries site. Keep tag-and-probe basic information and introduce the corrections based on efficiency that are detailed in Systematics. T&P would be the main way people can calc their own SFs.
  • Unix: add WSL2 box info with the notes about avoiding Git Bash for future container use.

Pages needing small updates / tweaks:

  • Finding Data: update workshop link and/or give multiple
  • ROOT: check links and provide a workshop lesson link
  • Docker: port intro explanations from the VM page. Amp up the tabs by linking workshop lessons
  • Data Model: eventually add 2016 and NanoAOD
  • Analyzer: add links to workshop lessons for Run 1 / Run 2 EDM files. Note POET 2011 vs 2012 branches. Eventually add 2016 info
  • Configuration: same notes as above
  • Conditions Data: check the examples for anything out of date. Expand the info at the bottom to include Run 2
  • Collision Data: can the warning label go?
  • MC Simulations: watch for cross section updates to the record pages and describe them here
  • Event Generation: update the example to show production of Mini and Nano from AOD.
  • About: add more contributor names.

QCD background measurement

(from #7 - @slaurila )

Currently the QCD background is estimated using a same-sign control-region, and then normalized to the opposite-sign signal region with an ad-hoc transfer factor of 0.80. As suggested by Stefan, this transfer factor could and should be measured from a separate control region, as done for the actual Run-1 result. However, having separate control regions for different purposes can make the code a bit messy, so it might be better to have a separate script to determine this transfer factor.

Add a link to docker tutorial (workshop)

For the CODP getting started pages, to avoid writing detailed technical instructions in the portal pages, it would be useful to have an open data guide page with links to the current docker tutorial.

Now the open data guide docker page in points to the CODP docker page. This page needs rewriting, see cernopendata/opendata.cern.ch#3163

It would be better to have the open data guide docker page point to the latest tutorial : https://cms-opendata-workshop.github.io/workshop2021-lesson-docker/ It is far more complete than the CODP page and gives full instructions for different operating systems.

Or, if preferred, "import" the tutorial material to the guide. In both cases, for the 2015 data release, I will point for more information to the open data guide docker page.

Internal links from the public egamma page

The egamma public info page has the following internal links:

The information in these pages may have been already covered in the workshop tutorial and POET electron code, but we should either make the pages public or to add a mention of the information being covered to the page.

Rename Frontier to Condition data

Rename the section "Frontier" to "Condition data".
Frontier refers to distributed database caching system use for the LHC, but we recommend reading them from /cvmfs, therefore the naming should not refer to Frontier

How to get started page

We might need a section on how to get started. If someone is completely new to this, she might not know what to do first

CMSSW checklist

To-do list for the section "CMSSW".
The topics can link to an existing analysis or a tool, eg to https://github.com/cms-opendata-analyses/AOD2NanoAODOutreachTool or to https://github.com/cms-legacydata-analyses/PhysObjectExtractorTool, to show an example
In the list below, possible links to existing documentation which can be included to the text.

Pile-up corrections for MC

(from #7 - @slaurila )

Pileup corrections for MC: Three steps are needed to apply pileup corrections to MC in the HTT analysis:

  • Publishing the pileup distribution jsons for 2011/2012 data in the open data portal (could be skimmed to contain only the runs published as open data?). The pileup jsons can be found in here.
  • Adding the number of interactions (TrueNumInteractions variable, see here) to open data NanoAOD
  • Implementing the reweighting in the HTT analysis code: first calculate the PU distribution in data by combining the pileup json with the golden json, then extract the PU distribution from MC samples and calculate the weights as a function of TrueNumInteractions, finally apply these weights to MC

General Stuff Checklist

While not the heart of the documentation, important to fill out:

  • Write a brief introduction to the guide (front page)
  • Write the How to use this guide section in the front page
  • Take care of the FAQ
  • Take care of the About section

Add a disclaimer

Remove "official" in the welcome sentence "Welcome to the official guide for CMS open data".

Add a disclaimer inline with https://www.gnu.org/licenses/gpl-3.0.en.html or else to make it clear that this guide is provided by the CMS open data group, at best-effort basis.

Maybe:

Welcome to the CMS open data guide.

This guide is brought to you by the CMS open data group, on a best-effort basis. All software and instructions are provided "as is", without warranty of any kind. This is ongoing work and we appreciate your feedback and/or your help building this guide

@caredg your thoughts?

Update physics object page with 2015 and MINIAOD

Update https://cms-opendata-guide.web.cern.ch/analysis/selection/objects/ with 2015 and MINIAOD information so that it can be linked directly from the open data portal getting started with the 2015 data page

Needed for the updated portal getting started page, see cernopendata/opendata.cern.ch#3165

Note that https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2015 exists.
Many of the links on that page point to CMSSW_7_4_X, the 2015 OD is CMSSW_7_6_X but the information is likely valid.
It would be good to check if the code snippets are similar to those in POET.

((Where's the corresponding up-to-date page for UL MINIAOD?
The tables for WorkBookMiniAODYYYY in https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD go only up to CMSSW_9_4_X and do not have the page for 2018: UL MiniAODv1 so CMSSW_10_6_X.
The NanoAOD is in https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookNanoAOD and updated for the UL)

Computing tools checklist

To-do list for the first section "Computing tools".
None of these topics needs new example code.
In the list below, possible links to existing documentation which can be included to the text.

Missing information about the CMS collaboration

At the front page of the guide, there is no information about what kind of data cms open data is, from which experimental collaboration and what kind of information it contains (hadronic collisions for particle physics analysis). We need to add this.

Enable web analytics

We would like to set up a workflow to get web analytics for the open data page

Statistical interpretation of results

(from #7 - @slaurila )

Currenty, the final product of the analysis is a set of histograms that are interesting inspect qualitatively, but there is no way to conclude if we actually see some hint of Higgs in the data or not. After all selections, we have roughly 2500 signal events and 75000 background events, which gives a naive expected significance of 2500/sqrt(75000)=9.1 sigmas. This means that it is impossible to draw conclusions without including the systematic uncertainties (at least the dominating ones). However, it might be useful to provide a script that allows one to calculate the significance properly, and see how adding in syst. uncertainties for different processes (even if they are just ad-hoc numbers, some 10% here and 20% there) affects the significance. In practice, we would need something like this: http://dpnc.unige.ch/~sfyrla/teaching/Statistics/handsOn3.html

Updating and integration issues (slowness) at push

Hi @tiborsimko and @mvidalgarcia, I am not sure what is going on, but every time one pushes to the repository to update the site, the travis integration fails as there is no rule for 'make html' (no makefile) and it seems that nothing gets updated (this seems to have been the case since the very initial move to mkdocs). However, in the past, I have noticed that after a while, the modifications ARE propagated. This is troublesome right now because the delay is considerable and we are sort of running against time to populate these documentation and need to see the site (I can build it and see it locally, but still). I can't completely figure out how the modifications are propagated to openshift. It is probably through webhooks, but I might be just dead wrong. I'd love to play around, but I do not want to mess up anything. Could you please take a look? Thanks!

Update description of tabs

There are three main tabs to help you navigate the site. It starts with the **Computing Tools** most likely needed to deal with CMS open data. Then, there is a little review of **CMSSW**, which is the software used by CMS. Finally the **Analysis** section guides you through the different steps (in the most general order) that you need to follow for performing a particle physics analysis with CMS open data.

This gets outdated if someone adds a section tab. Maybe it can be rewritten to make it more general and avoid having to update it every time something changes

tag and probe package update

The material that was in a pull request should be moved to a tutorial. We need to contact the authors and see if they want to update it for the 2022 tutorials, and remove it from here.

Trigger efficiency

(from #7 - @slaurila )

If we can find an existing measurement for the trigger efficiency of HLT_IsoMu17_eta2p1_LooseIsoPFTau20 for data and MC, would it make sense to publish it in the Open Data portal e.g. as a json file? Then it would be very straightforward to apply the trigger efficiency scale factors in the analysis (again, could be done by @anniinakinnunen starting from the json). Alternatively, we should come up with a tag-and-probe example for measuring the trigger efficiency, which then moves this item to the "longer-term" list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.