Giter VIP home page Giter VIP logo

optimization's People

Contributors

chiara-rizzi avatar kratsg avatar lawrenceleejr avatar mattleblanc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

optimization's Issues

Add documentation about the `mul_bbb` error

13735 NotImplementedError: couldn't find matching opcode for 'mul_bbb'

this occurs because of multiplication between two booleans. Need to write it as 1*({0} < m_effective)*(m_effective < {0}+200) instead.

figure out with of the 4 SR is optimal for a given signal

@swiatlow: let's talk about signal regions now?

Max Swiatlowski [11:22 AM]
Cool

Max Swiatlowski [11:22 AM]
Did you see my slides?

Giordon Stark [11:23 AM]
the new ones?

Giordon Stark [11:23 AM]
with the 4 signal regions?

Max Swiatlowski [11:23 AM]
Yeah

Giordon Stark [11:23 AM]
I don't understand the 4 particularly.

Max Swiatlowski [11:23 AM]
4?

Giordon Stark [11:23 AM]
do you mean 2 signal regions for low lumi, and 2 for high lumi?

Max Swiatlowski [11:24 AM]
Yup

Max Swiatlowski [11:24 AM]
Low boost/ high boost

Max Swiatlowski [11:24 AM]
It's very approximate

Max Swiatlowski [11:24 AM]
The first thing to do is just implement these 4 SRs

Max Swiatlowski [11:25 AM]
Then check, for each lumi, which SR is best for each point

Max Swiatlowski [11:25 AM]
And what the significance is

Giordon Stark [11:25 AM]
so a SR == a cut?

Max Swiatlowski [11:25 AM]
If the sig is very different from the OPTIMAL you found... We might need to adjust, or add a new SR, etc.

Max Swiatlowski [11:25 AM]
SR is a set of cuts

Max Swiatlowski [11:26 AM]
It's the table on slide 15 or whatever

Giordon Stark [11:26 AM]
err, a SR = a supercut?

Max Swiatlowski [11:26 AM]
Yeah with all pivot

Max Swiatlowski [11:26 AM]
Just fixed cuts

Giordon Stark [11:26 AM]
gotcha. No problem.

Max Swiatlowski [11:26 AM]
Then, you check which Sr is best at each point, etc etx

Giordon Stark [11:26 AM]
so I run all 4 supercut's I have

Max Swiatlowski [11:27 AM]
Yeah

Giordon Stark [11:27 AM]
and just look at all the significances reported for a given signal region

Giordon Stark [11:27 AM]
find which of the 4 maximizes the signal region

Giordon Stark [11:27 AM]
and just make a grid showing 1,2,3,4

Giordon Stark [11:27 AM]
basically saying which one maximized that box?

Max Swiatlowski [11:27 AM]
Yup!

Parse DSID from directory-based structure as well

#@echo(write=logger.debug)
did_regex = re.compile('\.?(?:00)?(\d{6,8})\.?')
def get_did(filename):
  global did_regex
  m = did_regex.search(filename.split("/")[-2])
  if m is None:
    logger.warning('Can\'t figure out the DID! Trying format II ...')
    m = did_regex.search(filename.split("/")[-1])
  if m is None:
    logger.warning('Can\'t figure out the DID! Using input filename ...')
    return filename.split("/")[-1]
  return m.group(1)

Statistically insignificant warnings

we should make sure that it’s clear to use (with a flag, or just an abort, or something) when B (or S, though B is more likely) is statistically insignificant

Max Swiatlowski [4:42 PM]
People often do this by requiring B(unweighted) > 10 or something, for example

Change insignificance

Just to write this down, we decided in the meeting to optimize towards 0.5 for insignificance

Aviv Cukierman [12:55 PM]
In optimize the insignificance for signal and background is the same number

Aviv Cukierman [12:55 PM]
Just --insignificance

Convenience functions for calculating some cuts manually

Let's say you're given a series of output hashes that contain cuts. You should be able to do something like

from optimize import *
cut = load_cut('/path/to/hash.json')
trees = get_ttrees('....')
signal = get_signal(...)
bkgd = get_bkgd(...)
apply_cut(...)

Example JSON dict usage

[
{
    "branch": "multiplicity_jet",
    "min": 2,
    "max": "...",
    "stepSize": 1
  },
  {
    "branch": "pt_jet_rc8_1",
    "min": 50.0,
    "max": 250.0,
    "stepSize": 2.5
  },
]

Allow counting of branches that pass a certain cut

Given a list of branches (e.g. m_jet_largeR_0, m_jet_largeR_1, etc.), and a cut (e.g. ">200", or a window cut if that gets supported), create a value that counts the number of those branches that passed that cut. Then, allow us to cut on that value.

Perhaps in supercuts allow a flag called "derived" where you specify:
list of branches
initial cut (start,step,stop) (e.g. [200;100;500]) [or the initial cut could be fixed]
final cut (start,step,stop) (e.g. [0;1;4])

Note that if TA only stores the top 4 large R jets, e.g., as it currently does, then the final cut could only go from 0 to 4. But I think that would be fine (e.g. top tagging is usually 0-1).

Replace top tagging with mass cuts

 "selections": "((m_jet_largeR_1 > {0})+ (m_jet_largeR_2 > {0}) + (m_jet_largeR_3 > {0}) + (m_jet_largeR_4 > {0})) > {1}",
  "st3": [
    [100, 250, 50],
    [0, 3, 1]
  ]

% increase in significance plots

We will like a utility that is able to do this. The best idea is to separate the plotting functionality into two pieces.

One piece produces a plain-text file of the points and values at each point, while another piece creates the actual grid. This allows us to diff the text-files separately and not have to remake them... and makes this portion easier to do.

pdfs plz

for 3.1 results, all files in pdf format for INT note plz

New workflow

  • Create significances as a dictionary whose keys are the hash of the cut.
  • Run over a set of files to produce a set of event counts for the series of cuts specified by supercuts
  • potentially incorporate the calculation of significance into the plotting script

Add window cuts

Add another option to the cuts j-son, to flag a branch for exploration as a window cut. This is pretty easy to do in practice: you just have two iterators, for the start of the window and the end of the window. Start with the iterators some MINWINDOW size apart from each other, and advance the right-iterator by a step. Advance the left iterator to sweep all available windows. Reset the left iterator. Advance the right iterator. You can have an option to search for signal INSIDE the window, and outside the window (99% of the time we want inside, but just to be sure).

This is mostly useful for mass cuts on the re-clustered jets: would be good to see whether a window improves anything over a simple cut.

I would not worry about extending the generator to this case; this is something we can specify manually, when we have a good reason to expect a particular branch to want a window.

Find max significance for a subset of the significances

EG: load in the supercuts file, ask the user which cuts to fix and at what values -- and then just generate the hashes. A workflow would be

  1. new supercuts
  2. load it in
  3. loop over cuts
  4. calculate hash
  5. look up hash in significances file
  6. dump it into output

Giordon Stark [8:26 AM]
this gives us a speed up because we don't spend time recomputing cuts.

Add simple maths support

Would be good to add simple branch math support. This is the ability to specify in the json that we want to scan over a variable like "BRANCH1 + BRANCH2."

I'm not sure about the best way to do this, but TTree::Draw's branch manipulation syntax is very good at doing this sort of thing. I'm sure numpy has something clever as well.

Make luminosity/scaling part of optimize rather than cuts

Doing cuts takes a long time. Doing optimize takes a little time. If we want to quickly check different levels of luminosity (which we will, e.g. when we learn the final luminosity for 2015) then it makes more sense to make luminosity part of optimize. Might as well make all scaling part of optimize as well then, so cuts only returns a weighted (by event weights) value, and no weights file is necessary as an argument.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.