Giter VIP home page Giter VIP logo

Comments (11)

julienguy avatar julienguy commented on August 20, 2024

flavor of quickbrick should also contain a 'mix' (or 'science') flavor to allow the test of target class identification.

from desisim.

sbailey avatar sbailey commented on August 20, 2024

I was embarrassed to tell the Milky Way Survey people that "newexp-desi --flavor science" was only for the dark time survey and was not very inclusive of their science. How very Berkeley of me. We should avoid "science" as meaning only the dark time survey, and use keys like "dark" and "bright" to mean various dark time survey or bright time survey mixes of object classes.

"flavor" was borrowed from SDSS exposure flavors (arc, flat, science...). Perhaps this option is better named "type" or "class" rather than "flavor"?

from desisim.

dkirkby avatar dkirkby commented on August 20, 2024

How about saving the true flux as an optional HDU in the standard brick format, to avoid adding three extra files? The same strategy could be used for the truth, but then would be duplicated for each camera so a separate file makes more sense here.

Note that the truth consists of more than TRUEZ. For example, which template was used for the simulation, was a DLA added, etc. I suggest defining a few required binary table columns and then allowing arbitrary additional columns.

from desisim.

sbailey avatar sbailey commented on August 20, 2024

We had considered packing the truth into extra HDUs or columns of the original brick files, but we also wanted the ability to temporarily hide the truth on a validation sample while still having it available for later comparison. It seemed odd to define that sometimes the extra truth information would be in the original files while other times it would be in a different file. That tipped the scales for keeping it in a separate file from the start.

Working through this does make me wonder if splitting the brick files into b vs. r vs. z is unnecessary. Grouping them together would mean extnames like FLUX_B, IVAR_R, RESOLUTION_Z, etc. and 15 HDUs per file instead of 3 files of 5 HDUs each, but that doesn't seem too crazy. Or we could even use the more flexible hierarchy of hdf5 (gasp) ... but that is beyond the scope of this particular ticket. For the most part the I/O of this should be an implementation detail at the boundaries of this code rather than baked into how it is fundamentally done.

Agreed about the required columns + optional columns. I'm not sure how to express that cleanly for truth files that contain a mix of target classes. OIIFLUX makes sense only for ELGs, while DLA makes sense only for QSOs, while REDSHIFT applies to both. Maybe pad with NaNs or default values if the column doesn't apply to that target type?

from desisim.

dkirkby avatar dkirkby commented on August 20, 2024

It sounds like there are two use cases, which nudge the design in different directions:

  • Simulations for pipeline development, where you want convenient and consistent access to the truth at each stage.
  • Blind data challenges where the data should look just like data and truth should be securely hidden from file consumers.

I recommend designing for whichever of these we expect to be used most heavily, whichever that is, to maximize the responsiveness of the design for most users at the cost of extra overhead for the edge case. If neither scenario is an edge case, they might need different solutions.

from desisim.

sbailey avatar sbailey commented on August 20, 2024

Julien made similar comments. Simulations for pipeline development is the primary use case so it seems that I should give up on the purity of a single data model for both cases and allow the truth to be optionally in the original data file or in a separate file.

from desisim.

moustakas avatar moustakas commented on August 20, 2024

This is a great suggestion. A few comments:

  • I prefer the idea of keeping "truez" as a separate file. In addition to OIIFLUX (I'm pushing against O2FLUX, @sbailey -- these are forbidden transitions, not molecules!), we'll want to keep things like D4000, Teff, metallicity, etc. -- all the physical parameters inherited from the templates that we may want to look at when doing our redshift success post-mortem. Padding with -1 or some other null value if the variable doesn't apply to the given template should be fine.
  • It would also be useful for some analyses, especially as we are developing our pipeline and redshift-fitting code, to be able to access the original-resolution template. For example, as we start making and examining QAplots of successes and failures we'll want to look at the convolved/noisy spectrum, the best-fitting model (from the redshift-fitter), and the true template/spectrum. This will help diagnose spectral feature/break confusion and other failure modes that are hard to disentangle from z vs z plots.
  • We also need to carry forward the photometry/colors in "truth". This is a problem for the QSOs templates right now because we don't have a robust way of predicting the WISE fluxes.

from desisim.

dkirkby avatar dkirkby commented on August 20, 2024

We could easily end up with tens of truth parameters associated with each template class, and the set of parameters (and possibly also template classes) will likely evolve. I think this argues for some sort of (key, value) dictionary data structure, with some required keys (TRUEZ, CLASS, ...) and minimal restrictions on additional optional keys.

The dictionary would naturally map onto a FITS HDU header, either in the brick file or an associated file, but other choices are also possible. I believe this covers the important use cases:

  • Code reading bricks and treating them like data can ignore the optional brick HDU or associated file.
  • Code reading bricks and treating them as generic simulations can count on required keys being present for generic analysis, and determine the list of available optional keys by introspection.
  • Code reading bricks simulated for a specific class knows what optional keys to expect and can provide class-specific analysis.

Are there any other use cases we should be considering?

from desisim.

sbailey avatar sbailey commented on August 20, 2024

I might be missing something about @dkirkby's suggestion, but I don't think it maps onto a FITS HDU header. For a given truth parameter, there will be one value per object for the objects to which it applies. HDU headers only naturally accommodate one value per parameter per HDU.

However, metadata tables for different template classes with different truth columns can still be combined using astropy.table.Table.vstack([elgmeta, lrgmeta, ...], join_type='outer'), e.g.:

In [26]: x = Table(dict(id=[1,2,3], a=[1,2,3], b=[4,3,2]))

In [27]: y = Table(dict(id=[4,5,6], a=[3,4,5], c=[2,3,4]))

In [28]: vstack([x, y], join_type='outer')
Out[28]: 
<Table masked=True length=6>
  a     b     id    c  
int64 int64 int64 int64
----- ----- ----- -----
    1     4     1    --
    2     3     2    --
    3     2     3    --
    3    --     4     2
    4    --     5     3
    5    --     6     4

Detail: join_type='outer' is different than what you get using astropy.table.join(), which is more DB-like but doesn't handle the common columns as we want for this case.

The masked values unfortunately don't survive a roundtrip to a FITS file -- they end up as zeros instead of masked values. We might be able to get them to be NaNs instead.

There is a prototype implementation of bin/quickbrick of the quickbrick branch of desispec. For the truth table, it uses the metadata table returned by desispec.templates.XYZ.make_templates(), to which it adds OBJTYPE and TARGETID columns and then writes this as an extname=_TRUTH HDU in the brick file. This v0 only supports one objtype per file, but multiple objtypes could be supported by stacking the metadata tables as described above.

To do before it will be ready for a pull request:

  • Simulate at high resolution and then downsample (slightly tricky for the resolution matrix data; specsim already supports this for the spectra themselves)
  • Sanity check on the S/N -- I generated 100 ELGs and ran them through redmonster using desispec/bin/desi_zfind.py and only one failed, which smells too good.
  • Add the ability to include multiple object types in a single output brick (though even without this, quickbrick could be useful).
  • Also missing but not strictly necessary (yet): ability to write truth to a separate file.

Even though it may be handy to write the truth table to the original brick files, I do want to retain the option to write it elsewhere to ensure that for final tests the code can't cheat, e.g. by using a truth value to make a quick photo-z prior.

from desisim.

dkirkby avatar dkirkby commented on August 20, 2024

I agree this doesn't fit the model of an HDU header and like Stephen's suggestion.

I wasn't thinking of the case where we merge different simulated template classes into the same output file. I suspect we won't be doing that often, but its useful to have the capability. In that case we definitely need a required 'CLASS' string (?) column, in addition to 'ZTRUE', to specify which optional columns are expected, and perhaps also a 'CLASSVERSION' column to support schema evolution.

from desisim.

weaverba137 avatar weaverba137 commented on August 20, 2024

I'm running quickbrick at NERSC right now, so I'm guessing this issue has been resolved?

from desisim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.