keck-datareductionpipelines / framework Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 26.71 MB

Project management and simple code repository for the DRP framework project

Python 50.81% Shell 0.39% Jupyter Notebook 48.80%

framework's People

Contributors

Watchers

framework's Issues

Create simulated data sets, FITS file following the header ICD

Generate a set of simulated data files (FITS files) which can be used to test our data model. These simulated data should cover a range of possible formats and should include formats which represent raw data from various instruments and formats which might exist only in intermediate processing steps.

One goal is to cover as many possibilities as possible to test that the data model (and eventually the primitives which act on that data model) correctly handle all aspects of the data including the array data itself, variance information, mask information, header information, and table data.

Types of input data

Single data extension with simple header.
Multi-extension. One data HDU, one variance HDU, both with headers.
Multi-extension. One data HDU, one mask HDU, both with headers.
Multi-extension. One data HDU, one variance HDU, one mask HDU, all with headers.
Multi-extension. One header HDU, two data HDUs with headers.
Multi-extension. One header HDU, two data HDUs, two mask HDUs, all with headers.
Multi-extension. One header HDU, two data HDUs, two variance HDUs, all with headers.
Multi-extension. One header HDU, two data HDUs, two mask HDUs, two variance HDUs, all with headers.
Multi-extension. One data HDU, one variance HDU, one table HDU, all with headers.
Multi-extension. One data HDU, one mask HDU, one table HDU, all with headers.
Multi-extension. One data HDU, one variance HDU, one mask HDU, one table HDU, all with headers.

We should confirm that basic mathematical operators (addition, subtraction, multiplication, etc.) implemented in simple primitives propagate each type of data (pixel data, header, mask, variance, table) properly.

Examples:

Addition of a constant should change data, but not mask or variance.
Multiplication by a constant should change data and variance, but not mask.
Addition of two images should propagate variance correctly if both input images have variance and should handle the case where only one input image has variance.
etc.

Note that handling of mask and variance is embedded in astropy.nddata.NDData and astropy.nddata.CCDData if we choose to use those.

Reproducibility summary

Describe how the requirements will support scientific reproducibility.

This is probably best done as a summary after other requirements are completed.

Record of processing history

Related to #3.

The system should have tools to generate a record of the processing history. The goal is to record enough information to reproduce exactly the resulting reduced data from the same raw inputs (data and arguments). This record should be relatively easy to read (for both humans and computers) and not contain excess information.

It should be possible to write code which parses the record and produces all the inputs necessary to run the pipeline to reproduce the same result. This code is not necessarily something provided by the system described here, but the need should be considered in the design.

Information Which Needs to be Recorded

Version of the recipe, primitive, framework, python libraries (i.e. numpy, astropy), and python
Arguments to the processing step
Result of processing step (i.e. did it complete successfully)

Implementation Possibilities

In principle, this information could be recorded to a log file, however, a log file has a slightly different goal and it includes a more linear history of what has happened and likely does not satisfy the need to make this simple to read without excess information.

A second possibility would be to record this information to the output data file (e.g. a FITS file header). This would be built in to our data representation.

A third possibility is to record this to a flat file on disk (i.e. a "proctable").

Finally, one could record the information to a database. This would be in some ways similar to the proctable described above, but has the disadvantage of being less human readable.

Users

science users: these are people who are trying to reduce data on a local machine under their control
night time users: people reducing data on the fly at the telescope, probably using facility (Keck or KOA) computers
remote users: people reducing data through a science platform (likely KOA)
Support Astronomers or other WMKO staff doing performance tracking or quality assessments
KOA, running the pipeline in fully automated mode for the archive

Stakeholders

pipeline authors: these are people who write recipes and primitives to reduce data from an instrument
support astronomers: who will answer user questions and trouble tickets and will do some software maintenance
software maintainers: who are not the original developers but are tasked to support the pipelines