ottostruve / muler Goto Github PK

A Python package for working with pipeline-produced spectra from IGRINS, HPF, and Keck NIRSPEC

License: MIT License

Python 18.55% Makefile 0.04% TeX 81.41%

astronomy spectroscopy exoplanets stars stellar-atmospheres data-analysis astronomical-algorithms infrared echelle python

muler's Introduction

`muler` (μler)

version 0.4.0

A Python package for analyzing pipeline-processed data from high resolution near-infrared echelle spectrographs.

The echelle spectrum problem

Imagine you have just received data from the an echelle spectrograph like IGRINS or HPF and you want to start science. Oftentimes you will get handed pipeline-reduced data from the observatory facility. When you examine the data you may notice some remaining instrumental signals: telluric contamination artifact and instrument-induced slopes stand out. Or maybe you want to apply a barycentric correction based on telescope pointing information, but you're not sure which FITS header columns to use when multiple are available. You may want to normalize, deblaze (a.k.a. flatten), and plot the spectrum, or populate the spectrum into a pandas dataframe, or estimate a coarse radial velocity based on a noise-free stellar model atmosphere or a standard star template. Or maybe you want to measure an equivalent width, with or without an error bar. All of these operations are relatively routine, but their application, order, and algorithm choice may depend on the science case, and therefore they cannot be built into a default pipeline: it is up to you---the scientist---to conduct these activities.

Muler is intended to reduce the friction of getting started with near-IR echelle spectroscopy. Typical spectral analyses become only 2 or 3 lines of compact muler code, rather than hundreds of cells of Jupyter notebooks just to load and post-process a spectrum.

Plotting a sky-subtracted, flattened spectrum:

spectrum = HPFSpectrum(file=file, order=15)
spectrum.remove_nans().sky_subtract().deblaze().normalize().plot()

Measuring an equivalent width:

spectrum = HPFSpectrum(file=file, order=15)
clean_spectrum = spectrum.remove_nans().sky_subtract().deblaze().normalize()
ew = clean_spectrum.measure_ew()

Installation: `pip` and development version

We currently offer seamless installation with pip and conda! You can install muler in one line with:

pip install muler

conda install -c conda-forge muler

muler constantly changes and benefits from new community contributions like yours. We therefore recommend the slightly more tedious installation from the raw source code described on our Installation webpage. Installing from source code empowers you to modify the code for your purposes.

Our mission and your help

muler is aimed at collecting common spectral analysis methods into one place, simplifying the interactive process of astronomical discovery. We want to reduce the perceived complexity of working with echelle spectra, so that scientists have more time to focus on frontier science problems. We aspire to become a tool that new and experienced astronomers alike come to use and rely upon. In order to achieve these goals we need community contributions from practitioners like you.

That help can take many forms, even more creative outlets than most would guess. A great and easy first step is to ⭐ our repo or subscribe to notifications so that you can stay up to date as our project evolves. Glance around our tutorials, overall documentation, and Issues pages. Join our welcoming community and help us build some cool tools to make astronomy slightly easier, more reproducible, and more fun.

Spectrographs we currently support

We currently support the IGRINS, HPF, and NIRSPEC spectrographs. These three near-IR echelle spectrographs are attached to some of the greatest telescopes in the world: Gemini, HET, and Keck respectively. HPF differs from IGRINS and NIRSPEC in its precision radial velocity (PRV) design. It uses fibers instead of slits. We do not purport to address the extreme precision demands of the PRV community, but we anticipate our package is still useful to a wide range of HPF science use cases. We are open to supporting new spectrographs in the future, but at the present time we are focused on building, testing, and maintaining the core features for these three spectrographs before we add new ones. One way to get a new instrument supported is to make a new GitHub Issue to describe the instrument and rationale for adding it to muler, so other community members can join the conversation.

muler's People

Contributors

Stargazers

Watchers

Forkers

jessicaluna joelburke02 astrocaroline dkrolikowski yifzhou synapticarbors caponepascal ericasaw azliu0

muler's Issues

Roll back resample_list?

Hi @kfkaplan, I have some questions about the use cases for the resample_list method just to make sure I am understanding it correctly.

I just double checked the placement of resample_list method and see that it is in EchelleSpectrum. I think it might need to be relocated from EchelleSpectrum to one of two new locations. It's really an operation on a SpectrumList, so it could/should be moved to EchelleSpectrumList. Alternatively, it is an operation on a model, in which case it should be moved to gollum's precomputed spectrum object. Which of these two (or separate?) use case do you envision?

We currently expect most EchelleSpectrum objects to be a single order spectrum, in which case resampling that to a list does not make as much sense, if the list is composed of spectra of distinct spectral regions. Can you share a tutorial or example of how you would use the resample_list?

Reading in the IGRINS Wavelength solution

Hi Everyone,

I am currently poking around the muler IGRINS code and when I went to read in a reduced IGRINS spectrum, I realized it is trying to read the wavelength solution from the same reduced spectrum but that .spec.fits file had no wavelength solution in it. I am using the latest version of the IGRINS pipeline (from 2017) which will generate wavelength solutions for sky OH or standard star A0V spectra but the pipeline products outputted for these are seperate from the science targets. The .spec.fits files for my science targets do not have the WAVELENGTH keyword. Am I missing a step here or is there some updated version of the IGRINS pipeline I am unaware of? If not, I would like to modify the IGRINSSpectrum class to be able to include an option to read in a second file for the wavelength solution.

-Kyle

Change RA + DEC to qRA + qDEC

RA + DEC from goldilocks aren't always in the same units, we change the use to qRA and qDEC instead (which also encodes the pmRA + pmDEC corrections)

Base class for methods common to both IGRINS and HPF

Some methods will be shared between both IGRINS and HPF, and more generally any near-IR echelle spectrograph. It would make sense to create a base class that contains common methods for both spectrographs, such as normalize() and plot(). A working name for this class could be EchelleSpectrum.

Cannot Normalize IGRINSSpectrum Object

I've been trying to read in a few A0 normalized spectrum, but each I try to normalize the spectrum (just using spec.normalize()) I get the following error:

UnitTypeError: SpectralCoord instances require units equivalent to '(Unit("Hz"), Unit("m"), Unit("J"), Unit("1 / m"), Unit("km / s"))', so cannot set it to ''.

Can provide example data fits if needed, haven't been able to get normalize to work on any of them yet.

how does muler conduct the RV shift

Does muler has a built-in function to RV-shift a given spectrum back to its rest frame based on a user-defined RV value? If yes, how do we use such a function? Thanks.

trim_edges not working for IGRINS example data

New method brainstorming: Ideas for more actions you can do to a spectrum, ranked

Here is a brainstorm of things we can do to a spectrum. For each item we will put the estimated Implementation Difficulty (:keycap_ten: being the harder) and estimated Benefit to the community (:keycap_ten: being very valuable to the community).

.mask_telluric_regions( )
Telluric regions are sometimes either uncorrected, or corrected so poorly that their residuals would ruin a science application. Simply masking these regions can be an adequate solution for many applications. This method would require having a list of spectral regions affected by telluric lines, presumably a line list or experimentally determined mask. Which approach we choose will have differences in complexity to implement, with line lists being possibly outside the scope of muler.
Implementation difficulty: 7️⃣
Benefit to community: 8️⃣
.refine_wavelength_calibration(method='telluric')
Sometimes the wavelength calibration is poor and can be improved/standardized based on either information in the stellar spectrum itself or on the conspicuous telluric absorption lines. In practice this method is delicate and may be instrument-specific, and really should be the domain of the instrument pipelines that have access to more diagnostic information in the first place.
Implementation difficulty: 9️⃣
Benefit to community: 6️⃣
.to_pandas() and/or .to_excel().
Often you want to post-process a spectrum and then simply save it as a csv file to send to a collaborator or post on a website.
Or you may have some existing analysis code that assume dataframes. So being able to convert to pandas would help. We have this method in astropy Tables, so it should be straighforward to implement.
Implementation difficulty: 2️⃣
Benefit to community: 5️⃣
.combine_spectra()
Sometimes you have two or more spectra of the same object, wavelength range, and instrumental setup, and you want to combine them into a single high signal-to-noise ratio composite spectral template. In principle we can currently just add spectra together, one after another net_spec = spec1.add(spec2) or net_spec = spec1 + spec2 + spec3. The problem with this naive approach is what-to-do-with-the-metadata: should we be a bit smart about how to combine the dates? Should we warn the user to conduct RV registration first? The implementation difficult depends on how "smart" we want to be. To some extent the implementation is made easier by specutils' built-in methods. Should this be a method on a SpectrumList instead?
Implementation difficulty: 6️⃣
Benefit to community: 5️⃣
.cross_correlate(template)
It's common to want to cross correlate an observed, noisy spectrum, with a high SNR or noise-free template. RV applications and other detection-based techniques apply cross-correlation. specutils already has this method as a built-in, so we would only have to provide a lightweight wrapper. The main trick is making sure that the pre-processing and flattening steps have occurred so the cross-correlation is meaningful and effective. So we would want a high quality tutorial to go along with the implementation.
Implementation difficulty: 4️⃣
Benefit to community: 6️⃣

Barycentric RV correction (for HPF but also others), plus automatic Airmass estimation

Currently muler computes the Barycentric RV given the source coordinates, date/time, and site location using astropy's RV correction method. The Goldilocks FITS header provides a BRVCORR value that differs by up to 80 m/s from the value estimated by muler. Which one is correct? I tend to trust the @grzeimann value is correct.

There are two ways muler's estimate could be wrong. We are currently using (RA, Dec) from the TCS and not a standard like Gaia. We should use the Gaia value (make sure to shift by PMRA and PMDEC). Two, we use the Observatory site as McDonald Observatory, which probably uses Mt Locke instead of HET's Mt Fowlkes. We should use the HET lat/long.

While we're on the topic of headers, we should add an attribute that computes airmass. The HPF header does not report the realized elevation, so it should be computed based on the source coordinates, time, and location.

Update docs to describe new conda install option

We are officially on conda-forge. Yay! 🥳 We should revise the documentation to recommend conda as the easiest installation route:

conda install muler -c conda-forge

We may also want to add "developer documentation" with instructions on how to update pypi and conda-forge.

Submit a paper to JOSS

The time has come. muler appears to have reached some threshold level of maturity. It has a feature set that can accomplish the key tasks it was built for. It has months worth of user testing under a range of scenarios, and while it is not perfect, it's pretty good! It is relied upon for projects that are, themselves, destined for publication, and so a citeable reference is due. Also, its API has more-or-less stabilized to a point that it's not likely to have major breaking changes in the future.

So with all that said, let's submit a paper. We will plan to submit to JOSS, because they offer excellent two-reviewer reviews of the actual GitHub-based code. I don't believe it's possible to group muler with its sibling package gollum, but that's OK--- they'll each get their own standalone review.

What we mean by research software

muler is in-scope for JOSS because it meets these criteria:

Solves complex modeling problems in a scientific context
✅ It has detail sky fiber modeling built-in, deblazing models, and telluric masking
Supports the functioning of research instruments
✅ Makes it easy to use pipeline-reduced data from 3 distinct astronomical instruments
Supports the execution of research experiments
✅ Makes it easy to experiment with the order-of-operations of post-processing steps, and their tunable inputs
Extracts knowledge from large data sets
✅ Can distill a high-bandwidth echelle spectrum (~57,000 data points), into an industry-standard scalar summary statistic (e.g. equivalent width)

Substantial scholarly effort

muler has taken months of work, with many contributors, commits, issues, pull requests, and considerable on-line and off-line discussions put into its software architecture. It meets these requirements:

✅ enables some new research challenges to be addressed (e.g. sky subtraction, telluric masking, deblazing)
✅makes addressing research challenges significantly better (e.g., faster, easier, simpler)

Here are the numbers at-a-glance for muler:

Age of software: 1 year, 2 months
Number of commits: 254
Number of authors: 8
Lines of code: 1888 in the key source code, much more in docs and tutorials
Whether the software has already been cited in academic papers: papers currently in prep.
Whether the software is sufficiently useful that it is likely to be cited by your peer group. yes

Should we add a stitch method to SpectrumList-like objects?

It might be worth adding a .stitch() method to stitch e.g. HPF echelle orders into a single spectrum object. This spectrum is mostly for convenience and display purposes.

It's not clear what the returned object should be. Is it still, say, an HPFSpectrum Object? Or has it transformed in some way? Would making it an HPFSpectrum object break some methods that expect an HPF spectrum to have only ~2048 pixels at a time? There are some unintended consequences to think about here.

HPF deblazing by twilight flat field calibration spectra

Currently our deblaze method defaults to spline fitting. Spline fitting may require tuning for spectra with certain genuine large scale features and may introduce signal self-subtraction and bias comparable to target signals in some high precision applications, and so spline debazing should be considered for quick-look purposes only.

Instead, HPF observes twilight flat field spectra that should exhibit smooth continua ideal for deblazing. We should make this strategy the default. @jessicaluna already added support for deblazing by flats blaze_divide_flats, that method makes the following assumption for the input:

flat appears to be handed in as a numpy array derived from the FITS file with shape (3 x N_orders x N_wavelength) with (wavelength, flux, error) columns.

Ideally we would like to derive this (possibly pre-processed) flat array from the original FITS file, using muler. So we would want a new series of functions: pre_process_flat(flat_FITS_file_name) that creates that (3 x N_orders x N_wavelength) array.

The returned object should follow the new pattern for preserving metadata--- see examples of how other methods use return ________. The docstrings should house information on what the inputs and outputs of these functions are.

Tests went from passing to failing with no code change

The most recent commit failed, despite the only change being a tutorial. How could that happen? I am investigating! My first guess would be a dependency change, but we have a matrix of astropy/specutils/python dependencies that all went from passing to failing. It's not numpy because that was the same between the two passing and failing commits. Strange!

Make an intermediate base class for the SpectrumList methods common to all spectrographs

We've made such a base class built on Spectrum1D, now let's do the same for EchelleSpectrumList.

Instrumental broaden the default A0V template

We have previously uploaded a default A0V template as a 10,000 K PHOENIX model, useful in the telluric response estimation. We should have instrumental broadened this spectrum before uploading it. Here is the code to re-generate the file:

from gollum.phoenix import PHOENIXSpectrum
import pandas as pd

template = PHOENIXSpectrum(teff=10_000, logg=4.5).normalize()
# New augmentation: Assert vsin=90 km/s, R=55,000, and RV=0.
template = template.rotationally_broaden(90).instrumental_broaden(resolving_power=55_000)

dict_out = {'wave_ang':template.wavelength.value, 'flux':template.flux.value}

pd.DataFrame(dict_out).to_csv('../../../muler/src/muler/templates/PHOENIX_10kK_hpf_template.csv', index=False)

In practice, folks should really customize their telluric A0V template with gollum, but this simple template allows a quick look adequate for many purposes.

Project rationale, scope, and architecture brainstorm

The problem to solve

You have just received data from the IGRINS spectrograph and you want to start science. You may get handed plp-reduced data from the observatory facility or from a collaborator, or self-reduced with plp. When you examine the data you may notice some remaining instrumental artifacts: incomplete telluric cancellation artifacts remain.
These data artifacts hinder your science progress. Or maybe you want to apply a barycentric correction based on telescope pointing information in the FITS header, but you're not sure which FITS header columns to use when multiple are available. You may want to normalize the spectrum, or populate the spectrum into a pandas dataframe, or estimate a coarse radial velocity based on a template. All of these operations are relatively routine, but their application, order, and algorithm choice may depend on the science case, and therefore they cannot be built into a default pipeline.

Instead, IGRINS users develop bespoke methods for their science application of interest. This code development is relatively costly in time, as each user may "re-invent the wheel" for common operations that fellow IGRINS users have already solved. We wish to avoid the "re-inventing the wheel" phenomenon through encouraging code re-use, and a shared framework for IGRINS data analysis.

The solution and project scope

Here we propose a Python Package for plp-reduced IGRINS data. This package will introduce relevant Python containers that possess common attributes and routine methods. The inspirations for this project are principally lightkurve and specutils. Specifically, this project will encourage an API design, rather than a commandline/script-centric design. The scope of the project is limited to 1D echelle spectra, not the 2D raw frames, which are the domain of the plp. A key idea for the proposed package is that the plp essentially introduced a standard--- all IGRINS data have (roughly) the same attributes and data columns. This package will take advantage of that standardization, and can optionally accommodate data from previous plp version that disobey the current standard. The project will initially begin with local data only, with prospects in the future for querying and populating data from remote databases.

The key priorities of this package are:

Great documentation
Copious tutorials
Easy installation
Robust, uncontroversial, well-tested algorithms for routine operations
Community input and feedback (through GitHub Issues) and contributions (through GitHub Pull Requests)
(eventually) Programmatic interface to querying remote databases such as the Gemini API.

Project Architecture

I propose an architecture closely matching that of lightkurve, which has become an exemplar for ease-of-use within modern astronomical data analysis. The key leap of the imagination is to change from time-series analysis to spectral analysis. To first approximation that's merely by replacing time with wavelength. In practice, several echelle spectroscopy-specific concepts may make the mental mapping a bit subtle to architect. We also want to encourage the reuse of code from specutils.

Some architecture decisions to consider:

What are the routine operations users currently perform
What structure to adopt from lightkurve
What structure to adopt from specutils
What are the tradeoffs in choice of fundamental building blocks? (e.g. Are they based on dataframes, numpy arrays, astropy Tables, Spec1D instances? Something else?)
How granular/flexible to make the notions of objects---MultiOrderSpectrum? SingleOrderSpectrum? FlattenedSpectrum? TelluricCorrectedSpectrum? Or just a single "Spectrum" type object. Tradeoff in API complexity versus expressiveness?
How/where to limit the scope: what is a genuine research problem (out-of-scope), and what is well-agreed-upon operation (in-scope).

HPF Telluric correction with an A0V standard and doctored A0V template

We should add a tutorial, or possibly a method, for doing telluric correction with an A0V standard.

Key idea and preliminaries

It is standard practice in the near-IR spectroscopy community to multiply your spectrum by an instrumental-and-telluric-response-function computed in the following way:
instrument_and_telluric_response = observed_A0V / template_A0V

We will simply call it a telluric response for brevity, but the wavelength dependent response computed in this way also contains the conspicuous upside-down-U-shape blaze function. It may be worth dividing out the blaze function (c.f. #20) in the observed target spectrum and the observed A0V, so that the conspicuous blaze shape is gone, which would make the next steps much easier:

telluric_response = observed_and_deblazed_A0V / template_A0V

Doctoring the A0V template

The problem is that the template_A0V spectrum has to be augmented by vsini and RV. Even then, the sharp Hydrogen lines in the A0V are rarely accurate in the template, and therefore leave residual artifacts. Another problem with this method is that it adds random noise from the noisy observed_A0V, not to mention the bias from the imperfect template. For many science applications--albeit not PRV ones---these noise and bias sources are negligible: the quicklook ability to diminish telluric artifacts outweighs minor noise sources. In the future, we could consider fancier telluric mitigation strategies, but for now, just dividing by the standard template will unlock a lot of science. Building this machinery is a prerequisite to the fancier stuff anyways.

Implementation plan

The steps of the process are:

read in an A0V template from gollum Phoenix (use a SpT-T_eff relation to get a close estimate for T_eff for your actual observed standard star which may not be exactly an A0V)
(optionally) use the .show_dashboard(data=data) functionality of gollum to refine your vsini and RV best fit for the template. Eventually we could automate this step with a chi-squared search for vsini and RV. This step is made much easier if you have divided out the blaze pattern.
compute the ratio telluric_response = observed_and_deblazed_A0V / template_A0V, where you have doctored your template_A0V with the vsini and RV.
Finally, multiply your science target: corrected_target_spectrum = observed_target_spectrum * telluric_response

In the final step, you can see that deblazing by a reference is canceled out, the purpose of deblazing the observed A0V is simply to help find the vsini and RV without the conspicuous blaze curve muddling the comparison.

That's it! If you're using muler and gollum---which are built on specutils--- all of the error propagation is automatic. So your final corrected_target_spectrum will have uncertainties that reflect the noise introduced from the telluric correction process: the places with deep telluric lines will have huge uncertainties (low SNR), as we'd expect.

Possible Windows Issue?

Working with @18ganesha we have discovered what seems to be a platform-dependent muler bug. sky_subtract() raises a units error, which we thought was an astropy version problem, but we installed a fresh conda environment using the provided muler_dev python environment.yml file, and the problem persisted. Weird! The unit tests pass with some failures so that's a good place to look.

My hunch is that there is something wrong with the importlib resources approach that may go awry in Windows? I have a Windows dual-boot that I can use to test this out, so the task is:

Debug installation problems on Windows.
Investigate failed unit tests on Windows

Object Name in Plots

When looking at the tutorials in muler, I don't know what object I'm looking at when the plot is constructed. Is it possible to have the name of the target object appear within the title of the plot?

(upstream) Equivalent width fails for new version 1.4.1 of specutils

Failure as revealed in the new pip-install daily testing! Success for that idea.

Refactor the HPF spectrum class

As part of creating an EchelleSpectrum class, we want to refactor the HPF spectrum class. Here are some tasks:

Should sky subtract accept a sky flux vector or an HPFSpectrum class? It currently takes a vector assumed to be the same length.
blaze_subtract_spline should propagate the blaze correction into the uncertainty.
Standardize docstrings to either numpy style or Google style but not both. Probably numpy style?
Add parameters descriptor for all input parameters for new methods
Consider renaming time_edges to trim_edges_by_x or something to emphasize that the operation is on the original x coordinate, regardless of previous destructive operations.
How to handle the sky and LFC fibers--- which methods don't make sense anymore?
Add some attributes specific to HPF: a goldilocks or instrument_team flag, etc.
Alter the cached_HDUs kwarg to handle just one input spectrum, rather than the two needed for IGRINS.
Add back support for cached_HDU mimicking IGRINS

How to handle the sky and LFC fiber?

🏗️ Should we treat these as meta objects or as their own Spectrum1D/HPFSpectrum instances? Or both!

Add a method for estimating uncertainty from the data itself

We often want to estimate the SNR from the data itself. This operation is not unique and depends strongly on the properties of the data itself. A common and fair assumption is that data are near-critically sampled, with a few pixels per resolution element, such that the spectrum possesses many regions in which adjacent pixels share exactly the same input flux, but with only the noise values different. The input flux need not even be exactly the same, but merely smoothly-varying-in-wavelength so that a trendline perceives and removes the trend leaving only a residual spectrum with scatter.

From this residual spectrum we quantify the pixel-to-pixel scatter with a robust standard deviation metric, either homoscedastic or heteroscedastic with some sort of window-function-based-standard-deviation, like binned_statistic

Finally, an uncertainty derived in this way can be compared to the pipeline-reported SNR to gauge the reliability of the pipeline, or flag bad data. We could add a method or show a tutorial for overriding the pipeline uncertainty with this user-derived uncertainty in cases where the pipeline uncertainty is inaccurate.

We already have some of the tools in place to generate this residual spectrum: .smooth_spectrum() exists and is exactly what we aim for as the running-average model estimate.

Tutorial that shows chaining and the `.plot()` method

We should have a tutorial that shows method chaining and the .plot() method.

Add support for HPF

Currently we focus on IGRINS, let's also add ability to read in HPF spectra.

Add flexure correction?

Is there any way to shift and resample a spectrum in pixel space instead wavelength space? I need this for flexure correction for IGRINS data.

Astropy, Python, and Specutils version requirements

muler is tightly dependent on specutils, since it inherits from the Spectrum1D and SpectrumList objects. Accordingly the python and astropy version supported by specutils are automatically imposed onto muler.

I have experimentally verified which versions of Python, astropy, and specutils work with muler using a GitHub Actions build matrix. I find that no version of Python 3.6 worked. Astropy 4.0.6 did not work on any of the builds either.

For these reasons, I am inclined to require Python 3.7 or 3.8, and Astropy >=4.2, in order for muler to work. It is not clear how much of a hurdle these requirements would be for most users. We may be able to relax these with a bit more investigating...

Feedback from user testing with @astrocaroline

Today we conducted valuable user testing with @astrocaroline. Key insights:

HPFSpectrum(file) Does not work out of the box: file is a kwarg and not just an arg. I think that choice is baked-in by Spectrum1D accepting Spectrum1D(wavelength, flux, error), but that use case is extremely rare for our custom HPFSpectrum, and would indeed break most of our methods that assume certain metadata from the get-go. I propose we:

Study if it's possible to turn file into the only arg?

HPFSpectrum(file=file, order=???) we currently default to order=19, to ensure that we reutrn back a single echelle order. We have intentionally made the single order the default pattern for most applications. Eventually, we should consider support for stitching orders. But then the spectra become too bulky to plot at full native resolution. We may one day want to support a scenario where the user passes order='all' or order=[1,2,5], we return a SpectrumList object (which we already support), and then the user can stitch the orders together.

Or maybe we state that HPF spectrum defaults to returning a SpectrumList (aka order='all'), but then the common usage pattern is: spec.get_order(10).normalize().... That's great because we can implement a method .get_order(wavelength=1.0830*u.micron) that fetches the single order of interest. I like this idea the most.

Implement a get_order() method on SpectrumList objects, and add a flag for HPFSpectrum(order='all')

Finally, we found that the order of operations matters, which we knew, but that such ordering should be explained better or maybe trigger a warning? For example deblazing caused problems when it was not previously normalized.

Add a warning when it's clear that certain expected operations have not occurred yet (e.g. spectrum has NaNs or is not normalized).

Subtle Error propagation considerations

Sky subtraction with the HPF sky fiber is a little subtle for two reasons:

The target and sky fibers have slightly different wavelength solutions. For non-PRV applications those differences may be negligible.
Subtracting a noise-free array from a Spectrum1D object does not propagate uncertainty, whereas subtracting a Spectrum1D from another Spectrum1D does propagate the uncertainty.

target_spectrum = HPFSpectrum(file=file, order=15)
sky_spectrum = HPFSpectrum(file=file, order=15, sky=True)

sky_free_spectrum = target_spectrum.sky_subtract(sky_spectrum.flux)

sky_free_spectrum_alt = target_spectrum - sky_spectrum

I think the bottom line accounts for both effects, but I'm not sure about how it does the wavelength interpolation.

Compatibility with specutils 1.5 and astropy 5.0

I just spot-checked muler provisioned with specutils 1.5 and astropy 5.0. Remarkably it appears to work right out of the box! It even works with gwcs 0.17.1, which was previously disallowed for our muler install because of issue #70 .

This ease of install is a good thing. It raises the issue of how to handle backwards compatibility. Should we demand that users upgrade to specutils 1.5, gwcs 0.17.1, and astropy 5? That may have some backwards breaking changes for folks. It also will probably break whatever conda environment they currently have.

On the other hand, how to we support some gwcs-specutils-astropy combinations and not others? Maybe the easiest thing to do is disallow gwcs 0.17.0, and all other astropy-specutils combos should just work? Not sure! I suppose we could add gwcs to our existing build matrix.

We may also want to write some guidance in the docs on how to make a fresh conda environment if you're upgrading to specutils.

By the way, the tests are only failing due to an experimental commit to the equivalent_width method.

HPF team uncertainty missing a square root of the variance operation

@jessicaluna points out that the HPF-instrument-team-provided reduction pipeline delivers variance (uncertainty-squared) instead ofuncertainty, as the Goldilocks pipeline does. Our user testing on the HPF-team reduction is fairly minimal (we've been sticking with Goldilocks these days) so the bug hasn't affect most users so far as I can tell. Anyways, it's a quick fix.

How / whether to support synthetic spectral models?

We often want to compare data and models. So far muler only knows about real data observed with a telescope. But in principle there is nothing stopping us from making classes parallel to HPFSpectrum and IGRINSSpectrum called, say, PHOENIXSpectrum, or SonoraSpectrum.

These synthetic spectral models would have no uncertainty associated with them, and they would have additional metadata about their physical labels. But the units and math operations would work out-of-the-box since a SonoraSpectrum would have units and common methods just like any other Spectrum1D instance. Some methods like .estimate_snr() or .deblaze() wouldn't work or make sense, but others like .measure_equivalent_width() would work and make sense.

The main architectural decision to make is:

Whether we want to make an entirely separate package to deal with Synthetic spectra, call it say buler.
Whether we want to keep Synthetic spectra and Data spectra under the same muler roof.
Whether we want to make Synthetic spectra a part of the blase package.

The main reason to make an entirely new package is to be more general than just echelle spectra. muler doesn't currently and may never support low resolution spectra for example. We already have a parallel project blase intended to compress/emulate the synthetic spectral models, so maybe that's a natural forum for dealing with synthetic spectra.

I'm on the fence here! The solution may be one of expedience. If we can develop blase fast enough, it can gain these new features. But if not, muler will prevail. blase also has the impediment of being written in PyTorch, which serves as a slight barrier to entry for installation and contributions.

Add a cross correlate method

Cross correlation is a useful technique for template matching, and can be tricky to setup for new users (how to treat boundaries, resampling, etc). Here we propose to add a .cross_correlate(input_spec) method.

One interesting implementation question is what-object-to-return? A Spectrum1D seems not-quite-right because the cross correlation has a relative and not absolute spectral_axis, and it's not clear that specutils can handle relative coordinates?

Feature requests after working with IGRINS data

I was working with some IGRINS data and came up with some feature requests. Some of these are duplicates, but it's a useful datapoint to see it come up twice.

Implement a cross-correlation method in muler
Implement a telluric masking feature for IGRINS in muler (we already have this for HPF)
Implement a continuum-tilting method, possibly "warp-to-model" or something. I think we have that for gollum, warp_to_data. Alternatively we could do "normalize_to_wavelength(s)" that you hand in the wavelength locations and it fits a smooth continuum to it.

Flatten continuum method

We want a .flatten() method, which is essentially just dividing the input spectrum by its smoothed version. The main challenge here is deriving the smoothed version with appropriate masking.

Flatten the full spectrum with a black body

Now that we have .deblaze() working, we are left with a Rayleigh-Jeans looking spectrum from order-to-order. Let's add a method to divide this out with a black body, which should render a nearly flattened spectrum ideal for cross-correlation with a template.

Ancillary files

Model file locations and download instructions could be made more clear in the muler documentation.

We should put NotImplementedError placeholders planned methods

We should put NotImplementedError placeholders planned methods. This practice will help show what methods are possible but just happen not to be implemented since the package is so new. Here is a brainstorm of such methods:

.match_stellar_model(model_spec) would resample an input high resolution synthetic stellar spectrum to the observed spectrum's spectral resolution and wavelength sampling
.match_telluric_model(model_spec) would resample an input synthetic telluric spectrum to the observed spectrum's spectral resolution and wavelength sampling
.interact_model() when called from a Jupyter notebook would display an interactive bokeh dashboard for either stellar or substellar spectral libraries. We have a working demo for substellar atmospheres called intuition.
.cross_correlate(input_spec) would cross correlate an input spectrum with the observed spectrum, returning the cross correlation signal as a function of velocity separation.
.fill_nans() would fill NaN's with an interpolated signal rather than simply drop them.
.refine_telluric_correction() would refine the telluric correction provided by default from e.g. IGRINS. Occasionally the telluric division is offset and yields spike artifacts from over and under correction. This method would adjust for such issues.

Some of the methods require access to synthetic spectral models. We'll probably want to think about how to support synthetic spectral models. Do we want that to be a separate package or a module of muler?

Feature Requests

I was working with a few IGRINS spectra today and found a few things that would be nice to integrate:

customizing the muler.echelle.EchelleSpectrumList function so the list can include specific orders (eg: I wanted to stitch together an IGRINS spectrum but exclude orders on each side that are generally not used for science due to noise)
the ability to create Spectrum objects from existing wavelength and flux quantities (eg: like lightkurve's LightCurve constructor, EchelleSpectrum.wavelength = arr; EchelleSpectrum.flux = arr)
utility that allows you to slice a SpectrumList and have it return a SpectrumList

New Constraint on the wavelength dependence of HPF Science Fiber to Sky Fiber Ratio

We got new data last night on blank night sky with HPF! We can now directly measure the ratio of night sky emission lines between the science and sky emission lines. For context in #33 we found a ratio of around 0.93, but varying past 0.96 to the blue. Here is what the emission lines look like:

We get about the same ratio for the line near 10830:

ping @dkrolikowski @jessicaluna

Simplify combined_spectrum_misaligned with bin_edges attribute

Specutils has a "bin_edges" attribute--- awesome! We should use it:

spec.spectral_axis.bin_edges.value

This attribute will simplify our combine_spectrum_misaligned method.

Documentation standardization

The documentation is great, yay! In a few instances the documentation lacks informaiton on the input parameters. The docstring style follows either numpy-style or google-style. To improve the user experience we can:

Standardize docstrings to either numpy style or Google style but not both. Probably numpy style?
Add parameters descriptor for all input parameters for existing and new methods

Documentation helps users and contributors alike, and looks and feels really pro when it just works and can be relied upon.

Add an SNR attribute

We often use the signal-to-noise ratio (SNR) in our user tests and other applications. We should add an SNR attribute, which is simply the ratio of flux to uncertainty.

This feature would be implemented as an attribute and not a method, so it should have a noun as its identifier and take no arguments. SNR should be dimensionless. If Uncertainty is zero or undefined we could either return an error, zero, or infinity, or estimate it from the data itself.

signal_to_noise_ratio = spectrum.snr

Error message when using sky_subtract

Running the current version of the code, I get an error message when I run sky_subtract that looks related to astropy. I can solve this by adding "import astropy" and rerunning.

Implement Equivalent Width Measurement method

Today @18ganesha implemented a .measure_ew() method and using test driven development. Our next step will be to copy our custom EW code from existing Jupyter notebooks into that method.

Integration with Google Drive Python API for querying IGRINS data

Yesterday @gmace announced that IGRINS will release a Google Drive archive of all IGRINS data from 2014-May 2021. 🥳 Yay! The release will come in June of 2022.

It is worth considering how/whether muler can support automatic fetching of the reduced IGRINS data. We could use this Python API to Google Drive to request and return the object, saving it locally. Then use muler to read in that file. The ideal goal would be to resemble the lightkurve method .search_targetpixelfile() and its siblings .search_tesscut() etc. In practice I anticipate some problems based on my experience with lightkurve--- it is difficult to support search methods like these since they depend on changes to the Archive. Close communication between developers and database administrators is sometimes needed. Another problem is standardization--- sometimes there are multiple ways to do things: which way(s) would we support? Another friction point is authorization. While the archive is expected to be public public, usually these APIs need to know who you are, which may require a Google account credentials, and a way to enter that information secretly.

As always, it may be that muler doesn't actually support or require the google-api, but that we simply have tutorials showing how to use the google-api in tandem with muler. That would alleviate our support burden at the expense of being a bit clunkier for users. Ideally we'd support the queries, and abstract away all the boilerplate.

Overall streamlining a way to fetch data and analyze it in one-fell-swoop will be transformative for IGRINS. I've seen how awesome it is for Kepler/K2/TESS, so I look forward to the same for IGRINS.

Include example data on the GitHub project

We should package example data with the GitHub project so that folks can run the tutorials without having to do anything special. The data should be small enough (<10-50 MB) to fit on GitHub without noticing too much.

Increase test coverage by targeting lines of code missed in code coverage

Most methods should be tested automatically to make sure the methods function as expected under a range of inputs. I quantified the test coverage using the coverage tool. A snapshot of the coverage today shows these results:

Module	statements	missing	coverage
Total	931	168	82%
src/muler/init.py	0	0	100%
src/muler/echelle.py	247	78	68%
src/muler/hpf.py	157	45	71%
src/muler/igrins.py	103	22	79%
src/muler/nirspec.py	91	9	90%
src/muler/templates/init.py	0	0	100%
src/muler/utilities.py	7	0	100%
tests/test_hpf.py	105	0	100%
tests/test_igrins.py	102	14	86%
tests/test_nirspec.py	79	0	100%
tests/test_utilities.py	40	0	100%

We see that HPF, echelle and IGRINS have coverage in the 68-79% range. We would like to increase the test coverage to the 80-90% range by adding more tests in those places that currently lack them.

There are methods for automating this test coverage. For now, I recommend we just manually check the coverage locally on demand, and try to get most new methods to add a test.

Bug with Specutils radial velocity comparison

Specutils offers a neat concept of the spectral_axis which makes it amenable to frequency or wavelength axes, and supports redshifting, etc. We currently use the radial_velocity= feature to let specutils do all the Doppler shifting for us. The problem is that specutils does handle the comparison of two spectra with radial velocities defined:

spec1.radial_velocity = 5*u.km/u.s
spec2.radial_velocity = 7*u.km/u.s
net_spec = spec1-spec2 # How will it handle the velocity?
net_spec.radial_velocity # shows 0.0 km/s

There's a kwarg that can nominally support a callable for what to do with these different spectral axes:

net_spec = spec1.subtract(spec2, compare_wcs='first_found')

...but I haven't figure out how to make a callable that simply subtracts the Doppler shifted spectra? Other bugs have arisen from the use of radial_velocity and I am simply inclined to immediately make a zero-RV copy of each spectrum with a pristine spectral_axis set to the Doppler shifted wavelength as a permanent workaround. This hacky looking workaround appears to achieve this goal:

spec1.radial_velocity = 5*u.km/u.s
spec1._copy(spectral_axis=spec1.wavelength.value*spec1.wavelength.unit)

So I will inject this code directly after the methods that rouch the radial_velocity