Giter VIP home page Giter VIP logo

gdt-core's Introduction

Gamma-ray Data Tools - Core Components

The Gamma-ray Data Tools (GDT) is centralized toolkit for hard X-ray and gamma-ray astrophysics data analysis, with a focus on providing a uniform interface to the data provided by several different missions and instruments.

The GDT Core Package (astro-gdt) contains the core components of the GDT that can be utilized for various instruments and is a generalized version of the Fermi GBM Data Tools. Individual mission or instrument packages can be developed using astro-gdt and released under the gdt.missions namespace (see astro-gdt-fermi as an example).

The full documentation can be found here.

Normal Installation

If you don't plan to contribute code to the project, the recommended install method is installing from PyPI using:

pip install astro-gdt
gdt-data init

The gdt-data init is required to initialize the library after installation.

Contributing to the GDT

Community contributions to the GDT are welcome and make the GDT a better toolkit for everyone to use. There are various types of contributions. These include:

  • Bug fixes
  • New features
  • Documentation improvements
  • API improvements
  • New mission packages

For each contribution, we consider it best practice to first open an issue (with an appropriate label). This may be as simple as notifying us (and other users!) that there is a bug. Perhaps you also have a proposed solution to fixing the issue. If so, then you can outline your proposed fix when you open the issue, and we will give you feedback about whether we think that would be a useful and appropriate fix.

Simple typographical fixes and clarifications made in the documentation do not require the creation of an issue.

If your proposed modifications are significant (e.g. sizable document change, API improvements, new feature), we highly recommend that you detail the propose change in the issue and wait for feedback. This is to save you precious time in the event that we decide not to accept your proposed solution. Often the proposed solution can break functionality elsewhere or can be simplified, and we would like to have a chance to provide useful feedback before you begin coding. Waiting for feedback is not a requirement, but merely reduces the chance of additional changes prior to having your pull request accepted.

If you are submitting code modifications, we require that you create a unit test to confirm expected operation of the code if those modifications aren't already covered by an existing unit test.

The usual sequence of events are:

  1. Create an issue describing the proposed changes.
  2. Waiting for feedback if desired.
  3. Create a fork from main branch.
  4. Use your fork to add your changes to the code base.
  5. Create unit tests showing that your changes work as intended.
  6. Create Pull Request with a comment explaining how it closes the issue you created.

Setting up a development environment

If you do want to contribute code to this project (and astro-gdt), you can use the following commands to quickly setup a development environment:

mkdir gdt-devel
cd gdt-devel
python -m venv venv
. venv/bin/activate
pip install --upgrade pip setuptools wheel
git clone [email protected]:USRA-STI/gdt-core.git
pip install -e gdt-core/
gdt-data init
pip install -r gdt-core/requirements.txt

This should result in git-devel having the following directory structure:

.
├── venv
└── gdt-core

with gdt-core installed in the virtual environment named venv.

Writing Extensions using Namespace Packaging

Gamma-ray Data Tools encourages missions to write extensions using namespace packages. Please use our Fermi extension as an example of how we expect other missions to contribute extensions to the Gamma-ray Data Tools.

The extension package should contain a directory 'gdt' with a subdirectory 'missions' which will hold the extension code in a package directory named after the mission.

For example, GDT-Fermi has the following directory layout:

.
├── config
├── dist
├── docs
├── src
│   └── gdt
│      └── missions
│          └── fermi
│              ├── gbm
│              │   └── __init__.py
│              ├── lat
│              │   └── __init__.py
│              └── __init__.py
└── tests
  └── missions
      └── fermi

Since GDT-Fermi uses namespace packaging, both src/gdt and src/gdt/missions do not contain a file named __init__.py. This is because they are Namespace packages.

Notice that directory src/gdt/mission/fermi and its subdirectories contains an __init__.py file signalling to Python that those directories are regular packages.

You can learn more about Namespace packages by reading PEP-420.

Helping with Documentation

You can contribute additions and changes to the documentation. In order to use sphinx to compile the documentation source files, we recommend that you install the packages contained within requirments.txt.

To compile the documentation, use the following commands:

cd gdt-core/docs
make html

gdt-core's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gdt-core's Issues

"Nan" 'el' Values from 'gdt.core.coords.spacecraft', SpacecraftFrame.

Sollution

@a_coords.frame_transform_graph.transform(a_coords.FunctionTransform, a_coords.ICRS, SpacecraftFrame)
def icrs_to_spacecraft_mod(icrs_frame, sc_frame):
"""
Modified child class of 'icrs_to_spacecraft' to remove 'Nan' vale of 'el'.

"""
xyz = icrs_frame.cartesian.xyz.value
rot = Rotation.from_quat(sc_frame.quaternion)
xyz_prime = rot.inv().apply(xyz.T)
if xyz_prime.ndim == 1:
    xyz_prime = xyz_prime.reshape(1, -1)
az = np.arctan2(xyz_prime[:, 1], xyz_prime[:, 0])
mask = (az < 0.0)
az[mask] += 2.0 * np.pi
el = np.pi / 2.0 - np.arccos(np.clip(xyz_prime[:, 2],-1,1)). ###### np.clip helps to avoid the Nan values
return type(sc_frame)(az=az * u.radian, el=el * u.radian,
                        quaternion=sc_frame.quaternion)

Install not working

I’m currently trying to install GDT on my computer and I’ve run into an issue that I can’t figure out. I used the pip install command as it says on the docs page. However, when I run gdt-data init I receive the following error:

Traceback (most recent call last):
File "/home/unholysun/Documents/GDT/GDT/bin/gdt-data", line 44, in
from gdt.core.heasarc import FileDownloader
File "/home/unholysun/Documents/GDT/GDT/lib/python3.8/site-packages/gdt/core/heasarc.py", line 379, in
class FileDownloader(BaseFinder):
File "/home/unholysun/Documents/GDT/GDT/lib/python3.8/site-packages/gdt/core/heasarc.py", line 402, in FileDownloader
def bulk_download(self, urls: list[str], dest_dir: Union[str, Path], verbose: bool = True):
TypeError: 'type' object is not subscriptable

I’m not sure if this is user error or perhaps something specific to my setup. I’m running Ubuntu through WSL2 in Windows 11. I have all required packages for the requierments.txt file installed and a virtual environment set up.

Request for consistent handling of time values between spacecraft frames and tte objects

@BillCleveland-USRA @AdamGoldstein-USRA

Currently when working with poshist history information from the GBM mission a user can do:

poshist = GbmPosHist.open(pos_file)
frames = poshist.get_spacecraft_frame()

t = Time(val, format)
frame = frames.at(t)

where a Time object is used to retrieve a spacecraft frame. In this case, the user does not need knowledge of the underlying time format used in the mission. They only need to create a valid time object.

However, this is incompatible with the time format in tte files:

tte = GbmTte.open(tte_file)
frame = frames.at(tte.trigtime) <-- results in an error

because the tte class has no inherent knowledge of the underlying time format from the mission.

Would it be possible to require knowledge of the time format in the core tte class? Inherited classes from missions would then be required to set this value, similar to how GbmPosHist knows the underlying time format needed to generate spacecraft frames. Functions like splice_time could then allow for either float or time class input with something like:

def __init__(self, data, trigtime..., time_format):
    self. format = time_format (need to figure out how to handle class + format string)

def splice_time(self, tstart, tstop)
    if isinstance(tstart, Time):
          tstart = getattr(tstart, self.format)

this would allow users the ability to provide Time values instead of needing to know the internal mission time formats themselves. You could also have a separate splice_time function for a formatted input if you want to avoid slowing down the regular float method. The benefit here is that you avoid having to caste the underlying data to astropy time objects.

There is also a need to return a Time class value for tte.trigtime

What do you think? Should I prototype this for a pull request?

Add a tolerance when computing "good segments" for ExposureBins and TimeEnergyBins

ExposureBins and TimeEnergyBins calculate "good segments," which are contiguous segments of data separated by bad time intervals (typically no data). This is determined by comparing the high edge of a bin to the low edge of the next bin, and if they are not the same, then the end of the good segment is declared, and a new segment starts. The implementation in ExposureBins, for example, is

mask = (self.lo_edges[1:] != self.hi_edges[:-1])

There are cases when self.lo_edges[1:] and self.hi_edges[:-1] are slightly different, usually by rounding/numerical precision limitations, especially on older files that use 32-bit floats. This causes an incorrect segmentation of the data. An improvement is to implement a tolerance:

mask = np.abs(self.lo_edges[1:] - self.hi_edges[:-1]) > tol

where tol is the tolerance in seconds allowed. As an example, tol=1e-4 is sufficient for BATSE continuous data.

The keyword option of tol could either be added as an option to the _calculate_good_segments() method and passed via an option in the initializer, or have the initializer set a private tolerance variable and _calculate_good_segments() uses that private variable.

`SkyPlot` heatmaps aren't rendered correctly when values exist with RA close to 0 or 360 degrees

I've noticed this issue when visualizing localizations with EquatorialPlot, where the probability heatmap isn't rendered correctly when the centroid RA is close to 0 or 360 degrees. This can be illustrated using a local copy of glg_healpix_all_bn220130968_v01.fit:

from gdt.core.plot.sky import EquatorialPlot
from gdt.missions.fermi.gbm.localization import GbmHealPix

gbm_healpix = GbmHealPix.open("./glg_healpix_all_bn220130968_v01.fit")
gbm_healpix.centroid
(357.87735849056605, -50.480044262442945)
eqplot = EquatorialPlot()
eqplot.add_localization(gbm_healpix, gradient=True)

The following plot is generated:

broken_eqplot

According to the GBM catalog, the correct skymap should look like:

glg_skymap_all_bn220130968_v01.png

The issue appears to be caused by the transformation of specified lon/lat arrays to SkyCoord in SkyMap.plot_heatmap():

coords = SkyCoord(x.flatten(), y.flatten(), frame='gcrs', unit='deg')
try:
coords = coords.transform_to(self._astropy_frame)
except:
pass
lon, lat = get_lonlat(coords)
lon = lon.reshape(x.shape)
lat = lat.reshape(y.shape)
heatmap = SkyHeatmap(lon.value, lat.value, heatmap, self.ax,
frame=self._frame, flipped=self._flipped, **kwargs)

In the case of localizations, the lon_array specified to plot_heatmap() will contain values in $[0, 360]$. However, the call to SkyCoord() will convert the last 360 value to 0. E.g.:

from astropy.coordinates import SkyCoord
SkyCoord(ra=[0, 180, 360], dec=[45, 45, 45], unit="deg")

<SkyCoord (ICRS): (ra, dec) in deg
    [(  0., 45.), (180., 45.), (  0., 45.)]>

This appears to cause the issue when subsequently rendering the heatmap. Note that it's only visible when centroids straddle RA=0/360 degrees, other localisation centroids are rendered correctly. I've also noticed a similar issue with the contour lineswhere although it doesn't impact the visualisation as much as the heatmaps, the lines are broken, and I assume it's also due to the use of SkyCoord.

I have a local workaround that overrides plot_heatmap() to exclude the call to SkyCoord(), where the specified lon_array and lat_array are passed directly to SkyHeatmap, similar to the previous GBM Data Tools implementation:

from gdt.core.plot.plot import SkyHeatmap

class FixEquatorialPlot(EquatorialPlot):
    def plot_heatmap(self, heatmap, lon_array, lat_array, **kwargs):
        """Plot a heatmap on the sky.
           This is a workaround for the issue in the parent SkyPlot.plot_heatmap(),
           where the meshgrid results in a lon_array starting/ending with 0 degrees.
           This causes issues where the heatmap generated from a HEALPix location
           is populated at both RA=0/360 degrees, resulting in an unexpected output
           from the subsequent call to Matplotlib's pcolormesh().

           The old GBM Data Tools didn't generate a meshgrid, and just passes the
           specified coordinate arrays unchanged.

        Args:
            heatmap (np.array): A 2D array of values
            ra_array (np.array): The array of longitudinal gridpoints
            dec_array (np.array): The array of latitudinal gridpoints
            **kwargs: Options to pass to :class:`~gdt.plot.plot.SkyHeatmap`
        """
        heatmap = SkyHeatmap(
            lon_array,
            lat_array,
            heatmap,
            self.ax,
            frame=self._frame,
            flipped=self._flipped,
            **kwargs
        )

        return heatmap

This results in a correct rendering of the heatmap, similar to that contained in the catalo:

fixeqplot = FixEquatorialPlot()
fixeqplot.add_localization(gbm_healpix, gradient=True)

fixed_eqplot

I can create a PR with this fix but I wanted to check first whether the SkyCoord usage was required for other reasons.

Pstat does not handle zero counts

The Pstat likelihood does not handle zero-count bins. While this is noted in SpectralFitterPstat, this can be rectified by including the special-case calculation of likelihood for zero-count bins:

$\mathcal{L}_i(S_i = 0) = t_s * (m_i + b_i)$

where $t_s$ is the source exposure, $m_i$ is the source model count rate for that bin, and $b_i$ is the background model count rate for that bin. Without this correction, using Pstat when many bins have zero counts can significantly bias the fit.

Error in heasarc.py

from gdt.core.heasarc import BrowseCatalog

Error Returned:

Traceback (most recent call last):
File "", line 1, in
File gdt-core/src/gdt/core/heasarc.py", line 379, in
class FileDownloader(BaseFinder):
File "gdt-core/src/gdt/core/heasarc.py", line 402, in FileDownloader
def bulk_download(self, urls: list[str], dest_dir: Union[str, Path], verbose: bool = True):
TypeError: 'type' object is not subscriptable

Slow TTE file read behavior

I noticed that opening TTE files takes about 3x longer when using GDT compared to older GBM data tools. The slowdown traces to a list(events) conversion happening inside the data_primitives.EventList class

https://github.com/USRA-STI/gdt-core/blob/main/src/gdt/core/data_primitives.py#L505

This can be avoided by first creating an empty numpy array and then filling it with the TIME and PHA columns

self._events = np.empty(times.size, dtype=[('TIME', times.dtype.type),
                                  ('PHA', channels.dtype.type)])
self._events['TIME'] = times
self._events['PHA'] = channels

Filling the empty array takes ~0.06 sec on a 2019 Macbook Pro versus 1 sec for the list conversion. I'll work on a pull request with this suggested change.

healpix convolve does not preserve header information

return type(self).from_data(new_hpx, trigtime=self.trigtime, **kwargs)

Currently trigtime is the only property saved when returning a HealPix-derived class from the convolve function.

In practice, this means a user trying to convolve a systematic error with a GBM map will receive a map with header, scpos, and quaternion information stripped away. Is this the intended behavior? Should users be expected to re-apply that information to the returned map?

Add Bayesian Blocks binning functions for binned and unbinned data

A highly requested feature is the Bayesian Blocks binning algorithm implemented in the GDT.

This could be implemented using Astropy's bayesian_blocks method. Both binned and unbinned data would use fitness='event' and would accept the gamma and ncp_prior keywords.

For unbinned data, the GDT function would be little more than a wrapper around bayesian_blocks, since the input is an array of event times and the ouput is an array of bin edges.

For binned data, the GDT function is slightly more complex because the number of counts in each new bin must be calculated as well as the exposure of each bin.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.