b612-asteroid-institute / precovery Goto Github PK

View Code? Open in Web Editor NEW

3.0 0.0 2.0 1.83 MB

Fast precovery of small body observations at scale

License: BSD 3-Clause "New" or "Revised" License

Python 98.05% Shell 0.45% Dockerfile 1.50%

asteroids comets python solar-system

precovery's People

Contributors

Stargazers

Forkers

stevenstetzler 10ay

precovery's Issues

Precovery of (458508)

Brute-force attribution of (458508) finds 10 observations within 1 arcsecond in observations dated between August 30th, 2013 and September 30th, 2013.

The precovery code finds 2 observations, only observation c4d.230825.30.319 agrees with the ones found by brute-force attribution.

import os
import pandas as pd 
from astropy.time import Time

from precovery import precovery_db
from precovery.orbit import Orbit
from precovery.orbit import EpochTimescale

SYSTEM_DIR = "/epyc"
ORBIT_DIR = os.path.join(SYSTEM_DIR, "projects/thor/thor_results/nsc/precovery/458508/")
DB_DIR = os.path.join(SYSTEM_DIR, "projects/thor/thor_data/nsc/precovery/")
OBSERVATIONS_FILE =  os.path.join(SYSTEM_DIR, "projects/thor/thor_data/nsc/preprocessed/nsc_dr2_observations_2013-08-30_2013-09-30.h5")

# Read database from directory
db = precovery_db.PrecoveryDatabase.from_dir(DB_DIR, create=True)

# Read MPC (truth) orbit
mpc_orbit = pd.read_csv(
    os.path.join(ORBIT_DIR, "mpc_orbit_cartesian.csv"),
    index_col=False
)
# Read results from brute-force attributions
orbit_attributions = pd.read_csv(
    os.path.join(ORBIT_DIR, "attributions.csv"),
    index_col=False
)
# Read observations (not necessary to reproduce example). 
# observations = pd.read_hdf(OBSERVATIONS_FILE, key="data")

# Create precovery Orbit
# Epochs are defined in TDB, observations are defined in UTC. So lets convert the orbit's 
# epoch to UTC so that when propagation and ephemeris generation occurs, the times scales match
# the observations.
epoch_tdb = Time(mpc_orbit["mjd_tdb"].values[0], scale="tdb", format="mjd")
epoch_utc = epoch_tdb.utc.mjd
orbit = Orbit.cartesian(
    0,
    mpc_orbit["x"].values[0],
    mpc_orbit["y"].values[0],
    mpc_orbit["z"].values[0],
    mpc_orbit["vx"].values[0],
    mpc_orbit["vy"].values[0],
    mpc_orbit["vz"].values[0],
    epoch_utc,
    EpochTimescale.UTC,
    20,
    0.15
)

matches = [m for m in db.precover(orbit, tolerance=1/3600)]
obs_ids_precovered = [m.id for m in matches]
print(obs_ids_precovered)

print(orbit_attributions["obs_id"].values)

This outputs the following:

['c4d.230825.30.319', 'c4d.237746.22.451']
['c4d.230114.59.331' 'c4d.230825.30.319' 'c4d.231493.14.573'
 'c4d.232341.24.965' 'c4d.232342.24.1141' 'c4d.232755.5.890'
 'c4d.232756.5.1516' 'c4d.232759.5.1678' 'c4d.233134.54.352'
 'c4d.233473.32.288']

Add dataset metadata to index.db, add cutout service urls to FrameCandidate and PrecoveryCandidate

Tracking metadata such as dataset name and other information like where documentation exists, what cutout service to use, etc.. will be useful as we add more datasets and for cutout link generation.

Once the table is implemented then generating cutout urls for each frame using the metadata should be pretty easy. For example:

cutout_service = "https://datalab.noirlab.edu/svc/cutout" # metadata table
dataset = "nsc_dr2" # metdata table
exposure_id = "c4d_130903_045146_ooi_g_ls9"
ra = 325.7731518779698
dec = 0.3930505122291065
length = 30 # arcsec (hard code for now)
width = 30 # arcsec (hard code for now)
preview = True

cutout_url = f"{cutout_service}?col={dataset}&siaRef={exposure_id}.fits.fz&extn=7&POS={ra:.10f},{dec:.10f}&SIZE={length/3600},{width/3600}&preview={preview}"
print(cutout_url)

Which prints:

https://datalab.noirlab.edu/svc/cutout?col=nsc_dr2&siaRef=c4d_130903_045146_ooi_g_ls9.fits.fz&extn=7&POS=325.7731518780,0.3930505122&SIZE=0.008333333333333333,0.008333333333333333&preview=True

Store precovery code version as metadata

Changes to the precovery code, particularly changes to the format and quantities that are indexed, may break using previously indexed observations. The code version used to index observations should be stored as metadata somewhere either in a file in the output directory or in the index.db. See #9 for more discussion.

Add automatic versioning

Versions are currently manually tagged. Adding versioning with setuptools_scm will allow non-tagged versioning for edits between minor or major versions.

ADAM::Precovery Announcements

Subscribe to notifications on this issue to be updated about upgrades/changes/status of the ADAM::Precovery Service.

Also, fo feel free to open issues here for any ideas, questions, or bug reports!

Add FrameCandidate

A potential use case is also returning "negative observations": instances where a given trajectory intersects a healpix frame but no observations were found. This only makes sense to do when the Healpixels are a small enough size but it could be worthwhile adding as an option to the user.

Add magnitudes, magnitude uncertainties, and filter values to indexed observations

Additional vetting of precovery observations may require looking at lightcurves which will mean needing to additionally store magnitudes and filter information.

Add higher nside HEALpixel IDs to PrecoveryCandidate

PrecoveryCandidate HEALpixel IDs are currently set to the HEALpixel ID of the HealpixFrame. If we decide to re-index the observations to a higher nside parameter then Healpixel IDs from previous versions will not match the new ones. The solution is to return Healpixel IDs with an nside parameter significantly higher than the nside parameters with which we would ever consider indexing the observations. At this point matching Healpixel IDs from a higher nside parameter to a lower nside parameter is a simple rshift.

For example,

import healpy as hp

k = 15
nside_15 = 2**k
nside_17 = 2**(k + 2)

id_15 = hp.ang2pix(nside_15, 0, 0, nest=True, lonlat=True)
id_17 = hp.ang2pix(nside_17, 0, 0, nest=True, lonlat=True)

print("id (k=15): ", id_15)
print("id (k=17): ", id_17)
print("id (k=17) >> 2 >> 2:", id_17 >> 2 >> 2)

Which returns:

id (k=15):  5100273664
id (k=17):  81604378624
id (k=17) >> 2 >> 2: 5100273664

Load obscodes from hdf5 source files

HDF5 NSC files could contain an obscode, which would mean we don't need to derive them from the exposure ID. We should modify the HDF catalogs on Epyc to include obscode, and then change

precovery/precovery/sourcecatalog.py

Line 134 in 71bc140

obscode = _obscode_from_exposure_id(exposure_id)

to use the new column.

Make window_size a search parameter

If the window size doesn't affect the underlying data structure of the precovery binary files and the index database, then exposing the window_size parameter to the precovery function would be useful to allow tuning while running precovery searches.

Basically adding a keyword argument here: https://github.com/B612-Asteroid-Institute/precovery/blob/master/precovery/precovery_db.py#L78

Profiling

On DiRAC's shared machine, a sample test orbit takes 95 seconds to return 27 matches in nearly 7 years of NSC observations (with profiling enabled). The profile indicates there are some low-hanging optimizations that can speed this time up:

Found 27 potential matches for orbit ID: 330784
         38862790 function calls (38779094 primitive calls) in 95.102 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   412778   33.377    0.000   38.044    0.000 orbit.py:192(compute_ephemeris)
   524266    5.920    0.000    6.474    0.000 spherical_geom.py:79(haversine_distance_deg)
  1650539    5.709    0.000    5.709    0.000 {method 'reduce' of 'numpy.ufunc' objects}
  1543349    5.030    0.000    5.030    0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
   412591    3.191    0.000    7.129    0.000 pixelfunc.py:153(check_theta_valid)
   412591    3.089    0.000    3.089    0.000 healpix_geom.py:17(radec_to_thetaphi)
   825327    2.596    0.000    7.628    0.000 fromnumeric.py:69(_wrapreduction)
   412591    2.419    0.000   22.506    0.000 pixelfunc.py:424(ang2pix)
   412115    2.159    0.000   31.309    0.000 orbit.py:240(approximately_propagate)
      503    2.046    0.004   90.738    0.180 precovery_db.py:146(_check_window)
   412778    1.978    0.000    1.978    0.000 orbit.py:13(_ensure_pyoorb_initialized)
   413897    1.680    0.000    1.680    0.000 {built-in method numpy.array}
  1954135    1.552    0.000   10.135    0.000 frame_db.py:197(<genexpr>)
  1543347    1.498    0.000    8.221    0.000 result.py:380(iterrows)
   825182    1.341    0.000    3.690    0.000 {method 'all' of 'numpy.generic' objects}
   825182    1.146    0.000    1.330    0.000 pixelfunc.py:1233(isnsideok)
   825185    1.121    0.000    8.747    0.000 fromnumeric.py:2367(all)
   412591    1.099    0.000   11.566    0.000 frame_db.py:168(propagation_targets)
   825182    0.993    0.000   12.958    0.000 pixelfunc.py:1279(check_nside)
   524266    0.954    0.000    7.771    0.000 frame_db.py:97(distance)

Functions where vectorization in third-party packages isn't fully utilized but could be implemented:

Note that these times were measured after creating a multi-level index on mjd and healpixel in the frames table. Code to do this on data ingests needs to be added to this PR as well.

Fix (remove) single column indices and explicitly build composite index on (healpixel, mjd, obscode)

Timings on the newly indexed database (4 single indexes, 1 composite index):
'ix_frames_obscode', 'ix_frames_mjd', ix_frames_healpixel', 'sqlite_autoindex_datasets_1', 'fast_query'

Found 29 potential matches for orbit ID 330784 in 139.495s
Found 33 potential matches for orbit ID 376552 in 136.716s
Found 22 potential matches for orbit ID 458508 in 137.527s

Timings on the new database (1 composite index):
'fast_query'

Found 29 potential matches for orbit ID 330784 in 118.748s
Found 33 potential matches for orbit ID 376552 in 108.497s
Found 22 potential matches for orbit ID 458508 in 106.770s

Timings on the old database (1 composite index):
'fast_query'

Found 29 potential matches for orbit ID 330784 in 111.993s
Found 33 potential matches for orbit ID 376552 in 109.673s
Found 22 potential matches for orbit ID 458508 in 107.731s

This commit did not have the intended effect of creating a faster query. We need to add a fix that creates a composite index equivalent to:

import sqlite3 as sql
con = sql.connect(DB)
con.execute("""CREATE INDEX fast_query ON frames (healpixel, mjd, obscode)""")
con.close()

Expose allow_version_mismatch in `precovery.main.precovery`

I was trying to run precovery search using the precovery function in precovery.main, however the following error appeared. I think it would be useful to expose the allow_version_mismatch argument in this function.

Exception: Version mismatch: 
Running version: 0.2.dev112+g40f8bf5.d20230215
Database version: 0.2.dev108+gdbdc8b0
Use allow_version_mismatch=True to ignore this error.

I am working with @moeyensj on skyMapper.

Separate indexed datafiles into subdirectories of dataset and calendar months

We are working to support multiple different datasets, however, as it stands indexing multiple datasets places them all in the same set of binary files. Ideally, we'd want to be able to subsample the data so we can combine different dataset indexes or split them apart trivially.

The solution would be to separate each dataset's observations into their own subdirectory and then separate each binary file by calendar month. This way downsampling the full dataset becomes a simple table query.

Proposed schema:

$INDEX_DIR
- index.db
- data
  - nsc_dr2
    - 2012-10-01T00:00:00.000_2012-10-31T23:59:59.999.data
    - 2012-11-01T00:00:00.000_2012-11-30T23:59:59.999.data
    - ...
  - sdss_dr9
    - 2012-10-01T00:00:00.000_2012-10-31T23:59:59.999.data
    - 2012-11-01T00:00:00.000_2012-11-30T23:59:59.999.data
    - ...

I've modified a code snippet from Stack Overflow that can be used to divide into calendar months (this could be a good starting place for any implementation).

# Modified from: https://stackoverflow.com/questions/51293632/how-do-i-divide-a-date-range-into-months-in-python
import datetime
import numpy as np
from astropy.time import Time

begin = '2012-09-01'
end = '2020-01-01'

dt_start = datetime.datetime.strptime(begin, '%Y-%m-%d')
dt_end = datetime.datetime.strptime(end, '%Y-%m-%d')
one_day = datetime.timedelta(1)
start_dates = [dt_start]
end_dates = []
today = dt_start
while today <= dt_end:
    tomorrow = today + one_day
    if tomorrow.month != today.month:
        start_dates.append(tomorrow)
        # End Date is 1 millisecond before midnight of the first day of the next month
        end_dates.append(today + datetime.timedelta(1 - 1/86400/1000))
    today = tomorrow

end_dates.append(dt_end)

start_times = Time(start_dates)
end_times = Time(end_dates)
start_times = start_times[:-2]
end_times = end_times[:-2]

for i, j in zip(start_times.isot, end_times.isot):
    print(i, j)

start_times and end_times are both astropy Time objects that can then be used to filter and appropriate split the input observations.
The loop prints the following:

2012-09-01T00:00:00.000 2012-09-30T23:59:59.999
2012-10-01T00:00:00.000 2012-10-31T23:59:59.999
2012-11-01T00:00:00.000 2012-11-30T23:59:59.999
2012-12-01T00:00:00.000 2012-12-31T23:59:59.999
2013-01-01T00:00:00.000 2013-01-31T23:59:59.999
2013-02-01T00:00:00.000 2013-02-28T23:59:59.999
2013-03-01T00:00:00.000 2013-03-31T23:59:59.999
2013-04-01T00:00:00.000 2013-04-30T23:59:59.999
2013-05-01T00:00:00.000 2013-05-31T23:59:59.999
2013-06-01T00:00:00.000 2013-06-30T23:59:59.999
2013-07-01T00:00:00.000 2013-07-31T23:59:59.999
2013-08-01T00:00:00.000 2013-08-31T23:59:59.999
2013-09-01T00:00:00.000 2013-09-30T23:59:59.999
2013-10-01T00:00:00.000 2013-10-31T23:59:59.999
2013-11-01T00:00:00.000 2013-11-30T23:59:59.999
2013-12-01T00:00:00.000 2013-12-31T23:59:59.999
2014-01-01T00:00:00.000 2014-01-31T23:59:59.999
2014-02-01T00:00:00.000 2014-02-28T23:59:59.999
2014-03-01T00:00:00.000 2014-03-31T23:59:59.999
2014-04-01T00:00:00.000 2014-04-30T23:59:59.999
2014-05-01T00:00:00.000 2014-05-31T23:59:59.999
2014-06-01T00:00:00.000 2014-06-30T23:59:59.999
2014-07-01T00:00:00.000 2014-07-31T23:59:59.999
2014-08-01T00:00:00.000 2014-08-31T23:59:59.999
...

Add exposure duration (time), FWHM, and individual detection MJDs

Add exposure duration or time to frames, and add FWHM if available as well.

Each individual observation should support having any observation time within the range defined by the start of exposure + exposure duration. This is particularly important for the LSST use case where each individual DIASource from a single visit will have a distinct observation time (this is to account for the motion of the shutter as it opens and closes -- pretty cool!).

Add predicted ephemeris to PrecoveryCandidate

Returning the predicted RA and Dec and other useful observables as part of the PrecoveryCandidate dataclass will be useful for generating cutouts/postage stamps since those are queried on predicted positions.

Make precovery conda installable

Add conda recipe and publish the precovery package to the asteroid-institute conda channel.

Add ra_sigma, dec_sigma to observations

Adding uncertainties to the index observations will allow OD to be performed using precovery outputs rather than needing to return to the source observation files.

Add dra, ddec, and distance to PrecoveryCandidate

Additional useful quantities to include in the PrecoveryCandidate dataclass are the sky-plane residuals and the great circle distance.

Disable the banner

Now that the "NYT wave" has passed & the page is effectively open, let's disable the warning banner.

Make "E-mail me cutouts" off by default

Right now, "Email Me Candidate Cutouts" is checked by default. That keeps the 'submit' field greyed out until an e-mail is entered. An impatient/unobservant user (== yours truly) won't notice that and will get confused on why the 'Submit' field can't be clicked.

Proposed resolution: just make the "Email me candidate cutouts" checkbox off by default.

Related issue:

After typing in the e-mail, the 'Submit' button remains disabled until the focus moves away from it. This will be confusing to the user.

Re-add CI tests for building and testing against pip- and conda-based installs

We've had issues in the past where packages available on pip and conda might differ (in version or name) and so testing both use cases where a user may install dependencies from either seems worthwhile to do.

Handle case when database doesn't have exposure_id

In ZTF database there were no exposure ids so we turned that column into a empty column of strings. This allowed the precovery to index the hdf5 file but caused other issues when running the precovery search. When we looked for a specific asteroid all the "mjd_utc" fell on the same night. We need a solution for the datasets that don't have exposure ids.

df5["exposure_id"] = df5["mjd_utc"].apply(lambda x:str(x))
df5.to_hdf('ztf_observations_610_624.h5', key = 'data', mode='w', format='table', encoding = 'utf-8')

This is what we ended up doing so that "exposure_id" could have values based on "mjd.utc"

Allow an asteroid to be specified by desgination/number

In that case, the orbit would be pulled from (e.g.) JPL.

Expand the README.md-level documentation

Let's add a ~paragraph describing at a high level what the library does.

Issues intalling with pip/conda

I have run into issues installing the requirements using conda/pip. I initially tried installing with pip using requirements.txt and dev-requirements.txt per the README:

conda create -n precovery python=3.10
conda activate precovery
# python3.10 -m venv virtualenv3.10 # ignore since I installed python3.10 with conda
# source virtualenv3.10/bin/activate # ignore since I made a conda env
pip install pip --upgrade
pip install -r requirements.txt -r dev-requirements.txt
pre-commit install

gave errors saying that llvmlite==0.36.0 and numba==0.53.1 were not available, making the fixed versions impossible to install.

I think installed using the requirements.in and dev-requirements.in files, where only numpy and numba are versioned and continued to receive an error since numba==0.53.1 is defined in requirements.in. I replaced numba==0.53.1 with numba==0.55.1 which led to a solve. Finally, pip installed numpy version 1.22 but numba threw errors on execution claiming in needs numpy<=1.21. I solved this with:

pip install numpy==1.21

And with that the precovery code ran.

Additionally, I needed to install pyoorb:

conda install openorb

I can make updates to the README and a pull request at a later time.

b612-asteroid-institute / precovery Goto Github PK

precovery's People

Contributors

Stargazers

Forkers

precovery's Issues

Recommend Projects

Recommend Topics

Recommend Org