b612-asteroid-institute / precovery Goto Github PK
View Code? Open in Web Editor NEWFast precovery of small body observations at scale
License: BSD 3-Clause "New" or "Revised" License
Fast precovery of small body observations at scale
License: BSD 3-Clause "New" or "Revised" License
Brute-force attribution of (458508) finds 10 observations within 1 arcsecond in observations dated between August 30th, 2013 and September 30th, 2013.
The precovery code finds 2 observations, only observation c4d.230825.30.319 agrees with the ones found by brute-force attribution.
import os
import pandas as pd
from astropy.time import Time
from precovery import precovery_db
from precovery.orbit import Orbit
from precovery.orbit import EpochTimescale
SYSTEM_DIR = "/epyc"
ORBIT_DIR = os.path.join(SYSTEM_DIR, "projects/thor/thor_results/nsc/precovery/458508/")
DB_DIR = os.path.join(SYSTEM_DIR, "projects/thor/thor_data/nsc/precovery/")
OBSERVATIONS_FILE = os.path.join(SYSTEM_DIR, "projects/thor/thor_data/nsc/preprocessed/nsc_dr2_observations_2013-08-30_2013-09-30.h5")
# Read database from directory
db = precovery_db.PrecoveryDatabase.from_dir(DB_DIR, create=True)
# Read MPC (truth) orbit
mpc_orbit = pd.read_csv(
os.path.join(ORBIT_DIR, "mpc_orbit_cartesian.csv"),
index_col=False
)
# Read results from brute-force attributions
orbit_attributions = pd.read_csv(
os.path.join(ORBIT_DIR, "attributions.csv"),
index_col=False
)
# Read observations (not necessary to reproduce example).
# observations = pd.read_hdf(OBSERVATIONS_FILE, key="data")
# Create precovery Orbit
# Epochs are defined in TDB, observations are defined in UTC. So lets convert the orbit's
# epoch to UTC so that when propagation and ephemeris generation occurs, the times scales match
# the observations.
epoch_tdb = Time(mpc_orbit["mjd_tdb"].values[0], scale="tdb", format="mjd")
epoch_utc = epoch_tdb.utc.mjd
orbit = Orbit.cartesian(
0,
mpc_orbit["x"].values[0],
mpc_orbit["y"].values[0],
mpc_orbit["z"].values[0],
mpc_orbit["vx"].values[0],
mpc_orbit["vy"].values[0],
mpc_orbit["vz"].values[0],
epoch_utc,
EpochTimescale.UTC,
20,
0.15
)
matches = [m for m in db.precover(orbit, tolerance=1/3600)]
obs_ids_precovered = [m.id for m in matches]
print(obs_ids_precovered)
print(orbit_attributions["obs_id"].values)
This outputs the following:
['c4d.230825.30.319', 'c4d.237746.22.451']
['c4d.230114.59.331' 'c4d.230825.30.319' 'c4d.231493.14.573'
'c4d.232341.24.965' 'c4d.232342.24.1141' 'c4d.232755.5.890'
'c4d.232756.5.1516' 'c4d.232759.5.1678' 'c4d.233134.54.352'
'c4d.233473.32.288']
Tracking metadata such as dataset name and other information like where documentation exists, what cutout service to use, etc.. will be useful as we add more datasets and for cutout link generation.
Once the table is implemented then generating cutout urls for each frame using the metadata should be pretty easy. For example:
cutout_service = "https://datalab.noirlab.edu/svc/cutout" # metadata table
dataset = "nsc_dr2" # metdata table
exposure_id = "c4d_130903_045146_ooi_g_ls9"
ra = 325.7731518779698
dec = 0.3930505122291065
length = 30 # arcsec (hard code for now)
width = 30 # arcsec (hard code for now)
preview = True
cutout_url = f"{cutout_service}?col={dataset}&siaRef={exposure_id}.fits.fz&extn=7&POS={ra:.10f},{dec:.10f}&SIZE={length/3600},{width/3600}&preview={preview}"
print(cutout_url)
Which prints:
https://datalab.noirlab.edu/svc/cutout?col=nsc_dr2&siaRef=c4d_130903_045146_ooi_g_ls9.fits.fz&extn=7&POS=325.7731518780,0.3930505122&SIZE=0.008333333333333333,0.008333333333333333&preview=True
Changes to the precovery code, particularly changes to the format and quantities that are indexed, may break using previously indexed observations. The code version used to index observations should be stored as metadata somewhere either in a file in the output directory or in the index.db. See #9 for more discussion.
Versions are currently manually tagged. Adding versioning with setuptools_scm will allow non-tagged versioning for edits between minor or major versions.
Subscribe to notifications on this issue to be updated about upgrades/changes/status of the ADAM::Precovery Service.
Also, fo feel free to open issues here for any ideas, questions, or bug reports!
A potential use case is also returning "negative observations": instances where a given trajectory intersects a healpix frame but no observations were found. This only makes sense to do when the Healpixels are a small enough size but it could be worthwhile adding as an option to the user.
Additional vetting of precovery observations may require looking at lightcurves which will mean needing to additionally store magnitudes and filter information.
PrecoveryCandidate HEALpixel IDs are currently set to the HEALpixel ID of the HealpixFrame. If we decide to re-index the observations to a higher nside
parameter then Healpixel IDs from previous versions will not match the new ones. The solution is to return Healpixel IDs with an nside
parameter significantly higher than the nside
parameters with which we would ever consider indexing the observations. At this point matching Healpixel IDs from a higher nside
parameter to a lower nside
parameter is a simple rshift.
For example,
import healpy as hp
k = 15
nside_15 = 2**k
nside_17 = 2**(k + 2)
id_15 = hp.ang2pix(nside_15, 0, 0, nest=True, lonlat=True)
id_17 = hp.ang2pix(nside_17, 0, 0, nest=True, lonlat=True)
print("id (k=15): ", id_15)
print("id (k=17): ", id_17)
print("id (k=17) >> 2 >> 2:", id_17 >> 2 >> 2)
Which returns:
id (k=15): 5100273664
id (k=17): 81604378624
id (k=17) >> 2 >> 2: 5100273664
HDF5 NSC files could contain an obscode, which would mean we don't need to derive them from the exposure ID. We should modify the HDF catalogs on Epyc to include obscode, and then change
precovery/precovery/sourcecatalog.py
Line 134 in 71bc140
If the window size doesn't affect the underlying data structure of the precovery binary files and the index database, then exposing the window_size parameter to the precovery function would be useful to allow tuning while running precovery searches.
Basically adding a keyword argument here: https://github.com/B612-Asteroid-Institute/precovery/blob/master/precovery/precovery_db.py#L78
On DiRAC's shared machine, a sample test orbit takes 95 seconds to return 27 matches in nearly 7 years of NSC observations (with profiling enabled). The profile indicates there are some low-hanging optimizations that can speed this time up:
Found 27 potential matches for orbit ID: 330784
38862790 function calls (38779094 primitive calls) in 95.102 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
412778 33.377 0.000 38.044 0.000 orbit.py:192(compute_ephemeris)
524266 5.920 0.000 6.474 0.000 spherical_geom.py:79(haversine_distance_deg)
1650539 5.709 0.000 5.709 0.000 {method 'reduce' of 'numpy.ufunc' objects}
1543349 5.030 0.000 5.030 0.000 {method 'fetchone' of 'sqlite3.Cursor' objects}
412591 3.191 0.000 7.129 0.000 pixelfunc.py:153(check_theta_valid)
412591 3.089 0.000 3.089 0.000 healpix_geom.py:17(radec_to_thetaphi)
825327 2.596 0.000 7.628 0.000 fromnumeric.py:69(_wrapreduction)
412591 2.419 0.000 22.506 0.000 pixelfunc.py:424(ang2pix)
412115 2.159 0.000 31.309 0.000 orbit.py:240(approximately_propagate)
503 2.046 0.004 90.738 0.180 precovery_db.py:146(_check_window)
412778 1.978 0.000 1.978 0.000 orbit.py:13(_ensure_pyoorb_initialized)
413897 1.680 0.000 1.680 0.000 {built-in method numpy.array}
1954135 1.552 0.000 10.135 0.000 frame_db.py:197(<genexpr>)
1543347 1.498 0.000 8.221 0.000 result.py:380(iterrows)
825182 1.341 0.000 3.690 0.000 {method 'all' of 'numpy.generic' objects}
825182 1.146 0.000 1.330 0.000 pixelfunc.py:1233(isnsideok)
825185 1.121 0.000 8.747 0.000 fromnumeric.py:2367(all)
412591 1.099 0.000 11.566 0.000 frame_db.py:168(propagation_targets)
825182 0.993 0.000 12.958 0.000 pixelfunc.py:1279(check_nside)
524266 0.954 0.000 7.771 0.000 frame_db.py:97(distance)
Functions where vectorization in third-party packages isn't fully utilized but could be implemented:
Note that these times were measured after creating a multi-level index on mjd
and healpixel
in the frames
table. Code to do this on data ingests needs to be added to this PR as well.
Timings on the newly indexed database (4 single indexes, 1 composite index):
'ix_frames_obscode', 'ix_frames_mjd', ix_frames_healpixel', 'sqlite_autoindex_datasets_1', 'fast_query'
Found 29 potential matches for orbit ID 330784 in 139.495s
Found 33 potential matches for orbit ID 376552 in 136.716s
Found 22 potential matches for orbit ID 458508 in 137.527s
Timings on the new database (1 composite index):
'fast_query'
Found 29 potential matches for orbit ID 330784 in 118.748s
Found 33 potential matches for orbit ID 376552 in 108.497s
Found 22 potential matches for orbit ID 458508 in 106.770s
Timings on the old database (1 composite index):
'fast_query'
Found 29 potential matches for orbit ID 330784 in 111.993s
Found 33 potential matches for orbit ID 376552 in 109.673s
Found 22 potential matches for orbit ID 458508 in 107.731s
This commit did not have the intended effect of creating a faster query. We need to add a fix that creates a composite index equivalent to:
import sqlite3 as sql
con = sql.connect(DB)
con.execute("""CREATE INDEX fast_query ON frames (healpixel, mjd, obscode)""")
con.close()
I was trying to run precovery search using the precovery function in precovery.main
, however the following error appeared. I think it would be useful to expose the allow_version_mismatch
argument in this function.
Exception: Version mismatch:
Running version: 0.2.dev112+g40f8bf5.d20230215
Database version: 0.2.dev108+gdbdc8b0
Use allow_version_mismatch=True to ignore this error.
I am working with @moeyensj on skyMapper.
We are working to support multiple different datasets, however, as it stands indexing multiple datasets places them all in the same set of binary files. Ideally, we'd want to be able to subsample the data so we can combine different dataset indexes or split them apart trivially.
The solution would be to separate each dataset's observations into their own subdirectory and then separate each binary file by calendar month. This way downsampling the full dataset becomes a simple table query.
Proposed schema:
$INDEX_DIR
- index.db
- data
- nsc_dr2
- 2012-10-01T00:00:00.000_2012-10-31T23:59:59.999.data
- 2012-11-01T00:00:00.000_2012-11-30T23:59:59.999.data
- ...
- sdss_dr9
- 2012-10-01T00:00:00.000_2012-10-31T23:59:59.999.data
- 2012-11-01T00:00:00.000_2012-11-30T23:59:59.999.data
- ...
I've modified a code snippet from Stack Overflow that can be used to divide into calendar months (this could be a good starting place for any implementation).
# Modified from: https://stackoverflow.com/questions/51293632/how-do-i-divide-a-date-range-into-months-in-python
import datetime
import numpy as np
from astropy.time import Time
begin = '2012-09-01'
end = '2020-01-01'
dt_start = datetime.datetime.strptime(begin, '%Y-%m-%d')
dt_end = datetime.datetime.strptime(end, '%Y-%m-%d')
one_day = datetime.timedelta(1)
start_dates = [dt_start]
end_dates = []
today = dt_start
while today <= dt_end:
tomorrow = today + one_day
if tomorrow.month != today.month:
start_dates.append(tomorrow)
# End Date is 1 millisecond before midnight of the first day of the next month
end_dates.append(today + datetime.timedelta(1 - 1/86400/1000))
today = tomorrow
end_dates.append(dt_end)
start_times = Time(start_dates)
end_times = Time(end_dates)
start_times = start_times[:-2]
end_times = end_times[:-2]
for i, j in zip(start_times.isot, end_times.isot):
print(i, j)
start_times
and end_times
are both astropy
Time objects that can then be used to filter and appropriate split the input observations.
The loop prints the following:
2012-09-01T00:00:00.000 2012-09-30T23:59:59.999
2012-10-01T00:00:00.000 2012-10-31T23:59:59.999
2012-11-01T00:00:00.000 2012-11-30T23:59:59.999
2012-12-01T00:00:00.000 2012-12-31T23:59:59.999
2013-01-01T00:00:00.000 2013-01-31T23:59:59.999
2013-02-01T00:00:00.000 2013-02-28T23:59:59.999
2013-03-01T00:00:00.000 2013-03-31T23:59:59.999
2013-04-01T00:00:00.000 2013-04-30T23:59:59.999
2013-05-01T00:00:00.000 2013-05-31T23:59:59.999
2013-06-01T00:00:00.000 2013-06-30T23:59:59.999
2013-07-01T00:00:00.000 2013-07-31T23:59:59.999
2013-08-01T00:00:00.000 2013-08-31T23:59:59.999
2013-09-01T00:00:00.000 2013-09-30T23:59:59.999
2013-10-01T00:00:00.000 2013-10-31T23:59:59.999
2013-11-01T00:00:00.000 2013-11-30T23:59:59.999
2013-12-01T00:00:00.000 2013-12-31T23:59:59.999
2014-01-01T00:00:00.000 2014-01-31T23:59:59.999
2014-02-01T00:00:00.000 2014-02-28T23:59:59.999
2014-03-01T00:00:00.000 2014-03-31T23:59:59.999
2014-04-01T00:00:00.000 2014-04-30T23:59:59.999
2014-05-01T00:00:00.000 2014-05-31T23:59:59.999
2014-06-01T00:00:00.000 2014-06-30T23:59:59.999
2014-07-01T00:00:00.000 2014-07-31T23:59:59.999
2014-08-01T00:00:00.000 2014-08-31T23:59:59.999
...
Add exposure duration or time to frames, and add FWHM if available as well.
Each individual observation should support having any observation time within the range defined by the start of exposure + exposure duration. This is particularly important for the LSST use case where each individual DIASource from a single visit will have a distinct observation time (this is to account for the motion of the shutter as it opens and closes -- pretty cool!).
Returning the predicted RA and Dec and other useful observables as part of the PrecoveryCandidate dataclass will be useful for generating cutouts/postage stamps since those are queried on predicted positions.
Add conda recipe and publish the precovery package to the asteroid-institute conda channel.
Adding uncertainties to the index observations will allow OD to be performed using precovery
outputs rather than needing to return to the source observation files.
Additional useful quantities to include in the PrecoveryCandidate dataclass are the sky-plane residuals and the great circle distance.
Right now, "Email Me Candidate Cutouts" is checked by default. That keeps the 'submit' field greyed out until an e-mail is entered. An impatient/unobservant user (== yours truly) won't notice that and will get confused on why the 'Submit' field can't be clicked.
Proposed resolution: just make the "Email me candidate cutouts" checkbox off by default.
Related issue:
After typing in the e-mail, the 'Submit' button remains disabled until the focus moves away from it. This will be confusing to the user.
We've had issues in the past where packages available on pip and conda might differ (in version or name) and so testing both use cases where a user may install dependencies from either seems worthwhile to do.
In ZTF database there were no exposure ids so we turned that column into a empty column of strings. This allowed the precovery to index the hdf5 file but caused other issues when running the precovery search. When we looked for a specific asteroid all the "mjd_utc" fell on the same night. We need a solution for the datasets that don't have exposure ids.
df5["exposure_id"] = df5["mjd_utc"].apply(lambda x:str(x))
df5.to_hdf('ztf_observations_610_624.h5', key = 'data', mode='w', format='table', encoding = 'utf-8')
This is what we ended up doing so that "exposure_id" could have values based on "mjd.utc"
In that case, the orbit would be pulled from (e.g.) JPL.
Let's add a ~paragraph describing at a high level what the library does.
I have run into issues installing the requirements using conda/pip. I initially tried installing with pip
using requirements.txt
and dev-requirements.txt
per the README:
conda create -n precovery python=3.10
conda activate precovery
# python3.10 -m venv virtualenv3.10 # ignore since I installed python3.10 with conda
# source virtualenv3.10/bin/activate # ignore since I made a conda env
pip install pip --upgrade
pip install -r requirements.txt -r dev-requirements.txt
pre-commit install
gave errors saying that llvmlite==0.36.0
and numba==0.53.1
were not available, making the fixed versions impossible to install.
I think installed using the requirements.in
and dev-requirements.in
files, where only numpy
and numba
are versioned and continued to receive an error since numba==0.53.1
is defined in requirements.in
. I replaced numba==0.53.1
with numba==0.55.1
which led to a solve. Finally, pip installed numpy
version 1.22
but numba
threw errors on execution claiming in needs numpy<=1.21
. I solved this with:
pip install numpy==1.21
And with that the precovery
code ran.
Additionally, I needed to install pyoorb
:
conda install openorb
I can make updates to the README and a pull request at a later time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.