Comments (3)
I think @yymao summarized the key issue well in the SRC thread:
"""
how we can minimally change the function so that it accepts both a GCRCatalogs iterator and a pandas/dask dataframe. The test would supposedly need very different code to be applied to these two cases. And I think that's the main issue we are facing.
"""
from descqa.
I've pulled out this specific example from descqa.basic_tests.SkyArea
to help think through this. This is a simple function that takes all of the RA, Dec values, maps them to healpixels and then keeps the set
of filled pixels. The code goes through the data in chunks using a GCRCatalogs get_quantities
iterator.
def calc_healpix_set(self, catalog_instance, nside, ra_col, dec_col):
"""
Calculating the healpixel for all of the data is the I/O intensive step.
so we separate out this here into its own function.
"""
pixels = set()
for d in catalog_instance.get_quantities([ra_col, dec_col], return_iterator=True):
pixels.update(hp.ang2pix(nside, d[ra_col], d[dec_col], lonlat=True))
return pixels
In my branch I broke this out a bit into its own (notionally free) function to isolate the core details.
The rest of the run_on_single_catalog
function takes the pixels and counts the fraction area covered and plots a map. These are operations done on the aggregated set from the above and so aren't central to the question of how to access the data.
from descqa.
This makes me think of a scatter-gather pattern, but the scatter part (going through each chunk and identifying the unique healpixel numbers) is done serially.
I'm starting to wonder how the question of how to access data in chunks relates to the question of how to process data in parallel.
from descqa.
Related Issues (20)
- Tree ring test
- validate instance catalogs to filter out offending AGN HOT 3
- number counts test updates HOT 6
- Update DESCQA web app's landing page HOT 4
- Ability to "tag" a run after the run is complete
- `sklearn.cluster.k_means` not working when `n_jobs=-1` is set HOT 1
- README is pointing to jupyter-dev
- Segmentation fault running some correlation function tests HOT 2
- Shear Test fails due to OSError: libgfortran.so.3: cannot open shared object file: No such file or directory HOT 7
- shear_test fails due to camb attribute error
- Move from project/projecta to cfs
- Python Environment Name Change stack => desc HOT 1
- Need to update versions of gsl and cray-fftw in run_master.sh HOT 2
- Set HDF5_USE_FILE_LOCKING=FALSE in run_master.sh
- Make DESCQA compatible with new desc-python environment HOT 6
- Is there a way to pass `external_data_dir` as an argument? HOT 5
- Revert to using the desc-python env at NERSC
- Versioning of releases and plans any plans for an updated release? HOT 1
- DeltaSigma test failures HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from descqa.