Giter VIP home page Giter VIP logo

Comments (3)

wmwv avatar wmwv commented on September 17, 2024

I think @yymao summarized the key issue well in the SRC thread:
"""
how we can minimally change the function so that it accepts both a GCRCatalogs iterator and a pandas/dask dataframe. The test would supposedly need very different code to be applied to these two cases. And I think that's the main issue we are facing.
"""

from descqa.

wmwv avatar wmwv commented on September 17, 2024

I've pulled out this specific example from descqa.basic_tests.SkyArea to help think through this. This is a simple function that takes all of the RA, Dec values, maps them to healpixels and then keeps the set of filled pixels. The code goes through the data in chunks using a GCRCatalogs get_quantities iterator.

    def calc_healpix_set(self, catalog_instance, nside, ra_col, dec_col):
        """
        Calculating the healpixel for all of the data is the I/O intensive step.
        so we separate out this here into its own function.
        """
        pixels = set()
        for d in catalog_instance.get_quantities([ra_col, dec_col], return_iterator=True):
            pixels.update(hp.ang2pix(nside, d[ra_col], d[dec_col], lonlat=True))

        return pixels

In my branch I broke this out a bit into its own (notionally free) function to isolate the core details.

The rest of the run_on_single_catalog function takes the pixels and counts the fraction area covered and plots a map. These are operations done on the aggregated set from the above and so aren't central to the question of how to access the data.

from descqa.

wmwv avatar wmwv commented on September 17, 2024

This makes me think of a scatter-gather pattern, but the scatter part (going through each chunk and identifying the unique healpixel numbers) is done serially.

I'm starting to wonder how the question of how to access data in chunks relates to the question of how to process data in parallel.

from descqa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.