Giter VIP home page Giter VIP logo

Comments (4)

spencerahill avatar spencerahill commented on June 13, 2024

Copying over relevant discussion w/ @spencerkclark in #90, wherein we have started in on a DataLoader module that handles loading data from disk. Importantly, DataLoader will be glued on at the Run level rather than to each Calc: each Run will have a DataLoader object as an attribute, and whenever data is to be loaded from disk corresponding to that Run, the DataLoader will be used. This is a departure from the CalcDataGetter and DataLoader steps described above, although we may still end up using the former.

Will we need to implement the GFDL and one_dir mappings before we can meaningfully attempt plugging this into Calc?

The short answer is yes. Unfortunately it may take a fair amount more, since as it stands now, Calc is quite convoluted (and not really atomic enough to simply plug this logic in on its own). I'll start a list of a few things here, but there are probably still other things we'll need to consider:

  • _add_grid_attributes in Calc currently does what rename_grid_attrs and set_grid_attrs_as_coords do in DataLoader and more. It also compares grid attributes to those of the associated Model (how do we want to maintain this behavior moving forward?, see discussion in #14).
  • The effects of dt_from_time_bounds and _get_dt are both now implemented in the timedate module. I don't think we should require Calc to have a dt attribute anymore. It should just be carried with the DataArrays where needed as a coordinate. If the DataArray does not have an 'average_DT' attribute, then one can just take a time mean without weighting by dt (thus there is no reason to generate an array of ones for instantaneous data, as is done now).
  • In DataLoader.load_variable I subset things immediately in time to isolate the full time series of the data of interest (before loading it into memory). This and further time subsetting of data (i.e. the selection of a series of months for seasonal averages) is currently handled in the _to_desired_dates function.
  • Tracing all the way back to a main-like script, what is the minimum set of parameters needed to identify a given file set for any DataLoader? How should we specify those parameters when submitting a computation? In some ways this traces back to your comment on _generate_file_set above. I have some ideas, but it's probably best we discuss those verbally on Wednesday.

Feel free to list any other concerns of your own that come to mind!

from aospy.

spencerahill avatar spencerahill commented on June 13, 2024

The effects of dt_from_time_bounds and _get_dt are both now implemented in the timedate module. I don't think we should require Calc to have a dt attribute anymore. It should just be carried with the DataArrays where needed as a coordinate. If the DataArray does not have an 'average_DT' attribute, then one can just take a time mean without weighting by dt (thus there is no reason to generate an array of ones for instantaneous data, as is done now).

I mostly agree. One potential use-case for a Calc-level dt would be if a Calc requiring multiple variables received data with differing time-spacing (e.g. one variable was daily and the other 6 hourly, and the desired output was daily). We have not come across this yet in practice yet, and I think it would require a fair amount of other work, so maybe it's not worth worrying about for now.

from aospy.

spencerahill avatar spencerahill commented on June 13, 2024

Tracing all the way back to a main-like script, what is the minimum set of parameters needed to identify a given file set for any DataLoader? How should we specify those parameters when submitting a computation? In some ways this traces back to your comment on _generate_file_set above. I have some ideas, but it's probably best we discuss those verbally on Wednesday.

Lets continue the conversation on this in #32, wherein I just copied this comment, since it seems most applicable to that Issue.

from aospy.

spencerahill avatar spencerahill commented on June 13, 2024

@spencerkclark and I have decided that the best way forward is to create separate PRs for each new module/coherent piece related to this effort, without actually re-writing the calc.py code in any of those PRs. Then, once we have sufficiently enough of the functionality modularized across these PRs, we can create one or more PRs that actually merge the new structure into Calc.

from aospy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.