Comments (4)
Copying over relevant discussion w/ @spencerkclark in #90, wherein we have started in on a DataLoader
module that handles loading data from disk. Importantly, DataLoader
will be glued on at the Run
level rather than to each Calc
: each Run
will have a DataLoader
object as an attribute, and whenever data is to be loaded from disk corresponding to that Run
, the DataLoader
will be used. This is a departure from the CalcDataGetter
and DataLoader
steps described above, although we may still end up using the former.
Will we need to implement the GFDL and one_dir mappings before we can meaningfully attempt plugging this into Calc?
The short answer is yes. Unfortunately it may take a fair amount more, since as it stands now, Calc is quite convoluted (and not really atomic enough to simply plug this logic in on its own). I'll start a list of a few things here, but there are probably still other things we'll need to consider:
- _add_grid_attributes in Calc currently does what rename_grid_attrs and set_grid_attrs_as_coords do in DataLoader and more. It also compares grid attributes to those of the associated Model (how do we want to maintain this behavior moving forward?, see discussion in #14).
- The effects of dt_from_time_bounds and _get_dt are both now implemented in the timedate module. I don't think we should require Calc to have a dt attribute anymore. It should just be carried with the DataArrays where needed as a coordinate. If the DataArray does not have an 'average_DT' attribute, then one can just take a time mean without weighting by dt (thus there is no reason to generate an array of ones for instantaneous data, as is done now).
- In DataLoader.load_variable I subset things immediately in time to isolate the full time series of the data of interest (before loading it into memory). This and further time subsetting of data (i.e. the selection of a series of months for seasonal averages) is currently handled in the _to_desired_dates function.
- Tracing all the way back to a main-like script, what is the minimum set of parameters needed to identify a given file set for any DataLoader? How should we specify those parameters when submitting a computation? In some ways this traces back to your comment on _generate_file_set above. I have some ideas, but it's probably best we discuss those verbally on Wednesday.
Feel free to list any other concerns of your own that come to mind!
from aospy.
The effects of dt_from_time_bounds and _get_dt are both now implemented in the timedate module. I don't think we should require Calc to have a dt attribute anymore. It should just be carried with the DataArrays where needed as a coordinate. If the DataArray does not have an 'average_DT' attribute, then one can just take a time mean without weighting by dt (thus there is no reason to generate an array of ones for instantaneous data, as is done now).
I mostly agree. One potential use-case for a Calc-level dt
would be if a Calc requiring multiple variables received data with differing time-spacing (e.g. one variable was daily and the other 6 hourly, and the desired output was daily). We have not come across this yet in practice yet, and I think it would require a fair amount of other work, so maybe it's not worth worrying about for now.
from aospy.
Tracing all the way back to a main-like script, what is the minimum set of parameters needed to identify a given file set for any DataLoader? How should we specify those parameters when submitting a computation? In some ways this traces back to your comment on _generate_file_set above. I have some ideas, but it's probably best we discuss those verbally on Wednesday.
Lets continue the conversation on this in #32, wherein I just copied this comment, since it seems most applicable to that Issue.
from aospy.
@spencerkclark and I have decided that the best way forward is to create separate PRs for each new module/coherent piece related to this effort, without actually re-writing the calc.py
code in any of those PRs. Then, once we have sufficiently enough of the functionality modularized across these PRs, we can create one or more PRs that actually merge the new structure into Calc
.
from aospy.
Related Issues (20)
- Error when passing non-default date ranges to `calc_suite_specs` HOT 6
- BOUNDS_STR and TIME_BOUNDS_STR HOT 14
- Daily output data HOT 1
- Perform calculations on subset of levels HOT 2
- recalculate HOT 1
- Calculations on only one variable HOT 3
- Towards v0.3 release HOT 15
- Use `stable` and `master` as our two main branches, rather than `master` and `develop` HOT 1
- Add support for zarr HOT 5
- Improve logging when calcs submitted in parallel
- Potentially use intake for describing/finding data on disk (i.e. what DataLoaders do)
- Failing tests in CI, but for some builds still come back as green
- New failure in test_apply_time_offset HOT 5
- Need to fix warnings due to recent updates to dependencies
- Move CI over to Azure pipelines HOT 1
- Use "black" tool for enforcing style
- Infer surface area from lat + lon if none provided. HOT 1
- Should we move aospy repository to its own 'aospy' organization? HOT 2
- Python errors in docs builds HOT 1
- YAML-based specification of aospy objects HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aospy.