Comments (2)
Thanks for opening this issue @NickGeneva. It will take me some time to parse. Starting with "The current"
The Current
Presently data is exchanged between parts of earth2mip via numpy/pytorch tensors. This data is physical, thus some meta data needed to describe what exactly this data is representing. We currently do this through a set of properties assigned to each of these components which return either python primitives or alternatively some object (grid schema).
Perhaps would be helpful to give some examples of this existing objects.
This means that to communicate these required properties we need package wide concepts such as geooperator / timestep / timestepper. Additionally we need global schemas for these coordinate systems in some cases (although we have been slowing moving away from these).
Let'd define "schema".
The earth2mip.grid.LatLonGrid objects are no longer earth2mip.schema hardcoding 721x1440 etc. They are more flexible and encompass any lat/lon grid. Do we have any concrete use cases that do not use lat/lon grids? If so, we can add a earth2mip.grid.Unstructured
.
For the most part in the bounds of this package this has worked, granted the natural coupling between the package wide interfaces has caused the addition of new models to be a little painful and sometime more challenging to debug but documentation can likely fix that.
What code objects are the trouble?
Can you provide specific examples of trouble adding new models? I feel a lot of the trouble came from before the more recent batch of APIs (DataSource, TimeLoop) were formalized and when we still used enum objects in earth2mip.schema
for the grid and channels. Another difficulty was using arth2mip.networks.Inference
for all models, but we no longer do that.
The Issue
Generally I see two issues that with this:
- Presently its more difficult to use components of earth2mip in isolation. Without pure functions, we require users / developers to always have clear knowledge on the properties present / needed. You end up in cycles of property updates for different workflows.
Let's unpack "pure function" more. To me this means a function without state. Turning such a function into a class that has some properties for metadata does not make the overall use less pure.
- Updates to the schemas / property requirements have rolling effects across the package... if one model has additional / unique needs the coupling between components can result in an update being challenging.
What is the rolling effect specifically in the linked PR? While motivated by graphcast the updates in that PR fixed bugs in the scoring that could appear with other models (e.g. assuming input_channels == output_channels). Just because graphcast is the first example of such a model that we have added, does not mean it is a model specific issue.
Outside of that I've been thinking about just the general concepts of how can people better understand how information moves in the package. Just the data alone can reveal a lot for users but there needs to be meta data with it that is present hard to get to which can/has lead to challenging debugging.
I think the main debate here is static vs dynamic structure. I feel users may benefit from a dynamic interface, but the more static structure (e.g. grid, channels etc live on the object and can be checked ahead of time) are essential for writing pipelines that work at scale.
from earth2mip.
Decision, was this would be too invasive and current data flow works. So lets keep it.
Thanks for the input!
from earth2mip.
Related Issues (20)
- 🐛[BUG]: Running inference_ensemble with multiple ensemble members per rank causes perturbations to be applied multiple times HOT 1
- 🚀[FEA]: make pure python function for earth2mip.time_collection
- 🐛[BUG]: e2mip://fcn broken HOT 1
- Need to clarify provenance of e2mip models
- 🚀[FEA]: Improve deterministic inference UX HOT 1
- 🐛[BUG]: ensemble perturbation_strategy: "spherical_grf"
- 🐛[BUG]: time averaging loses the time coordinate informations
- 🐛[BUG]: Download API issues HOT 1
- 🐛[BUG]: Not able to load models on CPU HOT 1
- 🐛[BUG]: Chunking feature of open_forecasts is not available HOT 2
- 🐛[BUG][Feature Request]: Perturbing channels that are not included in `earth2mip/_channel_stds.py` HOT 1
- upgrade to pydantic 2.0
- 🐛[BUG]: Using output_grid different than model grid results in an error HOT 2
- 🚀[FEA]: Running on Multi GPU A100 HOT 2
- 🐛[BUG]: IFS initial conditions faulty HOT 1
- 🐛[BUG]: Unable to run inference_ensemble for models other than fcnv2_sm HOT 5
- 🐛[BUG]: Failed to allocate memory for requested buffer of size 1851310080 HOT 1
- 📚[DOC]: Information on Local Data for inference ensembles HOT 2
- 🐛[BUG]: GraphCast Model Registry Issues
- CDS API Changes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from earth2mip.