Giter VIP home page Giter VIP logo

Comments (2)

nbren12 avatar nbren12 commented on September 27, 2024

Thanks for opening this issue @NickGeneva. It will take me some time to parse. Starting with "The current"

The Current

Presently data is exchanged between parts of earth2mip via numpy/pytorch tensors. This data is physical, thus some meta data needed to describe what exactly this data is representing. We currently do this through a set of properties assigned to each of these components which return either python primitives or alternatively some object (grid schema).

Perhaps would be helpful to give some examples of this existing objects.

This means that to communicate these required properties we need package wide concepts such as geooperator / timestep / timestepper. Additionally we need global schemas for these coordinate systems in some cases (although we have been slowing moving away from these).

Let'd define "schema".

The earth2mip.grid.LatLonGrid objects are no longer earth2mip.schema hardcoding 721x1440 etc. They are more flexible and encompass any lat/lon grid. Do we have any concrete use cases that do not use lat/lon grids? If so, we can add a earth2mip.grid.Unstructured.

For the most part in the bounds of this package this has worked, granted the natural coupling between the package wide interfaces has caused the addition of new models to be a little painful and sometime more challenging to debug but documentation can likely fix that.

What code objects are the trouble?

Can you provide specific examples of trouble adding new models? I feel a lot of the trouble came from before the more recent batch of APIs (DataSource, TimeLoop) were formalized and when we still used enum objects in earth2mip.schema for the grid and channels. Another difficulty was using arth2mip.networks.Inference for all models, but we no longer do that.

The Issue

Generally I see two issues that with this:

  • Presently its more difficult to use components of earth2mip in isolation. Without pure functions, we require users / developers to always have clear knowledge on the properties present / needed. You end up in cycles of property updates for different workflows.

Let's unpack "pure function" more. To me this means a function without state. Turning such a function into a class that has some properties for metadata does not make the overall use less pure.

  • Updates to the schemas / property requirements have rolling effects across the package... if one model has additional / unique needs the coupling between components can result in an update being challenging.

What is the rolling effect specifically in the linked PR? While motivated by graphcast the updates in that PR fixed bugs in the scoring that could appear with other models (e.g. assuming input_channels == output_channels). Just because graphcast is the first example of such a model that we have added, does not mean it is a model specific issue.

Outside of that I've been thinking about just the general concepts of how can people better understand how information moves in the package. Just the data alone can reveal a lot for users but there needs to be meta data with it that is present hard to get to which can/has lead to challenging debugging.

I think the main debate here is static vs dynamic structure. I feel users may benefit from a dynamic interface, but the more static structure (e.g. grid, channels etc live on the object and can be checked ahead of time) are essential for writing pipelines that work at scale.

from earth2mip.

NickGeneva avatar NickGeneva commented on September 27, 2024

Decision, was this would be too invasive and current data flow works. So lets keep it.

Thanks for the input!

from earth2mip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.