Currently, cdf uses data-based configuration for sinks.
We are moving to python based configuration which provides the actual objects collecting them into a dataclass to logically group them. Config injection means that we are still leveraging data in our config file / env but with drastically increased flexibility as well as concrete code making sinks less ephemeral.
Our assumption right now is that a sink is labelled prod
and that is exclusively used for automatic metadata generation.
This burdens us though with a disconnected development experience. We must deploy to prod before we can properly generate metadata. This is not ideal. We should be able to generate metadata without deploying to prod.
Therefore we can solve this by having our metadata folder structure be:
<workspace>/metadata/<sink_name>/*.yaml
In this case a dedicated development sink which writes to duckdb can generate metadata (gitignored by user?)
We can even promote the metadata eagerly if useful by copying auto-derived files over to the appropriate sink folder...
Now we need to consider, I suppose, a single destination should still be considered prod
within a workspace? such that we can use it for generate-staging-layer
? Or should we leave that up to the user? Perhaps up to the user is good here. In which case we do cdf generate-staging-layer <workspace>.<sink_name>
. That is quite nice since cdf will not delete staging models during this process, only add. In which case we can eagerly add models for more holistic PRs and workflows.
The ONLY consideration is that we cannot do sqlmesh plans when prod does not yet have the data, even if we have done absolutely everything end-to-end in dev + staging.
So a disjunct flow may be unavoidable. PIpeline development + deployment to prod must precede Model development.
Unless we can dynamically trim the transformation subgraphs, which we technically could with our custom Loader by grabbing appropriate /metadata
folder and pruning all models where depends_on is not found upstream.
There are a faire number of ways to "break" this but I think it might actually tackle a sufficient number of use cases to make it work putting behind a flag. Epic indeed ๐