Context
Currently the WhereFilter protocol only has one property, where_sql_template
which is defined as a str
. However, in reality this string
is a highly structured object. We know we are going to make changes to this structure over time. This is problematic because the protocol only captures that it is a string and nothing more, thus any changes to the structure wouldn't actually be associated with any DSI version. This could lead to some really funky situations. To my knowledge we can't set the protocol filter to match a specific string pattern. Though if we could, that would be cool. The alternative is to take the structure out of the string, i.e. have a more structured protocol definition for WhereFilter
WhereFilter Protocol
class DimensionInput(Protocol):
dimension: str
primary_entity: str
entity_path: Optional[List[str]]
class TimeDimensionInput(Protocol):
dimension: str
primary_entity: str
granularity: TimeGranularity
entity_path: Optional[List[str]]
class EntityInput(Protocol):
entity: str
entity_path: Optional[List[str]]
class WhereFilter(Protocol):
where_sql_template: str
input_dimensions: List[DimensionInput]
input_time_dimensions: List[TimeDimensionInput]
input_entities: List[EntityInput]
Example instantiated WhereFilter
object
WhereFilter(
where_sql_template="{{ country }} = 'US' AND {{ ds }} >= '2023-0701' AND {{ user }} == 'SOME_USER_ID'",
input_dimensions=[
DimensionInput(
dimension='country',
entity_path=['user']
)
],
input_time_dimensions=[
TimeDimensionInput(
dimension='ds',
entity_path=['user', 'transaction'],
granularity=TimeGranularity.MONTH
)
],
input_entities=[EntityInput(entity='user')]
}
Example serialized WhereFilter
{
"where_sql_template": "{{ country }} = 'US' AND {{ ds }} >= '2023-0701' AND {{ user }} == 'SOME_USER_ID'"
"input_dimensions": [
{
"dimension": "country",
"entity_path": ["user"]
}
],
"input_time_dimensions": [
{
"dimension": "ds",
"entity_path": ["user", "transaction"],
"granularity": "month"
}
],
"input_entities": [{"entity": "user"}]
}
User facing spec
We don't want the user to have to specify their filters in this verbose structured manner. That would suck. However, the user facing spec and the protocol can be divorced, and in core they actually are. My view more and more has become that the user spec compiles down to the protocol definitions. The user facing spec in YAML would continue to be
metric:
- name: 'my metric name'
...
- filter: "{{ dimension(name='country', entity_path=['user']) }} = 'US' AND {{ time_dimension('ds', 'month', entity_path=['user', 'transaction']) }} >= '2023-07-01' AND {{ entity('user') }} == 'SOME_USER_ID'"
This example would then compile down to the structure protocol definition example given above. This lifts the string specification into implementations of the protocols, and the agreed protocol definition would be structured (and likely less frequently change).
Implementing in DSI
We'll lift the call_parameter_sets
into a generic jinja where filter compiler.
Implementing in Core
Core has a unparsed nodes and parsed nodes. It'll be fairly straight forward to handle this in core. And core will have it's own ticketed work for doing this. It'll likely just reuse the generic jinja where filter compiler produced by DSI.