[Feature] Add Configuration Object to Semantic Manifest

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

As requested by @marcodamore - it would be useful to have a configuration object in the semantic manifest to hold configuration options like:

{ 
  "project_config": {
      "dsi_version_number": 123,
      "timespine": {
          "location": "transform.metricflow_time_spine",
          "column_name": "date_day",
          "grain": "daily",
      }
   }
}

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

[Feature] Add `call_parameter_sets` to the `WhereFilter` Protocol

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

call_parameter_sets is currently a property on the Pydantic implementation of the WhereFilter. Since this property might be more broadly useful, it should be moved into the protocol definition.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

Yes

Anything else?

No response

[Spec Update] Refactor dimension `type_params` to instead be separate `CategoricalDimension` and `TimeDimension` interfaces

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Right now having Dimension be agnostic of it's type forces us to do validation all throughout the codebase. So breaking up these into two different classes makes it a lot easier for us to break up the code!

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

[BUG] `etype` was dropped from `traceback.format_exception_only` in python 3.10+

We recently discovered that the named parameter etype of traceback.format_exception_only was dropped in python 3.10+. Currently we use the etype parameter in two places:

If either of these code paths are hit when one is using python 3.10+, then an exception like the following gets raised

  File "/Users/quigleymalcolm/Developer/dbt-labs/dbt-semantic-interfaces/dbt_semantic_interfaces/validations/validator_helpers.py", line 365, in wrapper
    generate_exception_issue(
  File "/Users/quigleymalcolm/Developer/dbt-labs/dbt-semantic-interfaces/dbt_semantic_interfaces/validations/validator_helpers.py", line 343, in generate_exception_issue
    f"{''.join(traceback.format_exception_only(etype=type(e), value=e))}",
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: format_exception_only() got an unexpected keyword argument 'etype'

This is extra problematic as the only way to work around the issue is to fix whatever is wrong with your config that is trying to raise a validation issue, but this exception ends up swallowing up the initial discovered problem 🙃

Solution

We could continue passing the type of the exception, as it's still accepted as a positional argument wherein it just has to be an unnamed argument and the first in the function call. However as of python 3.5, the exception type began being inferred from the passed in exception value. Since we only support python 3.8+, it seems like the best path forward would be to just drop the etype in the two linked calls.

Create protocol abstraction for `SemanticModel`

We want a standard definition of what a valid SemanticModel implementation should have. This allows for a shared definition to be understood in MetricFlow and dbt-core of what anything implementing the SemanticModel protocol will make available without MetricFlow and dbt-core needing to import each other. This should use the new Protocol type which exists in python 3.8+

[CT-2363][Documentation] Add contributing guide

This will be an open source repository. As such we should have a contributing guide. It can likely be much the same as dbt-core's, but likely paired down as this repository should be smaller and have fewer dependencies

[Spec Update] Rename `is_primary` to `default_agg_time`

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

As above so below. Let's do some renaming!

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

Rename `Identifiers` to `Entities` and update object keys to new spec

In the new world of dbt-core x MetricFlow Identifiers are becoming Entities. Additionally some of the properties of the object are changing. The resulting object should have the following properties

Property Name	Type	Description
name	str	Name of the entity
type	enum	Type of the entity
description	str	Description of the entity
role	str	Role of the entity
entities	List[str]	List of composite sub-entities
expr	str	Expression of the entity

The above properties are were pulled from dbt-labs/dbt-core#7456

Add `SemanticManifestValidator` rule for checking `WhereFilter.call_parameter_sets` of `Metric` filters

Context

Metrics can have many WhereFilters associated with them. Specifically:

Metric.filter
Metric.type_params.measure.filter
Metric.type_params.denominator.filter
Metric.type_params.numerator.filter
Metric.type_params.input_metrics[x].filter
That's quite a few filters. Additionally the where_sql_template of a WhereFilter is a highly structured. Technically with #110 call_parameter_sets guarantees the structure of a WhereFilters where_sql_template, but this only happens if call_parameter_sets is actually called. The best way to pseudo guarantee this happens is to add a SemanticManifestValidationRule to the default rules of the SemanticManifestValidator. It's only a pseudo guarantee because from DSI's perspective, it's not guaranteed that a SemanticManifest has been run through the SemanticManifestValidator, however that is best practice and what people should do.

Acceptance Criteria

THERE EXISTS a SemanticManifestValidationRule
THAT checks call_parameter_sets of all filters of all metrics on a SemanticManifest
AND the new rule is added to the default rules of the SemanticManifestValidator

Create SemanticManifest Protocol

This should be a protocol that simply is the composition of SemanticModel and Metric protocols. Something like:

from types import List
from dbt_semantic_interfaces.protocols import SemanticModel, Metric

class SemanticManifest(Protocol):
  semantic_models: List[SemanticModel]
  metrics: List[Metric]

Add release script

Most of our open source repositories at dbt Labs have github workflow release actions (core, dbt-snowflake, dbt-redshift, dbt-bigquery). These workflows handle compiling the changie docs, bumping the version, building the distributions, pushing the distributions to pypi and github. We should have this in place for the 0.1.0 release of DSI.

[Feature] Adding PR templates

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

We don't have PR templates. We should!

Describe alternatives you've considered

🤠 continuing this rodeo wild west world

Who will this benefit?

Everyone who contributes here

Are you interested in contributing this feature?

Yarp

Anything else?

No response

Add semantic manifest construct for specifying a time spine location to dbt-semantic-interfaces

[Feature] Publish dbt-semantic-interfaces on PyPi

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

As soon as we've got a working version, we should publish dbt-semantic-interfaces on PyPi so that both dbt-core and MetricFlow can install it without needing to reference the github link.

Describe alternatives you've considered

Having them reference the hard coded git link but that seems like a 🅱️ ad idea

Who will this benefit?

The developers of MetricFlow & dbt-core!

Are you interested in contributing this feature?

Yarp

Anything else?

No response

Update `SemanticManifestValidator` to expect a generic bound by the `SemanticManifest` Protocol

Currently the SemanticManifestValidator (currently named ModelValidator) expects a UserConfiguredModel which is a concrete object. Initially we thought we should move to it expecting SemanticManifest protocol. However, in dbt-core we'll be writing nodes which extend the protocol definition. If we want to be able to write validation rules that can operate on the extensions and guarantee type safety, then we need to take it a step further. Thus the SemanticManifestValidator should instead operate on a generic bound by the SemanticManifest protocol.

Something like...

from typing import TypeVar
from dbt_semantic_interfaces.protocols import SemanticManifest


T = TypeVar("T", bound="SemanticManifest")
class SemanticManifestValidator:
  ...

  def validate(self, semantic_manifest: T) -> ValidationResults:
    ...

[Spec Update] Add a `hidden` property to metric and dimension properties

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

This is an alternative or potential addition to #56 . We want some way of defining metrics/measures without showing them.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

Create protocol abstraction for `Entities`

We want a standard definition of what a valid Entity implementation should have. This allows for a shared definition to be understood in MetricFlow and dbt-core of what anything implementing the Entity protocol will make available without MetricFlow and dbt-core needing to import each other. This should use the new Protocol type which exists in python 3.8+

Update object keys of `Metrics` to match new spec

In the new world of dbt-core x MetricFlow the properties of metrics are changing slightly. Metric objects should have the following properties

Property Name	Type	Description
name	str	Name of the metric
type	enum	Metric type
description	str	Description of the metric
type_params	TypeParams	Type parameters for the metric. These parameters change based on the type
filter	str	WHERE clause constraint applied to the metric

The above properties are were pulled from dbt-labs/dbt-core#7456

[Feature] Add a field to the Semantic Manifest protocol that attaches the dbt-semantic-interfaces version

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The requirement for this functionality was discussed during a standup and it resolves some issues we would have around knowing what version of DSI a semantic manifest is using.

Describe alternatives you've considered

Not doing this

Who will this benefit?

The metricflow developers

Are you interested in contributing this feature?

No response

Anything else?

No response

Update SemanticManifestTransformer to expect a generic bound by the `SemanticManifest` Protocol

Currently the SemanticManifestTransformer (currently named ModelTransformer) expects and returns a UserConfiguredModel which is a concrete object. Initially we thought we should move to it expecting and returning a SemanticManifest protocol. However, in dbt-core we'll be writing nodes which extend the protocol definition. Additionally we want to be able to hand in raw-ish parsings and transform them into the final objects, meaning the input and return types should be different. If we want to be able to write transformation rules that can operate on the raw parsed objects, return the extended classes, and guarantee type safety, then we need to take it a step further. Thus the SemanticManifestTransformer should instead expect a generic and return a generic bound by the SemanticManifest protocol.

Something like...

from typing import TypeVar
from dbt_semantic_interfaces.protocols import SemanticManifest


T = TypeVar("T", bound="SemanticManifest")
U = TypeVar("U")

class SemanticManifestTransformer:
  ...

  @staticmethod
  def transform(
    raw_semantic_manifest: U,
    ordered_rule_sequences: Tuple[Sequence[SemanticManifestTransformRule], ...] = DEFAULT_RULES,
  ) -> T:
    ....

[Spec Update] Remove `EXPR` metric types

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

This entails getting rid of two metric types because the functionality can be accomplished with derived. We want to commit to simple metrics being the building blocks and derived metrics serving as the mechanism to create more complicated metrics.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

[Spec Update] Rename `measure_proxy` metric type to `simple`

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Measure proxy isn't going to make as much sense in the new world of simple metrics & derived metrics. So let's rename it to simple!

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

[Feature] Upgrade to mypy 1.3.0

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The version of mypy should be upgraded from 1.1.1 to the latest version 1.3.0 to be consistent with what's used in metricflow.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

[Bug] Imports from dbt_semantic_interfaces are untyped

Is this a new bug in dbt-semantic-interfaces?

I believe this is a new bug in dbt-semantic-interfaces
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

mypy fails in metricflow with error: Skipping analyzing "dbt_semantic_interfaces.test_utils": module is installed, but missing library stubs or py.typed marker [import]

Expected Behavior

mypy does typechecking without skipping the imports

Steps To Reproduce

Take a dependency with follow imports enabled for dbt_semantic_interfaces, run mypy

Relevant log output

No response

Environment

No response

Additional Context

No response

[Spec Update] Support inline metric definitions

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Say you want to define a derived metric that is a ratio of two measures. In the old world you'd use a ratio metric type but that will no longer exist. So you have two options:

Define each measure as its own simple metric with create_metric: true
ADDITION: Define the simple metrics inline with the derived metric definition.

metric:
  name: derived_metric
  type: derived
  type_params:
    expr: "metric_a / metric_b"
    metrics:
      - name: metric_a
        type: simple
        type_params:
         - measure: column_a

      - name: metric_b
        type: simple
        type_params:
         - measure: column_b

In the above example, metric_a and metric_b would NOT show up in list-metrics as they are only defined inline to derived_metric. Similar to how dbt uses ephemeral models today.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

Rename the 'schema_name' field on node_relation to simply 'schema'

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Rename the 'schema_name' field on node_relation to simply 'schema'

[Feature] Figure out best way for Semantic Model -> Relationships to be established in dbt DAG

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Right now everything in dbt land is represented as a node in the DAG. These relationships are established with things like ref('some_model').

Once we introduce the semantic constructs (metric & semantic model) we'll want those relationships to hold true. IE I should be able to see that model A feeds into semantic model B and Metric C.

🤔

Describe alternatives you've considered

Hiding?

Who will this benefit?

Anyone who uses the semantic layer

Are you interested in contributing this feature?

Yarp

Anything else?

No response

[CT-2357] Setup: Pre-Commit Hooks

With this being a new repository we should setup some good hygiene, as part of that we should setup pre-commit hooks. The pre-commit hooks should be similar to the pre-commit hooks in dbt-core.

[Spec Update] Refactor metric `type_params` to instead be separate defined metric interfaces

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

@QMalcolm can you offer more context here on what this issue entails?

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

[CLEANUP] Another pass at switching usages of `model` to `semantic_manifest`

In the before times, pre March 2023, we referred to SemanticManifests as Models. We've taken an initial pass getting a fair number of these moved over, but many remain (Examples: A, B, C, etc).

Getting these corrected is slowly becoming harder due to Hyrum's Law. The incorrect usage of the word model shows up in class names, function names, function attributes, variables within functions, comments. Some of these are easier to fix than others. For the use of model in comments and in variables instantiated within a function run have no outside exposure and can be changed without worry. Usage of model in function names beginning with a _ and parameters of a function starting with a _ are safe to change because they are "private" functions. For function names, parameters to public functions, class names, and public attributes of classes we'll first have to investigate MetricFlow and dbt-core to see if these names are depended on. At this time I'm not too worried about third party exposure given that we're in RC of our first version release.

Oh this is extra hard because pydantic's HashableBaseModel and BaseModel, SemanticModel, and the like have valid uses of the word model and thus a simple "find and replace" cannot be used 🙃

[Spec Update] Redesign `create_metric`: default True and dict value to support `create_metric_display_name`

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Do measures always create metrics? In a world where all complicated metrics are defined as derived metrics on top of simple metrics, this might be true.

This Issue is complicated by the fact that create_metric = true has implications in parsing for core.

Additionally we have the create_metric_display_name property. If we want to retain this then we should make create_metric a dict to support it as a class. That way someone isn't adding the display value without create metric

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

Ensure the `WhereFilterParser` uses a `SandboxedEnvironment` during jinja rendering

Context

Currently the WhereFilterParser uses jinja2.Template when parsing the FilterCallParameterSets from a str. This is not considered best practice because the jinja2.Template does not by default use a SandboxedEnvironment. Generally DSI doesn't make assumptions about the implementing architecture of projects using DSI, and depending on your usage more security checks should be done on the implementation side. However it's a pretty small change to begin using a SandboxedEnvironment for jinja rendering to provide people utilizing DSI some additonal peace of mind.

Acceptance Criteria

THE WhereFilterParser
USES a SandboxedEnviornment
WHEN performing jinja parsing/rendering

Update object keys of `Measures` to match new spec

In the new world of dbt-core x MetricFlow the properties of measures are changing slightly. Measure objects should have the following properties

Property Name	Type	Description
name	str	Name of the measure
agg	enum	Aggregation type
description	str	Description of the measure
expr	str	Expression of the measure
create_metric	Bool	Boolean flag that creates a metric from the measure if True
agg_params	Optional[AggregationParameters]	The aggregation parameters
non_additive_dimension	?	Non-additive dimension parameters
agg_time_dimension	str	The time dimension to aggregate the measure by

The above properties are were pulled from dbt-labs/dbt-core#7456

Enable the `label` property to allow users to set a display name

Describe the Feature
I want to be able to retrieve a display_name from the domain objects.

Currently, Metric and Dimension domain objects do not have display_name as an attribute. It is available in the schema, but isn't defined at the domain level.

Would you like to contribute?
Certainly, but it is 2 LOC change.

Anything Else?
@nhandel I know there are some changes coming, maybe that is something that could be added ?

Validation tests are super slow

Describe the bug
When we run our unit tests everything grinds to a halt on the semantic validator tests

This is potentially caused by our test construction - most of these tests do something like:

with pytest.raises(...):
    # run all validations

That means we run every validation on every test input model on every test case, even though most, if not all, of these tests cases are targeted at highly specific validation rules.

A good start here would be to migrate these test cases to only run the specific rule we care to test.

Another possible optimization - both for runtime and readability - is to use smaller, more targeted models defined local to the test case, so converting these cases away from model fixtures onto the local model shims would be great as well. See https://github.com/transform-data/metricflow/blob/main/metricflow/test/model/validations/test_validity_param_definitions.py#L51-L71 for an example of a locally defined model with a specific failure state written into it.

Steps To Reproduce
Steps to reproduce the behavior:

Run make test
Observe slowness for tests in the validations path

Expected behavior
These should be much faster

Drop `create_metric` from the Measure Protocol

The create_metric key on a measure shouldn't be part of the protocol. It's an ergonomics field that is only ever used at parse time, it's never used by MetricFlow. It can be left on the PydanticMeasure implementation, but should be removed from the protocol.

CT-2358: Changie Support

We want to have a good change log for this repository, starting as soon as possible. dbt-core uses Changie, and we want to follow that pattern. Here is dbt-core's changie config. Additionally PR's should require a changie entry to be present.

[CT-2346] Cut over current semantic objects and helpers from MetricFlow

This repository / package is intended to be where the semantic interface protocols live such that MetricFlow and dbt-core (and other projects) have shared importable understood protocol. That's a tall order. It all starts though with the current definitions as they are in MetricFlow, specifically everything that lives in the model director. The only carve outs from that are files & directories for data warehouse validations and dbt-metrics -> MetricFlow conversion. Data warehouse validations will remain part of MetricFlow, and conversions from dbt-metrics to MetricFlow will no longer be needed.

Update object keys of `Dimensions` to match new spec

In the new world of dbt-core x MetricFlow the properties of dimensions are changing slightly. Dimension objects should have the following properties

Property Name	Type	Description
name	str	Name of the dimension
type	enum	Type of the dimension
description	str	Description of the dimension
expr	str	Expression of the dimension
type_params	TypeParameters	Parameters needed for the given type

The above properties are were pulled from dbt-labs/dbt-core#7456

[Feature] Relax Dependencies for Better Compatibility with Other Projects

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Currently, pyproject.toml specifies pinned versions on many dependencies. The pinned dependencies make it more difficult to use this project with other projects as there will be dependency conflicts. Consequently, the pinned dependencies should be relaxed when possible.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

[CT-2353][Epic] Initial build out of new package `dbt-semantic-interfaces` for sharing of semantic layer interfaces

As noted in dbt-labs/dbt-core#7456, for the integration of dbt-core and MetricFlow we are creating a third repo to put the shared semantic interface defintions. This is that repo. What that entails is protocol definitions for the semantic interfaces, default object implementations, default parsing / transformations / validations, and associated tests.

Rename `Models` to `SemanticModels` and update object keys to new spec

In the new world of dbt-core x MetricFlow Models have become SemanticModels and the properties have changed slightly. Semantic Model objects should have the following properties

Property Name	Type	Description
name	str	Name of the semantic model
description	str	Description of the semantic model
entities	List[Entity]	Entities for the semantic model
dimensions	List[Dimension]	Dimensions for the semantic model
measures	List[Measure]	Measures for the semantic model
data_path	DataPath	Path object to where the data should be

Relatedly the DataPath object should have the following for this first pass

Property Name	Type
name	str
schema	str
database	Optional[str]

The above properties are were pulled from dbt-labs/dbt-core#7456

[Spec Change] Update `RATIO` metric types to ingest metrics instead of measures

Create protocol abstraction for `Metrics`

We want a standard definition of what a valid Metric implementation should have. This allows for a shared definition to be understood in MetricFlow and dbt-core of what anything implementing the Metric protocol will make available without MetricFlow and dbt-core needing to import each other. This should use the new Protocol type which exists in python 3.8+

Add ValidationRule ensuring there are no duplicate primary entity parings

Context / Problem

Multiple SemanticModel objects can have the same primary Entity
Dimension names are not required to unique across SemanitcModels in the SemanticManifest

Together statements 1 and 2 mean that it is possible for SemanticModels to have the same primary Entity and have Dimensions of the same name. This is problematic because if a dimension <primary_entity>__<dimension_nam> is specified in a WhereFilter, it can be ambiguous which dimension is actually being referenced.

Solution

Disallow SemanticModels with the same primary Entity to have duplicate Dimensions

Accetance Criteria

A validation rule exists which disallows SemanticModels with the same primary Entity to have duplicate Dimensions

[Spec Update] Move `default_agg_time` to be in `default` struct within `SemanticModel`

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward change to existing dbt-semantic-interfaces functionality, rather than a Big Idea better suited to a discussion

Describe the feature

This is a follow up after #49 . In reality this is more of a semantic model property than it is a dimension property.

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

Move assumed metric filter string structure to structured objects in protocol

Context

Currently the WhereFilter protocol only has one property, where_sql_template which is defined as a str. However, in reality this string is a highly structured object. We know we are going to make changes to this structure over time. This is problematic because the protocol only captures that it is a string and nothing more, thus any changes to the structure wouldn't actually be associated with any DSI version. This could lead to some really funky situations. To my knowledge we can't set the protocol filter to match a specific string pattern. Though if we could, that would be cool. The alternative is to take the structure out of the string, i.e. have a more structured protocol definition for WhereFilter

WhereFilter Protocol

class DimensionInput(Protocol):
  dimension: str
  primary_entity: str
  entity_path: Optional[List[str]]

class TimeDimensionInput(Protocol):
  dimension: str
  primary_entity: str
  granularity: TimeGranularity
  entity_path: Optional[List[str]]

class EntityInput(Protocol):
  entity: str
  entity_path: Optional[List[str]]

class WhereFilter(Protocol):
  where_sql_template: str
  input_dimensions: List[DimensionInput]
  input_time_dimensions: List[TimeDimensionInput]
  input_entities: List[EntityInput]

Example instantiated `WhereFilter` object

WhereFilter(
  where_sql_template="{{ country }} = 'US' AND {{ ds }} >= '2023-0701' AND {{ user }} == 'SOME_USER_ID'",
  input_dimensions=[
    DimensionInput(
      dimension='country',
      entity_path=['user']
    )
  ],
  input_time_dimensions=[
    TimeDimensionInput(
      dimension='ds', 
      entity_path=['user', 'transaction'],
      granularity=TimeGranularity.MONTH
    )
  ],
  input_entities=[EntityInput(entity='user')]
}

Example serialized `WhereFilter`

{
  "where_sql_template": "{{ country }} = 'US' AND {{ ds }} >= '2023-0701' AND {{ user }} == 'SOME_USER_ID'"
  "input_dimensions": [
    {
      "dimension": "country",
      "entity_path": ["user"]
    }
  ],
  "input_time_dimensions": [
    {
      "dimension": "ds", 
      "entity_path": ["user", "transaction"],
      "granularity": "month"
    }
  ],
  "input_entities": [{"entity": "user"}]
}

User facing spec

We don't want the user to have to specify their filters in this verbose structured manner. That would suck. However, the user facing spec and the protocol can be divorced, and in core they actually are. My view more and more has become that the user spec compiles down to the protocol definitions. The user facing spec in YAML would continue to be

metric:
  - name: 'my metric name'
  ...
  - filter: "{{ dimension(name='country',  entity_path=['user']) }} = 'US' AND {{ time_dimension('ds', 'month', entity_path=['user', 'transaction']) }} >= '2023-07-01' AND {{ entity('user') }} == 'SOME_USER_ID'"

This example would then compile down to the structure protocol definition example given above. This lifts the string specification into implementations of the protocols, and the agreed protocol definition would be structured (and likely less frequently change).

Implementing in DSI

We'll lift the call_parameter_sets into a generic jinja where filter compiler.

Implementing in Core

Core has a unparsed nodes and parsed nodes. It'll be fairly straight forward to handle this in core. And core will have it's own ticketed work for doing this. It'll likely just reuse the generic jinja where filter compiler produced by DSI.

Create protocol abstraction for `Measures`

We want a standard definition of what a valid Measure implementation should have. This allows for a shared definition to be understood in MetricFlow and dbt-core of what anything implementing the Measure protocol will make available without MetricFlow and dbt-core needing to import each other. This should use the new Protocol type which exists in python 3.8+

Create protocol abstraction for `Dimensions`

We want a standard definition of what a valid Dimension implementation should have. This allows for a shared definition to be understood in MetricFlow and dbt-core of what anything implementing the Dimension protocol will make available without MetricFlow and dbt-core needing to import each other. This should use the new Protocol type which exists in python 3.8+

dbt-labs / dbt-semantic-interfaces Goto Github PK

dbt-semantic-interfaces's Issues

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Solution

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Context

Acceptance Criteria

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Is this a new bug in dbt-semantic-interfaces?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

Is this your first time submitting a feature request?

Describe the feature