When running incremental models on Synpase Replicated tables an error is thrown: <

PR <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id=

Error when using incremental models on Synapse REPLICATE table distribution about dbt-synapse HOT 4 OPEN

dbt-msft commented on August 16, 2024 1

Error when using incremental models on Synapse REPLICATE table distribution

from dbt-synapse.

Comments (4)

prdpsvs commented on August 16, 2024 1

PR #138

You can now create seed tables with different distribution and index strategy by providing required confiuration in dbt_project.yml file. The default choice is REPLICATE disttribution and HEAP (no indexing). If you want to override this configuration, the following sample should help.

seeds:
  jaffle_shop:
    index: HEAP
    dist: ROUND_ROBIN
    raw_customers:
      index: HEAP
      dist: REPLICATE
    raw_payments:
      dist: HASH(payment_method)
      index: CLUSTERED INDEX(id,order_id)

Create a new context "seeds:" at the root followed by project name and seed name. In this case the project name is jaffle_shop and seeds are raw_customers and raw_payments. Provide index and distribution values using index and dist keys. Use replicate, round_robin, hash({column name}) as a value. Example: dist: replicate. The raw_customers seed table will be replicated a table. For hash distribution, the user need to provide the vaule HASH(payment_method). Example: dist: hash(payment_method)

To specific index, index as a key and CLUSTERED INDEX({Column1, Column2}), HEAP, CLUSTERED COLUMNSTORE INDEX as a value. Example: index: HEAP. The raw_customers seed table will use heap index strategy. For clustered index, the user need to provide one or more columns to create clustered index on. Example: index: CLUSTERED INDEX(id,order_id). The default value of index and distribution can also be set for all seeds under project name.

from dbt-synapse.

AdamG30 commented on August 16, 2024

Hi. First, I want to say thanks to all the contributors here for making it possible to use dbt with Azure Synapse. I've been running through a series of tests, and bumped into this error. Here's a bit more detail on the problem for clarity.

The test case was - Distribution of the model was changed to REPLICATE

Model

{{ config(materialized='incremental'
, index='clustered index(OLDACNUM)'
, dist='REPLICATE'
, on_schema_change='sync_all_columns' )    }}

SELECT TOP (1000) 
      [RecordCreateRunId]
      ,[DEBTORNUM]
      ,[OLDACNUM]
      ,[CREDCOLACTION]
      ,[LASTSTMTDATE]
      ,[CUST_TYPE]
      ,[LETTERDATE]
      ,[CHARGE_CODE]
      ,[NUM_REMINDERS]
      ,[PAYMENT_CODE]
      ,[CYCLE_GROUP]
      ,ODS_START_DATE	  
FROM [DBO].[AR_DEBTOR]
{% if is_incremental() %}
  -- this filter will only be applied on an incremental run
  WHERE ODS_START_DATE > (select max(ODS_START_DATE) from {{ this }})
{% endif %}

Incremental Run

The model was run as incremental
This caused the error "'REPLICATE User Temp Table' is not supported in this version of SQL Server"
Offending code:

  CREATE TABLE "DBT"."#STG_DEBTORS__dbt_tmp"
    WITH(
      DISTRIBUTION = REPLICATE,
      clustered index(OLDACNUM)
      )
    AS (SELECT * FROM DBT.STG_DEBTORS__dbt_tmp_temp_view)

Temporary tables don't support REPLICATE distribution.
The recommended fix above means the generated code will never use temporary tables in any circumstance (I think). This may be problematic. In the dedicated SQL pool resource, temporary tables offer a performance benefit because their results are written to local rather than remote storage. A better approach would be only not to use temporary tables when DISTRIBUTION=REPLICATE.

Versions - dbt =1.1.2, sqlserver=1.1.1, synapse = 1.1.0

from dbt-synapse.

AdamG30 commented on August 16, 2024

I've been able to fix this issue by overriding the sqlserver__make_temp_relation with a synapse version of this macro that does not use a temp table when the distribution type is REPLICATE.

Create a new macro in the macros folder of your project and use this code.

{# Fix for Temporary tables don't support REPLICATE distribution. Don't use temp table if Dist=REPLICATE #}
{% macro synapse__make_temp_relation(base_relation, suffix) %}
    {% if config.get('dist')|upper == 'REPLICATE' -%}
        {% set tmp_identifier = base_relation.identifier ~ suffix %}
    {%- else -%}
        {% set tmp_identifier = '#' ~  base_relation.identifier ~ suffix %}                        
   {% endif %}
    {% set tmp_relation = base_relation.incorporate(path={"identifier": tmp_identifier}) -%}
    {% do return(tmp_relation) %}
{% endmacro %}

To permanently fix this in this synapse adapter, this logic should be added to the relation.sql file.

from dbt-synapse.

jgilfillan commented on August 16, 2024

Hi

I've been able to fix this issue by overriding the sqlserver__make_temp_relation with a synapse version of this macro that does not use a temp table when the distribution type is REPLICATE.

Create a new macro in the macros folder of your project and use this code.
{# Fix for Temporary tables don't support REPLICATE distribution. Don't use temp table if Dist=REPLICATE #}
{% macro synapse__make_temp_relation(base_relation, suffix) %}
    {% if config.get('dist')|upper == 'REPLICATE' -%}
        {% set tmp_identifier = base_relation.identifier ~ suffix %}
    {%- else -%}
        {% set tmp_identifier = '#' ~  base_relation.identifier ~ suffix %}                        
   {% endif %}
    {% set tmp_relation = base_relation.incorporate(path={"identifier": tmp_identifier}) -%}
    {% do return(tmp_relation) %}
{% endmacro %}
To permanently fix this in this synapse adapter, this logic should be added to the relation.sql file.

Is this going to be incorporated into the adapter, and if so any thoughts on when?

from dbt-synapse.

Error when using incremental models on Synapse REPLICATE table distribution about dbt-synapse HOT 4 OPEN

Comments (4)

Model

Incremental Run

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent