Giter VIP home page Giter VIP logo

Comments (4)

prdpsvs avatar prdpsvs commented on August 16, 2024 1

PR #138

You can now create seed tables with different distribution and index strategy by providing required confiuration in dbt_project.yml file. The default choice is REPLICATE disttribution and HEAP (no indexing). If you want to override this configuration, the following sample should help.

seeds:
  jaffle_shop:
    index: HEAP
    dist: ROUND_ROBIN
    raw_customers:
      index: HEAP
      dist: REPLICATE
    raw_payments:
      dist: HASH(payment_method)
      index: CLUSTERED INDEX(id,order_id)  

Create a new context "seeds:" at the root followed by project name and seed name. In this case the project name is jaffle_shop and seeds are raw_customers and raw_payments. Provide index and distribution values using index and dist keys. Use replicate, round_robin, hash({column name}) as a value. Example: dist: replicate. The raw_customers seed table will be replicated a table. For hash distribution, the user need to provide the vaule HASH(payment_method). Example: dist: hash(payment_method)

To specific index, index as a key and CLUSTERED INDEX({Column1, Column2}), HEAP, CLUSTERED COLUMNSTORE INDEX as a value. Example: index: HEAP. The raw_customers seed table will use heap index strategy. For clustered index, the user need to provide one or more columns to create clustered index on. Example: index: CLUSTERED INDEX(id,order_id). The default value of index and distribution can also be set for all seeds under project name.

from dbt-synapse.

AdamG30 avatar AdamG30 commented on August 16, 2024

Hi. First, I want to say thanks to all the contributors here for making it possible to use dbt with Azure Synapse. I've been running through a series of tests, and bumped into this error. Here's a bit more detail on the problem for clarity.

The test case was - Distribution of the model was changed to REPLICATE

Model

{{ config(materialized='incremental'
, index='clustered index(OLDACNUM)'
, dist='REPLICATE'
, on_schema_change='sync_all_columns' )    }}

SELECT TOP (1000) 
      [RecordCreateRunId]
      ,[DEBTORNUM]
      ,[OLDACNUM]
      ,[CREDCOLACTION]
      ,[LASTSTMTDATE]
      ,[CUST_TYPE]
      ,[LETTERDATE]
      ,[CHARGE_CODE]
      ,[NUM_REMINDERS]
      ,[PAYMENT_CODE]
      ,[CYCLE_GROUP]
      ,ODS_START_DATE	  
FROM [DBO].[AR_DEBTOR]
{% if is_incremental() %}
  -- this filter will only be applied on an incremental run
  WHERE ODS_START_DATE > (select max(ODS_START_DATE) from {{ this }})
{% endif %}

Incremental Run

  • The model was run as incremental
  • This caused the error "'REPLICATE User Temp Table' is not supported in this version of SQL Server"
  • Offending code:
  CREATE TABLE "DBT"."#STG_DEBTORS__dbt_tmp"
    WITH(
      DISTRIBUTION = REPLICATE,
      clustered index(OLDACNUM)
      )
    AS (SELECT * FROM DBT.STG_DEBTORS__dbt_tmp_temp_view)
  • Temporary tables don't support REPLICATE distribution.
  • The recommended fix above means the generated code will never use temporary tables in any circumstance (I think). This may be problematic. In the dedicated SQL pool resource, temporary tables offer a performance benefit because their results are written to local rather than remote storage. A better approach would be only not to use temporary tables when DISTRIBUTION=REPLICATE.

Versions - dbt =1.1.2, sqlserver=1.1.1, synapse = 1.1.0

from dbt-synapse.

AdamG30 avatar AdamG30 commented on August 16, 2024

Hi

I've been able to fix this issue by overriding the sqlserver__make_temp_relation with a synapse version of this macro that does not use a temp table when the distribution type is REPLICATE.

Create a new macro in the macros folder of your project and use this code.

{# Fix for Temporary tables don't support REPLICATE distribution. Don't use temp table if Dist=REPLICATE #}
{% macro synapse__make_temp_relation(base_relation, suffix) %}
    {% if config.get('dist')|upper == 'REPLICATE' -%}
        {% set tmp_identifier = base_relation.identifier ~ suffix %}
    {%- else -%}
        {% set tmp_identifier = '#' ~  base_relation.identifier ~ suffix %}                        
   {% endif %}
    {% set tmp_relation = base_relation.incorporate(path={"identifier": tmp_identifier}) -%}
    {% do return(tmp_relation) %}
{% endmacro %}

To permanently fix this in this synapse adapter, this logic should be added to the relation.sql file.

from dbt-synapse.

jgilfillan avatar jgilfillan commented on August 16, 2024

Hi

I've been able to fix this issue by overriding the sqlserver__make_temp_relation with a synapse version of this macro that does not use a temp table when the distribution type is REPLICATE.

Create a new macro in the macros folder of your project and use this code.

{# Fix for Temporary tables don't support REPLICATE distribution. Don't use temp table if Dist=REPLICATE #}
{% macro synapse__make_temp_relation(base_relation, suffix) %}
    {% if config.get('dist')|upper == 'REPLICATE' -%}
        {% set tmp_identifier = base_relation.identifier ~ suffix %}
    {%- else -%}
        {% set tmp_identifier = '#' ~  base_relation.identifier ~ suffix %}                        
   {% endif %}
    {% set tmp_relation = base_relation.incorporate(path={"identifier": tmp_identifier}) -%}
    {% do return(tmp_relation) %}
{% endmacro %}

To permanently fix this in this synapse adapter, this logic should be added to the relation.sql file.

Is this going to be incorporated into the adapter, and if so any thoughts on when?

from dbt-synapse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.