Comments (4)
PR #138
You can now create seed tables with different distribution and index strategy by providing required confiuration in dbt_project.yml file. The default choice is REPLICATE disttribution and HEAP (no indexing). If you want to override this configuration, the following sample should help.
seeds:
jaffle_shop:
index: HEAP
dist: ROUND_ROBIN
raw_customers:
index: HEAP
dist: REPLICATE
raw_payments:
dist: HASH(payment_method)
index: CLUSTERED INDEX(id,order_id)
Create a new context "seeds:" at the root followed by project name and seed name. In this case the project name is jaffle_shop and seeds are raw_customers and raw_payments. Provide index and distribution values using index and dist keys. Use replicate, round_robin, hash({column name}) as a value. Example: dist: replicate. The raw_customers seed table will be replicated a table. For hash distribution, the user need to provide the vaule HASH(payment_method). Example: dist: hash(payment_method)
To specific index, index as a key and CLUSTERED INDEX({Column1, Column2}), HEAP, CLUSTERED COLUMNSTORE INDEX as a value. Example: index: HEAP. The raw_customers seed table will use heap index strategy. For clustered index, the user need to provide one or more columns to create clustered index on. Example: index: CLUSTERED INDEX(id,order_id). The default value of index and distribution can also be set for all seeds under project name.
from dbt-synapse.
Hi. First, I want to say thanks to all the contributors here for making it possible to use dbt with Azure Synapse. I've been running through a series of tests, and bumped into this error. Here's a bit more detail on the problem for clarity.
The test case was - Distribution of the model was changed to REPLICATE
Model
{{ config(materialized='incremental'
, index='clustered index(OLDACNUM)'
, dist='REPLICATE'
, on_schema_change='sync_all_columns' ) }}
SELECT TOP (1000)
[RecordCreateRunId]
,[DEBTORNUM]
,[OLDACNUM]
,[CREDCOLACTION]
,[LASTSTMTDATE]
,[CUST_TYPE]
,[LETTERDATE]
,[CHARGE_CODE]
,[NUM_REMINDERS]
,[PAYMENT_CODE]
,[CYCLE_GROUP]
,ODS_START_DATE
FROM [DBO].[AR_DEBTOR]
{% if is_incremental() %}
-- this filter will only be applied on an incremental run
WHERE ODS_START_DATE > (select max(ODS_START_DATE) from {{ this }})
{% endif %}
Incremental Run
- The model was run as incremental
- This caused the error "'REPLICATE User Temp Table' is not supported in this version of SQL Server"
- Offending code:
CREATE TABLE "DBT"."#STG_DEBTORS__dbt_tmp"
WITH(
DISTRIBUTION = REPLICATE,
clustered index(OLDACNUM)
)
AS (SELECT * FROM DBT.STG_DEBTORS__dbt_tmp_temp_view)
- Temporary tables don't support REPLICATE distribution.
- The recommended fix above means the generated code will never use temporary tables in any circumstance (I think). This may be problematic. In the dedicated SQL pool resource, temporary tables offer a performance benefit because their results are written to local rather than remote storage. A better approach would be only not to use temporary tables when DISTRIBUTION=REPLICATE.
Versions - dbt =1.1.2, sqlserver=1.1.1, synapse = 1.1.0
from dbt-synapse.
Hi
I've been able to fix this issue by overriding the sqlserver__make_temp_relation with a synapse version of this macro that does not use a temp table when the distribution type is REPLICATE.
Create a new macro in the macros folder of your project and use this code.
{# Fix for Temporary tables don't support REPLICATE distribution. Don't use temp table if Dist=REPLICATE #}
{% macro synapse__make_temp_relation(base_relation, suffix) %}
{% if config.get('dist')|upper == 'REPLICATE' -%}
{% set tmp_identifier = base_relation.identifier ~ suffix %}
{%- else -%}
{% set tmp_identifier = '#' ~ base_relation.identifier ~ suffix %}
{% endif %}
{% set tmp_relation = base_relation.incorporate(path={"identifier": tmp_identifier}) -%}
{% do return(tmp_relation) %}
{% endmacro %}
To permanently fix this in this synapse adapter, this logic should be added to the relation.sql file.
from dbt-synapse.
Hi
I've been able to fix this issue by overriding the sqlserver__make_temp_relation with a synapse version of this macro that does not use a temp table when the distribution type is REPLICATE.
Create a new macro in the macros folder of your project and use this code.
{# Fix for Temporary tables don't support REPLICATE distribution. Don't use temp table if Dist=REPLICATE #} {% macro synapse__make_temp_relation(base_relation, suffix) %} {% if config.get('dist')|upper == 'REPLICATE' -%} {% set tmp_identifier = base_relation.identifier ~ suffix %} {%- else -%} {% set tmp_identifier = '#' ~ base_relation.identifier ~ suffix %} {% endif %} {% set tmp_relation = base_relation.incorporate(path={"identifier": tmp_identifier}) -%} {% do return(tmp_relation) %} {% endmacro %}
To permanently fix this in this synapse adapter, this logic should be added to the relation.sql file.
Is this going to be incorporated into the adapter, and if so any thoughts on when?
from dbt-synapse.
Related Issues (20)
- upgrade to support dbt-core v1.2.0 HOT 1
- upgrade to support dbt-core v1.2.0
- upgrade to support dbt-core v1.2.0
- upgrade to support dbt-core v1.2.0
- Incorrect syntax near 'column' error adding a New Column with on_schema_change='sync_all_columns' HOT 1
- Incorrect syntax near 'column' error renaming a Column with on_schema_change='sync_all_columns' HOT 1
- upgrade to support dbt-core v1.3.0 HOT 1
- sam, are you there?
- sam, are you there?
- Changing the distribution and index types in existing snapshots don't have any effect HOT 1
- upgrade to support dbt-core v1.4.0 HOT 2
- Stand up new CI-testing pool HOT 1
- upgrade to support dbt-core v1.5.0
- This repo is missing important files
- upgrade to support dbt-core v1.6.0
- Encountered an error: dbt_sqlserver://macros/adapters.sql HOT 2
- dbt found two macros named "synapse__snapshot_merge_sql" in the project "dbt_synapse". HOT 1
- How to load parquet files from Azure Data Lake Gen2 to dedicated pool using dbt? HOT 1
- create_nonclustered_index macro ends up in endless recursion HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-synapse.