Giter VIP home page Giter VIP logo

dbt-formatter's Introduction

CircleCI

DBT Formatter

Install

npm install -s dbt-formatter

Usage

import formatter from 'dbt-formatter';

const mySql = "SELECT * FROM {{ ref('myTableRef') }}";
const myOpts = { sql: 'default', indent: 2, upper: false };

formatter.format(mySql, myOpts);

This will result in:

SELECT
  *
FROM
  {{ ref('myTableRef') }}

Usage options

Fine tune dbt-formatter behavior with the following options:

Option Default Description
sql default The sql dialect you want to use, currently only default is available
indent 2 How many spaces you want an indentation to be
upper false Formats sql reserved words to be uppercase when set to true
newline false Appends a new line at the end of the formatted sql string
lowerWords false Lowercases all words as identified by the tokenizer
allowCamelcase true Allows column names to be camelcased

Development

NPM scripts

  • npm test: Run test suite
  • npm start: Run npm run build in watch mode
  • npm run build: Generate bundles and typings, create docs
  • npm run lint: Lints code
  • npm run package: Package dbt-formatter as a binary

Roadmap

  • Add more sql dialects:
    • snowflake
    • redshift
    • bigquery
    • postgres
    • presto

dbt-formatter's People

Contributors

dependabot[bot] avatar elliottohara avatar henriblancke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dbt-formatter's Issues

Formatter changes `null end` to `nullend` in case statement

The formatter can introduce an error when formatting the end of a case statement:
Before formatting:

when GTTV_28 between 2500 and 4000 then 'Trending'
when GTTV_28 < 2500 then 'Low'
else null end csm_status

After formatting:

WHEN GTTV_28 BETWEEN 2500 
AND 4000 THEN 'Trending'
WHEN GTTV_28 < 2500 THEN 'Low'
ELSE NULLEND csm_status

NULL END should remain as separate words.

This does not happen on all case statements; in this case it may have been related to use of between X and Y in one of the WHEN clauses

Justified "AS" column aliases?

We would love to start using this in our DBT projects. We are currently formatting SQL queries manually, and we are using a justified AS blocks for column aliases - which adds a lot in terms of readability. Has anyone looked into what this would require?

For example:

SELECT
    CAST(1 as int)                     AS col_a
  , CAST('text_value' as varchar(255)) AS col_b

...instead of:

SELECT
    CAST(1 as int) AS col_a
  , CAST('text_value' as varchar(255)) AS col_b

I'm happy to help contribute code if someone is able to point me in the right direction.

asin seen as reserved word

I typed:

SELECT
  updated_at,
  asin,
  data->'title' AS product_title
FROM
  product

And it formatted to:

SELECT
  updated_at,
  ASIN,
  DATA -> 'title' AS product_title
FROM
  product

Why is asin and data capitalized?

Open to PRs?

Hello maintainers of this package ๐Ÿ‘‹ thanks for your hard work on this! I have been using this for a little while now at work. I've noticed a few bugs in the formatter such that it produces invalid SQL. I'm wondering if you're open to PRs to address bugs?

Fails to parse square brackets in variables correctly

Hello! We're keen to use this to start formatting our dbt codebase; it looks like exactly what we want to have a standardised SQL output that takes into account jinja code blocks, but have come across a couple of issues. I'm happy to look at fixing this but want to validate the approach first.

Reproducing the issue

The issue is to do with the way the tokeniser treats [ and ] as string delimiters, which is fine for SQL Server SQL, but breaks when we start using arrays and dictionaries in our jinja code (which we currently do). The errors can be reproduced as follows:

  it('works with arrays in the jinja statement', () => {
    const formatted = formatter(`{{ some_macro(arg_1 = 'foo', arg_2 = ['[string with square brackets]'], arg_3 = 'bar') }}`);

    expect(formatted).toBe(`{{ some_macro(
  arg_1 = 'foo',
  arg_2 = ['[string with square brackets]'],
  arg_3 = 'bar'
) }}`);
  })

  it('does not treat square brackets in jinja statements as strings', () => {
    const formatted = formatter(`{{ variable['key'] }}`);
    expect(formatted).toBe(`{{ variable['key'] }}`);
  })

When I run these tests, I get the following output:

  โ— jinja flavored queries โ€บ works with arrays in the jinja statement

    expect(received).toBe(expected) // Object.is equality

    - Expected  - 3
    + Received  + 2

      {{ some_macro(
        arg_1 = 'foo',
    -   arg_2 = ['[string with square brackets]'],
    -   arg_3 = 'bar'
    - ) }}
    +   arg_2 = ['[string with square brackets] '], arg_3 = ' bar ') }}
    +

  โ— jinja flavored queries โ€บ does not treat square brackets in jinja statements as strings

    expect(received).toBe(expected) // Object.is equality

    - Expected  - 1
    + Received  + 1

    - {{ variable['key'] }}
    + {{ variable ['key'] }}
      โ†ต

Why these tests fail

The problem in the first test is that it tokenises ['[string with square brackets]'] into ['[string with square brackets] and '], arg_3 = '" }. The incorrect match on the trailing bracket ultimately results in tokenising the entirety of the remainder of the statment into a single token, and we end up with our entire file on one line.

The problem in the second test is that due to the treatment of [ ... ] as a string, it separates a dictionary access in the jinja code variable['key'] into two tokens variable and ['key'], which are then separated by whitespace because they're just two different strings.

Proposed solution

I think the root of this problem is due to the fact that the tokeniser doesn't distinguish between SQL and code in the jinja templates. Obviously we still need to treat [...] as a string in the SQL but treating it like a string in python doesn't make sense. Consequently I think what we need to do is to parse and format jinja separately to the SQL:

  1. Firstly, tokenise the query as SQL and break out jinja variables or templates into whole tokens, instead of the current behaviour. So a string like SELECT * FROM {{ ref("table") }} would get tokenised into something like:
    • {'type': 'reserved', 'value': 'SELECT'}
    • {'type': 'string', 'value': '*'}
    • {'type': 'reserved', 'value': 'FROM'}
    • {'type': 'jinja', 'value': '{{ ref("table") }}'}
  2. Have a separate parser for jinja variables and templates which get applied when we are going through the jinja token type.

Does the problem and the proposed solution make sense? Like I say, I'm happy to work on a fix for this but given it's a fairly substantial change I thought I'd make sure I'm not missing something first ๐Ÿ˜„

Case statement bug

When else is NULL it will append END without space to NULL

Example:

CASE
WHEN a > 1 THEN b
WHEN b > 1 THEN a
ELSE NULL
END AS col

JSONB and type casting

I think JSONB and type casting is being mis-formated:

From:

(data ->> 'newPriceIsMAP')::boolean as new_price_is_map,

To:

  (
    data ->> 'newPriceIsMAP'
  ) :: boolean as new_price_is_map,

Seems like the first option is more readable.

Support for tabs

It would be nice if you could configure the plugin to use tabs vs spaces.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.