lelit / pglast Goto Github PK

PostgreSQL Languages AST and statements prettifier: master branch covers PG10, v2 branch covers PG12, v3 covers PG13, v4 covers PG14, v5 covers PG15, v6 covers PG16

Makefile 0.60% Python 70.14% PLpgSQL 0.35% Emacs Lisp 0.01% Cython 28.91%

postgresql python-3 sql-formatter

pglast's Issues

Issue on CreateTrigStmt "WHEN" condition

Hello,

In the process of removing my overrides and falling back to the new release of pglast, I noticed a bug in the CreateTrigger statement: the "when" condition must always be wrapped in a set of parentheses, for example:

CREATE TRIGGER trig1
  AFTER UPDATE
  ON schema1.table1
  FOR EACH ROW
    WHEN (new.col1 IS DISTINCT FROM old.col1)
    EXECUTE PROCEDURE schema1.func1()

Is "prettified" as

CREATE TRIGGER trig1
  AFTER UPDATE
  ON schema1.table1
  FOR EACH ROW
    WHEN new.col1 IS DISTINCT FROM old.col1
    EXECUTE PROCEDURE schema1.func1()

which is not valid syntax.

AST to where clause

Good day!

I interesting in implementation on next task.

I has query, for example

EXECUTE QUERY format($$
    SELECT field1, field2, filed3
    FROM table
    %$1s
$$, create_where(whereTree))

I interesting on creating whereTree on pglast functionality.
How I can write classes from this lib for generated this:

WHERE field1 IS NO NULL
    AND field2 = 'value1'

Can I do this with pglast?

get_postgresql_version() seems wrong

My interpretation of PG_VERSION_NUM seems wrong, but there something fishy in the PG headers as well...

Compiling the v12.1 tarball, pg_config.h contains

/* PostgreSQL version as a string */
#define PG_VERSION "12.1"

/* PostgreSQL version as a number */
#define PG_VERSION_NUM 120001

So it seems that it isn't computed as MAJOR*1000+MINOR*100+PATCH as I assumed.

OTOH, in several places in the PG sources there's the expression PG_VERSION_NUM / 100 to get the major version, in particular pg_upgrade.h defines the macro GET_MAJOR_VERSION(v) as ((v) / 100)...

Error when printing exclusion constraint with operator explicitly specifying schema

Consider the following code:

import pglast
sql = "ALTER TABLE T ADD CONSTRAINT C EXCLUDE USING gist (F WITH OPERATOR(intarray.&&))"
print(pglast.prettify(sql))

which produces the following error:

Traceback (most recent call last):
  File "format.py", line 3, in <module>
    print(pglast.prettify(sql))
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/__init__.py", line 46, in prettify
    prettified = IndentedStream(**options)(Node(orig_pt))
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 251, in __call__
    self.print_node(statement)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 380, in print_node
    printer(node, self)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printers/dml.py", line 701, in raw_stmt
    output.print_node(node.stmt)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 380, in print_node
    printer(node, self)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printers/ddl.py", line 252, in alter_table_stmt
    output.print_list(node.cmds, ',', standalone_items=True)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 598, in print_list
    super().print_list(nodes, sep, relative_indent, standalone_items, are_names, is_symbol)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 437, in print_list
    is_symbol=is_symbol)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 401, in _print_items
    self.print_node(item, is_name=are_names, is_symbol=is_symbol and idx == last)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 380, in print_node
    printer(node, self)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printers/ddl.py", line 284, in alter_table_cmd
    output.print_node(constraint)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 380, in print_node
    printer(node, self)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printers/ddl.py", line 488, in constraint
    output.write(clauses.string_value)
  File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/node.py", line 143, in string_value
    raise TypeError('%r does not contain a single String node' % self)
TypeError: [2*{String}] does not contain a single String node

This is almost certainly due to the && operator explicitly specifying the schema, that is, intarray.&&.
Version used is pglast 1.17 on python 3.5.2.

Node => String

Is it possible to take a root Node and "stringify" it back into raw SQL?

How to contribute

Hello,

I'm currently using your library to build a tool, and running it on our pg_dumps and migration script made me implement / override printers for a large amount of statement types. I did not bother with handling indentation for now as that's not my intended use, but I managed to get no errors on our 300k+ lines of SQL code when running prettify with the safety belt on. (there's still a lot to do if we want to handle everything present the PostgreSQL regression tests though :) )

I have implemented support for things we use, which may be incomplete, for the following nodes:

AccessPriv
AlterDatabaseSetStmt
AlterDefaultPrivilegesStmt
AlterEnumStmt
AlterFunctionStmt
AlterObjectSchemaStmt
AlterOwnerStmt
AlterRoleStmt
AlterSeqStmt
AlterTableCmd
AlterTableStmt
ClusterStmt
CompositeTypeStmt
CreateEnumStmt
CreateFunctionStmt
CreatePolicyStmt
CreateTrigStmt
DiscardStmt
DoStmt
GrantRoleStmt
GrantStmt
GroupingSet
LockStmt
RenameStmt
TriggerTransition
VacuumStmt
VariableSetStmt
ViewStmt

In addition, I've overridden printers for those nodes (adding support for unsupported options / handling corner cases):

A_ArrayExpr
A_Expr
ColumnDef
Constraint
CreateTableAsStmt
DefineStmt
DropStmt
IndexElem
NullTest
ObjectWithArgs
RowExpr
TypeCast

I obviously want to contribute that back, but I don't really know how to do that. Would you like indivual PRs for every node or a huge one for the whole lot ? How should that code be tested ?

Thank you in advance for your help regarding this.

Parsing / Deparsing of PostgreSQL regress test suite

In the PostgreSQL test suite, we may encounter statements which are not serialized back to the same form they were originally written. However, the constructions involved are semantically identical.

As for now, I identified the following constuctions detected as different even though they are strictly the same thing.

On TypeName parsing:

timestamp VS timestamp without time zone
time VS time without time zone
::"char" vs ::char, one being parsed into the name "char" the other one being parsed into pg_catalog.bpchar.
character VS char(1)
bit varying VS varbit

I will update this list as I found other issues.

deparsing the ast to raw string?

Hi,

is there a deparse function implemented already?
thanks for the reply.

Feature request: Reformat SQL in Python files

It would be a killer feature if this could be extended to reformat Python source files in place, similar to what Black does.

Expose libpg_query deparse

libpg_query now supports deparse/unparse, with very good coverage.

pganalyze/libpg_query@aa05c85#diff-a4d461fa40ad82727605dcf20cbb7b43f9a2023fbc1ecae058cddf5ba934899c

It would be nice to have that as an option.

parsenodes.py enums not matching up with parse_sql output

Hi—really been enjoying this library

While on branch v2 (https://github.com/lelit/pglast/tree/v2), I've been noticing some inconsistencies between the dictionaries returned by parse_sql and some of the enums in enums/parsenodes.py.
As an example, the value of the attribute 'subtype' in the AlterTableCmd node is supposed to reference a member of the AlterTableType enum in parsenodes.py. When running parse_sql on a drop column command, the value of 'subtype' in the AlterTableCmd node is 10 (which is equal to the value of the enum member with name AT_SetStorage) when we were expecting 11 (which references AT_DropColumn)

To illustrate, when running:

alterStmt = 'ALTER TABLE myschema.tablename DROP dropCol'
print(parse_sql(alterStmt))
print("'subtype' returned: " + repr(AlterTableType(10)))
print("expected: " + repr(AlterTableType.AT_DropColumn))

the output is:

[{'RawStmt': {'stmt': {'AlterTableStmt': {'cmds': [{'AlterTableCmd': {'behavior': 0,
                                                                    'name': 'dropcol',
                                                                    'subtype': 10}}], <--
                                          'relation': {'RangeVar': {'inh': True,
                                                                    'location': 12,
                                                                    'relname': 'tablename',
                                                                    'relpersistence': 'p',
                                                                    'schemaname': 'myschema'}},
                                          'relkind': 37}}}}]
'subtype' returned: <AlterTableType.AT_SetStorage: 10>
expected: <AlterTableType.AT_DropColumn: 11>

I've also noticed similar behavior in the ObjectType enum, where the value returned by parse_sql (37) is slightly lower than the value I expected (ObjectType.OBJECT_TABLE, which has a value of 39).
This suggests maybe the parse_sql function is referencing a different version of postgres than the enums/parsenodes.py file (which seems to match up with the PostgreSQL 12 source code) because it's missing some enum members.
Would you be able to update this? Thanks so much!

Consider merging the functionality of the node.Node proxy into the concrete ast.Node

The proxy has been there since the beginning, because in v1 it dealt with the JSON structure returned by libpg_query's parse_sql() function.
Now that there is a set of concrete ast classes, maybe it would make sense to move the hierarchy tracking data (parent_node and parent_attribute) into the concrete nodes.
Will think about this for v4.

Dictionary Error when parsing plsql

Getting the following error when trying to parse a plsql procedure DDL where sql is stored in a variable and executed using Execute command
Unexpected value for 'details', must be a dict with exactly one key, got <class 'dict'>

invalid sql generated with char column type

If you create an AST with a column like:

ColumnDef(
    'column_name', 
    inhcount=0,
    is_from_type=False,
    is_local=True,
    is_not_null=False, 
    typeName=TypeName(
        names=(String('char'),), 
        pct_type=False, 
        setof=False, 
        typemod=-1, 
        typmods=(A_Const(10),)
    )
)

it will serialize to "char"(20) which is invalid sql. It should just be char(20)

pg_query.parser.ParseError for OUT parameters

Please, consider function definition in the code:

from pg_query import Node, parse_plpgsql
s = '''CREATE OR REPLACE FUNCTION "get_ip_location" (IN ip_address text,
	OUT country varchar,
	OUT isp varchar,
	OUT ip_type varchar,
	OUT latitude float4,
	OUT longitude float4) 	
RETURNS record AS
$BODY$
declare
    ip_number bigint;
  begin
      select foo.country,foo.isp,foo.ip_type,foo.latitude,foo.longitude
      from batch_foo.ip_location foo
      where ip_range @> ip_number
      into
  country,
  isp,
  ip_type,
  latitude,
  longitude;
  return;
  end
$BODY$
	LANGUAGE plpgsql;'''
	
print(s)
root = Node(parse_plpgsql(s))
for node in root.traverse():
	print(node)

Output:

pasha@jp0sq32:~/PyEnvironments$ python3 hello.py
...
Traceback (most recent call last):
  File "hello.py", line 28, in <module>
    root = Node(parse_plpgsql(s))
  File "pg_query/parser.pyx", line 98, in pg_query.parser.parse_plpgsql
pg_query.parser.ParseError: "country" is not a known variable, at location 0

non-cosmetic difference detected if semicolons present

I get a non-cosmetic difference detected when I try and use pgpp on a file with eg the following contents:

select * from t;

The same query, and other queries are fine without the semicolon.

What's happening here? Perhaps pgpp doesn't support multiple statements?

Happy to take a look at fixing if you'd like to point me in the right direction?

Optionally emit a more compact representation of lists

Currently the IndentedStream emit lists of values/nodes as standalone items, that is one per line:

SELECT a
     , b
     , c
FROM function(1
            , 2
            , 3)

It would be nice to set a threshold, a maximum width, and make it emits those items on a single row, when the resulting width is below that value, so the above would result in the following, given a threshold of 20:

SELECT a, b, c
FROM function(1
            , 2
            , 3)

Extract tables from the query

How to extract tables from the query like pg_query does?

https://github.com/lfittl/pg_query/blob/master/README.md#extracting-tables-from-a-query

Thanks!

Issues observed during prettifying

Hi, first of all thank you very much for this library. This code is extremely useful, and I couldn't be more happy to have found it.

I have a fairly large collection of SQL queries that I'd like to work on. And while processing the queries, I have found several examples that trigger the "safety belt":

SELECT DISTINCT T1.player_name FROM Player AS T1 JOIN Player_Attributes AS T2 ON T1.player_api_id = T2.player_api_id WHERE T2.crossing  >  90 AND T2.preferred_foot  =  "right"

becomes

SELECT DISTINCT t1.player_name
FROM player AS t1
     INNER JOIN player_attributes AS t2 ON t1.player_api_id = t2.player_api_id
WHERE (t2.crossing > 90)
  AND (t2.preferred_foot = right)

Note that right is not in quotations marks.

SELECT DISTINCT T1.player_name FROM Player AS T1 JOIN Player_Attributes AS T2 ON T1.player_api_id = T2.player_api_id WHERE T2.preferred_foot  =  "left" AND T2.overall_rating  >=  85 AND T2.overall_rating  <=  90

becomes

SELECT DISTINCT t1.player_name
FROM player AS t1
     INNER JOIN player_attributes AS t2 ON t1.player_api_id = t2.player_api_id
WHERE (t2.preferred_foot = left)
  AND (t2.overall_rating >= 85)
  AND (t2.overall_rating <= 90)

Note that left is not in quotation marks.

SELECT player_api_id FROM Player WHERE height  >=  180 AND height  <=  190 INTERSECT SELECT player_api_id FROM Player_Attributes WHERE preferred_foot  =  "left"

becomes

SELECT player_api_id
FROM player
WHERE (height >= 180)
  AND (height <= 190)

INTERSECT

SELECT player_api_id
FROM player_attributes
WHERE preferred_foot = left

Same problem with left and perhaps an issue with INTERSECT. Parentheses around the select statements could help.

(SELECT document_name FROM documents GROUP BY document_type_code ORDER BY count(*) DESC LIMIT 3) INTERSECT SELECT document_name FROM documents GROUP BY document_structure_code ORDER BY count(*) DESC LIMIT 3

becomes

SELECT document_name
FROM documents
GROUP BY document_type_code
ORDER BY count(*) DESC
LIMIT 3

INTERSECT

SELECT document_name
FROM documents
GROUP BY document_structure_code
ORDER BY count(*) DESC
LIMIT 3

Parentheses around the select statements are mandatory here it would seem.

(SELECT document_name FROM documents GROUP BY document_type_code ORDER BY count(*) DESC LIMIT 3) INTERSECT SELECT document_name FROM documents GROUP BY document_structure_code ORDER BY count(*) DESC LIMIT 3

becomes

SELECT document_name
FROM documents
GROUP BY document_type_code
ORDER BY count(*) DESC
LIMIT 3

INTERSECT

SELECT document_name
FROM documents
GROUP BY document_structure_code
ORDER BY count(*) DESC
LIMIT 3

Likewise.

I'm also wondering what happens if one has nested UNION, INTERSECT, and EXCEPT queries, for instance, A INTERSECT (B UNION C) vs (A INTERSECT B) UNION C where A, B, and C are placeholders for select statements. It seems it's always a good idea to print parentheses when using set operators.

TypeError: object of type 'Missing' has no len()

I'm getting an error message here:

BEGIN;

CREATE MATERIALIZED VIEW a AS (
SELECT 1
FROM y
);

CREATE MATERIALIZED VIEW b AS (
SELECT 2
FROM y
);

ROLLBACK;

Traceback (most recent call last):
  File "/home/user/.local/venvs/pipx/bin/pgpp", line 10, in <module>
    sys.exit(main())
  File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/__main__.py", line 82, in main
    workhorse(args)
  File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/__main__.py", line 36, in workhorse
    comma_at_eoln=args.comma_at_eoln)
  File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/__init__.py", line 38, in prettify
    prettified = IndentedStream(**options)(Node(orig_pt))
  File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printer.py", line 232, in __call__
    self.print_node(statement)
  File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printer.py", line 357, in print_node
    printer(node, self)
  File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printers/dml.py", line 634, in raw_stmt
    output.print_node(node.stmt)
  File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printer.py", line 357, in print_node
    printer(node, self)
  File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printers/ddl.py", line 600, in create_table_as_stmt
    and len(node.query.targetList[0].val.fields) == 1
TypeError: object of type 'Missing' has no len()

pglast 1.2

A_Expr printer does not handle missing lexpr

When prefixing an expression with a minus sign the query is correctly parsed but the A_Expr printer fails. It works if the minus sign has a value to the left of it.

>>> from pglast import printers, Node, parse_sql
>>> from pglast.printer import RawStream
>>>
>>>
>>> RawStream()(Node(parse_sql('SELECT 0-(1+1)')))
'SELECT 0 - ((1 + 1))'
>>>
>>>
>>> RawStream()(Node(parse_sql('SELECT -(1+1)')))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 251, in __call__
    self.print_node(statement)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 380, in print_node
    printer(node, self)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printers/dml.py", line 701, in raw_stmt
    output.print_node(node.stmt)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 380, in print_node
    printer(node, self)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printers/dml.py", line 777, in select_stmt
    output.print_list(node.targetList)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 437, in print_list
    is_symbol=is_symbol)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 401, in _print_items
    self.print_node(item, is_name=are_names, is_symbol=is_symbol and idx == last)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 380, in print_node
    printer(node, self)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printers/dml.py", line 707, in res_target
    output.print_node(node.val)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 380, in print_node
    printer(node, self)
  File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printers/dml.py", line 33, in a_expr
    if node.lexpr.node_tag == 'A_Expr':
AttributeError: 'Missing' object has no attribute 'node_tag'

This is a valid expression:

postgres=# SELECT -(1+1);
 ?column?
----------
       -2
(1 row)

The data structure for SELECT -(1+1) is:

[
 {
  "RawStmt": {
   "stmt": {
    "SelectStmt": {
     "targetList": [
      {
       "ResTarget": {
        "val": {
         "A_Expr": {
          "kind": 0,
          "name": [
           {
            "String": {
             "str": "-"
            }
           }
          ],
          "rexpr": {
           "A_Expr": {
            "kind": 0,
            "name": [
             {
              "String": {
               "str": "+"
              }
             }
            ],
            "lexpr": {
             "A_Const": {
              "val": {
               "Integer": {
                "ival": 1
               }
              },
              "location": 9
             }
            },
            "rexpr": {
             "A_Const": {
              "val": {
               "Integer": {
                "ival": 1
               }
              },
              "location": 11
             }
            },
            "location": 10
           }
          },
          "location": 7
         }
        },
        "location": 7
       }
      }
     ],
     "op": 0
    }
   }
  }
 }
]

This is where it fails:

pglast/pglast/printers/dml.py

Line 33 in ad450de

if node.lexpr.node_tag == 'A_Expr':

The printer heavily assumes lexpr always exists so I'm not sure what the best way to fix it is.

Support PG 13

Support for PG 13 is steadly progressing in the underlying libpg-query.

I had an initial look at it, and I foresee a considerable effort in porting current pglast to that level.

The most breaking thing is the introduction of a protobuf-based result, much richer that the JSON-based one: IIUC, it would make it easier to implement #60. OTOH, the Python module generated from the definition is surprisingly 1.232 MB... I'm still undecided on whether it's the way to go or not.

I will upload my wip branch in the following days, but most tests are currently failing.

Build error re xxhash headers on non-wheel pip install

When I try to install this package without using a pre-built wheel (such as on macos where there are no wheels on pypi) I get a build error relating to xxhash headers. If install from a local folder, it works fine.

pglast ignore database name during parsing

pglast version: v1.17

I get identical result for parsing this two queries:

>>> pprint(pglast.parse_sql('select 1 from b.a'))
[{'RawStmt': {'stmt': {'SelectStmt': {'fromClause': [{'RangeVar': {'inh': True,
                                                                   'location': 14,
                                                                   'relname': 'a',
                                                                   'relpersistence': 'p',
                                                                   'schemaname': 'b'}}],
                                      'op': 0,
                                      'targetList': [{'ResTarget': {'location': 7,
                                                                    'val': {'A_Const': {'location': 7,
                                                                                        'val': {'Integer': {'ival': 1}}}}}}]}}}}]
>>> pprint(pglast.parse_sql('select 1 from c.b.a'))
[{'RawStmt': {'stmt': {'SelectStmt': {'fromClause': [{'RangeVar': {'inh': True,
                                                                   'location': 14,
                                                                   'relname': 'a',
                                                                   'relpersistence': 'p',
                                                                   'schemaname': 'b'}}],
                                      'op': 0,
                                      'targetList': [{'ResTarget': {'location': 7,
                                                                    'val': {'A_Const': {'location': 7,
                                                                                        'val': {'Integer': {'ival': 1}}}}}}]}}}}]

For me, expected behaviour in second case is get error or get additional field in ['RangeVar']['catalogname'] = 'c'.

`rexpr` field has sometimes optional and sometimes multiple cardinality

Hi, Thank you for being so quick to respond with my other issue, #55!

I have observed a strange behaviour that, as a consumer of pglast parse trees, is making my work a bit more difficult.

In particular, when the rexpr field of the A_Expr constructor is present, the raw data structure returned by parse_sql for that field is sometimes a list and sometimes a dict. It is never a list of a single element, though.

The field therefore has a cardinality that is neither purely optional nor multiple. If the cardinality was strictly multiple, then, in the non-missing case, parse_sql would always return a list representing one or more field entries. If the cardinality was strictly optional, then, in the non-missing case, parse_sql would always return a dict representing a single field entry. All other fields that can be missing behave in either of these two ways, only rexpr breaks this. That makes it more difficult to deal with this field.

I am not sure if consistency concerns of this kind are in the scope of pglast because the cause may well be within postgres's parser, https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y, I haven't checked though.

Recognize special functions and emit the equivalent (more standard?) construct

Take for example the query

SELECT * FROM manufacturers
ORDER BY EXTRACT('year' FROM deliver_date) ASC,
         EXTRACT('month' FROM deliver_date) ASC,
         EXTRACT('day' FROM deliver_date) ASC

is currently rendered as

SELECT *
FROM manufacturers
ORDER BY pg_catalog.date_part('year'
                            , deliver_date) ASC
       , pg_catalog.date_part('month'
                            , deliver_date) ASC
       , pg_catalog.date_part('day'
                            , deliver_date) ASC

because that's how the EXTRACT function is remapped by the parser.

It would be nice to optionally have a way to recognize those functions and map them back to more verbose form.

Better documentation of CLI/prettify options

I am trying to use the pglast CLI to format SQL files, but I can't find all the options available. All I can seem to find is comma_at_eoln and compact.

Is there a list somewhere I can look at?

Cannot install on windows

Windows 10, Python 3.6.

Full output is:

C:\Python36-32\Scripts>python -m pip install pg_query
Collecting pg_query
  Using cached pg_query-0.27.tar.gz
  Ignoring aenum: markers 'python_version < "3.6"' don't match your environment
Requirement already satisfied: setuptools in c:\python36-32\lib\site-packages (from pg_query)
Installing collected packages: pg-query
  Running setup.py install for pg-query ... error
    Complete output from command C:\Python36-32\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\PAVLO~1.GOL\\AppData\\Local\\Temp\\pip-build-8ph4yow2\\pg-query\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\PAVLO~1.GOL\AppData\Local\Temp\pip-m36oza86-record\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win32-3.6
    creating build\lib.win32-3.6\pg_query
    copying pg_query\error.py -> build\lib.win32-3.6\pg_query
    copying pg_query\keywords.py -> build\lib.win32-3.6\pg_query
    copying pg_query\node.py -> build\lib.win32-3.6\pg_query
    copying pg_query\printer.py -> build\lib.win32-3.6\pg_query
    copying pg_query\__init__.py -> build\lib.win32-3.6\pg_query
    copying pg_query\__main__.py -> build\lib.win32-3.6\pg_query
    creating build\lib.win32-3.6\pg_query\enums
    copying pg_query\enums\lockoptions.py -> build\lib.win32-3.6\pg_query\enums
    copying pg_query\enums\nodes.py -> build\lib.win32-3.6\pg_query\enums
    copying pg_query\enums\parsenodes.py -> build\lib.win32-3.6\pg_query\enums
    copying pg_query\enums\pg_class.py -> build\lib.win32-3.6\pg_query\enums
    copying pg_query\enums\primnodes.py -> build\lib.win32-3.6\pg_query\enums
    copying pg_query\enums\__init__.py -> build\lib.win32-3.6\pg_query\enums
    creating build\lib.win32-3.6\pg_query\printers
    copying pg_query\printers\ddl.py -> build\lib.win32-3.6\pg_query\printers
    copying pg_query\printers\dml.py -> build\lib.win32-3.6\pg_query\printers
    copying pg_query\printers\sfuncs.py -> build\lib.win32-3.6\pg_query\printers
    copying pg_query\printers\__init__.py -> build\lib.win32-3.6\pg_query\printers
    running build_ext
    error: [WinError 2] The system cannot find the file specified

    ----------------------------------------
Command "C:\Python36-32\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\PAVLO~1.GOL\\AppData\\Local\\Temp\\pip-build-8ph4yow2\\pg-query\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\PAVLO~1.GOL\AppData\Local\Temp\pip-m36oza86-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\PAVLO~1.GOL\AppData\Local\Temp\pip-build-8ph4yow2\pg-query\

New postgres12 keyword

Hi,

CTE behaviour was changed in postgres12, so because of that there were added new keywords: "MATERIALIZED" and "NOT MATERIALIZED" for cte statements.

Examples:

WITH k AS MATERIALIZED (select * from m) select * from k;
WITH k AS NOT MATERIALIZED (select * from m) select from k;

it seems the v2 does not support this new syntax.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pglast/parser.pyx", line 75, in pglast.parser.parse_sql
pglast.parser.ParseError: syntax error at or near "MATERIALIZED", at location 11

Can you help with this issue?

https://www.postgresql.org/docs/12/queries-with.html

Semicolon at the end is deleted

From master

pgpp <<EOF
SELECT 1;
SELECT 2;
EOF

output:

SELECT 1;

SELECT 2

I'd like to keep the final semicolon if possible.

Cookbook of common format styles

Very cool project.

I've been looking at pglast and sqlfmt tool (demo. I'm hoping to use pglast on my projects.

I know people have lots of different style preferences, and each auto-formatter must decide on configurability (like yapf) vs consistency (like black).

I'm still developing my taste in SQL formatting, but to my own eye, a couple features of sqlfmt's style can make code easier to read.

the argument list of SELECT, FROM, etc. appear on the next line, indented
parentheses delimiting a subquery get their own line and the body of the subquery is indented inside

SELECT
    count(*) AS count,
    winner,
    counter * 60 * 5 AS counter
FROM
    (
        SELECT
            winner,
            round(length / (60 * 5)) AS counter
        FROM
            players
        WHERE
            build = $1
            AND ((hero = $2 OR region = $3))
    )
        AS s
GROUP BY
    winner, counter;

compare pgpp 1.2:

SELECT count(*) AS count,
       winner,
       (counter * 60) * 5 AS counter
FROM (SELECT winner,
	     round(length / (60 * 5)) AS counter
      FROM players
      WHERE (build = $1)
	AND ((   (hero = $2)
	      OR (region = $3)))) AS s
GROUP BY winner, counter

I'd like to configure my pgpp to add newlines and indentation. How would you feel about adding such options, or adding a cookbook with a few examples on how to make these adjustments?

Thanks very much for sharing the project!

support `create function` statements

I used stored functions quite a bit and would like to be able to format them.

Currently this results in:

NotImplementedError: Printer for node 'CreateFunctionStmt' is not implemented yet

How much work would be needed to support this? Happy to look into fixing this up if you're willing to advise me on the best approach.

Strange SQL parse failure

I've run into a weird parsing error that seems to happen on pglast.split.

Given the following code, an exception is raised on L9 in both 1.5 and in master:

import pglast


SQL = 'CREATE CONSTRAINT TRIGGER monitor_account_status AFTER UPDATE ON accounts DEFERRABLE INITIALLY DEFERRED FOR EACH ROW WHEN ((((new.status_id = ANY (ARRAY[4, 5, 6])) <> (old.status_id = ANY (ARRAY[4, 5, 6]))) OR (new.status_id = 7))) EXECUTE PROCEDURE log_account_status_change();'

result = pglast.parse_sql(SQL)
print(result)

for statement in pglast.split(SQL):
    print(statement)
    result = pglast.parse_sql(statement)
    print(result)

The output:

[{'RawStmt': {'stmt': {'CreateTrigStmt': {'trigname': 'monitor_account_status', 'relation': {'RangeVar': {'relname': 'accounts', 'inh': True, 'relpersistence': 'p', 'location': 65}}, 'funcname': [{'String': {'str': 'log_account_status_change'}}], 'row': True, 'events': 16, 'whenClause': {'BoolExpr': {'boolop': 1, 'args': [{'A_Expr': {'kind': 0, 'name': [{'String': {'str': '<>'}}], 'lexpr': {'A_Expr': {'kind': 1, 'name': [{'String': {'str': '='}}], 'lexpr': {'ColumnRef': {'fields': [{'String': {'str': 'new'}}, {'String': {'str': 'status_id'}}], 'location': 126}}, 'rexpr': {'A_ArrayExpr': {'elements': [{'A_Const': {'val': {'Integer': {'ival': 4}}, 'location': 153}}, {'A_Const': {'val': {'Integer': {'ival': 5}}, 'location': 156}}, {'A_Const': {'val': {'Integer': {'ival': 6}}, 'location': 159}}], 'location': 147}}, 'location': 140}}, 'rexpr': {'A_Expr': {'kind': 1, 'name': [{'String': {'str': '='}}], 'lexpr': {'ColumnRef': {'fields': [{'String': {'str': 'old'}}, {'String': {'str': 'status_id'}}], 'location': 168}}, 'rexpr': {'A_ArrayExpr': {'elements': [{'A_Const': {'val': {'Integer': {'ival': 4}}, 'location': 195}}, {'A_Const': {'val': {'Integer': {'ival': 5}}, 'location': 198}}, {'A_Const': {'val': {'Integer': {'ival': 6}}, 'location': 201}}], 'location': 189}}, 'location': 182}}, 'location': 164}}, {'A_Expr': {'kind': 0, 'name': [{'String': {'str': '='}}], 'lexpr': {'ColumnRef': {'fields': [{'String': {'str': 'new'}}, {'String': {'str': 'status_id'}}], 'location': 211}}, 'rexpr': {'A_Const': {'val': {'Integer': {'ival': 7}}, 'location': 227}}, 'location': 225}}], 'location': 207}}, 'isconstraint': True, 'deferrable': True, 'initdeferred': True}}, 'stmt_len': 277}}]

And the traceback:

Traceback (most recent call last):
  File "/Users/gavinr/Source/Applications/pglifecycle/example.py", line 9, in <module>
    for statement in pglast.split(SQL):
  File "/Users/gavinr/Source/Libraries/pglast/pglast/__init__.py", line 95, in split
    printed_pt = parse_sql(printed)[0]
  File "pglast/parser.pyx", line 76, in pglast.parser.parse_sql
    raise ParseError(message, cursorpos)

Any thoughts on what it could be? I'm happy to try and track it down if you can point me in the right direction.

Add support for psql variables

I frequently use psql variables in my scripts.

It will be awesome if pglast can support then.

$ echo "ALTER FUNCTION myschema.myfunction OWNER TO :owner;" | pgpp

syntax error at or near ":", at location 45

Two lines between statements?

To my eye, it's a little nicer for long statements to have two lines between them, not just one. It helps visually separate them from each other. What do you think?

support as extension in postgres logs

Hi Team,

We are planning to send postgres logs to ELK stack.
everything works fine, but there is a concern where we log queries which are too long or problematic.

https://gist.github.com/cabecada/dd765a30f6946fdbf0bec0eb31fba047

now the concern is queries may have sensitive data wrt GDPR etc, and that creates problems logging to a centralized logging system.
i found this extension awesome (normalize the queries) which works the best when insert/updates are not parameterized/prepared.

but to my bad luck, i thought of trying to use this in an logstash plugin, but looks like jruby (logstash jruby) does not compile c extensions.

now the main part:
can you help me know if it is possible to hook this extension in the postgres logging facility and anonymize the params in the log queries.

just asking, feel free to ignore. :)

Comments at the top of the file are deleted

-- hello

select 1

Running pgpp master on the file above returns

SELECT 1

I would prefer that it retain the comments :-)

Boolean clause `WHERE ... IS TRUE` raises error

Here is a minimal example to reproduce (with pglast version 1.1).

from pglast import prettify

sql = "SELECT * FROM tbl WHERE foo IS TRUE"
prettify(sql)

Note that if we replace WHERE foo IS TRUE with WHERE foo = TRUE the error goes away.

Traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in get_printer_for_node_tag(parent_node_tag, node_tag)
     38     try:
---> 39         return NODE_PRINTERS[(parent_node_tag, node_tag)]
     40     except KeyError:

KeyError: ('SelectStmt', 'BooleanTest')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in get_printer_for_node_tag(parent_node_tag, node_tag)
     41         try:
---> 42             return NODE_PRINTERS[node_tag]
     43         except KeyError:

KeyError: 'BooleanTest'

During handling of the above exception, another exception occurred:

NotImplementedError                       Traceback (most recent call last)
<ipython-input-2-2b516e62760f> in <module>
      2 
      3 sql = "SELECT * FROM tbl WHERE foo IS TRUE"
----> 4 prettify(sql)

~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/__init__.py in prettify(statement, safety_belt, **options)
     37 
     38     orig_pt = parse_sql(statement)
---> 39     prettified = IndentedStream(**options)(Node(orig_pt))
     40     if safety_belt:
     41         try:

~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in __call__(self, sql, plpgsql)
    225                 if self.separate_statements:
    226                     self.newline()
--> 227             self.print_node(statement)
    228         return self.getvalue()
    229 

~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in print_node(self, node, is_name, is_symbol)
    350                 printer(node, self, is_name=is_name, is_symbol=is_symbol)
    351             else:
--> 352                 printer(node, self)
    353         self.separator()
    354 

~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printers/dml.py in raw_stmt(node, output)
    613 @node_printer('RawStmt')
    614 def raw_stmt(node, output):
--> 615     output.print_node(node.stmt)
    616 
    617 

~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in print_node(self, node, is_name, is_symbol)
    350                 printer(node, self, is_name=is_name, is_symbol=is_symbol)
    351             else:
--> 352                 printer(node, self)
    353         self.separator()
    354 

~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printers/dml.py in select_stmt(node, output)
    689                 output.newline()
    690                 output.write('WHERE ')
--> 691                 output.print_node(node.whereClause)
    692             if node.groupClause:
    693                 output.newline()

~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in print_node(self, node, is_name, is_symbol)
    346         else:
    347             parent_node_tag = node.parent_node and node.parent_node.node_tag
--> 348             printer = get_printer_for_node_tag(parent_node_tag, node.node_tag)
    349             if is_name and node.node_tag == 'String':
    350                 printer(node, self, is_name=is_name, is_symbol=is_symbol)

~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in get_printer_for_node_tag(parent_node_tag, node_tag)
     43         except KeyError:
     44             raise NotImplementedError("Printer for node %r is not implemented yet"
---> 45                                       % node_tag)
     46 
     47 

NotImplementedError: Printer for node 'BooleanTest' is not implemented yet

Standard printer for CREATE TABLE statements raises error if parent_node is None

I'm unboxing the parsed statements after parse_sql() to have less layers (of still unknown apis) and i'm trying to print a CreateStmt reboxed just into a pglast.node.Node instance. This is the relevant part of the traceback:

  File "/home/azazel/wip/my/my-network/my-network-db/dump.py", line 69, in format_stmt
    return is_(pglast.Node(stmt))
  File "/home/azazel/wip/my/my-network/my-network-db/.venv/lib/python3.7/site-packages/pglast/printer.py", line 258, in __call__
    self.print_node(statement)
  File "/home/azazel/wip/my/my-network/my-network-db/.venv/lib/python3.7/site-packages/pglast/printer.py", line 429, in print_node
    printer(node, self)
  File "/home/azazel/wip/my/my-network/my-network-db/.venv/lib/python3.7/site-packages/pglast/printers/ddl.py", line 1955, in create_stmt
    if node.parent_node.node_tag == 'CreateForeignTableStmt':
AttributeError: 'NoneType' object has no attribute 'node_tag'

The fix is trivial, but maybe for some reason you would prefer for the submitted
statement to always be wrapped at least into a pglast.ast.RawStmt and enforce
that? Boh

In the meantime I'll rewrap it before passing it to the pglast.printer.IndentedStream, first into such RawStmt and then into a Node.

Bad indentation for BoolExpr in ResTarget

The following statement

select a.one,
       not a.bool_flag and a.something is null as foobar,
       a.value1 + b.value2 * b.value3 as ciao
  from sometable as a

gets reformatted as

SELECT a.one
     , (NOT a.bool_flag)
   AND a.something IS NULL AS foobar
     , a.value1 + (b.value2 * b.value3) AS ciao
FROM sometable AS a

instead it should be

SELECT a.one
     , (NOT a.bool_flag)
       AND a.something IS NULL AS foobar
     , a.value1 + (b.value2 * b.value3) AS ciao
FROM sometable AS a

pglast parse failure on COPY FROM stdin;

It appears to try to parse the data as SQL. Is this expected behaviour, or a bug?

To get list of columns with transformation formula

Hi there!

I am using pglast version 3.3
Please hint me how I can get all included columns in all SelectStmt objects with probably existing formula .

Just example.

SELECT t1.name AS name ,
t1.name||' '||t2.name AS new_name
FROM t1 inner join t2 on t1.id = t2.par_id;

and need to get next columns:
name: t1.name
new_name: t1.name||' '||t2.name

thanks!

Feature request: pep 484 type annotations

Would you be interested in my attempting to add type annotations?
I'm picturing writing type annotations compatible with python 3.6+ for the python code, some type-generation step for the cython code, and adding a mypy check for the test suite.

Licensing

I'm looking for some clarification on the licensing choice. I was looking forward to using this library until I realized that the GLPv3 license will most likely be an issue in the future.

Looking at your main dependency, libpg_query that powers your python wrapper, why did you decide to deviate from it's BSD-3-Clause licensing?

In addition to that, all of the derivative works from libpg_query, including Ruby pg_query, Go pg_query_go, JS Node pg-query-parser, JS Browser pg-query-emscripten, and Python psqlparse, have all opted to follow in the spirit of libpg_query by licensing there work with the BSD-3-Clause.

Are you willing to follow suit with the other libraries as such? Any insight would be much appreciated.

Change DDL caveat in README

README says

Currently this covers most DML statements such as SELECTs, INSERTs, DELETEs and UPDATEs, fulfilling my needs, but I'd like to extend it to handle also DDL statements and, why not, PLpgSQL instructions too.

It actually seems like most DDL is covered.

parse exit on multi thread when sql syntax error (12-latest)

hi.
we have issue use pglast at multithread environment. when parsing a sql with wrong syntax, it exit with error code 2.
Would you please help to take a look ?

Thanks
hong

the simple code to reproduce:

from pglast import parse_sql
import time
import threading

cmd = 'SELECT * FROM (SELECT DISTINCT ON (Item) "Item", "Location" FROM "UN"."t10" ORDER BY "Item", "Location" DESC) ORDER BY "Location" DESC;'
def parser():
	try:
		try:
			parse_sql(cmd)
		except Exception as e:
			print(e)
			raise
	except:
		time.sleep(1)
		pass

for i in range(2):
	x = threading.Thread(target=parser)
	time.sleep(1)
	x.start()

while 1:
	pass

version attribute is unavailable

Minor issue, but the pglast module does not have a __version__ attribute:

>>> import pglast
>>> print(pglast.__version__)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-8eeeaab38785> in <module>
      1 import pglast
----> 2 print(pglast.__version__)

AttributeError: module 'pglast' has no attribute '__version__'

It would be helpful to add this in order to be able to check the version from within Python.

Licensing question -- use in testing

I know this is somewhat of a duplicate of issue #9, but I'd like to ask for a clarification.

My employer would like to use this library in our CI test suite merely as a linter for our ETL's SQL scripts; it's not even installed in production. Neither our ETL nor the test suite are publicly available. Would this be an acceptable usage?

Support Visitor Pattern in Base class

Visitor pattern is a common pattern to process parse trees in SQL. Very simply, there is a visitor with the following pseudo-code:

class Visitor:
    def visit(self, node):
        if obj is None:
            return None
        if isinstance(obj, AcceptingNode):
            self.visit_node(obj)
        elif isinstance(obj, AcceptingList):
            self.visit_list(obj)
        elif isinstance(obj, AcceptingScalar):
            self.visit_scalar(obj)

  def visit_node(self):
  # Visit all the child nodes

 ....

The Base class in node.py has to have an accept function:

def accept(self, visitor):
    visitor.visit(self)

Then other apps can build visitors to process the parse tree. I've got visitors to work in my project based on pglast:

node.py where I've added the accept function the Base class.

Visitor Base Class that handles navigation among children node.

Table Visitor that collects all the table names in the SQL query.

Tests

I can contribute the accept function and Visitor base class if you think it will be useful to other developers of your project. I can support the Visitor pattern as well.

lelit / pglast Goto Github PK

pglast's Issues

Recommend Projects

Recommend Topics

Recommend Org