lelit / pglast Goto Github PK
View Code? Open in Web Editor NEWPostgreSQL Languages AST and statements prettifier: master branch covers PG10, v2 branch covers PG12, v3 covers PG13, v4 covers PG14, v5 covers PG15, v6 covers PG16
PostgreSQL Languages AST and statements prettifier: master branch covers PG10, v2 branch covers PG12, v3 covers PG13, v4 covers PG14, v5 covers PG15, v6 covers PG16
Hello,
In the process of removing my overrides and falling back to the new release of pglast, I noticed a bug in the CreateTrigger statement: the "when" condition must always be wrapped in a set of parentheses, for example:
CREATE TRIGGER trig1
AFTER UPDATE
ON schema1.table1
FOR EACH ROW
WHEN (new.col1 IS DISTINCT FROM old.col1)
EXECUTE PROCEDURE schema1.func1()
Is "prettified" as
CREATE TRIGGER trig1
AFTER UPDATE
ON schema1.table1
FOR EACH ROW
WHEN new.col1 IS DISTINCT FROM old.col1
EXECUTE PROCEDURE schema1.func1()
which is not valid syntax.
Good day!
I interesting in implementation on next task.
I has query, for example
EXECUTE QUERY format($$
SELECT field1, field2, filed3
FROM table
%$1s
$$, create_where(whereTree))
I interesting on creating whereTree on pglast functionality.
How I can write classes from this lib for generated this:
WHERE field1 IS NO NULL
AND field2 = 'value1'
Can I do this with pglast?
My interpretation of PG_VERSION_NUM
seems wrong, but there something fishy in the PG headers as well...
Compiling the v12.1 tarball, pg_config.h
contains
/* PostgreSQL version as a string */
#define PG_VERSION "12.1"
/* PostgreSQL version as a number */
#define PG_VERSION_NUM 120001
So it seems that it isn't computed as MAJOR*1000+MINOR*100+PATCH
as I assumed.
OTOH, in several places in the PG sources there's the expression PG_VERSION_NUM / 100
to get the major version, in particular pg_upgrade.h
defines the macro GET_MAJOR_VERSION(v)
as ((v) / 100)
...
Consider the following code:
import pglast
sql = "ALTER TABLE T ADD CONSTRAINT C EXCLUDE USING gist (F WITH OPERATOR(intarray.&&))"
print(pglast.prettify(sql))
which produces the following error:
Traceback (most recent call last):
File "format.py", line 3, in <module>
print(pglast.prettify(sql))
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/__init__.py", line 46, in prettify
prettified = IndentedStream(**options)(Node(orig_pt))
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 251, in __call__
self.print_node(statement)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 380, in print_node
printer(node, self)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printers/dml.py", line 701, in raw_stmt
output.print_node(node.stmt)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 380, in print_node
printer(node, self)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printers/ddl.py", line 252, in alter_table_stmt
output.print_list(node.cmds, ',', standalone_items=True)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 598, in print_list
super().print_list(nodes, sep, relative_indent, standalone_items, are_names, is_symbol)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 437, in print_list
is_symbol=is_symbol)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 401, in _print_items
self.print_node(item, is_name=are_names, is_symbol=is_symbol and idx == last)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 380, in print_node
printer(node, self)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printers/ddl.py", line 284, in alter_table_cmd
output.print_node(constraint)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printer.py", line 380, in print_node
printer(node, self)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/printers/ddl.py", line 488, in constraint
output.write(clauses.string_value)
File "~/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pglast/node.py", line 143, in string_value
raise TypeError('%r does not contain a single String node' % self)
TypeError: [2*{String}] does not contain a single String node
This is almost certainly due to the &&
operator explicitly specifying the schema, that is, intarray.&&
.
Version used is pglast 1.17 on python 3.5.2.
Is it possible to take a root Node and "stringify" it back into raw SQL?
Hello,
I'm currently using your library to build a tool, and running it on our pg_dumps and migration script made me implement / override printers for a large amount of statement types. I did not bother with handling indentation for now as that's not my intended use, but I managed to get no errors on our 300k+ lines of SQL code when running prettify with the safety belt on. (there's still a lot to do if we want to handle everything present the PostgreSQL regression tests though :) )
I have implemented support for things we use, which may be incomplete, for the following nodes:
AccessPriv
AlterDatabaseSetStmt
AlterDefaultPrivilegesStmt
AlterEnumStmt
AlterFunctionStmt
AlterObjectSchemaStmt
AlterOwnerStmt
AlterRoleStmt
AlterSeqStmt
AlterTableCmd
AlterTableStmt
ClusterStmt
CompositeTypeStmt
CreateEnumStmt
CreateFunctionStmt
CreatePolicyStmt
CreateTrigStmt
DiscardStmt
DoStmt
GrantRoleStmt
GrantStmt
GroupingSet
LockStmt
RenameStmt
TriggerTransition
VacuumStmt
VariableSetStmt
ViewStmt
In addition, I've overridden printers for those nodes (adding support for unsupported options / handling corner cases):
A_ArrayExpr
A_Expr
ColumnDef
Constraint
CreateTableAsStmt
DefineStmt
DropStmt
IndexElem
NullTest
ObjectWithArgs
RowExpr
TypeCast
I obviously want to contribute that back, but I don't really know how to do that. Would you like indivual PRs for every node or a huge one for the whole lot ? How should that code be tested ?
Thank you in advance for your help regarding this.
In the PostgreSQL test suite, we may encounter statements which are not serialized back to the same form they were originally written. However, the constructions involved are semantically identical.
As for now, I identified the following constuctions detected as different even though they are strictly the same thing.
On TypeName parsing:
I will update this list as I found other issues.
Hi,
is there a deparse function implemented already?
thanks for the reply.
It would be a killer feature if this could be extended to reformat Python source files in place, similar to what Black does.
libpg_query now supports deparse/unparse, with very good coverage.
pganalyze/libpg_query@aa05c85#diff-a4d461fa40ad82727605dcf20cbb7b43f9a2023fbc1ecae058cddf5ba934899c
It would be nice to have that as an option.
Hi—really been enjoying this library
While on branch v2 (https://github.com/lelit/pglast/tree/v2), I've been noticing some inconsistencies between the dictionaries returned by parse_sql
and some of the enums in enums/parsenodes.py.
As an example, the value of the attribute 'subtype'
in the AlterTableCmd
node is supposed to reference a member of the AlterTableType
enum in parsenodes.py. When running parse_sql
on a drop column command, the value of 'subtype'
in the AlterTableCmd
node is 10 (which is equal to the value of the enum member with name AT_SetStorage
) when we were expecting 11 (which references AT_DropColumn
)
To illustrate, when running:
alterStmt = 'ALTER TABLE myschema.tablename DROP dropCol'
print(parse_sql(alterStmt))
print("'subtype' returned: " + repr(AlterTableType(10)))
print("expected: " + repr(AlterTableType.AT_DropColumn))
the output is:
[{'RawStmt': {'stmt': {'AlterTableStmt': {'cmds': [{'AlterTableCmd': {'behavior': 0,
'name': 'dropcol',
'subtype': 10}}], <--
'relation': {'RangeVar': {'inh': True,
'location': 12,
'relname': 'tablename',
'relpersistence': 'p',
'schemaname': 'myschema'}},
'relkind': 37}}}}]
'subtype' returned: <AlterTableType.AT_SetStorage: 10>
expected: <AlterTableType.AT_DropColumn: 11>
I've also noticed similar behavior in the ObjectType
enum, where the value returned by parse_sql
(37) is slightly lower than the value I expected (ObjectType.OBJECT_TABLE
, which has a value of 39).
This suggests maybe the parse_sql
function is referencing a different version of postgres than the enums/parsenodes.py file (which seems to match up with the PostgreSQL 12 source code) because it's missing some enum members.
Would you be able to update this? Thanks so much!
The proxy has been there since the beginning, because in v1 it dealt with the JSON
structure returned by libpg_query's parse_sql()
function.
Now that there is a set of concrete ast
classes, maybe it would make sense to move the hierarchy tracking data (parent_node
and parent_attribute
) into the concrete nodes.
Will think about this for v4.
Getting the following error when trying to parse a plsql procedure DDL where sql is stored in a variable and executed using Execute command
Unexpected value for 'details', must be a dict with exactly one key, got <class 'dict'>
If you create an AST with a column like:
ColumnDef(
'column_name',
inhcount=0,
is_from_type=False,
is_local=True,
is_not_null=False,
typeName=TypeName(
names=(String('char'),),
pct_type=False,
setof=False,
typemod=-1,
typmods=(A_Const(10),)
)
)
it will serialize to "char"(20)
which is invalid sql. It should just be char(20)
Please, consider function definition in the code:
from pg_query import Node, parse_plpgsql
s = '''CREATE OR REPLACE FUNCTION "get_ip_location" (IN ip_address text,
OUT country varchar,
OUT isp varchar,
OUT ip_type varchar,
OUT latitude float4,
OUT longitude float4)
RETURNS record AS
$BODY$
declare
ip_number bigint;
begin
select foo.country,foo.isp,foo.ip_type,foo.latitude,foo.longitude
from batch_foo.ip_location foo
where ip_range @> ip_number
into
country,
isp,
ip_type,
latitude,
longitude;
return;
end
$BODY$
LANGUAGE plpgsql;'''
print(s)
root = Node(parse_plpgsql(s))
for node in root.traverse():
print(node)
Output:
pasha@jp0sq32:~/PyEnvironments$ python3 hello.py
...
Traceback (most recent call last):
File "hello.py", line 28, in <module>
root = Node(parse_plpgsql(s))
File "pg_query/parser.pyx", line 98, in pg_query.parser.parse_plpgsql
pg_query.parser.ParseError: "country" is not a known variable, at location 0
I get a non-cosmetic difference detected when I try and use pgpp
on a file with eg the following contents:
select * from t;
The same query, and other queries are fine without the semicolon.
What's happening here? Perhaps pgpp doesn't support multiple statements?
Happy to take a look at fixing if you'd like to point me in the right direction?
Currently the IndentedStream
emit lists of values/nodes as standalone items, that is one per line:
SELECT a
, b
, c
FROM function(1
, 2
, 3)
It would be nice to set a threshold, a maximum width, and make it emits those items on a single row, when the resulting width is below that value, so the above would result in the following, given a threshold of 20
:
SELECT a, b, c
FROM function(1
, 2
, 3)
How to extract tables from the query like pg_query does?
https://github.com/lfittl/pg_query/blob/master/README.md#extracting-tables-from-a-query
Thanks!
Hi, first of all thank you very much for this library. This code is extremely useful, and I couldn't be more happy to have found it.
I have a fairly large collection of SQL queries that I'd like to work on. And while processing the queries, I have found several examples that trigger the "safety belt":
SELECT DISTINCT T1.player_name FROM Player AS T1 JOIN Player_Attributes AS T2 ON T1.player_api_id = T2.player_api_id WHERE T2.crossing > 90 AND T2.preferred_foot = "right"
becomes
SELECT DISTINCT t1.player_name
FROM player AS t1
INNER JOIN player_attributes AS t2 ON t1.player_api_id = t2.player_api_id
WHERE (t2.crossing > 90)
AND (t2.preferred_foot = right)
Note that right
is not in quotations marks.
SELECT DISTINCT T1.player_name FROM Player AS T1 JOIN Player_Attributes AS T2 ON T1.player_api_id = T2.player_api_id WHERE T2.preferred_foot = "left" AND T2.overall_rating >= 85 AND T2.overall_rating <= 90
becomes
SELECT DISTINCT t1.player_name
FROM player AS t1
INNER JOIN player_attributes AS t2 ON t1.player_api_id = t2.player_api_id
WHERE (t2.preferred_foot = left)
AND (t2.overall_rating >= 85)
AND (t2.overall_rating <= 90)
Note that left
is not in quotation marks.
SELECT player_api_id FROM Player WHERE height >= 180 AND height <= 190 INTERSECT SELECT player_api_id FROM Player_Attributes WHERE preferred_foot = "left"
becomes
SELECT player_api_id
FROM player
WHERE (height >= 180)
AND (height <= 190)
INTERSECT
SELECT player_api_id
FROM player_attributes
WHERE preferred_foot = left
Same problem with left
and perhaps an issue with INTERSECT
. Parentheses around the select statements could help.
(SELECT document_name FROM documents GROUP BY document_type_code ORDER BY count(*) DESC LIMIT 3) INTERSECT SELECT document_name FROM documents GROUP BY document_structure_code ORDER BY count(*) DESC LIMIT 3
becomes
SELECT document_name
FROM documents
GROUP BY document_type_code
ORDER BY count(*) DESC
LIMIT 3
INTERSECT
SELECT document_name
FROM documents
GROUP BY document_structure_code
ORDER BY count(*) DESC
LIMIT 3
Parentheses around the select statements are mandatory here it would seem.
(SELECT document_name FROM documents GROUP BY document_type_code ORDER BY count(*) DESC LIMIT 3) INTERSECT SELECT document_name FROM documents GROUP BY document_structure_code ORDER BY count(*) DESC LIMIT 3
becomes
SELECT document_name
FROM documents
GROUP BY document_type_code
ORDER BY count(*) DESC
LIMIT 3
INTERSECT
SELECT document_name
FROM documents
GROUP BY document_structure_code
ORDER BY count(*) DESC
LIMIT 3
Likewise.
I'm also wondering what happens if one has nested UNION
, INTERSECT
, and EXCEPT
queries, for instance, A INTERSECT (B UNION C)
vs (A INTERSECT B) UNION C
where A
, B
, and C
are placeholders for select statements. It seems it's always a good idea to print parentheses when using set operators.
I'm getting an error message here:
BEGIN;
CREATE MATERIALIZED VIEW a AS (
SELECT 1
FROM y
);
CREATE MATERIALIZED VIEW b AS (
SELECT 2
FROM y
);
ROLLBACK;
Traceback (most recent call last):
File "/home/user/.local/venvs/pipx/bin/pgpp", line 10, in <module>
sys.exit(main())
File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/__main__.py", line 82, in main
workhorse(args)
File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/__main__.py", line 36, in workhorse
comma_at_eoln=args.comma_at_eoln)
File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/__init__.py", line 38, in prettify
prettified = IndentedStream(**options)(Node(orig_pt))
File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printer.py", line 232, in __call__
self.print_node(statement)
File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printer.py", line 357, in print_node
printer(node, self)
File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printers/dml.py", line 634, in raw_stmt
output.print_node(node.stmt)
File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printer.py", line 357, in print_node
printer(node, self)
File "/home/user/.local/venvs/pipx/lib/python3.6/site-packages/pglast/printers/ddl.py", line 600, in create_table_as_stmt
and len(node.query.targetList[0].val.fields) == 1
TypeError: object of type 'Missing' has no len()
pglast 1.2
When prefixing an expression with a minus sign the query is correctly parsed but the A_Expr printer fails. It works if the minus sign has a value to the left of it.
>>> from pglast import printers, Node, parse_sql
>>> from pglast.printer import RawStream
>>>
>>>
>>> RawStream()(Node(parse_sql('SELECT 0-(1+1)')))
'SELECT 0 - ((1 + 1))'
>>>
>>>
>>> RawStream()(Node(parse_sql('SELECT -(1+1)')))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 251, in __call__
self.print_node(statement)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 380, in print_node
printer(node, self)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printers/dml.py", line 701, in raw_stmt
output.print_node(node.stmt)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 380, in print_node
printer(node, self)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printers/dml.py", line 777, in select_stmt
output.print_list(node.targetList)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 437, in print_list
is_symbol=is_symbol)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 401, in _print_items
self.print_node(item, is_name=are_names, is_symbol=is_symbol and idx == last)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 380, in print_node
printer(node, self)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printers/dml.py", line 707, in res_target
output.print_node(node.val)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printer.py", line 380, in print_node
printer(node, self)
File "/Users/aiham/Library/Caches/pypoetry/virtualenvs/aiham-parser-UHj-Q3Ed-py3.7/lib/python3.7/site-packages/pglast/printers/dml.py", line 33, in a_expr
if node.lexpr.node_tag == 'A_Expr':
AttributeError: 'Missing' object has no attribute 'node_tag'
This is a valid expression:
postgres=# SELECT -(1+1);
?column?
----------
-2
(1 row)
The data structure for SELECT -(1+1)
is:
[
{
"RawStmt": {
"stmt": {
"SelectStmt": {
"targetList": [
{
"ResTarget": {
"val": {
"A_Expr": {
"kind": 0,
"name": [
{
"String": {
"str": "-"
}
}
],
"rexpr": {
"A_Expr": {
"kind": 0,
"name": [
{
"String": {
"str": "+"
}
}
],
"lexpr": {
"A_Const": {
"val": {
"Integer": {
"ival": 1
}
},
"location": 9
}
},
"rexpr": {
"A_Const": {
"val": {
"Integer": {
"ival": 1
}
},
"location": 11
}
},
"location": 10
}
},
"location": 7
}
},
"location": 7
}
}
],
"op": 0
}
}
}
}
]
This is where it fails:
Line 33 in ad450de
The printer heavily assumes lexpr always exists so I'm not sure what the best way to fix it is.
Support for PG 13 is steadly progressing in the underlying libpg-query.
I had an initial look at it, and I foresee a considerable effort in porting current pglast to that level.
The most breaking thing is the introduction of a protobuf
-based result, much richer that the JSON
-based one: IIUC, it would make it easier to implement #60. OTOH, the Python module generated from the definition is surprisingly 1.232 MB... I'm still undecided on whether it's the way to go or not.
I will upload my wip branch in the following days, but most tests are currently failing.
When I try to install this package without using a pre-built wheel (such as on macos where there are no wheels on pypi) I get a build error relating to xxhash headers. If install from a local folder, it works fine.
pglast version: v1.17
I get identical result for parsing this two queries:
>>> pprint(pglast.parse_sql('select 1 from b.a'))
[{'RawStmt': {'stmt': {'SelectStmt': {'fromClause': [{'RangeVar': {'inh': True,
'location': 14,
'relname': 'a',
'relpersistence': 'p',
'schemaname': 'b'}}],
'op': 0,
'targetList': [{'ResTarget': {'location': 7,
'val': {'A_Const': {'location': 7,
'val': {'Integer': {'ival': 1}}}}}}]}}}}]
>>> pprint(pglast.parse_sql('select 1 from c.b.a'))
[{'RawStmt': {'stmt': {'SelectStmt': {'fromClause': [{'RangeVar': {'inh': True,
'location': 14,
'relname': 'a',
'relpersistence': 'p',
'schemaname': 'b'}}],
'op': 0,
'targetList': [{'ResTarget': {'location': 7,
'val': {'A_Const': {'location': 7,
'val': {'Integer': {'ival': 1}}}}}}]}}}}]
For me, expected behaviour in second case is get error or get additional field in ['RangeVar']['catalogname'] = 'c'
.
Hi, Thank you for being so quick to respond with my other issue, #55!
I have observed a strange behaviour that, as a consumer of pglast parse trees, is making my work a bit more difficult.
In particular, when the rexpr
field of the A_Expr
constructor is present, the raw data structure returned by parse_sql
for that field is sometimes a list
and sometimes a dict
. It is never a list of a single element, though.
The field therefore has a cardinality that is neither purely optional nor multiple. If the cardinality was strictly multiple, then, in the non-missing case, parse_sql
would always return a list
representing one or more field entries. If the cardinality was strictly optional, then, in the non-missing case, parse_sql
would always return a dict
representing a single field entry. All other fields that can be missing behave in either of these two ways, only rexpr
breaks this. That makes it more difficult to deal with this field.
I am not sure if consistency concerns of this kind are in the scope of pglast because the cause may well be within postgres's parser, https://github.com/postgres/postgres/blob/master/src/backend/parser/gram.y, I haven't checked though.
Take for example the query
SELECT * FROM manufacturers
ORDER BY EXTRACT('year' FROM deliver_date) ASC,
EXTRACT('month' FROM deliver_date) ASC,
EXTRACT('day' FROM deliver_date) ASC
is currently rendered as
SELECT *
FROM manufacturers
ORDER BY pg_catalog.date_part('year'
, deliver_date) ASC
, pg_catalog.date_part('month'
, deliver_date) ASC
, pg_catalog.date_part('day'
, deliver_date) ASC
because that's how the EXTRACT function is remapped by the parser.
It would be nice to optionally have a way to recognize those functions and map them back to more verbose form.
I am trying to use the pglast CLI to format SQL files, but I can't find all the options available. All I can seem to find is comma_at_eoln
and compact
.
Is there a list somewhere I can look at?
Windows 10, Python 3.6.
Full output is:
C:\Python36-32\Scripts>python -m pip install pg_query
Collecting pg_query
Using cached pg_query-0.27.tar.gz
Ignoring aenum: markers 'python_version < "3.6"' don't match your environment
Requirement already satisfied: setuptools in c:\python36-32\lib\site-packages (from pg_query)
Installing collected packages: pg-query
Running setup.py install for pg-query ... error
Complete output from command C:\Python36-32\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\PAVLO~1.GOL\\AppData\\Local\\Temp\\pip-build-8ph4yow2\\pg-query\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\PAVLO~1.GOL\AppData\Local\Temp\pip-m36oza86-record\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win32-3.6
creating build\lib.win32-3.6\pg_query
copying pg_query\error.py -> build\lib.win32-3.6\pg_query
copying pg_query\keywords.py -> build\lib.win32-3.6\pg_query
copying pg_query\node.py -> build\lib.win32-3.6\pg_query
copying pg_query\printer.py -> build\lib.win32-3.6\pg_query
copying pg_query\__init__.py -> build\lib.win32-3.6\pg_query
copying pg_query\__main__.py -> build\lib.win32-3.6\pg_query
creating build\lib.win32-3.6\pg_query\enums
copying pg_query\enums\lockoptions.py -> build\lib.win32-3.6\pg_query\enums
copying pg_query\enums\nodes.py -> build\lib.win32-3.6\pg_query\enums
copying pg_query\enums\parsenodes.py -> build\lib.win32-3.6\pg_query\enums
copying pg_query\enums\pg_class.py -> build\lib.win32-3.6\pg_query\enums
copying pg_query\enums\primnodes.py -> build\lib.win32-3.6\pg_query\enums
copying pg_query\enums\__init__.py -> build\lib.win32-3.6\pg_query\enums
creating build\lib.win32-3.6\pg_query\printers
copying pg_query\printers\ddl.py -> build\lib.win32-3.6\pg_query\printers
copying pg_query\printers\dml.py -> build\lib.win32-3.6\pg_query\printers
copying pg_query\printers\sfuncs.py -> build\lib.win32-3.6\pg_query\printers
copying pg_query\printers\__init__.py -> build\lib.win32-3.6\pg_query\printers
running build_ext
error: [WinError 2] The system cannot find the file specified
----------------------------------------
Command "C:\Python36-32\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\PAVLO~1.GOL\\AppData\\Local\\Temp\\pip-build-8ph4yow2\\pg-query\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\PAVLO~1.GOL\AppData\Local\Temp\pip-m36oza86-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\PAVLO~1.GOL\AppData\Local\Temp\pip-build-8ph4yow2\pg-query\
Hi,
CTE behaviour was changed in postgres12, so because of that there were added new keywords: "MATERIALIZED" and "NOT MATERIALIZED" for cte statements.
Examples:
WITH k AS MATERIALIZED (select * from m) select * from k;
WITH k AS NOT MATERIALIZED (select * from m) select from k;
it seems the v2 does not support this new syntax.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pglast/parser.pyx", line 75, in pglast.parser.parse_sql
pglast.parser.ParseError: syntax error at or near "MATERIALIZED", at location 11
Can you help with this issue?
From master
pgpp <<EOF
SELECT 1;
SELECT 2;
EOF
output:
SELECT 1;
SELECT 2
I'd like to keep the final semicolon if possible.
Very cool project.
I've been looking at pglast and sqlfmt
tool (demo. I'm hoping to use pglast on my projects.
I know people have lots of different style preferences, and each auto-formatter must decide on configurability (like yapf) vs consistency (like black).
I'm still developing my taste in SQL formatting, but to my own eye, a couple features of sqlfmt's style can make code easier to read.
SELECT
, FROM
, etc. appear on the next line, indentedSELECT
count(*) AS count,
winner,
counter * 60 * 5 AS counter
FROM
(
SELECT
winner,
round(length / (60 * 5)) AS counter
FROM
players
WHERE
build = $1
AND ((hero = $2 OR region = $3))
)
AS s
GROUP BY
winner, counter;
compare pgpp 1.2:
SELECT count(*) AS count,
winner,
(counter * 60) * 5 AS counter
FROM (SELECT winner,
round(length / (60 * 5)) AS counter
FROM players
WHERE (build = $1)
AND (( (hero = $2)
OR (region = $3)))) AS s
GROUP BY winner, counter
I'd like to configure my pgpp to add newlines and indentation. How would you feel about adding such options, or adding a cookbook with a few examples on how to make these adjustments?
Thanks very much for sharing the project!
I used stored functions quite a bit and would like to be able to format them.
Currently this results in:
NotImplementedError: Printer for node 'CreateFunctionStmt' is not implemented yet
How much work would be needed to support this? Happy to look into fixing this up if you're willing to advise me on the best approach.
I've run into a weird parsing error that seems to happen on pglast.split
.
Given the following code, an exception is raised on L9 in both 1.5 and in master:
import pglast
SQL = 'CREATE CONSTRAINT TRIGGER monitor_account_status AFTER UPDATE ON accounts DEFERRABLE INITIALLY DEFERRED FOR EACH ROW WHEN ((((new.status_id = ANY (ARRAY[4, 5, 6])) <> (old.status_id = ANY (ARRAY[4, 5, 6]))) OR (new.status_id = 7))) EXECUTE PROCEDURE log_account_status_change();'
result = pglast.parse_sql(SQL)
print(result)
for statement in pglast.split(SQL):
print(statement)
result = pglast.parse_sql(statement)
print(result)
The output:
[{'RawStmt': {'stmt': {'CreateTrigStmt': {'trigname': 'monitor_account_status', 'relation': {'RangeVar': {'relname': 'accounts', 'inh': True, 'relpersistence': 'p', 'location': 65}}, 'funcname': [{'String': {'str': 'log_account_status_change'}}], 'row': True, 'events': 16, 'whenClause': {'BoolExpr': {'boolop': 1, 'args': [{'A_Expr': {'kind': 0, 'name': [{'String': {'str': '<>'}}], 'lexpr': {'A_Expr': {'kind': 1, 'name': [{'String': {'str': '='}}], 'lexpr': {'ColumnRef': {'fields': [{'String': {'str': 'new'}}, {'String': {'str': 'status_id'}}], 'location': 126}}, 'rexpr': {'A_ArrayExpr': {'elements': [{'A_Const': {'val': {'Integer': {'ival': 4}}, 'location': 153}}, {'A_Const': {'val': {'Integer': {'ival': 5}}, 'location': 156}}, {'A_Const': {'val': {'Integer': {'ival': 6}}, 'location': 159}}], 'location': 147}}, 'location': 140}}, 'rexpr': {'A_Expr': {'kind': 1, 'name': [{'String': {'str': '='}}], 'lexpr': {'ColumnRef': {'fields': [{'String': {'str': 'old'}}, {'String': {'str': 'status_id'}}], 'location': 168}}, 'rexpr': {'A_ArrayExpr': {'elements': [{'A_Const': {'val': {'Integer': {'ival': 4}}, 'location': 195}}, {'A_Const': {'val': {'Integer': {'ival': 5}}, 'location': 198}}, {'A_Const': {'val': {'Integer': {'ival': 6}}, 'location': 201}}], 'location': 189}}, 'location': 182}}, 'location': 164}}, {'A_Expr': {'kind': 0, 'name': [{'String': {'str': '='}}], 'lexpr': {'ColumnRef': {'fields': [{'String': {'str': 'new'}}, {'String': {'str': 'status_id'}}], 'location': 211}}, 'rexpr': {'A_Const': {'val': {'Integer': {'ival': 7}}, 'location': 227}}, 'location': 225}}], 'location': 207}}, 'isconstraint': True, 'deferrable': True, 'initdeferred': True}}, 'stmt_len': 277}}]
And the traceback:
Traceback (most recent call last):
File "/Users/gavinr/Source/Applications/pglifecycle/example.py", line 9, in <module>
for statement in pglast.split(SQL):
File "/Users/gavinr/Source/Libraries/pglast/pglast/__init__.py", line 95, in split
printed_pt = parse_sql(printed)[0]
File "pglast/parser.pyx", line 76, in pglast.parser.parse_sql
raise ParseError(message, cursorpos)
Any thoughts on what it could be? I'm happy to try and track it down if you can point me in the right direction.
I frequently use psql variables in my scripts.
It will be awesome if pglast can support then.
$ echo "ALTER FUNCTION myschema.myfunction OWNER TO :owner;" | pgpp
syntax error at or near ":", at location 45
To my eye, it's a little nicer for long statements to have two lines between them, not just one. It helps visually separate them from each other. What do you think?
Hi Team,
We are planning to send postgres logs to ELK stack.
everything works fine, but there is a concern where we log queries which are too long or problematic.
https://gist.github.com/cabecada/dd765a30f6946fdbf0bec0eb31fba047
now the concern is queries may have sensitive data wrt GDPR etc, and that creates problems logging to a centralized logging system.
i found this extension awesome (normalize the queries) which works the best when insert/updates are not parameterized/prepared.
but to my bad luck, i thought of trying to use this in an logstash plugin, but looks like jruby (logstash jruby) does not compile c extensions.
now the main part:
can you help me know if it is possible to hook this extension in the postgres logging facility and anonymize the params in the log queries.
just asking, feel free to ignore. :)
I would like to understand the output of the following line of code.
for nodes in root.traverse(): ... print(nodes)
I understand it gives me an AST Tree, but how can I interpret it to get meaningful information out like, column names, inner join condition, inner join table name, where clause etc..
For example: When I run
sql2 = 'select * from mj inner join jimmy on jimmy.eyes = mj.eyes'
root = Node(parse_sql(sql2))
I get the following output
None[0]={RawStmt}
stmt={SelectStmt}
fromClause[0]={JoinExpr}
jointype=<0>
larg={RangeVar}
inh=<True>
location=<14>
relname=<'mj'>
relpersistence=<'p'>
quals={A_Expr}
kind=<0>
lexpr={ColumnRef}
fields[0]={String}
str=<'jimmy'>
fields[1]={String}
str=<'eyes'>
location=<37>
location=<48>
name[0]={String}
str=<'='>
rexpr={ColumnRef}
fields[0]={String}
str=<'mj'>
fields[1]={String}
str=<'eyes'>
location=<50>
rarg={RangeVar}
inh=<True>
location=<28>
relname=<'jimmy'>
relpersistence=<'p'>
op=<0>
targetList[0]={ResTarget}
location=<7>
val={ColumnRef}
fields[0]={A_Star}
location=<7>
Can someone make me understand this. Thanks
-- hello
select 1
Running pgpp master on the file above returns
SELECT 1
I would prefer that it retain the comments :-)
Here is a minimal example to reproduce (with pglast
version 1.1).
from pglast import prettify
sql = "SELECT * FROM tbl WHERE foo IS TRUE"
prettify(sql)
Note that if we replace WHERE foo IS TRUE
with WHERE foo = TRUE
the error goes away.
Traceback:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in get_printer_for_node_tag(parent_node_tag, node_tag)
38 try:
---> 39 return NODE_PRINTERS[(parent_node_tag, node_tag)]
40 except KeyError:
KeyError: ('SelectStmt', 'BooleanTest')
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in get_printer_for_node_tag(parent_node_tag, node_tag)
41 try:
---> 42 return NODE_PRINTERS[node_tag]
43 except KeyError:
KeyError: 'BooleanTest'
During handling of the above exception, another exception occurred:
NotImplementedError Traceback (most recent call last)
<ipython-input-2-2b516e62760f> in <module>
2
3 sql = "SELECT * FROM tbl WHERE foo IS TRUE"
----> 4 prettify(sql)
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/__init__.py in prettify(statement, safety_belt, **options)
37
38 orig_pt = parse_sql(statement)
---> 39 prettified = IndentedStream(**options)(Node(orig_pt))
40 if safety_belt:
41 try:
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in __call__(self, sql, plpgsql)
225 if self.separate_statements:
226 self.newline()
--> 227 self.print_node(statement)
228 return self.getvalue()
229
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in print_node(self, node, is_name, is_symbol)
350 printer(node, self, is_name=is_name, is_symbol=is_symbol)
351 else:
--> 352 printer(node, self)
353 self.separator()
354
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printers/dml.py in raw_stmt(node, output)
613 @node_printer('RawStmt')
614 def raw_stmt(node, output):
--> 615 output.print_node(node.stmt)
616
617
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in print_node(self, node, is_name, is_symbol)
350 printer(node, self, is_name=is_name, is_symbol=is_symbol)
351 else:
--> 352 printer(node, self)
353 self.separator()
354
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printers/dml.py in select_stmt(node, output)
689 output.newline()
690 output.write('WHERE ')
--> 691 output.print_node(node.whereClause)
692 if node.groupClause:
693 output.newline()
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in print_node(self, node, is_name, is_symbol)
346 else:
347 parent_node_tag = node.parent_node and node.parent_node.node_tag
--> 348 printer = get_printer_for_node_tag(parent_node_tag, node.node_tag)
349 if is_name and node.node_tag == 'String':
350 printer(node, self, is_name=is_name, is_symbol=is_symbol)
~/.local/share/virtualenvs/flowmachine-k9pemXkM/lib/python3.7/site-packages/pglast/printer.py in get_printer_for_node_tag(parent_node_tag, node_tag)
43 except KeyError:
44 raise NotImplementedError("Printer for node %r is not implemented yet"
---> 45 % node_tag)
46
47
NotImplementedError: Printer for node 'BooleanTest' is not implemented yet
I'm unboxing the parsed statements after parse_sql()
to have less layers (of still unknown apis) and i'm trying to print a CreateStmt
reboxed just into a pglast.node.Node
instance. This is the relevant part of the traceback:
File "/home/azazel/wip/my/my-network/my-network-db/dump.py", line 69, in format_stmt
return is_(pglast.Node(stmt))
File "/home/azazel/wip/my/my-network/my-network-db/.venv/lib/python3.7/site-packages/pglast/printer.py", line 258, in __call__
self.print_node(statement)
File "/home/azazel/wip/my/my-network/my-network-db/.venv/lib/python3.7/site-packages/pglast/printer.py", line 429, in print_node
printer(node, self)
File "/home/azazel/wip/my/my-network/my-network-db/.venv/lib/python3.7/site-packages/pglast/printers/ddl.py", line 1955, in create_stmt
if node.parent_node.node_tag == 'CreateForeignTableStmt':
AttributeError: 'NoneType' object has no attribute 'node_tag'
The fix is trivial, but maybe for some reason you would prefer for the submitted
statement to always be wrapped at least into a pglast.ast.RawStmt
and enforce
that? Boh
In the meantime I'll rewrap it before passing it to the pglast.printer.IndentedStream
, first into such RawStmt
and then into a Node
.
The following statement
select a.one,
not a.bool_flag and a.something is null as foobar,
a.value1 + b.value2 * b.value3 as ciao
from sometable as a
gets reformatted as
SELECT a.one
, (NOT a.bool_flag)
AND a.something IS NULL AS foobar
, a.value1 + (b.value2 * b.value3) AS ciao
FROM sometable AS a
instead it should be
SELECT a.one
, (NOT a.bool_flag)
AND a.something IS NULL AS foobar
, a.value1 + (b.value2 * b.value3) AS ciao
FROM sometable AS a
It appears to try to parse the data as SQL. Is this expected behaviour, or a bug?
Hi there!
I am using pglast version 3.3
Please hint me how I can get all included columns in all SelectStmt objects with probably existing formula .
Just example.
SELECT t1.name AS name ,
t1.name||' '||t2.name AS new_name
FROM t1 inner join t2 on t1.id = t2.par_id;
and need to get next columns:
name: t1.name
new_name: t1.name||' '||t2.name
thanks!
Would you be interested in my attempting to add type annotations?
I'm picturing writing type annotations compatible with python 3.6+ for the python code, some type-generation step for the cython code, and adding a mypy
check for the test suite.
I'm looking for some clarification on the licensing choice. I was looking forward to using this library until I realized that the GLPv3 license will most likely be an issue in the future.
Looking at your main dependency, libpg_query that powers your python wrapper, why did you decide to deviate from it's BSD-3-Clause licensing?
In addition to that, all of the derivative works from libpg_query, including Ruby pg_query, Go pg_query_go, JS Node pg-query-parser, JS Browser pg-query-emscripten, and Python psqlparse, have all opted to follow in the spirit of libpg_query by licensing there work with the BSD-3-Clause.
Are you willing to follow suit with the other libraries as such? Any insight would be much appreciated.
README says
Currently this covers most DML statements such as SELECTs, INSERTs, DELETEs and UPDATEs, fulfilling my needs, but I'd like to extend it to handle also DDL statements and, why not, PLpgSQL instructions too.
It actually seems like most DDL is covered.
hi.
we have issue use pglast at multithread environment. when parsing a sql with wrong syntax, it exit with error code 2.
Would you please help to take a look ?
Thanks
hong
the simple code to reproduce:
from pglast import parse_sql
import time
import threading
cmd = 'SELECT * FROM (SELECT DISTINCT ON (Item) "Item", "Location" FROM "UN"."t10" ORDER BY "Item", "Location" DESC) ORDER BY "Location" DESC;'
def parser():
try:
try:
parse_sql(cmd)
except Exception as e:
print(e)
raise
except:
time.sleep(1)
pass
for i in range(2):
x = threading.Thread(target=parser)
time.sleep(1)
x.start()
while 1:
pass
Minor issue, but the pglast
module does not have a __version__
attribute:
>>> import pglast
>>> print(pglast.__version__)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-1-8eeeaab38785> in <module>
1 import pglast
----> 2 print(pglast.__version__)
AttributeError: module 'pglast' has no attribute '__version__'
It would be helpful to add this in order to be able to check the version from within Python.
I know this is somewhat of a duplicate of issue #9, but I'd like to ask for a clarification.
My employer would like to use this library in our CI test suite merely as a linter for our ETL's SQL scripts; it's not even installed in production. Neither our ETL nor the test suite are publicly available. Would this be an acceptable usage?
Visitor pattern is a common pattern to process parse trees in SQL. Very simply, there is a visitor with the following pseudo-code:
class Visitor:
def visit(self, node):
if obj is None:
return None
if isinstance(obj, AcceptingNode):
self.visit_node(obj)
elif isinstance(obj, AcceptingList):
self.visit_list(obj)
elif isinstance(obj, AcceptingScalar):
self.visit_scalar(obj)
def visit_node(self):
# Visit all the child nodes
....
The Base
class in node.py
has to have an accept function:
def accept(self, visitor):
visitor.visit(self)
Then other apps can build visitors to process the parse tree. I've got visitors to work in my project based on pglast:
node.py where I've added the accept
function the Base
class.
Visitor Base Class that handles navigation among children node.
Table Visitor that collects all the table names in the SQL query.
I can contribute the accept
function and Visitor
base class if you think it will be useful to other developers of your project. I can support the Visitor pattern as well.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.