slaclab / lacquer Goto Github PK
View Code? Open in Web Editor NEWSQL Parser derived from Presto, written in Python with the PLY framework
License: Apache License 2.0
SQL Parser derived from Presto, written in Python with the PLY framework
License: Apache License 2.0
I don't think this will be too hard, but I think my initial implementation of these generated a few shift/reduce conflicts.
@brianv0 I was leveraging the logic you provided in the examples/gather_columns.py
script, and agree with your logic for a query of the form,
SELECT foo, b.bar FROM a JOIN b ON a.id = b.id
which results in a table/column resolution of:
SingleColumn(expression=QualifiedNameReference(name=QualifiedName(foo))): [Table(name=QualifiedName(a)), Table(name=QualifiedName(b))]
SingleColumn(expression=QualifiedNameReference(name=QualifiedName(b.bar))): [Table(name=QualifiedName(b))]
i.e., from the query one cannot decipher whether the column foo
is from a or b, however for a UNION
type query
SELECT foo AS bar FROM a UNION SELECT bar FROM b
the table/column resolution is:
SingleColumn(alias='bar', expression=QualifiedNameReference(name=QualifiedName(foo))): [Table(name=QualifiedName(b)), Table(name=QualifiedName(a))]
SingleColumn(expression=QualifiedNameReference(name=QualifiedName(bar))): [Table(name=QualifiedName(b)), Table(name=QualifiedName(a))]
although it's apparent from the query that the foo
column is from table a
and the bar
column is from table b
. Do you know of any resolution to this issue?
It'd be great to be able to successfully say we can parse all TPC-H queries without error.
A list of the ones presto uses (with some additional presto-specific stuff):
https://github.com/prestodb/presto/tree/master/presto-parser/src/test/resources/tpch/queries
SELECT COUNT(*) FROM foo
currently fails as a syntax error
The syntax error message is a bit off and doesn't produce helpful output about the token the error is seen at.
Howdy - I just tried out lacquer and it seems like CAST causes an uninformative error to be thrown
In [44]: parser.parse("SELECT CAST(a AS BIGINT) FROM b")
File "<string>", line 94
SELECT CAST(a AS BIGINT) FROM b
^
SyntaxError: Syntax error
FYI @johnbodley
Hey @brianv0
First of, awesome job on this parser. I came across an issue, easy to work around, but though I would share. When parsing queries with UNION, it parser correctly only if UNION is upper case. See below:
Query 1:
SELECT name,sex FROM zoo_a.animals UNION SELECT name,sex FROM zoo_b.animals
Result 1:
{'query_body': {'distinct': True, 'line': 0, 'pos': 0, 'relations': [QuerySpecification(select=Select(distinct=False, select_items=[SingleColumn(expression=QualifiedNameReference(name=QualifiedName(name))), SingleColumn(expression=QualifiedNameReference(name=QualifiedName(sex)))]), from_=Table(name=QualifiedName(zoo_a.animals)), order_by=[]), QuerySpecification(select=Select(distinct=False, select_items=[SingleColumn(expression=QualifiedNameReference(name=QualifiedName(name))), SingleColumn(expression=QualifiedNameReference(name=QualifiedName(sex)))]), from_=Table(name=QualifiedName(zoo_b.animals)), order_by=[])]}, 'line': 0, 'order_by': [], 'pos': 0}
==================================================
Query 2:
SELECT name,sex FROM zoo_a.animals union SELECT name,sex FROM zoo_b.animals
Result 2:
{'query_body': {'distinct': True, 'line': 0, 'right': None, 'pos': 0, 'left': QuerySpecification(select=Select(distinct=False, select_items=[SingleColumn(expression=QualifiedNameReference(name=QualifiedName(name))), SingleColumn(expression=QualifiedNameReference(name=QualifiedName(sex)))]), from_=Table(name=QualifiedName(zoo_a.animals)), order_by=[])}, 'line': 0, 'order_by': [], 'pos': 0}
Equal: False
Cheers
The Redbaron project has an Syntax Tree query interface:
https://redbaron.readthedocs.io/en/latest/querying.html
Which is partially implemented as such:
https://github.com/PyCQA/redbaron/blob/master/redbaron/base_nodes.py#L804
Conceivably, we should be able to use the Visitor pattern to implement this.
Where this will really potentially shine is basic pattern matching where writing a new visitor is a bit more work than it's worth.
Based on examples/gather_columns.py and other needs, it'd be good to see if we can define a API for answering common questions about queries, like what tables and columns are referenced in a given query.
I'm not sure if there's enough overlap in the questions people want to ask, so this issue is a placeholder to understand them, and a continuation of #13.
Would you be willing to publish this package to pypi so that it'll be easier to install ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.