Giter VIP home page Giter VIP logo

mo-sql-parsing's Introduction

More SQL Parsing!

PyPI Latest Release Build Status Downloads

Parse SQL into JSON so we can translate it for other datastores!

See changes

Objective

The objective is to convert SQL queries to JSON-izable parse trees. This originally targeted MySQL, but has grown to include other database engines. Please paste some SQL into a new issue if it does not work for you

Project Status

December 2023 - I continue to resolve issues as they are raised. There are over 1100 tests, that cover most SQL for most databases, with limited DML and UDF support, including:

  • inner queries,
  • with clauses,
  • window functions
  • create/drop/alter tables and views
  • insert/update/delete statements
  • create procedure and function statements (MySQL only)

Install

pip install mo-sql-parsing

Parsing SQL

>>> from mo_sql_parsing import parse
>>> parse("select count(1) from jobs")
{'select': {'value': {'count': 1}}, 'from': 'jobs'}

Each SQL query is parsed to an object: Each clause is assigned to an object property of the same name.

>>> parse("select a as hello, b as world from jobs")
{'select': [{'value': 'a', 'name': 'hello'}, {'value': 'b', 'name': 'world'}], 'from': 'jobs'}

The SELECT clause is an array of objects containing name and value properties.

SQL Flavours

There are a few parsing modes you may be interested in:

Double-quotes for literal strings

MySQL uses both double quotes and single quotes to declare literal strings. This is not ansi behaviour, but it is more forgiving for programmers coming from other languages. A specific parse function is provided:

result = parse_mysql(sql)

SQLServer Identifiers ([])

SQLServer uses square brackets to delimit identifiers. For example

SELECT [Timestamp] FROM [table]

which conflicts with BigQuery array constructor (eg [1, 2, 3, 4]). You may use the SqlServer flavour with

from mo_sql_parsing import parse_sqlserver as parse

NULL is None

The default output for this parser is to emit a null function {"null":{}} wherever NULL is encountered in the SQL. If you would like something different, you can replace nulls with None (or anything else for that matter):

result = parse(sql, null=None)

this has been implemented with a post-parse rewriting of the parse tree.

Normalized function call form

The default behaviour of the parser is to output function calls in simple_op format: The operator being a key in the object; {op: params}. This form can be difficult to work with because the object must be scanned for known operators, or possible optional arguments, or at least distinguished from a query object.

You can have the parser emit function calls in normal_op format

>>> from mo_sql_parsing import parse, normal_op
>>> parse("select trim(' ' from b+c)", calls=normal_op)

which produces calls in a normalized format

{"op": op, "args": args, "kwargs": kwargs}

here is the pretty-printed JSON from the example above:

{'select': {'value': {
    'op': 'trim', 
    'args': [{'op': 'add', 'args': ['b', 'c']}], 
    'kwargs': {'characters': {'literal': ' '}}
}}}

Generating SQL

You may also generate SQL from a given JSON document. This is done by the formatter, which is usually lagging the parser (Dec2023).

>>> from mo_sql_parsing import format
>>> format({"from":"test", "select":["a.b", "c"]})
'SELECT a.b, c FROM test'

Contributing

In the event that the parser is not working for you, you can help make this better but simply pasting your sql (or JSON) into a new issue. Extra points if you describe the problem. Even more points if you submit a PR with a test. If you also submit a fix, then you also have my gratitude.

Run Tests

See the tests directory for instructions running tests, or writing new ones.

More about implementation

SQL queries are translated to JSON objects: Each clause is assigned to an object property of the same name.

# SELECT * FROM dual WHERE a>b ORDER BY a+b
{
    "select": {"all_columns": {}} 
    "from": "dual", 
    "where": {"gt": ["a", "b"]}, 
    "orderby": {"value": {"add": ["a", "b"]}}
}

Expressions are also objects, but with only one property: The name of the operation, and the value holding (an array of) parameters for that operation.

{op: parameters}

and you can see this pattern in the previous example:

{"gt": ["a","b"]}

Array Programming

The mo-sql-parsing.scrub() method is used liberally throughout the code, and it "simplifies" the JSON. You may find this form a bit tedious to work with because the JSON property values can be values, lists of values, or missing. Please consider converting everything to arrays:

def listwrap(value):
    if value is None:
        return []
    elif isinstance(value, list)
        return value
    else:
        return [value]

then you may avoid all the is-it-a-list checks :

for select in listwrap(parsed_result.get('select')):
    do_something(select)

Version Changes, Features

Version 10

December 2023

SELECT * now emits an all_columns call instead of plain star (*).

>>> from mo_sql_parsing import parse
>>> parse("SELECT * FROM table")
{'select': {'all_columns': {}}, 'from': 'table'}

This works better with the except clause, and is more explicit when selecting all child properties.

>>> parse("SELECT a.* EXCEPT b FROM table")
>>> {"select": {"all_columns": "a", "except": "b"}, "from": "table"}

You may get the original behaviour by staying with version 9, or by using all_columns="*":

>>> parse("SELECT * FROM table", all_columns="*")
{'select': "*", 'from': 'table'}

Version 9

November 2022

Output for COUNT(DISTINCT x) has changed from function composition

{"count": {"distinct": x}}

to named parameters

{"count": x, "distinct": true}

This was part of a bug fix issue142 - realizing distinct is just one parameter of many in an aggregate function. Specifically, using the calls=normal_op for clarity:

>>> from mo_sql_parsing import parse, normal_op
>>> parse("select count(distinct x)", calls=normal_op)

{'select': {'value': {
    'op': 'count', 
    'args': [x], 
    'kwargs': {'distinct': True}
}}}

Version 8.200+

September 2022

  • Added ALTER TABLE and COPY command parsing for Snowflake

Version 8

November 2021

  • Prefer BigQuery [] (create array) over SQLServer [] (identity)

  • Added basic DML (INSERT/UPDATE/DELETE)

  • flatter CREATE TABLE structures. The option list in column definition has been flattened:
    Old column format

      {"create table": {
          "columns": {
              "name": "name",
              "type": {"decimal": [2, 3]},
              "option": [
                  "not null",
                  "check": {"lt": [{"length": "name"}, 10]}
              ]
          }
      }}
    

    New column format

      {"create table": {
          "columns": {
              "name": "name", 
              "type": {"decimal": [2, 3]}
              "nullable": False,
              "check": {"lt": [{"length": "name"}, 10]} 
          }
      }}
    

Version 7

October 2021

  • changed error reporting; still terrible
  • upgraded mo-parsing library which forced version change

Version 6

October 2021

  • fixed SELECT DISTINCT parsing
  • added DISTINCT ON parsing

Version 5

August 2021

  • remove inline module mo-parsing
  • support CREATE TABLE, add SQL "flavours" emit {null:{}} for None

Version 4

November 2021

  • changed parse result of SELECT DISTINCT
  • simpler ORDER BY clause in window functions

mo-sql-parsing's People

Contributors

alberto15romero avatar amolk avatar betodealmeida avatar csenki avatar databobek avatar didone avatar diggzhang avatar fireis avatar fy975713384 avatar johnatannvmd avatar julio-vaz avatar khml avatar klahnakoski avatar marqueewinq avatar miketzian avatar mknorps avatar mozilla-github-standards avatar mustafa-travisci avatar nishikeshkardak avatar paullucasleit avatar pydolan avatar rbergm avatar rostykob avatar sam-smo avatar samyxdev avatar sonalisinghal avatar thrbowl avatar tiberiuichim avatar todpole3 avatar ykawakamy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mo-sql-parsing's Issues

mo_parsing.exceptions.ParseException: Expecting {offset} | {StringEnd}, found "INTERSECT " (at char 95), (line:1, col:96)

Hi there,
I've got the following error. Could you advise me about it? Please see below detail. Thanks a lot.

----error message ----
mo_parsing.exceptions.ParseException: Expecting {offset} | {StringEnd}, found "INTERSECT " (at char 95), (line:1, col:96)

----SQL query that is ready to be parsed----
SELECT document_name FROM documents GROUP BY document_type_code ORDER BY count() DESC LIMIT 3 INTERSECT SELECT document_name FROM documents GROUP BY document_structure_code ORDER BY count() DESC LIMIT 3
T

Best,
Zea

Incorrect formatting of the DISTINCT ON construct

>>> import mo_sql_parsing
>>> q = "SELECT DISTINCT ON (col) col, col2 FROM test"
>>> mo_sql_parsing.format(mo_sql_parsing.parse(q))
'SELECT DISTINCT ON(col) AS col, col2 FROM test'

AS is invalid part of the DISTINCT ON construct

Checked on PostgreSQL 13

New failing test cases from a Redshift user (part 3)

Testing the library on our 2800 Redshift SQL queries codebase, I'm down to 1.7857% failing rate. In this issue I collect the corner cases that currently fail.

This is my last batch of test cases. I went over each failing query in my codebase and they should all be covered by the cases below 🙏 .

Relates to :

first_value ignore nulls

special case for first_value and last_value window functions

parse("""
select venuestate, venueseats, venuename,
first_value(venuename ignore nulls)
over(partition by venuestate
order by venueseats desc
rows between unbounded preceding and unbounded following)
from (select * from venue where venuestate='CA')
order by venuestate;
""")

Ref: last example of https://docs.aws.amazon.com/redshift/latest/dg/r_WF_first_value.html#r_WF_first_value-examples

nested CTEs

parse("""
with outer_cte as (
    with inner_cte as (
        select * from source
    )
    select date_at :: date from inner_cte
)
select * from outer_cte
""")

similar to

parse("""
select distinct city from users
where city similar to '%E%|%H%' order by city;
""")

Ref: https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions-similar-to.html#pattern-matching-conditions-similar-to-examples

mixed union and union all

parse("""
select * from a
union
select * from b
union all
select * from c
""")

nvl combined with sum window function

This works

parse("""
select
    SUM(prev_day_count - line_count) over (order by date_at rows unbounded preceding) as dead_crons
from source
""")

This doesn't work

parse("""
select
    NVL(
        SUM(prev_day_count - line_count) over (order by date_at rows unbounded preceding)
    , 0) as dead_crons
from source
""")

double precision data type

parse("select sum(price::double precision) as revenue from source")

Ref: https://docs.aws.amazon.com/redshift/latest/dg/r_Numeric_types201.html#r_Numeric_types201-floating-point-types

custom udfs

We use a few user defined functions (UDFs) which might take arbitrary names.
For example, f_bigint_to_hhmmss might convert a timestamp to a human readable time.

select f_bigint_to_hhmmss(device_timezone) from t

Given the names are arbitrary, what do you think is the best way to parse those corner case queries? Should I fork the repo and add to the grammar our custom function names? or is there a more generic way to handle functions that can be arbitrarily named?

Ref:

Parse and format is not correct for DISTICT keyword

hi, Primes:
This maybe not correct for the DISTICT parse and format. Could you please help for a look? Thanks.

q = "select DISTICT col_a, col_b from table_test"
pasred_query = parse_mysql(q)
print(f"parsed_query: {pasred_query}")
re_format_query = format(parsed_query)
print(f"re_format_query: {re_format_query}")

parsed_query: {'select': [{'value': 'DISTICT', 'name': 'col_a'}, {'value': 'col_b'}], 'from': 'table_test'}
re_format_query: 'SELECT DISTICT AS col_a, col_b FROM table_test'

Escape dots in aliases

Sources:

select a.x as a.b.c.x from a
select a.x as "a.b.c.x" from a

Both parsed to:

{
    "select": {
        "value": "a.x",
        "name": "a.b.c.x"
    },
    "from": "a"
}

But, on sql side this two statements means different.

I faced this not on parsing, but formatting part. I need to make a.b.c.d alias to some column, but have no clue how to escape dots. I've found \. shadowing in formatter source code, but this leads to \ character present in alias, since . replaced back with \. after escape on combine step.

bug in parenthesis

following #35 and #36
SELECT name FROM stadium WHERE stadium_id NOT IN (SELECT stadium_id FROM concert)
to
SELECT name FROM stadium WHERE stadium_id NOT IN ((SELECT stadium_id FROM concert))

SELECT rid FROM routes WHERE dst_apid IN (SELECT apid FROM airports WHERE country = 'United States')
to
SELECT rid FROM routes WHERE dst_apid IN ((SELECT apid FROM airports WHERE country = 'United States'))

SELECT COUNT(*) FROM (SELECT cName FROM tryout INTERSECT SELECT cName FROM tryout)
to
SELECT COUNT(*) FROM SELECT cName FROM tryout INTERSECT SELECT cName FROM tryout

Support UNNEST without AS

Hello,

we have been using your library for a while to parse our queries and are super happy that it exists. During the last half year (we started with moz-sql-parser, right before you started not updating it anymore and did not realise that), we collected a bunch of test cases that the parser does not understand and we are kinda painfully fixing by adding a preprocessor to our code. I will include the preprocessor's statements here, I think they are self-explanatory. I will open one issue per test case

# UNNEST without AS is not understood.
fail = 'SELECT * FROM UNNEST(ARRAY[foo,bar]) table_name(column_name)'
success = 'SELECT * FROM UNNEST(ARRAY[foo,bar]) AS table_name(column_name)'

I would be happy to help out fixing them, but could probably use one or two pointers where to start 🙂

Our engine is PrestoDB (Athena), but that should be covered by this issue as far as I understood.

parse method breaks with trim function

The parse method breaks when the SQL has something similar to trim(both ' ' from ' This is a test')

The parser breaks because, the from is a keyword. The only way I have worked around is to regex replace this kind of occurrence using the following:
re.sub("trim\(.*?\)",'trim()',sql, flags=re.DOTALL|re.I)

>>> from moz_sql_parser import parse
>>> sql = "select trim(' ' from ' This is a test') from dual"
>>> parse(sql)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tej/virtualenvs/lib/python3.8/site-packages/moz_sql_parser/__init__.py", line 23, in parse
    parse_result = SQLParser.parseString(sql, parseAll=True)
  File "/Users/tej/virtualenvs/lib/python3.8/site-packages/mo_parsing/core.py", line 72, in output
    return func(self, *args[1:], **kwargs)
  File "/Users/tej/virtualenvs/lib/python3.8/site-packages/mo_parsing/core.py", line 275, in parseString
    return self._parseString(string, parseAll=parseAll)
  File "/Users/tej/virtualenvs/lib/python3.8/site-packages/mo_parsing/core.py", line 289, in _parseString
    StringEnd()._parse(string, end)
  File "/Users/tej/virtualenvs/lib/python3.8/site-packages/mo_parsing/core.py", line 233, in _parse
    result = self.parseImpl(string, index, doActions)
  File "/Users/tej/virtualenvs/lib/python3.8/site-packages/mo_parsing/tokens.py", line 657, in parseImpl
    raise ParseException(self, start, string)
mo_parsing.exceptions.ParseException: Expecting StringEnd, found "(' ' from " (at char 11, (line:1, col:12)
>>> 

INTERSECT is missing from the parser

Hi,
trying to parse the following query:
SELECT Status FROM city WHERE Population > 1500 INTERSECT SELECT Status FROM city WHERE Population < 500

and got:
mo_parsing.exceptions.ParseException: Expecting StringEnd, found "INTERSECT " (at char 50, (line:1, col:51)

Can we add support for INTERSECT statements?
(new here, please let me know if any information is missing)

Support FILTER in Aggregates

Hello,

we have been using your library for a while to parse our queries and are super happy that it exists. During the last half year (we started with moz-sql-parser, right before you started not updating it anymore and did not realise that), we collected a bunch of test cases that the parser does not understand and we are kinda painfully fixing by adding a preprocessor to our code. I will include the preprocessor's statements here, I think they are self-explanatory. I will open one issue per test case

# `FILTER` is not understood.
fail = 'SELECT MAX(1) FILTER (WHERE 1=1) AS foo',
success = """
    SELECT MAX(1)
        /* parser-replace: */
        FILTER (WHERE 1=1)
        /* end:parser-replace */
    AS foo
    """

I would be happy to help out fixing them, but could probably use one or two pointers where to start 🙂

Our engine is PrestoDB (Athena), but that should be covered by this issue as far as I understood.

Can this reliably check for non-select statements?

In my testing, I'm seeing that parse('any statement that is not a select statement') will cause an error. I like this; this is one of the reasons I need to parse SQL, but I don't see anywhere that this is specified as intentional or reliable.

I've got a basic try/except, like this:

try:
    parsed = parse(sql)
except Exception:
    return "use a select statement"

I wasn't able to get past parse when supplying any statement that was not a select statement, but I wanted to make sure I could dependably do this.

Formatter deletes Over clause

from mo_sql_parsing import parse
from mo_sql_parsing import format
format(parse("SELECT name, dept, RANK() OVER (PARTITION BY dept ORDER BY salary) AS rank FROM employees;"))

The resulting is "SELECT name, dept, RANK() AS rank FROM employees", without "OVER (PARTITION BY dept ORDER BY salary)"

Parse regex into ParserElements for better analysis

The RegEx() ParserElement is opaque. It could be parsed into finer ParserElements so that min_length() and expecting() can be used for further optimization. This should not slow down parsing because most ParserElements are assembled to Python regexps anyway.

The format for cast(col as FLOAT) not work as expected for as

q = "select node,datetime from eutrancellfdd where ((900 - ( cast(pmCellDowntimeAuto as FLOAT) + cast(pmCellDowntimeMan as FLOAT) ) ) / 900) < 0.9 order by datetime limit 100"
parsed_query = parse_mysql(q)
print(f"parsed_query: {parsed_query}")
re_format_query= format(parsed_query)
print(f"re_format_query: {re_format_query}")

parsed_query: {'select': [{'value': 'node'}, {'value': 'datetime'}], 'from': 'eutrancellfdd', 'where': {'lt': [{'div': [{'sub': [900, {'add': [{'cast': ['pmCellDowntimeAuto', {'float': {}}]}, {'cast': ['pmCellDowntimeMan', {'float': {}}]}]}]}, 900]}, 0.9]}, 'orderby': {'value': 'datetime'}, 'limit': 100}

re_format_query: SELECT node, datetime FROM eutrancellfdd WHERE 900 - (CAST(pmCellDowntimeAuto, FLOAT()) + CAST(pmCellDowntimeMan, FLOAT())) / 900 < 0.9 ORDER BY datetime LIMIT 100

Auto-detect SQL flavour

SQLServer identifiers (eg [this is a column name]) conflict with BigQuery's Array constructor (eg [1, 2, 3, 4]). Can a good guess be made about which flavour we are parsing?

Select CASE with NULL is missing from the parser

Coming from the issue and the fix, I think the parser may still need to be fixed.

For example, using the following SQL statement

from moz_sql_parser import parse, format
parse("SELECT a, CASE WHEN some_columns = 'Bob' THEN NULL ELSE 'helloworld' END AS some_columns FROM mytable")

we get this, the first case object is already missing the [then NULL] part

{
    "select": [
        {
            "value": "a"
        },
        {
            "value": {
                "case": [
                    {
                        "when": {
                            "eq": [
                                "some_columns",
                                {
                                    "literal": "Bob"
                                }
                            ]
                        }
                    },
                    {
                        "literal": "helloworld"
                    }
                ]
            },
            "name": "some_columns"
        }
    ],
    "from": "mytable"
}

Also, I think this might help verify if they work together

parse(format(parse("...")))

New failing test cases from a Redshift user (part 2)

Next batch of failures on our 2800 (Redshift) SQL queries codebase after mozilla#149

My programmatic testing script

from glob import glob
from tqdm import tqdm
from moz_sql_parser import parse, ParseException

sql_files = glob("models/**/*.sql", recursive=True)
print(len(sql_files))  # 2800

failures = []
with_cast_date = []
for f in tqdm(sql_files):
    with open(f, "r") as fh:
        query = fh.read()
    try:
        parse(query)
    except ParseException as e:
        failures.append(f)
        if ":: date" in query or "::date" in query:
            with_cast_date.append(f)

print(len(failures) / len(sql_files))  # 0.05714285714285714
print(len(with_cast_date) / len(sql_files))  # 0.04607142857142857

With the next 1️⃣ case, my estimation (using Regex) is that we can get this failure rate down to 1% on my codebase.
The rest of the cases are even more corner cases. I can definitely see the Pareto effect there. So fixing that one case would already be huge for my usage. I myself stop looking further in the long tail of failing queries for now 😁

:: as date

parse("select * from t left join ex on t.date = ex.date_at :: date""")

or

parse("select distinct date_at :: date as date_at from t")

or

parse("""
    select
        datediff('day', u.birth_date :: date, us.date_at :: date) as day_diff
    from
        users as u
    inner join
        user_sessions as us
""")

columns with special name

This one is scary to me.

Some of our columns from raw tables use reserved names, like date, so we rename them, for example to date_at.

parse("select date as date_at from t")

or

parse("""
select count(*) as validation_errors
from t
where date is null
""")

or

parse("""
select count(*) as validation_errors
from t
where timestamp is null
""")

sum window function

parse("""
select
    salesid, 
    dateid, 
    sellerid, 
    qty,
    sum(qty) over (order by dateid, salesid rows unbounded preceding) as sum
from winsales
order by 2,1;
""")

Ref: https://docs.aws.amazon.com/redshift/latest/dg/r_WF_SUM.html#r_WF_SUM-examples

extract

parse("select extract('epoch' from occurred_at) as my_timestamp from t")

Ref: https://docs.aws.amazon.com/redshift/latest/dg/r_EXTRACT_function.html#r_EXTRACT_function-examples

parse and format query are not works as expection

hi all:
This is similar but more issue with #close45, I'm afraid the 45 issue is not fully solved in 7.4.21313 based on our test.
Take below query for example:
q = "SELECT c1, c2 FROM t1 WHERE ((900 - (CAST(c3 AS FLOAT) + CAST(c4 AS FLOAT))) / 900) < 0.9 ORDER BY c2 LIMIT 100"
parsed_query = parse_mysql(q)
print(f"parsed_query={parsed_query}")

formatted_query = format(parsed_query)
print(f"formatted_query={formatted_query}")

new_parsed_query = parse_mysql(formatted_query)
print(f"formatted_query={new_parsed_query}")

Issues:

  1. the result for the parsed_query and formatted_query is as blow. the "()" after c3 AS FLOCAT in formatted_query should not expected.
    parsed_query={'select': [{'value': 'c1'}, {'value': 'c2'}], 'from': 't1', 'where': {'lt': [{'div': [{'sub': [900, {'add': [{'cast': ['c3', {'float': {}}]}, {'cast': ['c4', {'float': {}}]}]}]}, 900]}, 0.9]}, 'orderby': {'value': 'c2'}, 'limit': 100}
    formatted_query=SELECT c1, c2 FROM t1 WHERE 900 - (CAST(c3 AS FLOAT**()) + CAST(c4 AS FLOAT()**)) / 900 < 0.9 ORDER BY c2 LIMIT 100

  2. after invoking parsed_query=parse_mysql(q), formatted_query=format(parsed_query), and new_parsed_query=parse_mysql(formatted_query), the expectation is new_parsed_query==q,
    To be clear, the expected result should match one of the following:
    SELECT c1, c2 FROM t1 WHERE ((900 - (CAST(c3 AS FLOAT) + CAST(c4 AS FLOAT))) / 900) < 0.9 ORDER BY c2 LIMIT 100
    SELECT c1, c2 FROM t1 WHERE (900 - (CAST(c3 AS FLOAT) + CAST(c4 AS FLOAT))) / 900 < 0.9 ORDER BY c2 LIMIT 100

  3. An exception is raised when invoking "new_parsed_query = parse_mysql(formatted_query)"


ParseException Traceback (most recent call last)
~\Anaconda3\lib\site-packages\mo_parsing\expressions.py in parse_impl(self, string, start, do_actions)
504 try:
--> 505 result = e._parse(string, start, do_actions)
506 return ParseResults(

~\Anaconda3\lib\site-packages\mo_parsing\core.py in _parse(self, string, start, do_actions)
450 try:
--> 451 result = self.parse_impl(string, start, do_actions)
452 except Exception as cause:

~\Anaconda3\lib\site-packages\mo_parsing\expressions.py in parse_impl(self, string, start, do_actions)
280 else:
--> 281 raise pe
282

~\Anaconda3\lib\site-packages\mo_parsing\expressions.py in parse_impl(self, string, start, do_actions)
268 try:
--> 269 result = expr._parse(string, index, do_actions)
270 acc.append(result)

~\Anaconda3\lib\site-packages\mo_parsing\core.py in _parse(self, string, start, do_actions)
450 try:
--> 451 result = self.parse_impl(string, start, do_actions)
452 except Exception as cause:

~\Anaconda3\lib\site-packages\mo_parsing\expressions.py in parse_impl(self, string, start, do_actions)
280 else:
--> 281 raise pe
282

~\Anaconda3\lib\site-packages\mo_parsing\expressions.py in parse_impl(self, string, start, do_actions)
268 try:
--> 269 result = expr._parse(string, index, do_actions)
270 acc.append(result)

~\Anaconda3\lib\site-packages\mo_parsing\core.py in _parse(self, string, start, do_actions)
450 try:
--> 451 result = self.parse_impl(string, start, do_actions)
452 except Exception as cause:

~\Anaconda3\lib\site-packages\mo_parsing\tokens.py in parse_impl(self, string, start, do_actions)
234 )
--> 235 raise ParseException(self, start, string)
236

ParseException: Expecting union, found "- (CAST(c3" (at char 32), (line:1, col:33)

The above exception was the direct cause of the following exception:

ParseException Traceback (most recent call last)
ParseException: Expecting {union} | {intersect} | {except} | {minus}, found "- (CAST(c3" (at char 32), (line:1, col:33)

The above exception was the direct cause of the following exception:

ParseException Traceback (most recent call last)
in
7 print(f"formatted_query={formatted_query}")
8
----> 9 new_parsed_query = parse_mysql(formatted_query)
10 # print(f"formatted_query={new_parsed_query}")

~\Anaconda3\lib\site-packages\mo_sql_parsing_init_.py in parse_mysql(sql, null, calls)
50 if not mysql_parser:
51 mysql_parser = sql_parser.mysql_parser()
---> 52 return _parse(mysql_parser, sql, null, calls)
53
54

~\Anaconda3\lib\site-packages\mo_sql_parsing_init_.py in _parse(parser, sql, null, calls)
60 utils.scrub_op = calls
61 sql = sql.rstrip().rstrip(";")
---> 62 parse_result = parser.parse_string(sql, parse_all=True)
63 output = scrub(parse_result)
64 for o, n in utils.null_locations:

~\Anaconda3\lib\site-packages\mo_parsing\core.py in output(*args, **kwargs)
77 Log.error("reset action failed", cause=e)
78
---> 79 return func(*args, **kwargs)
80
81 return output

~\Anaconda3\lib\site-packages\mo_parsing\core.py in parse_string(self, string, parse_all)
133
134 """
--> 135 return self._parseString(string, parse_all=parse_all)
136
137 def _parseString(self, string, parse_all=False):

~\Anaconda3\lib\site-packages\mo_parsing\core.py in _parseString(self, string, parse_all)
153 return tokens.tokens[0]
154 except ParseException as cause:
--> 155 raise cause.best_cause
156
157 @entrypoint

ParseException: Expecting {union} | {intersect} | {except} | {minus} | {order} | {limit} | {offset} | {StringEnd}, found "- (CAST(c3" (at char 32), (line:1, col:33)

Postgres AT TIME ZONE is parsed as a function

The expression below is parsed as a non existent function in postgresql. Im not sure if you want this syntax to be supported or not, let me know!

SELECT 
    id, 
    create_date AT TIME ZONE 'UTC' as created_at, 
    write_date AT TIME ZONE 'UTC' as updated_at
FROM sometable;

Incorrect formatting of Interval

from mo_sql_parsing import parse
from mo_sql_parsing import format
format(parse("select now() + interval 2 week"))
The resulting is "SELECT NOW() + INTERVAL(2, week)"

Support dot access after bracket access

Engine: PrestoDB/Athena Structural Types

fails = "SELECT b['c'].d"
succeeds = "SELECT b['c']/* parser-replace: */.d/* end:parser-replace */"

This happens in our case when we have a map b of structs and want to access property b of the struct. It would be nice if that would also work with Arrays, but I did not test yet if that also fails. Might be a good test case: SELECT b[0].d

Thanks again!

MySQL quoting is unusual and doesn't parse correctly

I believe in ANSI a single quote surrounds strings while double quote is used to 'un-keyword' a token. However, in MySQL a backquote is used to 'un-keyword' while single and double quote both mean literal strings. Is there a way to parse using the MySQL meaning so that the following:

SELECT 'fred'
SELECT "fred"

Both give:

{"select": {"value": {"literal": "fred"}}}

Currently the latter is interpreted as SELECT fred and hence results in:

{"select": {"value": "fred"}}

Parse Fails for Tables with One or More Dashes in the Column Name

from mo_sql_parsing import parse_mysql, parse
q = "select col-cpu-usage from test-information"
parse_mysql(q)
Traceback (most recent call last):
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 144, in _parseString
StringEnd()._parse(string, end)
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 437, in _parse
result = self.parseImpl(string, start, doActions)
File "env/lib/python3.8/site-packages/mo_parsing/tokens.py", line 659, in parseImpl
raise ParseException(self, start, string)
mo_parsing.exceptions.ParseException: Expecting StringEnd, found "-informati" (at char 30), (line:1, col:31)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 146, in _parseString
raise ParseException(self.element, 0, string, cause=tokens.failures+[pe])
mo_parsing.exceptions.ParseException: Expecting {union} | {intersect} | {except} | {minus} | {order} | {limit} | {offset} | {StringEnd}, found "-informati" (at char 30), (line:1, col:31)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "env/lib/python3.8/site-packages/mo_sql_parsing/init.py", line 52, in parse_mysql
return _parse(mysql_parser, sql, null, calls)
File "env/lib/python3.8/site-packages/mo_sql_parsing/init.py", line 62, in _parse
parse_result = parser.parseString(sql, parseAll=True)
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 79, in output
return func(*args, **kwargs)
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 135, in parseString
return self._parseString(string, parseAll=parseAll)
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 153, in _parseString
raise cause.best_cause
mo_parsing.exceptions.ParseException: Expecting {union} | {intersect} | {except} | {minus} | {order} | {limit} | {offset} | {StringEnd}, found "-informati" (at char 30), (line:1, col:31)

q = "select 'col-cpu-usage' from test-information"
parse_mysql(q)
Traceback (most recent call last):
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 144, in _parseString
StringEnd()._parse(string, end)
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 437, in _parse
result = self.parseImpl(string, start, doActions)
File "env/lib/python3.8/site-packages/mo_parsing/tokens.py", line 659, in parseImpl
raise ParseException(self, start, string)
mo_parsing.exceptions.ParseException: Expecting StringEnd, found "-informati" (at char 32), (line:1, col:33)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 146, in _parseString
raise ParseException(self.element, 0, string, cause=tokens.failures+[pe])
mo_parsing.exceptions.ParseException: Expecting {union} | {intersect} | {except} | {minus} | {order} | {limit} | {offset} | {StringEnd}, found "-informati" (at char 32), (line:1, col:33)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "env/lib/python3.8/site-packages/mo_sql_parsing/init.py", line 52, in parse_mysql
return _parse(mysql_parser, sql, null, calls)
File "env/lib/python3.8/site-packages/mo_sql_parsing/init.py", line 62, in _parse
parse_result = parser.parseString(sql, parseAll=True)
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 79, in output
return func(*args, **kwargs)
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 135, in parseString
return self._parseString(string, parseAll=parseAll)
File "env/lib/python3.8/site-packages/mo_parsing/core.py", line 153, in _parseString
raise cause.best_cause
mo_parsing.exceptions.ParseException: Expecting {union} | {intersect} | {except} | {minus} | {order} | {limit} | {offset} | {StringEnd}, found "-informati" (at char 32), (line:1, col:33)

Clarify Documentation

The following snippet from the README is strange to me.

you may find it easier if all JSON expressions had a list of operands:

def normalize(expression)
    # ensure parameters are in a list
    return {
        op: params
        for op, param = expression.items()
        for params in [[normalize(p) for p in listwrap(param)]]
    }

The function definition normalize is missing a colon after the parens. The body also does not seem to be valid python syntax. Could you clarify if what is intended?

Thanks

Parsed switch (case) statement has no else

from mo_sql_parsing import parse
parse("select case table0.y1 when 'a' then 1 else 0 end from table0")

Actual output: {'select': {'value': {'case': {'when': {'eq': ['table0.y1', {'literal': 'a'}]}, 'then': 1}}}, 'from': 'table0'}

Expected output: {'select': {'value': {'case': [{'when': {'eq': ['table0.y1', {'literal': 'a'}]}, 'then': 1}, 0]}}, 'from': 'table0'}

Version: mo-sql-parsing-5.43.21240

Not sure about exact output format, but clearly dropping the else clause entirely is incorrect.

Formatter does not work for queries with INTERSECT/EXCEPT/UNION

With the recent addition of INTERSECT of #20 (Thanks a lot for that!), the format operation does not support queries with INTERSECT/EXCEPT/UNION keywords:

import mo_sql_parsing as msp

query = "SELECT stuid FROM student INTERSECT SELECT stuid FROM student"
d = msp.parse(query)
back = msp.format(d)

msp.format(d) returns an empty string

Support Structural Data Types as Cast Target

Engine: PrestoDB/Athena Structural Types

fails = 'SELECT CAST(x AS ROW(y VARCHAR))'
succeeds = """
    SELECT CAST(
        x AS /* parser-replace: VARCHAR */ROW(y VARCHAR)
        /* end:parser-replace */
    )
"""

More information:

  • Nested types lke ARRAY(ROW(y VARCHAR)) should also work - in fact, this is our use case

Thanks again!

Set difference operator causes ParseException

Both minus (Oracle) and except (Postgres) cause a ParseException:

from mo_sql_parsing import parse
sql = """select name from employee
minus
select 'Alan' from dual
"""
try:
    parse(sql)
except Exception as e:
    print(e)

Expecting StringEnd, found "select 'Al" (at char 32, (line:3, col:1)

from mo_sql_parsing import parse
sql = """select name from employee
except
select 'Alan'
"""
try:
    parse(sql)
except Exception as e:
    print(e)

Expecting StringEnd, found "select 'Al" (at char 33, (line:3, col:1)

Formatter deletes parenthesis around subqueries

import mo_sql_parsing as msp
s = "SELECT count(*) FROM (SELECT city FROM airports GROUP BY city HAVING count(*)  >  3)"
d = msp.parse(s)
back = msp.format(d)
print(back)

This code produces SELECT COUNT(*) FROM SELECT city FROM airports GROUP BY city HAVING COUNT(*) > 3 where parenthesis disappear.

Error when trying to parse more then one WITH clause

From snowflake tutorials.

Non-Working Example:

with ssr as
 (select s_store_id,
        sum(sales_price) as sales,
        sum(profit) as profit,
        sum(return_amt) as returns,
        sum(net_loss) as profit_loss
 from
  ( select  ss_store_sk as store_sk,
            ss_sold_date_sk  as date_sk,
            ss_ext_sales_price as sales_price,
            ss_net_profit as profit,
            cast(0 as decimal(7,2)) as return_amt,
            cast(0 as decimal(7,2)) as net_loss
    from store_sales
    union all
    select sr_store_sk as store_sk,
           sr_returned_date_sk as date_sk,
           cast(0 as decimal(7,2)) as sales_price,
           cast(0 as decimal(7,2)) as profit,
           sr_return_amt as return_amt,
           sr_net_loss as net_loss
    from store_returns
   ) salesreturns,
     date_dim,
     store
 where date_sk = d_date_sk
       and d_date between cast('1998-08-18' as date) 
                  and dateadd(day, 14, cast('1998-08-18' as date))
       and store_sk = s_store_sk
 group by s_store_id)
 ,
 csr as
 (select cp_catalog_page_id,
        sum(sales_price) as sales,
        sum(profit) as profit,
        sum(return_amt) as returns,
        sum(net_loss) as profit_loss
 from
  ( select  cs_catalog_page_sk as page_sk,
            cs_sold_date_sk  as date_sk,
            cs_ext_sales_price as sales_price,
            cs_net_profit as profit,
            cast(0 as decimal(7,2)) as return_amt,
            cast(0 as decimal(7,2)) as net_loss
    from catalog_sales
    union all
    select cr_catalog_page_sk as page_sk,
           cr_returned_date_sk as date_sk,
           cast(0 as decimal(7,2)) as sales_price,
           cast(0 as decimal(7,2)) as profit,
           cr_return_amount as return_amt,
           cr_net_loss as net_loss
    from catalog_returns
   ) salesreturns,
     date_dim,
     catalog_page
 where date_sk = d_date_sk
       and d_date between cast('1998-08-18' as date)
                  and dateadd(day, 14, cast('1998-08-18' as date))
       and page_sk = cp_catalog_page_sk
 group by cp_catalog_page_id)
 ,
 wsr as
 (select web_site_id,
        sum(sales_price) as sales,
        sum(profit) as profit,
        sum(return_amt) as returns,
        sum(net_loss) as profit_loss
 from
  ( select  ws_web_site_sk as wsr_web_site_sk,
            ws_sold_date_sk  as date_sk,
            ws_ext_sales_price as sales_price,
            ws_net_profit as profit,
            cast(0 as decimal(7,2)) as return_amt,
            cast(0 as decimal(7,2)) as net_loss
    from web_sales
    union all
    select ws_web_site_sk as wsr_web_site_sk,
           wr_returned_date_sk as date_sk,
           cast(0 as decimal(7,2)) as sales_price,
           cast(0 as decimal(7,2)) as profit,
           wr_return_amt as return_amt,
           wr_net_loss as net_loss
    from web_returns left outer join web_sales on
         ( wr_item_sk = ws_item_sk
           and wr_order_number = ws_order_number)
   ) salesreturns,
     date_dim,
     web_site
 where date_sk = d_date_sk
       and d_date between cast('1998-08-18' as date)
                  and dateadd(day, 14, cast('1998-08-18' as date))
       and wsr_web_site_sk = web_site_sk
 group by web_site_id)
  select  channel
        , id
        , sum(sales) as sales
        , sum(returns) as returns
        , sum(profit) as profit
 from 
 (select 'store channel' as channel
        , 'store' || s_store_id as id
        , sales
        , returns
        , (profit - profit_loss) as profit
 from   ssr
 union all
 select 'catalog channel' as channel
        , 'catalog_page' || cp_catalog_page_id as id
        , sales
        , returns
        , (profit - profit_loss) as profit
 from  csr
 union all
 select 'web channel' as channel
        , 'web_site' || web_site_id as id
        , sales
        , returns
        , (profit - profit_loss) as profit
 from   wsr
 ) x
 group by rollup (channel, id)
 order by channel
         ,id
 limit 100;

Support NULLS FIRST/LAST

Hello,

we have been using your library for a while to parse our queries and are super happy that it exists. During the last half year (we started with moz-sql-parser, right before you started not updating it anymore and did not realise that), we collected a bunch of test cases that the parser does not understand and we are kinda painfully fixing by adding a preprocessor to our code. I will include the preprocessor's statements here, I think they are self-explanatory. I will open one issue per test case

# NULLS FIRST/LAST is not understood.
fail = 'SELECT X() OVER (ORDER BY foo DESC NULLS FIRST)',
success = """
    SELECT X() OVER (
        ORDER BY foo DESC
        /* parser-replace: */NULLS FIRST/* end:parser-replace */
    )
""",

I would be happy to help out fixing them, but could probably use one or two pointers where to start 🙂

Our engine is PrestoDB (Athena), but that should be covered by this issue as far as I understood.

Problems to get LIMIT keyword

Hi there.

Firstly, I just one to point out that this project is awesome. I discovered it recently and I'm already using it in my code.

Issue

The query SELECT * FROM jobs LIMIT 100 is retrivieng a dict without a key representing the LIMIT keyword: {'select': '*', 'from': 'jobs'}.

Code snippet

import moz_sql_parser as msp

print(msp.parse('SELECT * FROM jobs LIMIT 10'))

> {'select': '*', 'from': 'jobs'}

Reproducing the behavior

These were the tested versions:

  • 3.32.20026 (dev latest at this date)
  • 3.125.20293 (latest at Mozilla master branch)

Thanks

COUNT (DISTINCT Y) generate an error

I tried to parse the following line:
parse("SELECT COUNT(DISTINCT Y) FROM A ")
Got this error:
Traceback (most recent call last):
File "", line 1, in
File "D:\Github\sol-etl-process-service\venv\lib\site-packages\moz_sql_parser_init_.py", line 31, in parse
parse_result = SQLParser.parseString(sql, parseAll=True)
File "D:\Github\sol-etl-process-service\venv\lib\site-packages\mo_parsing\core.py", line 73, in output
return func(*args, **kwargs)
File "D:\Github\sol-etl-process-service\venv\lib\site-packages\mo_parsing\core.py", line 319, in parseString
raise exc
File "D:\Github\sol-etl-process-service\venv\lib\site-packages\mo_parsing\core.py", line 317, in parseString
se._parse(string, loc)
File "D:\Github\sol-etl-process-service\venv\lib\site-packages\mo_parsing\core.py", line 222, in _parse
loc, tokens = self.parseImpl(string, preloc, doActions)
File "D:\Github\sol-etl-process-service\venv\lib\site-packages\mo_parsing\expressions.py", line 198, in parseImpl
raise pe
File "D:\Github\sol-etl-process-service\venv\lib\site-packages\mo_parsing\expressions.py", line 190, in parseImpl
loc, exprtokens = expr._parse(string, loc, doActions)
File "D:\Github\sol-etl-process-service\venv\lib\site-packages\mo_parsing\core.py", line 222, in _parse
loc, tokens = self.parseImpl(string, preloc, doActions)
File "D:\Github\sol-etl-process-service\venv\lib\site-packages\mo_parsing\tokens.py", line 1120, in parseImpl
raise ParseException(self, loc, string)
mo_parsing.exceptions.ParseException: Expecting StringEnd, found '(' (at char 12), (line:1, col:13)

Thanks

Identifiers do not allow for character encoding other than 7bit ASCII

Table issue:

from mo_sql_parsing import parse
sql = """select ace from motörhead"""
try:
    parse(sql)
except Exception as e:
    print(e)

Expecting StringEnd, found "\u00f6rhead" (at char 19, (line:1, col:20)

Column issue:

from mo_sql_parsing import parse
sql = """select äce from motorhead"""
try:
    parse(sql)
except Exception as e:
    print(e)
Expecting {{expression1 + [{[as] + alias}]}} | {*}, found "\u00e4ce from m" (at char 7, (line:1, col:8)	
	Expecting value, found "\u00e4ce from m" (at char 7, (line:1, col:8)	
		Expecting *, found "\u00e4ce from m" (at char 7, (line:1, col:8)	
		Expecting {null} | {true} | {false} | {nocase} | {{interval + {{' + params + duration + [{, + params + duration}]... + '}} | {{params + duration}}}} | {{op + {params} | {params}}} | {{extract + ( + {duration} | {Forward: Forward: {value + [{within + over}]}} + from + Forward: Forward: {value + [{within + over}]} + )}} | {{case + case + [{else + Forward: Forward: {value + [{within + over}]}}] + end}} | {{case + Forward: Forward: {value + [{within + over}]} + case + [{else + Forward: Forward: {value + [{within + over}]}}] + end}} | {Group:({cast + ( + Forward: Forward: {value + [{within + over}]} + as + params + )})} | {{distinct + params}} | {{( + Group:(Forward: ordered sql) + )}} | {{( + Group:({Forward: {value + [{within + over}]} + [{, + Forward: {value + [{within + over}]}}]...}) + )}} | {string} | {hex} | {{{float} | {int} + {identifier + ( + params + ignore_nulls + )}}} | {{{float} | {int} + table name}} | {float} | {int} | {{identifier + ( + params + ignore_nulls + )}} | {Group:(array)} | {Group:(bigint)} | {Group:(bool)} | {Group:(boolean)} | {{blob + [{( + int + )}]}} | {{bytes + [{( + int + )}]}} | {{char + [{( + int + )}]}} | {{date + [params]}} | {{datetime + [params]}} | {{decimal + [{( + int + , + int + )}]}} | {{double + precision}} | {Group:(double)} | {Group:(float64)} | {Group:(float)} | {Group:(geometry)} | {Group:(integer)} | {Group:(int)} | {Group:(int32)} | {Group:(int64)} | {{numeric + [{( + int + , + int + )}]}} | {Group:(real)} | {Group:(text)} | {Group:(smallint)} | {Group:(string)} | {Group:(struct)} | {{time + [params]}} | {{timestamp + [params]}} | {{timestamptz + [params]}} | {{timetz + [params]}} | {{varchar + [{( + int + )}]}} | {{varbinary + [{( + int + )}]}} | {{tinyint + [{( + int + )}]}} | {Combine:({table name + [.*]})} | {{( + expression + )}}, found "\u00e4ce from m" (at char 7, (line:1, col:8)	
			expecting one of ["'", "(", "+", "-", ".", "0", "0x", "1", "2", "3", "4", "5", "6", "7", "8", "9", "case", "cast", "date", "datetime", "distinct", "extract", "false", "interval", "nocase", "null", "time", "timestamp", "timestamptz", "timetz", "true"] (), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
			Expecting {Regex:({" + [1]... + "})} | {Regex:({` + [1]... + `})} | {Regex:({[ + [1]... + ]})} | {W:([\$0-9@-Z_a-z]+)}, found "\u00e4ce from m" (at char 7, (line:1, col:8)	
				Expecting Regex:({" + [1]... + "}), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
				Expecting Regex:({` + [1]... + `}), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
				Expecting Regex:({[ + [1]... + ]}), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
				Expecting W:([\$0-9@-Z_a-z]+), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
			expecting one of ["array", "bigint", "blob", "bool", "boolean", "bytes", "char", "date", "datetime", "decimal", "double", "float", "float64", "geometry", "int", "int32", "int64", "integer", "numeric", "real", "smallint", "string", "struct", "text", "time", "timestamp", "timestamptz", "timetz", "tinyint", "varbinary", "varchar"] (), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
			Expecting {Regex:({" + [1]... + "})} | {Regex:({` + [1]... + `})} | {Regex:({[ + [1]... + ]})} | {W:([\$0-9@-Z_a-z]+)}, found "\u00e4ce from m" (at char 7, (line:1, col:8)	
				Expecting Regex:({" + [1]... + "}), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
				Expecting Regex:({` + [1]... + `}), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
				Expecting Regex:({[ + [1]... + ]}), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
				Expecting W:([\$0-9@-Z_a-z]+), found "\u00e4ce from m" (at char 7, (line:1, col:8)	
			Expecting (, found "\u00e4ce from m" (at char 7, (line:1, col:8)	
	Expecting *, found "\u00e4ce from m" (at char 7, (line:1, col:8)	

UTF-8 is not a problem per se:

from mo_sql_parsing import parse
sql = """select 'äce'"""
try:
    parse(sql)
    print('works')
except Exception as e:
    print(e)

works

Why this bevaiour when UTF-8 is found in identifier?

Support binary Operators after Window Statement

Hello,

we have been using your library for a while to parse our queries and are super happy that it exists. During the last half year (we started with moz-sql-parser, right before you started not updating it anymore and did not realise that), we collected a bunch of test cases that the parser does not understand and we are kinda painfully fixing by adding a preprocessor to our code. I will include the preprocessor's statements here, I think they are self-explanatory. I will open one issue per test case

# NULLS FIRST/LAST is not understood.
fail = 'SELECT X() OVER () = 1 AS is_true'
success = """
SELECT
    X() OVER ()
    /* parser-replace: */= 1/* end:parser-replace */ AS is_true
"""

I would be happy to help out fixing them, but could probably use one or two pointers where to start 🙂

Our engine is PrestoDB (Athena), but that should be covered by this issue as far as I understood.

Support IS DISTINCT FROM

Engine: PrestoDB/Athena IS DISTINCT FROM

fails = 'SELECT 1 IS DISTINCT FROM 2'
succeeds = """
    SELECT 1
    /* parser-replace: */IS DISTINCT FROM 2/* end:parser-replace */
"""

Thanks again!

`WHERE IN` with multiple columns parse → format sequence produce illegal sql

I used mo-sql-parsing from dev branch:

>>> import mo_sql_parsing as sqlp
>>> sqlp.parse("select * from T where (a, b) in (('a', 'b'), ('c', 'd'))")
{'select': '*', 'from': 'T', 'where': {'in': [['a', 'b'], [{'literal': ['a', 'b']}, {'literal': ['c', 'd']}]]}}
>>> sqlp.format(_)
"SELECT * FROM T WHERE ['a', 'b'] IN ('a', 'b'), ('c', 'd')"

So, the square brackets after WHERE are illegal sql syntax.

parsing error when using squared bracket in from

Hello,

I'm looking at using your lib to deduplicate queries in a large inventory of Ms SQL Server based queries made by 100 of users.

While parsing our inventory I've encounter the following error:
ParseException: Expecting table_source, found "[myDB].[my" (at char 49), (line:1, col:50)

You can use this sample query to replicate the issue:
SELECT [Timestamp] ,[RowsCount] ,[DataName] FROM [myDB].[myTable] where [Timestamp] >='2020-01-01' and [Timestamp]<'2020-12-31'

I hope you can include this use case in your lib, in the meantime, I'm doing some clean-up before using your parser.

thanks for your work !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.