dtstack / dt-python-parser Goto Github PK

View Code? Open in Web Editor NEW

25.0 25.0 6.0 3.67 MB

Python Parsers for BigData, built with antlr4.

License: MIT License

JavaScript 92.84% ANTLR 4.90% TypeScript 2.26%

bigdata parser python typescript

dt-python-parser's People

Contributors

Stargazers

Watchers

Forkers

profbramble andyzjy wasabi1998 haydenorz joshi1983 dabaoshu

dt-python-parser's Issues

Problem when using listener mode

When using listener mode the parser is always expecting a line break after each line of python code.
For instance,

const parser = new Python3Parser();
const python ='import sys\nfor i in sys.argv:\n print(i)';
// parseTree
const tree = parser.parse(python);
class MyListener extends Python3Listener {
    enterImport_name(ctx): void {
        let importName = ctx
            .getText()
            .toLowerCase()
            .match(/(?<=import).+/)?.[0];
        console.log('ImportName', importName);
    }
}
const listenTableName = new MyListener();
parser.listen(listenTableName, tree);

The above code will work but when I remove '\n' from the python code it won't (Given below).

const parser = new Python3Parser();
const python ='import sys';
// parseTree
const tree = parser.parse(python);
class MyListener extends Python3Listener {
    enterImport_name(ctx): void {
        let importName = ctx
            .getText()
            .toLowerCase()
            .match(/(?<=import).+/)?.[0];
        console.log('ImportName', importName);
    }
}
const listenTableName = new MyListener();
parser.listen(listenTableName, tree);

Is there a work around for this? I want to dynamically pass the python code to the parser so it might not always have a line break.

支持最新版Python3语法

How to parse/tokenize only INDENT and DEDENT tokens?

Hello, I'm working on my VS Code extension, blockman, it renders blocks based on nested code blocks to make it easier to visually perceive the code.

video:
https://youtu.be/2Ajh8WQJvHs

The "dt-python-parser" package works really well, I use "getAllTokens" function to get all the tokens from python text file and then filter only type 93 (INDENT) and type 94 (DEDENT).

I need only INDENT and DEDENT locations, nothing more, but the "getAllTokens" function tokenizes everything and therefore losing much time, so if the file has more than 1000 lines, like 5000 lines or so, getAllTokens function takes many seconds to return the value, so rendering blocks take more time and waiting is not very comfortable for a user.

So, for the optimization, can I use the parser/tokenizer in such way that it would parse only INDENT/DEDENT locations and nothing more?

Use dt-python-parser client-side in a browser ?

Hi,
Is that possible to use dt-python-parser library in a browser using

Thanks for your help!

lexer freezes on this Python code

Steps to reproduce:

Try to run lexer("print('\'m')")
Notice that it freezes.

I traced the freeze to an infinite do-while loop in the lexer.
In the following code, input is set to the code "print('\'m')" but current variable reaches into the billions,
currentChar === undefined, and
validator.test(currentChar) is returning false when currentChar is undefined.

  /**
     * 过滤（提取） 引号中的内容
     */
    // eslint-disable-next-line
    const matchQuotation = (currentChar, validator, TokenType) => {
        do {
            if (currentChar === '\n') {
                line++;
            }
            currentChar = input[++current];
        } while (!validator.test(currentChar));
        ++current;
    };

I wonder if this has anything to do with the backslashed quotation mark. Maybe the grammar doesn't define STRING properly for escaped quotes.

I edited this issue because I first thought the bug was reproducible by parsing but it actually requires running the lexer function on the code.

Async / Await Support for the Python parser

Hello,
I use your dt-python-parser for my web-based Python IDE as a syntax validator.
However I found out, that the grammar of your parser does not support async / await keywords.
URL

Do you plan to include this into the ANTLR grammar? I will appreciate that because my web-based IDE runs the Python in event-driven environment.
Thanks for answer.

auto-complete

已定义的变量、函数、类等支持自动补全

Lexer throws RangeError: Invalid string length

There appears to be a bug with the lexer that is reproduced by the following JavaScript:

import { lexer } from 'dt-python-parser';

// This Python code is processed with no problem:
const python = `"""it is for test"""\nvar1 = "Hello World!"\n# comment here\nfor i in range(5):\n    print(i)`;
const commentTokens = lexer(python);
console.log(commentTokens);
/*
    [
      {
        type: 'Comment',
        value: '"""it is for test"""',
        start: 0,
        lineNumber: 1,
        end: 20
      }
    ]
*/

////////////////////////////////// HERE is where the bug is reproduced:
const commentTokens2 = lexer('# hi');
console.log(commentTokens2); // never reaches this point.

Here is the stack trace I get:
RangeError: Invalid string length
at lexer (C:\Users\josh.greig\Desktop\turtle\python-parser\node_modules\dt-python-parser\dist\utils\index.js:76:26)
at file:///C:/Users/josh.greig/Desktop/turtle/python-parser/comments.mjs:20:24
at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
at async Loader.import (internal/modules/esm/loader.js:166:24)
at async Object.loadESM (internal/process/esm_loader.js:68:5)

Oddly enough, I can parse the same code without a problem. The resulting tree doesn't contain the single-line comments but that is as you intended parse to work.

I'm working around this by adding a '\n' to the end of the Python code before passing it to the lexer. This bug is reproduced by Python code with a '#' comment and no newline character at the very end.

Is there a problem with this in the grammar not matching EOF and instead strictly looking for a line break?:
fragment COMMENT
: '#' ~[\r\n]*
;

Python Code Generator and Tree-Rewrite Engine for Python Parser

Hi all,
in my Python IDE I need emulate synchrounous Python code in the browser. The reason for this is that we cannot teach young students asynchronous programming paradigma using async and await keywords. The path I want to do this is using dt-python-parser, then to use a grammar-based tree-rewrite engine for AST to rewrite synchronous code to asynchronous and finally use a Python synthetizer to generate Python Code again.

Before I will implement Python Code Generator and Python Tree-Rewrite Engine I want to ask if anyone is not already working on this issue.
My second question is if you want this contribution to make part of dt-python-parser? It would be quite fine to have it in one library.

Thanks very much for answer.

dtstack / dt-python-parser Goto Github PK

dt-python-parser's People

Contributors

Stargazers

Watchers

Forkers

dt-python-parser's Issues

Problem when using listener mode

支持最新版Python3语法

How to parse/tokenize only INDENT and DEDENT tokens?

Use dt-python-parser client-side in a browser ?

lexer freezes on this Python code

Async / Await Support for the Python parser

auto-complete

Lexer throws RangeError: Invalid string length

Python Code Generator and Tree-Rewrite Engine for Python Parser

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent