Giter VIP home page Giter VIP logo

dt-python-parser's People

Contributors

dependabot[bot] avatar haydenorz avatar joshi1983 avatar profbramble avatar wewoor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dt-python-parser's Issues

Problem when using listener mode

When using listener mode the parser is always expecting a line break after each line of python code.
For instance,

const parser = new Python3Parser();
const python ='import sys\nfor i in sys.argv:\n print(i)';
// parseTree
const tree = parser.parse(python);
class MyListener extends Python3Listener {
    enterImport_name(ctx): void {
        let importName = ctx
            .getText()
            .toLowerCase()
            .match(/(?<=import).+/)?.[0];
        console.log('ImportName', importName);
    }
}
const listenTableName = new MyListener();
parser.listen(listenTableName, tree);

The above code will work but when I remove '\n' from the python code it won't (Given below).

const parser = new Python3Parser();
const python ='import sys';
// parseTree
const tree = parser.parse(python);
class MyListener extends Python3Listener {
    enterImport_name(ctx): void {
        let importName = ctx
            .getText()
            .toLowerCase()
            .match(/(?<=import).+/)?.[0];
        console.log('ImportName', importName);
    }
}
const listenTableName = new MyListener();
parser.listen(listenTableName, tree);

Is there a work around for this? I want to dynamically pass the python code to the parser so it might not always have a line break.

How to parse/tokenize only INDENT and DEDENT tokens?

Hello, I'm working on my VS Code extension, blockman, it renders blocks based on nested code blocks to make it easier to visually perceive the code.

video:
https://youtu.be/2Ajh8WQJvHs

The "dt-python-parser" package works really well, I use "getAllTokens" function to get all the tokens from python text file and then filter only type 93 (INDENT) and type 94 (DEDENT).

I need only INDENT and DEDENT locations, nothing more, but the "getAllTokens" function tokenizes everything and therefore losing much time, so if the file has more than 1000 lines, like 5000 lines or so, getAllTokens function takes many seconds to return the value, so rendering blocks take more time and waiting is not very comfortable for a user.

So, for the optimization, can I use the parser/tokenizer in such way that it would parse only INDENT/DEDENT locations and nothing more?

lexer freezes on this Python code

Steps to reproduce:

  1. Try to run lexer("print('\'m')")
  2. Notice that it freezes.

I traced the freeze to an infinite do-while loop in the lexer.
In the following code, input is set to the code "print('\'m')" but current variable reaches into the billions,
currentChar === undefined, and
validator.test(currentChar) is returning false when currentChar is undefined.

  /**
     * 过滤(提取) 引号中的内容
     */
    // eslint-disable-next-line
    const matchQuotation = (currentChar, validator, TokenType) => {
        do {
            if (currentChar === '\n') {
                line++;
            }
            currentChar = input[++current];
        } while (!validator.test(currentChar));
        ++current;
    };

I wonder if this has anything to do with the backslashed quotation mark. Maybe the grammar doesn't define STRING properly for escaped quotes.

I edited this issue because I first thought the bug was reproducible by parsing but it actually requires running the lexer function on the code.

Async / Await Support for the Python parser

Hello,
I use your dt-python-parser for my web-based Python IDE as a syntax validator.
However I found out, that the grammar of your parser does not support async / await keywords.
URL

Do you plan to include this into the ANTLR grammar? I will appreciate that because my web-based IDE runs the Python in event-driven environment.
Thanks for answer.

auto-complete

已定义的变量、函数、类等支持自动补全

Lexer throws RangeError: Invalid string length

There appears to be a bug with the lexer that is reproduced by the following JavaScript:

import { lexer } from 'dt-python-parser';

// This Python code is processed with no problem:
const python = `"""it is for test"""\nvar1 = "Hello World!"\n# comment here\nfor i in range(5):\n    print(i)`;
const commentTokens = lexer(python);
console.log(commentTokens);
/*
    [
      {
        type: 'Comment',
        value: '"""it is for test"""',
        start: 0,
        lineNumber: 1,
        end: 20
      }
    ]
*/

////////////////////////////////// HERE is where the bug is reproduced:
const commentTokens2 = lexer('# hi');
console.log(commentTokens2); // never reaches this point.

Here is the stack trace I get:
RangeError: Invalid string length
at lexer (C:\Users\josh.greig\Desktop\turtle\python-parser\node_modules\dt-python-parser\dist\utils\index.js:76:26)
at file:///C:/Users/josh.greig/Desktop/turtle/python-parser/comments.mjs:20:24
at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
at async Loader.import (internal/modules/esm/loader.js:166:24)
at async Object.loadESM (internal/process/esm_loader.js:68:5)

Oddly enough, I can parse the same code without a problem. The resulting tree doesn't contain the single-line comments but that is as you intended parse to work.

I'm working around this by adding a '\n' to the end of the Python code before passing it to the lexer. This bug is reproduced by Python code with a '#' comment and no newline character at the very end.

Is there a problem with this in the grammar not matching EOF and instead strictly looking for a line break?:
fragment COMMENT
: '#' ~[\r\n]*
;

Python Code Generator and Tree-Rewrite Engine for Python Parser

Hi all,
in my Python IDE I need emulate synchrounous Python code in the browser. The reason for this is that we cannot teach young students asynchronous programming paradigma using async and await keywords. The path I want to do this is using dt-python-parser, then to use a grammar-based tree-rewrite engine for AST to rewrite synchronous code to asynchronous and finally use a Python synthetizer to generate Python Code again.

  1. Before I will implement Python Code Generator and Python Tree-Rewrite Engine I want to ask if anyone is not already working on this issue.

  2. My second question is if you want this contribution to make part of dt-python-parser? It would be quite fine to have it in one library.

Thanks very much for answer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.