klauer / blark Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 5.0 3.97 MB

Beckhoff TwinCAT ST (IEC 61131-3) code parsing in Python using Lark (Earley)

Home Page: https://klauer.github.io/blark/

License: GNU General Public License v2.0

Python 99.00% Smalltalk 1.00%

iec61131-3 sphinx-domain structured-text twincat

blark's Introduction

Let's collaborate!

blark's People

Contributors

Stargazers

Watchers

Forkers

cssaheel engineerjoe440 din-dke drehstromer ggessner

blark's Issues

Add Support for Multi-Line Object Oriented Function-Block Usage

Ok, so admittedly this is a new one to me. I had someone send me a snippet of 61131, and I was playing with it in blark and ran into a nice-big-ol'-error. Thought I'd report it, since it does seem worth supporting.

Problem

Apparently, given the right constraints, it's possible to "chain" method calls in 61131 (e.g., object.call1().call2()) and it's legal to have that "chaining" extend across line-separation. In other words, the following logic is "legal:"

object.call1()
      .call2();

The problem is that white-space between call1() and call2(). blark does not like it. 😦

Here's a sample of what the error might look like (through the lens of a pytest run)

E           lark.exceptions.UnexpectedCharacters: No terminal matches '.' in the current parser context, at line 9 col 4
E           
E              .DoSomethingElse(input_1:=5)
E              ^
E           Expected one of: 
E               * SEMICOLON

What's Needed

I think we need to figure out how to allow an indeterminate amount of white-space to be allowed in these chained-method cases. I'll have to do some poking around to see if I might be able to figure out how to make that work. Any suggestions are welcome.

I have a new test case that demonstrates this issue here: https://github.com/engineerjoe440/blark/blob/bugfix/newline-separated-oop-logic/blark/tests/source/object_oriented_dotaccess.st#L8-L10

Definitely an interesting pattern in the ST. I have to admit, it's kinda cool...

Consider trying the OpenPLC grammar

OpenPLC's matiec is a working compiler which has a grammar based on flex/yacc
https://github.com/thiagoralves/OpenPLC_v3/blob/master/utils/matiec_src/stage1_2/iec_bison.yy

It's similar to the one blark is currently based on, but likely better tested. It may make sense to give porting it a try at some point.

Support `CONTINUE` Statement

Hello! It's been a while! I'm poking around at the "linter" project I've been working on, again, and I stumbled across a bit of missing grammar. I think that we're missing "CONTINUE" from the grammar. Looks like it's an extension of the 61131 standard, but both CODESYS and Schneider Electric seem to support it:

A simple example might be like the following:

WHILE condition DO
    IF another_condition THEN
        CONTINUE;
    END_IF
END_WHILE

I'll see if I can't get a PR up to add the syntax.

Move to lark-only grammar

Currently parsing + generating iec.lark from iec.grammar (iec2xml)

TODO

Remove iec.grammar + conversion utility
Reformat iec.lark

Array Initialization with Zero

This is an interesting syntax I haven't seen, before.

This screenshot was taken in a CODESYS environment, and I can't find specific documentation for Beckhoff systems that allow this as legal syntax (but I can't find it as illegal syntax, either, for that matter). I think this should be confirmed as legal (or proven as illegal) syntax for Beckhoff systems, and then perhaps we can proceed from there.

Add documentation to repository

Add sphinx (or mkdocs would be fine too) documentation to the repository
Document the blark API (automatically)
Document the blark CLI entrypoint
Add in stuff from the README to the documentation and expound upon it
Add a CI job to deploy documentation (preferably with docs-versions-menu)

Add support for UNION (TYPE)

Ref: https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_plc_intro/2529421195.html&id=

Support TwinCAT solutions/project/source code files directly

blark will continue to support the current input formats - just without pytmc as a required dependency.
I'm hoping to make file loading more easily customizable, with the option to load arbitrary file formats so long as you can provide an adapter (or a JSON-ified code block).

This is more of a TODO list at the moment - more details on the way soon.

Most of the following are at least somewhat done in the WIP branch wip_standalone_solution_parser:

Ensure apischema round-trips all test suite cases

Goal

All blark dataclasses from blark.transform should be able to be converted to JSON by way of apischema.serialize
All generated JSON should similarly be able to be transformed back to dataclasses by way of apischema.deserialize
apischema should remain an optional dependency of blark

"JSON" above refers to JSON-serializable data types, including Python built-ins such as dicts, lists, ints, floats, and strings. The conversion of that to an actual serialized JSON string is an additional step.

Issue

During the course of testing, I ran into some issues with apischema on Python 3.9+ (at least according to inline code comments). I wasn't able to get to the heart of the issue to determine a path forward, I merely set up a "skip if on Python 3.9+ or apischema is unavailable".

Test suite reference:

blark/blark/tests/test_transformer.py

Lines 19 to 21 in 263e263

 # TODO: apischema serialization is recursing infinitely on 3.9 and 3.10; 

 # need to dig into details and report it (first test that fails is ARRAY-related) 

 APISCHEMA_SKIP = sys.version_info[:2] >= (3, 9)

Optional dependency in the transformer dataclass section:

blark/blark/transform.py

Lines 18 to 28 in 263e263

 try: 

 # NOTE: apischema is an optional requirement; this should work regardless. 

 import apischema 

 from .apischema_compat import as_tagged_union 

 except ImportError: 

 apischema = None 

 def as_tagged_union(cls: Type[T]) -> Type[T]: 

 """No-operation stand-in for when apischema is not available.""" 

 return cls

Failing tests as of `263e263`

With APISCHEMA_SKIP set to False on Python 3.9 and blark 263e263, the following tests fail on the serialization portion (at least sometimes due to a recursion error in apischema). apischema=0.17.5, 84 failed, 144 passed, 5 xfailed in 77.86s (0:01:17)

FAILED tests/test_transformer.py::test_expression_roundtrip[array_type_declaration-TypeName : ARRAY [1..2, 3..4] OF INT] - Re...
FAILED tests/test_transformer.py::test_expression_roundtrip[array_type_declaration-TypeName : ARRAY [1..2] OF INT := [1, 2]]
FAILED tests/test_transformer.py::test_expression_roundtrip[array_type_declaration-TypeName : ARRAY [1..2, 3..4] OF INT := [2(3), 3(4)]]
FAILED tests/test_transformer.py::test_expression_roundtrip[array_type_declaration-TypeName : ARRAY [1..2, 3..4] OF Tc.SomeType]
FAILED tests/test_transformer.py::test_expression_roundtrip[structure_type_declaration-TypeName :\nSTRUCT\nEND_STRUCT] - Recu...
FAILED tests/test_transformer.py::test_expression_roundtrip[structure_type_declaration-TypeName EXTENDS Other.Type :\nSTRUCT\nEND_STRUCT]
FAILED tests/test_transformer.py::test_expression_roundtrip[structure_type_declaration-TypeName : POINTER TO\nSTRUCT\nEND_STRUCT]
FAILED tests/test_transformer.py::test_expression_roundtrip[structure_type_declaration-TypeName : POINTER TO\nSTRUCT\n    iValue : INT;\nEND_STRUCT]
FAILED tests/test_transformer.py::test_expression_roundtrip[structure_type_declaration-TypeName : POINTER TO\nSTRUCT\n    iValue : INT := 3 + 4;\n    stTest : ST_Testing := (1, 2);\n    eValue : E_Test := E_Test.ABC;\n    arrValue : ARRAY [1..2] OF INT := [1, 2];\n    arrValue1 : INT (1..2);\n    arrValue1 : (Value1 := 1) INT;\n    sValue : STRING := 'abc';\n    iValue1 AT %I* : INT := 5;\n    iValue2 AT %Q* : INT := 5;\n    iValue3 AT %M* : INT := 5;\n    sValue1 : STRING[10] := 'test';\nEND_STRUCT]
FAILED tests/test_transformer.py::test_input_roundtrip[input_declarations-VAR_INPUT\nEND_VAR] - RecursionError: maximum recur...
FAILED tests/test_transformer.py::test_input_roundtrip[input_declarations-VAR_INPUT RETAIN\nEND_VAR] - RecursionError: maximu...
FAILED tests/test_transformer.py::test_input_roundtrip[input_declarations-VAR_INPUT RETAIN\n    iValue : INT;\n    sValue : STRING := 'abc';\n    wsValue : WSTRING := "abc";\nEND_VAR]
FAILED tests/test_transformer.py::test_input_roundtrip[input_declarations-VAR_INPUT RETAIN\n    iValue : INT;\n    sValue : STRING := 'abc';\n    wsValue : WSTRING := "abc";\n    fbTest : FB_Test(1, 2, 3);\n    fbTest : FB_Test(A := 1, B := 2, C => 3);\n    fbTest : FB_Test(1, 2, A := 1, B := 2, C => 3);\n    fbTest : FB_Test := (1, 2, 3);\nEND_VAR]
FAILED tests/test_transformer.py::test_output_roundtrip[output_declarations-VAR_OUTPUT\nEND_VAR] - RecursionError: maximum re...
FAILED tests/test_transformer.py::test_output_roundtrip[output_declarations-VAR_OUTPUT RETAIN\nEND_VAR] - RecursionError: max...
FAILED tests/test_transformer.py::test_output_roundtrip[output_declarations-VAR_OUTPUT RETAIN\n    iValue : INT;\n    sValue : STRING := 'abc';\n    wsValue : WSTRING := "abc";\nEND_VAR]
FAILED tests/test_transformer.py::test_output_roundtrip[output_declarations-VAR_OUTPUT RETAIN\n    iValue : INT;\n    sValue : STRING := 'abc';\n    wsValue : WSTRING := "abc";\n    fbTest : FB_Test(1, 2, 3);\n    fbTest : FB_Test(A := 1, B := 2, C => 3);\n    fbTest : FB_Test(1, 2, A := 1, B := 2, C => 3);\n    fbTest : FB_Test := (1, 2, 3);\nEND_VAR]
FAILED tests/test_transformer.py::test_input_output_roundtrip[input_output_declarations-VAR_IN_OUT\nEND_VAR] - RecursionError...
FAILED tests/test_transformer.py::test_input_output_roundtrip[input_output_declarations-VAR_IN_OUT\n    iValue : INT;\n    sValue : STRING := 'abc';\n    wsValue : WSTRING := "abc";\nEND_VAR]
FAILED tests/test_transformer.py::test_input_output_roundtrip[input_output_declarations-VAR_IN_OUT\n    iValue : INT;\n    sValue : STRING := 'abc';\n    wsValue : WSTRING := "abc";\n    fbTest : FB_Test(1, 2, 3);\n    fbTest : FB_Test(A := 1, B := 2, C => 3);\n    fbTest : FB_Test(1, 2, A := 1, B := 2, C => 3);\n    fbTest : FB_Test := (1, 2, 3);\nEND_VAR]
FAILED tests/test_transformer.py::test_global_roundtrip[global_var_declarations-VAR_GLOBAL\nEND_VAR] - RecursionError: maximu...
FAILED tests/test_transformer.py::test_global_roundtrip[global_var_declarations-VAR_GLOBAL CONSTANT\nEND_VAR] - RecursionErro...
FAILED tests/test_transformer.py::test_global_roundtrip[global_var_declarations-VAR_GLOBAL PERSISTENT\nEND_VAR] - RecursionEr...
FAILED tests/test_transformer.py::test_global_roundtrip[global_var_declarations-VAR_GLOBAL CONSTANT PERSISTENT\nEND_VAR] - Re...
FAILED tests/test_transformer.py::test_global_roundtrip[global_var_declarations-VAR_GLOBAL CONSTANT PERSISTENT\n    iValue : INT := 5;\n    fbTest1 : FB_Test(1, 2);\n    fbTest2 : FB_Test(A := 1, B := 2);\n    fbTest3 : FB_TestC;\n    i_iFscEB1Ch0AI AT %I* : INT;\nEND_VAR]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK fbName\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK fbName IMPLEMENTS I_fbName\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK fbName IMPLEMENTS I_fbName, I_fbName2\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK ABSTRACT fbName EXTENDS OtherFbName\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK PRIVATE fbName EXTENDS OtherFbName\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK PUBLIC fbName EXTENDS OtherFbName\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK INTERNAL fbName EXTENDS OtherFbName\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK PROTECTED fbName EXTENDS OtherFbName\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK FINAL fbName EXTENDS OtherFbName\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK fbName\nVAR_INPUT\n    bExecute : BOOL;\nEND_VAR\nVAR_OUTPUT\n    iResult : INT;\nEND_VAR\nVAR_IN_OUT\n    iShared : INT;\nEND_VAR\nVAR CONSTANT\n    iConstant : INT := 5;\nEND_VAR\nVAR\n    iInternal : INT;\nEND_VAR\nVAR RETAIN\n    iRetained : INT;\nEND_VAR\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[located_var_declarations-VAR RETAIN\n    iValue AT %IB1 : INT := 5;\nEND_VAR]
FAILED tests/test_transformer.py::test_fb_roundtrip[temp_var_decls-VAR_TEMP\n    iGlobalVar : INT;\nEND_VAR] - RecursionError...
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK fbName\n    iValue := 1;\n    iValue;\n    iValue S= 1;\n    iValue R= 1;\n    iValue REF= GVL.iTest;\n    fbOther(A := 5, B => iValue, NOT C => iValue1);\n    IF 1 THEN\n        iValue := 1;\n        IF 1 THEN\n            iValue := 1;\n        END_IF\n    END_IF\n    Method();\n    RETURN;\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_type_declaration-FUNCTION_BLOCK fbName\n    Method();\n    IF 1 THEN\n        EXIT;\n    END_IF\nEND_FUNCTION_BLOCK]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_method_declaration-METHOD PRIVATE MethodName : RETURNTYPE\nEND_METHOD]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_method_declaration-METHOD PRIVATE MethodName : ARRAY [1..2] OF INT\nEND_METHOD]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_method_declaration-METHOD PUBLIC MethodName : RETURNTYPE\nEND_METHOD]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_method_declaration-METHOD PUBLIC MethodName\nEND_METHOD]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_method_declaration-METHOD MethodName : RETURNTYPE\n    VAR_INPUT\n        bExecute : BOOL;\n    END_VAR\n    VAR_OUTPUT\n        iResult : INT;\n    END_VAR\n    iResult := 5;\nEND_METHOD]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_method_declaration-METHOD PUBLIC MethodName\n    VAR_INST\n        bExecute : BOOL;\n    END_VAR\nEND_METHOD]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_method_declaration-METHOD PUBLIC ABSTRACT MethodName : LREAL\n    VAR_INPUT\n        I : UINT;\n    END_VAR\nEND_METHOD]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_property_declaration-PROPERTY PRIVATE PropertyName : RETURNTYPE\nEND_PROPERTY]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_property_declaration-PROPERTY PRIVATE PropertyName : ARRAY [1..2] OF INT\nEND_PROPERTY]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_property_declaration-PROPERTY PUBLIC PropertyName : RETURNTYPE\nEND_PROPERTY]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_property_declaration-PROPERTY PUBLIC PropertyName\nEND_PROPERTY]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_property_declaration-PROPERTY PropertyName : RETURNTYPE\n    VAR_INPUT\n        bExecute : BOOL;\n    END_VAR\n    VAR_OUTPUT\n        iResult : INT;\n    END_VAR\n    iResult := 5;\nEND_PROPERTY]
FAILED tests/test_transformer.py::test_fb_roundtrip[function_block_property_declaration-PROPERTY PUBLIC ABSTRACT PropertyName : LREAL\n    VAR_INPUT\n        I : UINT;\n    END_VAR\nEND_PROPERTY]
FAILED tests/test_transformer.py::test_statement_roundtrip[if_statement-IF 1 THEN\n    iValue := 1;\n    IF 1 THEN\n        iValue := 1;\n    END_IF\nELSIF 2 THEN\n    iValue := 2;\nELSIF 3 THEN\n    iValue := 2;\nELSE\n    iValue := 3;\nEND_IF]
FAILED tests/test_transformer.py::test_statement_roundtrip[if_statement-IF 1 THEN\n    IF 2 THEN\n        IF 3 * x THEN\n            y();\n        ELSE\n        END_IF\n    END_IF\nEND_IF]
FAILED tests/test_transformer.py::test_statement_roundtrip[if_statement-IF 1 AND_THEN 1 THEN\n    y();\nEND_IF] - RecursionEr...
FAILED tests/test_transformer.py::test_statement_roundtrip[if_statement-IF 0 OR_ELSE 1 THEN\n    y();\nEND_IF] - RecursionErr...
FAILED tests/test_transformer.py::test_statement_roundtrip[case_statement-CASE expr OF\n1:\n    abc();\n2, 3, GVL.Constant:\n    def();\nELSE\n    ghi();\nEND_CASE]
FAILED tests/test_transformer.py::test_statement_roundtrip[case_statement-CASE a.b.c^.d OF\n1..10:\n    OneToTen := OneToTen + 1;\nEnumValue:\nEND_CASE]
FAILED tests/test_transformer.py::test_statement_roundtrip[while_statement-WHILE expr\nDO\n    iValue := iValue + 1;\nEND_WHILE]
FAILED tests/test_transformer.py::test_statement_roundtrip[repeat_statement-REPEAT\n    iValue := iValue + 1;\nUNTIL expr\nEND_REPEAT]
FAILED tests/test_transformer.py::test_statement_roundtrip[for_statement-FOR iIndex := 0 TO 10\nDO\n    iValue := iIndex * 2;\nEND_FOR]
FAILED tests/test_transformer.py::test_statement_roundtrip[for_statement-FOR iIndex := 0 TO 10 BY 1\nDO\n    iValue := iIndex * 2;\nEND_FOR]
FAILED tests/test_transformer.py::test_statement_roundtrip[for_statement-FOR iIndex := (iValue - 5) TO iValue * 10 BY iValue MOD 10\nDO\n    arrArray[iIndex] := iIndex * 2;\nEND_FOR]
FAILED tests/test_transformer.py::test_statement_roundtrip[for_statement-FOR iIndex[1] := 0 TO 10\nDO\n    iValue := iIndex * 2;\nEND_FOR]
FAILED tests/test_transformer.py::test_function_roundtrip[int_with_input] - RecursionError: maximum recursion depth exceeded ...
FAILED tests/test_transformer.py::test_function_roundtrip[int_with_pointer_retval] - RecursionError: maximum recursion depth ...
FAILED tests/test_transformer.py::test_function_roundtrip[int_with_input_output] - RecursionError: maximum recursion depth ex...
FAILED tests/test_transformer.py::test_function_roundtrip[no_return_type] - RecursionError: maximum recursion depth exceeded ...
FAILED tests/test_transformer.py::test_function_roundtrip[dotted_return_type] - RecursionError: maximum recursion depth excee...
FAILED tests/test_transformer.py::test_program_roundtrip[program_declaration-PROGRAM ProgramName\nEND_PROGRAM] - RecursionErr...
FAILED tests/test_transformer.py::test_program_roundtrip[program_declaration-PROGRAM ProgramName\n    VAR_INPUT\n        iValue : INT;\n    END_VAR\n    VAR_ACCESS\n        AccessName : SymbolicVariable : TypeName READ_WRITE;\n    END_VAR\n    iValue := iValue + 1;\nEND_PROGRAM]
FAILED tests/test_transformer.py::test_input_output_comments[input_output_declarations-/ Var in and out\n(* Var in and out *)\nVAR_IN_OUT\n    / Variable\n    iVar : INT;\nEND_VAR]
FAILED tests/test_transformer.py::test_incomplete_located_var_decls[incomplete_located_var_declarations-VAR\n    iValue AT %Q* : INT;\n    sValue AT %I* : STRING [255];\n    wsValue AT %I* : WSTRING [255];\nEND_VAR]
FAILED tests/test_transformer.py::test_incomplete_located_var_decls[incomplete_located_var_declarations-VAR RETAIN\n    iValue AT %I* : INT;\n    iValue1 AT %Q* : INT;\nEND_VAR]
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE\nEND_TYPE] - RecursionError: maximum ...
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE TypeName :\n    STRUCT\n        xPLC_CnBitsValid : BOOL;\n        xPLC_CnBits : ARRAY [0..20] OF BYTE;\n    END_STRUCT\nEND_TYPE]
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE ArrayTypeName : ARRAY [1..10, 2..20] OF INT;\nEND_TYPE]
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE StringTypeName : STRING[10] := 'Literal';\nEND_TYPE]
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE SimpleTypeName EXTENDS OtherType : POINTER TO INT;\nEND_TYPE]
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE SubrangeTypeName : POINTER TO INT (1..5);\nEND_TYPE]
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE EnumeratedTypeName : REFERENCE TO Identifier;\nEND_TYPE]
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE EnumeratedTypeName : REFERENCE TO (IdentifierA, INT#IdentifierB := 1);\nEND_TYPE]
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE TypeName :\n    UNION\n        intval : INT;\n        as_bytes : ARRAY [0..2] OF BYTE;\n    END_UNION\nEND_TYPE]
FAILED tests/test_transformer.py::test_data_type_declaration[data_type_declaration-TYPE TypeName :\n    UNION\n        intval : INT;\n        enum : (iValue := 1, iValue2 := 2) INT;\n    END_UNION\nEND_TYPE]

Random thoughts

It's likely that the issue is due to mistaken/incorrect type hints. There is at least a tiny possibility that this is an apischema bug.
Type hints for the problematic ones in the failed tests above need reviewing to see what's wrong.
CI should be updated to "run test suite with optional dependencies" and "run test suite without optional dependencies" at some point
Ideally, the round-tripping of code -> dataclass -> code and code -> dataclass -> JSON -> dataclass would be separate in the test suite. As of now, these are only just tacked on in roundtrip_rule.

Language features in upcoming TwinCAT 4026 release

https://www.beckhoff.com/en-us/products/automation/twincat/twincat-3-build-4026/

Some relevant ones might include:

keyword: ABSTRACT for abstract function block/method/property definition
ENUMs available as strings in the PLC
exception handling via TRY-CATCH
conditional compilation available in declaration part (in addition to implementation part)
multi-line support in pragma declarations
optional storage format Base64 for graphical PLC objects

Will need code examples. If you find more details about the build - specifically structured text-related changes - please feel free to add a comment here.

Have you seen TcBlack?

Hi,

I just found your repo when looking for structured text parsers. Your project looks very interesting!

In your readme you mention you want to make a formatter based on Python's Black. I already started such a project: TcBlack.

The current implementation of TcBlack tries to parse structured text using regex. This is not very scalable and therefore I was considering writing my own parser, because I couldn't find anything. But now I see your project and it seems like maybe I could use blark to do the parsing. What is the status of the parser? How much can it parse currently? We could maybe also collaborate?

Roald.

Comments in output tree?

Wondering if this is something you considered including or thing you'd be willing to accept?

Asking here first before I fork or push a diff up, as I have a partial solution working for me. But I think this could be helpful.

I used this lib, and some very-rough text extraction to transpile some ST structs for use in Python and JSON. I had need to preserve the comments, because the fields names for the structs are often so short.

Need Support for `JMP` Statements

Found this in the wild, today:

Apparently, it's possible to use jump (JMP) statements in 61131. I'm not happy about it, you're not happy about it, who is happy about it? Anyway... It can be done...

Here's TwinCAT's docs on it: https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_plc_intro/2528307851.html&id=

I'm not sure how the labels are getting parsed in blark right now, but they may also need to be addressed.

Here's some example logic:

IF
    logMessage := concat(logMessage, ', ...<truncated>');
    JMP JumpToMessageLogging;
END_IF

Which resulted in the following traceback:

λ selint
Traceback (most recent call last):
  File "C:\Users\joestan\Documents\Python RTAC Projects\selint\selint\core\__init__.py", line 177, in _parse
    self._parse_source_code(self.source)
  File "C:\Users\joestan\Documents\Python RTAC Projects\selint\selint\core\__init__.py", line 280, in _parse_source_code
    self.source_tree = self.__parser.parse(
                       ^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\lark\lark.py", line 652, in parse
    return self.parser.parse(text, start=start, on_error=on_error)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\lark\parser_frontends.py", line 101, in parse
    return self.parser.parse(stream, chosen_start, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\lark\parsers\earley.py", line 276, in parse
    to_scan = self._parse(lexer, columns, to_scan, start_symbol)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\lark\parsers\xearley.py", line 146, in _parse
    to_scan = scan(i, to_scan)
              ^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\lark\parsers\xearley.py", line 119, in scan
    raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
lark.exceptions.UnexpectedCharacters: No terminal matches 'J' in the current parser context, at line 100 col 7

                JMP JumpToMessageLogging;
                    ^
Expected one of:
        * DOT
        * ASSIGNMENT
        * __ANON_10
        * SEMICOLON
        * LSQB
        * LPAR
        * DEREFERENCED
        * __ANON_12
        * __ANON_11

Python 3.8 import of blark (-> apischema) may fail

Seeing this error on CI:

https://github.com/klauer/blark/actions/runs/4419005433/jobs/7746872821

blark/__init__.py:4: in <module>
    from .parse import get_parser, parse_project, parse_source_code
blark/parse.py:17: in <module>
    from . import summary
blark/summary.py:10: in <module>
    from . import transform as tf
blark/transform.py:22: in <module>
    import apischema
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/apischema/__init__.py:29: in <module>
    from . import (  # noqa: F401
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/apischema/json_schema/__init__.py:8: in <module>
    from .schema import definitions_schema, deserialization_schema, serialization_schema
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/apischema/json_schema/schema.py:46: in <module>
    from apischema.json_schema.types import JsonSchema, JsonType, json_schema
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/apischema/json_schema/types.py:71: in <module>
    serializer(Conversion(dict, source=JsonSchema))
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/apischema/conversions/converters.py:165: in serializer
    resolved = resolve_conversion(serializer)
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/apischema/conversions/conversions.py:73: in resolve_conversion
    source, target = converter_types(
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/apischema/conversions/utils.py:35: in converter_types
    if target is None and is_type(converter):
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/apischema/typing.py:250: in is_type
    return isinstance(tp, type) and not get_args(tp)
/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/apischema/typing.py:54: in get_args
    res = tp.__args__
E   AttributeError: type object 'dict' has no attribute '__args__'

It's almost certainly apischema's fault, but perhaps our dependency pinning could be updated to fix it on our end (?)

Alternatively, we may move to Python 3.9+ anyway since the NEP29 deprecation schedule moves to Python 3.9+ on Apr 14, 2023

Build up summary tool

Basis for auto-generated code documentation
Functions, function blocks, ...
Input/output/externally accessible parameters, etc.

Suggestions on Implementing Custom XML Parser with `blark`s Grammar

Hi there!

First of all, WOW! This project is awesome!

I just happened to stumble across this project recently. I'm tinkering with a linter for SEL (Schweitzer Engineering Laboratories) RTAC projects. The SEL RTAC uses CODESYS, so there's some intricacies that I need to handle in the different XML formatting. I had been working on my own Lark grammar until I stumbled across this work, and now, it seems sensible to leverage the great work you've already produced, and contribute where possible.

I've built logic that's capable of breaking down the RTAC XML to its constituent declaration and implementation sections (separately) and I'm wondering if you might be able to suggest a good way to interact with these components. For example, I have a section of implementation code as follows:

// Default return value to TRUE
SerializeJson := TRUE;

// Set to Root of Structure
Root();

SerializeJson := SerializeJson AND _serializedContent.Recycle();

// Set up Initial States for Indices
lastLevel := Current.LEVEL;
outerType := Current.JSON_TYPE;

And using a little Python as follows:

def lark_61131_implementation(**kwargs):
    """Generate Lark Parser for 61131 Implementation Sections."""
    return lark.Lark.open(
        'iec.lark',
        rel_to=__file__,
        parser='earley',
        maybe_placeholders=True,
        propagate_positions=True,
        **kwargs
    )

...

parser = lark_61131_implementation()
print(parser.parse(interface).pretty()) # `interface` is a string of the aforementioned 61131

Ultimately, that gives me a nice Lark error:

Traceback (most recent call last):
  File "/home/joestan/Documents/selint/selint/core/lexer/__init__.py", line 155, in <module>
    print(parser.parse(interface).pretty())
  File "/home/joestan/.local/lib/python3.8/site-packages/lark/lark.py", line 625, in parse
    return self.parser.parse(text, start=start, on_error=on_error)
  File "/home/joestan/.local/lib/python3.8/site-packages/lark/parser_frontends.py", line 96, in parse
    return self.parser.parse(stream, chosen_start, **kw)
  File "/home/joestan/.local/lib/python3.8/site-packages/lark/parsers/earley.py", line 266, in parse
    to_scan = self._parse(lexer, columns, to_scan, start_symbol)
  File "/home/joestan/.local/lib/python3.8/site-packages/lark/parsers/xearley.py", line 146, in _parse
    to_scan = scan(i, to_scan)
  File "/home/joestan/.local/lib/python3.8/site-packages/lark/parsers/xearley.py", line 119, in scan
    raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
lark.exceptions.UnexpectedCharacters: No terminal matches 'S' in the current parser context, at line 3 col 1

SerializeJson := TRUE;
^
Expected one of: 
        * PROGRAM
        * TYPE
        * PROPERTY
        * FUNCTION
        * FUNCTION_BLOCK
        * ACTION
        * METHOD
        * VAR_GLOBAL

Before I go diving back into the grammar, I thought it might behoove me to ask exactly how these strings are intended to be parsed according to the grammar file you've provided. Should I be combining declaration and implementation into a single block? Are there other concepts you think I might be overlooking?

Thank you so much for your time!

Variables Containing the word "Return" Incorrectly Identified as `RETURN` Statements.

Found this one while testing some more "in the wild" code.

It seems that if a variable's name begins with the word "Return" blark will incorrectly identify that word as a RETURN statement instead of part of the variable. I will need to confirm, but I suspect this may also be the case with CONTINUE and BREAK.

Here's an example I'm adding as a test-case that will hopefully lead to an improvement pull-request.

            FUNCTION_BLOCK fbName
                iValue := 1;
                IF 1 THEN
                    iValue := 1;
                    IF 1 THEN
                        iValue := 1;
                    END_IF
                END_IF
                Method();
                ReturnStatus := mReturnStatus;
            END_FUNCTION_BLOCK

blark-parsed results:

FUNCTION_BLOCK fbName
    iValue := 1;
    IF 1 THEN
        iValue := 1;
        IF 1 THEN
            iValue := 1;
        END_IF
    END_IF
    Method();
    RETURN;              //  <-- Here's the offender
    Status := mReturnStatus;
END_FUNCTION_BLOCK

Parse Tree

The parse tree is:
function_block_type_declaration
FUNCTION_BLOCK
None
fbName
None
None
statement_list
  assignment_statement
    variable_name
      iValue
      None
    expression
      assignment_expression
        or_else_expression
          and_then_expression
            xor_expression
              and_expression
                comparison_expression
                  equality_expression
                    add_expression
                      expression_term
                        power_expression
                          unary_expression
                            None
                            constant
                              integer_literal
                                None
                                integer       1
  if_statement
    expression
      assignment_expression
        or_else_expression
          and_then_expression
            xor_expression
              and_expression
                comparison_expression
                  equality_expression
                    add_expression
                      expression_term
                        power_expression
                          unary_expression
                            None
                            constant
                              integer_literal
                                None
                                integer       1
    statement_list
      assignment_statement
        variable_name
          iValue
          None
        expression
          assignment_expression
            or_else_expression
              and_then_expression
                xor_expression
                  and_expression
                    comparison_expression
                      equality_expression
                        add_expression
                          expression_term
                            power_expression
                              unary_expression
                                None
                                constant
                                  integer_literal
                                    None
                                    integer   1
      if_statement
        expression
          assignment_expression
            or_else_expression
              and_then_expression
                xor_expression
                  and_expression
                    comparison_expression
                      equality_expression
                        add_expression
                          expression_term
                            power_expression
                              unary_expression
                                None
                                constant
                                  integer_literal
                                    None
                                    integer   1
        statement_list
          assignment_statement
            variable_name
              iValue
              None
            expression
              assignment_expression
                or_else_expression
                  and_then_expression
                    xor_expression
                      and_expression
                        comparison_expression
                          equality_expression
                            add_expression
                              expression_term
                                power_expression
                                  unary_expression
                                    None
                                    constant
                                      integer_literal
                                        None
                                        integer       1
        None
    None
  function_call_statement
    function_call
      variable_name
        Method
        None
      None
  return_statement
  assignment_statement
    variable_name
      Status
      None
    expression
      assignment_expression
        or_else_expression
          and_then_expression
            xor_expression
              and_expression
                comparison_expression
                  equality_expression
                    add_expression
                      expression_term
                        power_expression
                          unary_expression
                            None
                            variable_name
                              mReturnStatus
                              None
END_FUNCTION_BLOCK

Nested or inlined comments

Nested comments do not work. For example

https://github.com/pcdshub/lcls-plc-kfe-xgmd-vac/blob/63b62d530f30cfcb8e0a8f88df11815d5948bd9a/plc/plc-kfe-xgmd-vac/plc_kfe_xgmd_vac/POUs/FB_VGP.TcPOU#L499-L516

Possible solution? https://stackoverflow.com/questions/462843/improving-fixing-a-regex-for-c-style-block-comments

Inlined comments also appear problematic somehow, perhaps a misunderstanding of %ignore
For example
https://github.com/pcdshub/lcls-plc-kfe-xgmd-vac/blob/63b62d530f30cfcb8e0a8f88df11815d5948bd9a/plc/plc-kfe-xgmd-vac/plc_kfe_xgmd_vac/POUs/FB_VGP.TcPOU#L118

Syntax Parsing Failure - Accessing an Attribute on a Dereferenced Pointer Returned from a Method Doesn't Work

Ok... here's yet another interesting one...

filename := _directoryFileList.Item(_i)^.ToString();
                                        ^
Expected one of:
        * LOGICAL_AND_THEN
        * LOGICAL_XOR
        * LOGICAL_OR
        * MULTIPLY_OPERATOR
        * LOGICAL_OR_ELSE
        * EQUALS_OP
        * SEMICOLON
        * ASSIGNMENT
        * LOGICAL_AND
        * ADD_OPERATOR
        * COMPARE_OP

Seems like blark doesn't like the fact that we're accessing some attribute from whatever that dereferenced structure is supposed to be.

Thank you for all of your help and welcomeness, as ever!

`POINTER TO STRING(c_Size - 1);` Not Parsing Correctly

Came across this "in the wild" today:

pt_str : POINTER TO STRING(g_c_lenTimestampStr + 3);

Seems that blark doesn't really care for the parenthesis:

lark.exceptions.UnexpectedCharacters: No terminal matches '(' in the current parser context, at line 19 col 31

At a high-level, I think this should be fairly reasonable to iron out. I'll see if I can jump in and author a fix.

Working branch is here: https://github.com/engineerjoe440/blark/tree/bugfix/pointer-to-string-with-variable-not-parsing

Remove unused grammar rules

Program configuration settings (with VAR_ACCESS), and some others aren't actually valid for TwinCAT 3-flavor IEC61131-3.

Probably those that do not have a set of transformer dataclasses defined yet can be removed - or at least set aside - for the time being.

Finish up sphinx domain

The sphinx domain allows for generating PLC project API documentation directly from the source code. Currently, it's pretty limited - the proof-of-concept from #12 is now in master.

A simple example might be:

conf.py

extensions = [
    "blark.sphinxdomain",
]

blark_projects = [
    "/Users/klauer/Repos/lcls-twincat-general/LCLSGeneral/LCLSGeneral.tsproj",
]
blark_signature_show_type = True

html_css_files = [
    "css/blark_default.css",
]

and the reStructuredText source:

.. bk:type:: ST_System

.. bk:function_block:: FB_TempSensor

.. bk:function_block:: FB_EtherCATDiag

    .. bk:variable_block:: VAR_OUTPUT

Where to from here? Some notes to self mostly:

At the very least, I'd like to be able to document programs and other things I've forgotten.
Would syntax highlighting be a possibility?
What about dependency graphs?
Multiple projects and dependencies (I had done an initial dependency store thing with whatrecord; https://github.com/pcdshub/whatrecord/blob/e0a5cbfaf62ae80df3aeb69565893a4efc806205/whatrecord/plugins/twincat_pytmc.py#L673; this needs overhauling)
TMC file data structures?

Remove `method_statement` from grammar

Looks like it's time to get rid of method_statement entirely.

function_call_statement (using function_call) handles everything it does and more, so having both just makes for ambiguity in the grammar.

Originally posted by @klauer in #59 (comment)

Checklist:

Remove method_statement from iec.lark
Remove MethodStatement from blark.transformers

Add documentation for how to submit new tests cases

Goal

If someone tries blark and it fails to parse their code or fails otherwise, we'd like to know about it.

Creating an issue would be just fine
Creating a failing test case would be even better
- Dropping in a .st file in blark/tests/source/
- Or a .TcPOU in blark/tests/POUs

If the missing/mistaken grammar construct can be identified, we should reference back to infosys when possible.

Add support for INTERFACE and IMPLEMENTS in function blocks

https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_plc_intro/4262436875.html&id=

Also find a better reference for it.

TODO:

Add support for IMPLEMENTS in FUNCTION_BLOCK
Find better documentation and grammar references for IMPLEMENTS
Add top-level grammar for INTERFACE declaration (interface_declaration)
Add declarations under interface_declaration for supported grammar (methods, properties)
Add INTERFACE support in the transformers
Add INTERFACE test cases and ensure they round-trip from code->Python dataclasses->code
Follow up on pytmc support of .TcIO files: pcdshub/pytmc#276

Add Support for Recursive "POINTER TO" Syntax for Variable Type Declaration

Description

In some of the 61131 that I've reviewed, I've seen "recursive" POINTER TO variable type declarations as shown below:

FUNCTION_BLOCK fb_test
    VAR
        _pt_Strings : POINTER TO POINTER TO someObject;
        _pt_Current : POINTER TO POINTER TO someObject;
    END_VAR
END_FUNCTION_BLOCK

This simply makes a pointer to a pointer to some object. It's "legal" in 61131, and is helpful in editors/IDE's which understand the recursive dereference. That said, blark does not seem to be able to parse the variable type.

Work in Progress:

I've started to work on adding support for this declaration type in a branch of my fork: fe1d37d, but I'm afraid that it's failing quite effectively in pytest: https://github.com/engineerjoe440/blark/runs/6085242624?check_suite_focus=true

Goal/Request:

I was hoping that you might be able to provide some suggestions for how I might modify the transform.py functionality to accommodate this new syntax behavior. Do you have any thoughts or recommendations?

Clean up and expand provided examples

The examples are old ones carried forward from iec2xml and don't really represent what blark can do nowadays.

I'd like to add:

Better PLC code examples
Examples on how to use blark in Python
- Parsing plain source code
- Parsing TwinCAT projects (with or without dependencies)
- Using the summary tools
One example on how to use the sphinx domain for documentation generation

Make pragmas tokens; ignore them

There are too many places where pragmas are valid in code (essentially anywhere, I think)

So:

Make pragma -> PRAGMA in the lark grammar (a token)
Aggregate pragmas during lexing
Optionally attempt to match them up with code items after parsing

Make note of related/alternative/useful projects

https://github.com/nucleron/matiec (older ST compiler with parser) and updated version https://github.com/sm1820/matiec
https://github.com/thiagoralves/OpenPLC_v3 (uses matiec, may have some customizations)
https://github.com/PLC-lang/rusty (rust-based ST compiler)
https://github.com/jubnzv/iec-checker (OCaml grammar + static analysis tool)
https://github.com/Roald87/TcBlack Python black-like code formatter for TwinCAT code

Include comments from iec-checker author from #13:

Author of iec-checker here. I was alerted that you mentioned my project here, so I just want to add a few points about repositories listed above that I have worked with.

https://github.com/nucleron/matiec (older ST compiler with parser)

Matiec supports IEC61131-3 second edition, without classes, namespaces and other fancy features. If you decide to use it, I recommend applying the fixes from the company I worked for (see my recent commits). There were some segmentation faults in the parser that we found using fuzzing. After these fixes it seems pretty stable, and there haven't been any crashes in the few months of continuous fuzzing.
You may also be interested in the latest improvements in the parser in the upstream repository. They extended the parser with some constructs from CoDeSys and supported JSON dumps, which may be useful for you.

https://github.com/jubnzv/iec-checker (OCaml grammar + static analysis tool)

iec-checker has the ability to parse ST source code and dump AST and CFG to JSON format, so you can process it with your language of choice. This feature is used in test suite written in Python.
I'm not sure about the parser speed compared to other implementations, but I use menhir for the parser. This is a stable industrial grade LR(1) parser generator for OCaml and Coq.
The grammar completely covers IEC61131-3 3 ed. features used in the company I worked for. If something is missing there, you could create an issue in the repo.

`BYTE#` literals not properly handled in `CASE` statement

Continuing to work through testing this project against a number of libraries I contribute to, I discovered a failure that's related to CASE statements which use multiple subranges:

Click to expand pytest output

source_filename = '/home/joestan/Documents/blark/blark/tests/source/commas_in_case.st'

    def test_parsing_source(source_filename):
        """Test plain source 61131 files."""
        try:
            with open(source_filename, "r", encoding="utf-8") as src:
                content = src.read()
>           result = parse_source_code(content)

blark/tests/test_parsing.py:49: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
blark/parse.py:109: in parse_source_code
    tree = parser.parse(processed_source)
../../.local/lib/python3.8/site-packages/lark/lark.py:625: in parse
    return self.parser.parse(text, start=start, on_error=on_error)
../../.local/lib/python3.8/site-packages/lark/parser_frontends.py:96: in parse
    return self.parser.parse(stream, chosen_start, **kw)
../../.local/lib/python3.8/site-packages/lark/parsers/earley.py:266: in parse
    to_scan = self._parse(lexer, columns, to_scan, start_symbol)
../../.local/lib/python3.8/site-packages/lark/parsers/xearley.py:146: in _parse
    to_scan = scan(i, to_scan)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

i = 528
to_scan = {__xor_expression_star_33 ::= * LOGICAL_XOR and_expression (528), __equality_expression_star_36 ::= * COMPARE_OP add_e...LEVATED_BY unary_expression (528), __assignment_expression_star_30 ::= * LOGICAL_OR_ELSE or_else_expression (528), ...}

    def scan(i, to_scan):
        """The core Earley Scanner.
    
        This is a custom implementation of the scanner that uses the
        Lark lexer to match tokens. The scan list is built by the
        Earley predictor, based on the previously completed tokens.
        This ensures that at each phase of the parse we have a custom
        lexer context, allowing for more complex ambiguities."""
    
        node_cache = {}
    
        # 1) Loop the expectations and ask the lexer to match.
        # Since regexp is forward looking on the input stream, and we only
        # want to process tokens when we hit the point in the stream at which
        # they complete, we push all tokens into a buffer (delayed_matches), to
        # be held possibly for a later parse step when we reach the point in the
        # input stream at which they complete.
        for item in set(to_scan):
            m = match(item.expect, stream, i)
            if m:
                t = Token(item.expect.name, m.group(0), i, text_line, text_column)
                delayed_matches[m.end()].append( (item, i, t) )
    
                if self.complete_lex:
                    s = m.group(0)
                    for j in range(1, len(s)):
                        m = match(item.expect, s[:-j])
                        if m:
                            t = Token(item.expect.name, m.group(0), i, text_line, text_column)
                            delayed_matches[i+m.end()].append( (item, i, t) )
    
                # XXX The following 3 lines were commented out for causing a bug. See issue #768
                # # Remove any items that successfully matched in this pass from the to_scan buffer.
                # # This ensures we don't carry over tokens that already matched, if we're ignoring below.
                # to_scan.remove(item)
    
        # 3) Process any ignores. This is typically used for e.g. whitespace.
        # We carry over any unmatched items from the to_scan buffer to be matched again after
        # the ignore. This should allow us to use ignored symbols in non-terminals to implement
        # e.g. mandatory spacing.
        for x in self.ignore:
            m = match(x, stream, i)
            if m:
                # Carry over any items still in the scan buffer, to past the end of the ignored items.
                delayed_matches[m.end()].extend([(item, i, None) for item in to_scan ])
    
                # If we're ignoring up to the end of the file, # carry over the start symbol if it already completed.
                delayed_matches[m.end()].extend([(item, i, None) for item in columns[i] if item.is_complete and item.s == start_symbol])
    
        next_to_scan = set()
        next_set = set()
        columns.append(next_set)
        transitives.append({})
    
        ## 4) Process Tokens from delayed_matches.
        # This is the core of the Earley scanner. Create an SPPF node for each Token,
        # and create the symbol node in the SPPF tree. Advance the item that completed,
        # and add the resulting new item to either the Earley set (for processing by the
        # completer/predictor) or the to_scan buffer for the next parse step.
        for item, start, token in delayed_matches[i+1]:
            if token is not None:
                token.end_line = text_line
                token.end_column = text_column + 1
                token.end_pos = i + 1
    
                new_item = item.advance()
                label = (new_item.s, new_item.start, i)
                token_node = TokenNode(token, terminals[token.type])
                new_item.node = node_cache[label] if label in node_cache else node_cache.setdefault(label, SymbolNode(*label))
                new_item.node.add_family(new_item.s, item.rule, new_item.start, item.node, token_node)
            else:
                new_item = item
    
            if new_item.expect in self.TERMINALS:
                # add (B ::= Aai+1.B, h, y) to Q'
                next_to_scan.add(new_item)
            else:
                # add (B ::= Aa+1.B, h, y) to Ei+1
                next_set.add(new_item)
    
        del delayed_matches[i+1]    # No longer needed, so unburden memory
    
        if not next_set and not delayed_matches and not next_to_scan:
            considered_rules = list(sorted(to_scan, key=lambda key: key.rule.origin.name))
>           raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
                                       set(to_scan), state=frozenset(i.s for i in to_scan),
                                       considered_rules=considered_rules
                                       )
E           lark.exceptions.UnexpectedCharacters: No terminal matches ',' in the current parser context, at line 17 col 29
E           
E               BYTE#9..BYTE#10, BYTE#13, BYTE#28..BYTE#126:
E                                       ^
E           Expected one of: 
E               * EQUALS_OP
E               * __ANON_7
E               * LOGICAL_XOR
E               * LOGICAL_OR
E               * ASSIGNMENT
E               * ADD_OPERATOR
E               * COMPARE_OP
E               * LOGICAL_AND_THEN
E               * ELEVATED_BY
E               * MULTIPLY_OPERATOR
E               * LOGICAL_AND
E               * LOGICAL_OR_ELSE

../../.local/lib/python3.8/site-packages/lark/parsers/xearley.py:119: UnexpectedCharacters

I've created a branch in my local fork to contain a sample of some such logic that is valid (at least in CODESYS) with the CASE. This seems like it should be supported since both of the following

I've created a branch in my local fork to contain a sample of some such logic that is valid (at least in CODESYS) with the CASE (my sample fork is here ). This seems like it should be supported since both of the following options are supported.

Failure to Parse POINTER TO Object using FB_Init

Here's one that seems to have a bit of an odd syntax:

VAR
    pt_Results : POINTER TO DynamicVectors.class_BaseVector(SIZEOF(struct_EventDetails), 32);
END_VAR

It seems to me that it's almost a bit pointless to have the FB_Init in this particular context. Still, this was "valid logic" encountered in the wild.

Add GitHub Actions job to deploy API documentation

Preliminary sphinx docs are now in with #89
Need to add a GitHub Actions job to build and deploy those docs to GitHub Pages
- docs-versions-menu usage would be nice
Separate but related - more docs need to be written (#52)

Variable with underscore

I get an error if i want to parse a TcPou with vars inside which start with an underscore.
for example:
_eLocalSeverity : E_Severity;

I get the following error message:
Failed to parse D:\GIT\Logging\POU\LogFB\FB_LogBase.TcPOU FB_LogBase/declaration: Exception: UnexpectedCharacters: No terminal matches '_' in the current parser context, at line 9 col 2

    _eLocalSeverity                                 : E_Severity;
    ^

Expected one of:
* __ANON_12
* DOT
* ASSIGNMENT
* __ANON_10
* SEMICOLON
* COLON
* DEREFERENCED
* __ANON_11
* LSQB
* LPAR

Failed to parse some source code files:
D:\GIT\Logging\POU\LogFB\FB_LogBase.TcPOU: FB_LogBase/declaration

I'm sorry but i dont understand python and lark well enough to find the error and the solution to that error myself.
I hope you can help me.
Thanks

Add Support for POU Properties

Might just be something I'm overlooking, but as it exists, I don't think there's a way to parse POU Properties; or is there one that I'm just missing?

This seems a bit tricky, since a property has one or two methods, but neither method has a traditional "declaration" like other METHOD objects or, really, anything else in 61131. This makes me wonder if it's necessary to have a separate function responsible for parsing them and using a subset of the Lark rules? Not sure... What do you think?

Array of Pointer to Structure using Addresses Dereferencing by Index Failing to Parse

Ok, so I think this one is much simpler than the title may suggest, but it's the use-case I found this issue with, so I'm going to run with its full complexity for the sake of testing. Here's a snippet that seems to fail:

FUNCTION_BLOCK class_SomethingCool
VAR
        currentChannel : ARRAY[1..g_c_someConstant] OF POINTER TO struct_groupData :=
                [       ADR(_object[1].someValue),
                        ADR(_object[2].someValue),
                        ADR(_object[3].someValue),
                        ADR(_object[4].someValue),
                        ADR(_object[5].someValue),
                        ADR(_object[6].someValue),
                        ADR(_object[7].someValue),
                        ADR(_object[8].someValue),
                        ADR(_object[9].someValue),
                        ADR(_object[10].someValue)
                ];
END_VAR
END_FUNCTION_BLOCK

With the following pytest exception (shortened significantly):

E           lark.exceptions.UnexpectedCharacters: No terminal matches '[' in the current parser context, at line 4 col 36
E
E                           [       ADR(_object[1].someValue),
E                                              ^
E           Expected one of: 
E               * HASH
E               * LPAR
E               * COMMA
E               * RPAR
E               * ASSIGNMENT

C:\Python311\Lib\site-packages\lark\parsers\xearley.py:119: UnexpectedCharacters

Development branch is here: https://github.com/engineerjoe440/blark/tree/bugfix/advanced-array-development Clearly, I've added a simple sanity-check test, but haven't completed a fix, yet. May not have a chance until later this week at the earliest.

Double-asterisk (`**`) exponential operator is not supported by TwinCAT

Double-asterisk ('**') exponential operator is not supported by TwinCAT or the underlying CODESYS, as far as I can tell.

This came in as part of the original grammar that blark is based on as the power_expression rule:

blark/blark/iec.lark

Line 703 in 11d5c30

power_expression: unary_expression ( ELEVATED_BY unary_expression )*

This should be removed from the grammar and relevant test cases should be cleaned as well.

Repeated Variable Declaration in Structure with a Single Structured Data Type Fails to Parse

Looks like blark isn't liking "repeated" variable declarations using a structured type. I just added a test case for this on this branch. Hope that I'll get a chance to dig into it a little more very soon!

TYPE somethingCool :
STRUCT
        AlertTimer, SignalBadTimer, QualityBadTimer : library.TPUDO;
END_STRUCT
END_TYPE

ℹ️ Note:

I updated this description and title to reflect that this is specifically related to structures.

Dereferenced Variables in Function Calls Not Parsing Correctly

Found yet another little oddity while working through some "in-the-wild" ST code. In the segment below, the GetTagName()^ does not parse correctly. There seems to be an error handling that little ^ dereference in this context. 😕

Something := anObject.WriteLine(THIS^.something.t, THIS^.GetTagName()^, INT_TO_STRING(BOOL_TO_INT(THIS^.someAttribute)));

I hope that early next week I can start investigating this more closely!

apischema as a non-optional requirement?

I'm considering making apischema a non-optional requirement.
For those that are unaware, it's what allows blark to easily (de)serialize parsed code to/from JSON directly from the blark.transform dataclasses.
It's currently made optional through a bit of hackery:

blark/blark/transform.py

Lines 26 to 36 in abce326

 try: 

 # NOTE: apischema is an optional requirement; this should work regardless. 

 import apischema 

 from .apischema_compat import as_tagged_union 

 except ImportError: 

 apischema = None 

 def as_tagged_union(cls: Type[T]) -> Type[T]: 

 """No-operation stand-in for when apischema is not available.""" 

 return cls

It's a reasonably light dependency, and it'd make things more convenient were it just a hard requirement.

Any thoughts, objections, emojis?

Array initializer from TwinCAT documentation fails to parse

From https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_plc_intro/2530335371.html&id=

PROGRAM MAIN
VAR
    fbSample  : FB_Sample(nInitParam := 1) := (nInput := 2, nMyProperty := 3);
    aSample   : ARRAY[1..2] OF FB_Sample[(nInitParam := 4), (nInitParam := 7)]
                            := [(nInput := 5, nMyProperty := 6), (nInput := 8, nMyProperty := 9)];
END_VAR
END_PROGRAM

This odd syntax is explained later in the linked documentation page.

Explicit support for base 10/decimal literals

The grammar currently does not have a spot for explicit base 10/decimal literals - only implicit ones of course:

blark/blark/iec.lark

Lines 61 to 65 in 043f20e

 ?any_integer: "2#" BIT_STRING -> binary_integer 

 | "8#" OCTAL_STRING -> octal_integer 

 | "16#" HEX_STRING -> hex_integer 

 | SIGNED_INTEGER -> signed_integer 

 | integer

I think the integer rule could probably turn into:

integer: [ "10#" ] INTEGER

And then we just throw away the base specifier.

Related: does TwinCAT actually support arbitrary bases for integers?

VAR
    someData : ARRAY [1..5, 6..9] OF ARRAY [0 .. 10] OF INT;
END_VAR

Or as Beckhoff documents.... https://infosys.beckhoff.com/english.php?content=../content/1033/tc3_plc_intro/8825255691.html&id=

I think this should be a relatively easy fix (I think). I'll have a PR inbound shortly.

	# TODO: apischema serialization is recursing infinitely on 3.9 and 3.10;
	# need to dig into details and report it (first test that fails is ARRAY-related)
	APISCHEMA_SKIP = sys.version_info[:2] >= (3, 9)

	try:
	# NOTE: apischema is an optional requirement; this should work regardless.
	import apischema

	from .apischema_compat import as_tagged_union
	except ImportError:
	apischema = None

	def as_tagged_union(cls: Type[T]) -> Type[T]:
	"""No-operation stand-in for when apischema is not available."""
	return cls

	?any_integer: "2#" BIT_STRING -> binary_integer
	\| "8#" OCTAL_STRING -> octal_integer
	\| "16#" HEX_STRING -> hex_integer
	\| SIGNED_INTEGER -> signed_integer
	\| integer

klauer / blark Goto Github PK

blark's Introduction

blark's People

Contributors

Stargazers

Watchers

Forkers

blark's Issues

Problem

What's Needed

Goal

Issue

Failing tests as of 263e263

Random thoughts

Goal

Description

Work in Progress:

Goal/Request:

ℹ️ Note:

Recommend Projects

Recommend Topics

Recommend Org

Failing tests as of `263e263`