ridencww / goldengine Goto Github PK

View Code? Open in Web Editor NEW

35.0 35.0 14.0 523 KB

Java implementation of Devin Cook's GOLD Parser engine

License: Other

Java 99.94% Batchfile 0.06%

goldengine's People

Contributors

Stargazers

Watchers

Forkers

sanyaade brweber2 johnmwai wongtai bihai ktomingas tealnerd kimseeyea modulexcite thecocce codemanyak ghh1014 alexocculate

goldengine's Issues

The engine does not verify the header of the grammar files.

According to GOLD Parser's documentation, grammar files have a header at the beginning (cgt class reads it).

However, the Parser class does not check that header. This could lead to more errors.

Line comments not working

I'm trying to parse line comments and the following error appears:

Runaway group (no closing group terminator found). Last position line 6, column 9.

It might be related to issue #17
To temporarly fix that issue I've used the following
String text = ltsInputString.getSource().replaceAll("\n", "\r");

When I remove the replacement of the /n for /r to fix issue 17, the comments work properly. But the errors are reported in the wrong line (issue 17 still occurs).

I'm running on Linux.

Block comments work.

GRM file contain:

Comment Start = '/*'
Comment End = '*/'
Comment Line = '//'

I don't think the whole grammar file will help.

Parser fails with runaway group error if the last non-empty code line ends with a line comment

If the parsed source file contains a line comment (provided of course that the used grammar specifies line comments) in the last non-empty code line then the parser fails with error.group_runaway, no matter whether the file ends with a newline or not. Even if hundreds of newlines follow, the parsing will fail.

With end-standing block comments, the behaviour is correct (if the end symbol is missing, i.e., the group is actually unterminated then the error duefully occurs, otherwise it does not).

In the appendix, there is a combination of a Java grammar and a Java file that pass the test in GOLDBuilder but fail in the engine with the error described above.
final_line_comment_failure.zip

UTF-8 symbols

I have UTF-8 symbols like this
NOT = [¬]
in my grammar.

If I start generated java code like this

java -classpath .;./goldengine-5.0.3-SNAPSHOT prenex_FOLsn ..\Algor.fol -tree

I have got

Lexical error at line 5, column 37. Read (Error)
Parse tree is not available. Did you set generateTree(true)?

exectly where ¬ is in Algor.fol

What should I do?

Alex

BOM (byte order mark) causes parse error

UTF-8 encoded source files with a BOM will cause a parsing error. Parser needs to handle BOM properly.

Is there a built-in infrastructure to associate sourcecode comments to tokens?

For the reverse engineering facility in Structorizer (Nassi-Shneiderman diagram generation from source code parsing) we were interested in being able to associate identified source comments to the closest tokens.
We didn't find such a possibility, though, and wrote an own workaround. Have we missed something? Would it be a helpful enhancement, otherwise?
I add the respective GOLDParser subclass we wrote for this purpose (forget about the proprietary logging mechanism, which has nothing to do with it). It simply results in a hash map Token --> String (the protected field commentMap). In the diagram generator, we then defined a further map Reduction --> Token, which is derived from all non-terminal entries of the token-comment map, and we used to define language-specific sets of production rule IDs as stoppers for the actual association of the retrieved comment strings to meaningful syntactical units (diagram elements) where we had to avoid that all comments of substructure elements were also attached to their containing compound statements, but this is of course an application-specifc detail, briefly outlined in the class comment.
AuParser.zip

If you decide to integrate the proposed comment retrieval infrastructure, it might be helpful to make the commentMap field public.

Some grammar tables cannot be loaded

When constructing an instance of the parser, an IllegalStateException is thrown when loading the CGT or EGT file. This can occur when the size of the tables exceed 128 items.

Abstract Variable ?

Hi Ralph,

Sorry if it's not the good place for that, but I'm a new user with github.

Just one question: why the Variable class is not an abstract class or an interface ? I'm asking that because I have the feeling the implementation of Variable is limited for me.

For example, Variable is using asDouble(), asInt() and asNumber() when it's possible (i think) to use one method for the integers and one for the doubles or to leave the choice to the final user of your GoldEngine to use int, Integer, long, Long, BigDecimal, BigInteger or any other types.

One goal is also for me to have the choice to use one implementation or another for this Variable. For example, for the fast execution, to use only basic types or for a very good precision, to use one with BigDecimal and BigInteger just switching the variable type.

Regards,
Stef

Maven doesn't build with examples

When building with Maven, only the goldengine.jar was built, which was confusing because of the documentation. The project always built with Ant, but Maven support was added so the engine.jar could be posted to the Maven Central Repository.

Update for newer GOLDParser relevant

The readme says:

The goldengine for Java is compatible with version 5.0 of the GOLD parsing engine.

http://www.goldparser.org/news/index.htm mentions some changes after 5.0 - are there any that are relevant to the java engine implementation?

Seems like a serious problem

I was writing a simplified version of c grammar for my program, and i tested it with a couple of lines of codes in Golden Parser Generator, it was correct and worked well.

But when i took it in my program and tried to run it by ridencww-engine it didn't "reduce" some tokens well and came up with some errors.

I attached my Grammar and my Test-file so you can analyze it in your engine.

http://www.mediafire.com/download.php?sd4vs0sdcplau1a

Unfortunately i'm running out of time and i need to deliver program to my client soon so i have to use another engine to make it work right now, but i like your engine very much and i want to use it in my further projects.

Thank you.

URL encoded path causes problem in loading classes

Hello dear friend,

Firstly i want to thank you because of your handful and clean engine which you shared with us,
And then i want to mention a little bug :

The library in my PC (OS: windows 7) is in the path
"D:\Science\Lessons\Compiler\Gold-Grammer\Ralph Iden Engine\JavaLib"

and spaces in path causes a corrupted path in here :

File : ResourceHelper.java
Function: findClassesInPackage
Part-Codes:

        URL resource = resources.nextElement();
        jarFile = getJarFile(resource.toString());  <-- !!! problem here
        if (jarFile != null) {                
            break;
        }
        dirs.add(new File(resource.getFile()));  <-- !!! and here

-->
the deal is that when you convert URL to normal path through functions "getFile" and "toString" some special non-alphabetic characters will stay in URL form like space which was converted to %20 and that makes a incorrect path.

i changed it a little bit like this, so it is working well now.
-->

"D:\Science\Lessons\Compiler\Gold-Grammer\Ralph%20Iden%20Engine\JavaLib"

        URL resource = resources.nextElement();
        String path = URLDecoder.decode(resource.getFile(), "UTF-8");
        jarFile = getJarFile(path);
        if (jarFile != null) {                
            break;
        }
        dirs.add(new File(path));

Problem with the new lines ?

Hi,

I have a strange problem. At first it was with my grammar but I adapted the grammar of the sample2 to use a new line based grammar:

{WS} = {Whitespace} - {CR} - {LF}
Whitespace = {WS}+

NewLine = {CR}{LF}|{CR}

<nl> ::= NewLine <nl> !One or more
| NewLine

Finally, here the kind of code I used with this grammar:
assign n = 1
while n >= 1 do display n
assign n = n - 1
end
display 'Blast off!'

This code is working perfectly with Gold Parser Builder 5.2 but if I try it with the Iden Java Engine, I have this error:

2013-06-25 23:20:23 ERROR HtmlController:255 - Lexical error at line 1, column 13. Read (Error)
2013-06-25 23:20:23 ERROR HtmlController:256 - assign n = 1
while n >= 1 do display n
assign n = n - 1
end
display 'Blast off!'

I just have "(Error)" without a real reason of the error.

Is there a mistake in my grammar or one problem with the Iden Java Engine ?

Thank you for your help.

Regards,
Stef

Mismatch when compared between grammar builder parse tree & engine parse tree

Hi @ridencww ,

When using this grammar with the builder & the engine, few of the reductions go missing in the tree with the engine parser. Could you please take a look at it. I'm using goldparser for the first time, so please let me know if there is something wrong in my grammar implementation or engine usage.

MANUFACTURER 252,
DEVICE_TYPE 1,
DEVICE_REVISION 1,
DD_REVISION 1,
MANUFACTURER_EXT "xyz"

BLOCK DeviceBlock
{
TYPE PHYSICAL;
NUMBER 1;
}

The grammar is for EDDL(Electronic device description language). The grammar implementation is as seen below.

"Name" = 'EDDL grammar'
"Author" = 'Ashwin Jason Fernandes'
"Version" = 'The version of the grammar and/or language'
"About" = 'A short description of the grammar'

"Case Sensitive" = True
"Start Symbol" =

{Hex Digit} = {Digit} + [abcdefABCDEF]
{Oct Digit} = [01234567]

{String Ch} = {Printable} - ["]

{Id Head} = {Letter} + [_]
{Id Tail} = {Id Head} + {Digit}

DecLiteral = [123456789]{digit}*
HexLiteral = 0X | 0x{Hex Digit}+
OctLiteral = 0{Oct Digit}*
FloatLiteral = {Digit}*'.'{Digit}+

Id = {Id Head}{Id Tail}*

! ===================================================================
! Comments
! ===================================================================

Comment Start = '/'
Comment End = '/'
Comment Line = '//'

! -------------------------------------------------
! Character Sets
! -------------------------------------------------

{String Chars} = {Printable} + {HT} - ["]

! -------------------------------------------------
! Terminals
! -------------------------------------------------

!Identifier = {Letter}{AlphaNumeric}*
StringLiteral = '"' {String Chars}* '"'

! -------------------------------------------------
! Constants / Literals
! -------------------------------------------------

::= | StringLiteral

::= | FloatLiteral

::= DecLiteral | HexLiteral | OctLiteral

! -------------------------------------------------
! Rules
! -------------------------------------------------

! The grammar starts below
::= | |

! ===================================================================
! EDD Identification Declaration
! ===================================================================
::= ','
|

::=

::= MANUFACTURER
| DEVICE_TYPE
| DEVICE_REVISION
| DD_REVISION
| MANUFACTURER_EXT

::=
|
|
|
|
|
|

! ===================================================================
! Type Declaration
! ===================================================================

! ===================================================================
! Block Declaration
! ===================================================================

::= BLOCK Id '{' '}'

::= TYPE ';'
| NUMBER';'
|

::= PHYSICAL
| TRANSDUCER
| FUNCTION

! ===================================================================
! Variable Declaration
! ===================================================================

::= VARIABLE Id '{' '}'

::=
|
|
|
|
|
|
|

::= TYPE ';'
| TYPE '(' ')'';'
| TYPE '{''}'
| TYPE '(' ')''{''}'

::= CLASS ';'

::= CONTAINED
| DIAGNOSTIC
| LOCAL

::= LABEL StringLiteral';' | LABEL '['Id']'';'

::= HELP StringLiteral';' | HELP '['Id']'';'

::= CONSTANT_UNIT '['Id']'';' | CONSTANT_UNIT StringLiteral';'

::= '{'','StringLiteral'}'','|'{'','StringLiteral'}'

::= '{'','StringLiteral','StringLiteral',''}'','
| '{'','StringLiteral','StringLiteral',''}'

::= READ_TIMEOUT Id';'
| READ_TIMEOUT ';'

::= HANDLING ';'|HANDLING '&' ';'

::= READ
| WRITE

::= | '&' |

::= HARDWARE
| SOFTWARE
| CORRECTABLE
| UNCORRECTABLE

! ===================================================================
! Array Declaration
! ===================================================================

::= ARRAY Id '{' '}'

::= | TYPE Id ';' | NUMBER_OF_ELEMENTS ';' |

! ===================================================================
! Collection Declaration
! ===================================================================

::= COLLECTION Id '{' '}' | COLLECTION OF VARIABLE Id '{' '}'

::= | MEMBERS '{''}'|

::= Id','Id';'|

! ===================================================================
! List Declaration
! ===================================================================

::= LIST Id '{' '}'

! ===================================================================
! Command Declaration
! ===================================================================

::= COMMAND Id '{' '}'

::= BLOCK Id';'
| INDEX ';'
| NUMBER ';'
| OPERATION ';'
| TRANSACTION '{' '}'
| RESPONSE_CODES '{' '}'
|

::= READ
| WRITE
| COMMAND
| DATA_EXCHANGE

::= REQUEST '{''}' | REPLY '{''}' |

::= '['']' |

::= Id',' | Id |

::= ','','StringLiteral';' |

::= SUCCESS
| MISC_ERROR
| MISC_WARNING
| DATA_ENTRY_ERROR
| MODE_ERROR

::= ','StringLiteral |

! ===================================================================
! Component Declaration
! ===================================================================

::= COMPONENT Id '{' '}'

::=
|
| CAN_DELETE ';'
| CLASSIFICATION ';'
| DECLARATION '{' '}'
| PROTOCOL Id';'

::= TRUE | FALSE

::= NETWORK_COMPONENT

::=
|

The grammar looks a bit funny when you view it. It looks fine in the edit mode. If you have any problems, please let me know. Will mail you the grammar.

Regards,
Ashwin

In Linux the row (line) doesn't increment when the text has only new empty lines.

When I use your engine in Linux the row (line) doesn't increment when the text has only new empty lines.

For example:

A = 2
 
 
 
 
wrdjsadfklja

Error message: Syntax error at line 2, column 1.

Should be

Syntax error at line 6, column 1.

I suspect the issue is here:

Parser.java (class)

private void consumeBuffer(int count) {
if (count > 0 && count <= lookaheadBuffer.length()) {
// Adjust position
for (int i = 0; i < count; i++) {
char c = lookaheadBuffer.charAt(i);
if (c == 0x0A) {
if (sysPosition.getColumn() > 1) {
// Increment row if Unix EOLN (LF)
sysPosition.incrementLine();
}
} else if (c == 0x0D) {
sysPosition.incrementLine();
} else {
sysPosition.incrementColumn();
}
}

jar: prefix stripped away by resouce.getFile()

com.creativewidgetworks.goldparser.util.ResourceHelper
In method findClassesInPackage(String packageName)
resouce.getFile() strips away the "jar:" prefix.
The method getJarFile(String filePath) cannot find "jar:file:/" so it is not recognizing jar files.

In getJarFile( String filePath) replacing:
filePath = filePath.substring((filePath.indexOf("jar:file:/") + 9), filePath.indexOf('!'));
with:
filePath = filePath.substring( filePath.indexOf("/"), filePath.indexOf('!'));
Fixed it for me.
Thanks,
Kevin

What is wrong with ParserTest?

First of all thanks a lot for your work. This is a wonderful library. I am actually creating a GOLD parser engine for golang using your codebase as a standard as it has an extensive set of test cases. So while porting the codebase, I found out that one of the assert statements in ParserTest.java is commented out at Line 293. Further there was a comment saying

 will be null for 1.0 and 0 for 5.0

but actually when I un-commented it, I can see that value of groups.size() is 2 instead of 0 or null. So I was just wondering, is there any issue or bug behind it?

Moreover, I am just curious to know, if the fixes for the issues (reported by @nimatrueway) are in the master or not?

Thanks again for your work.

No rule handler for rule

I have got message

No rule handler for rule ::= Declaration Id sort .

But I have this handler (RuleHandler13.class) in my folder (\prenex_FOLsn) for handlers.

What should I do?

Is there a chance to resume the parsing process after an error?

Due to the restrictions of LALR(1) grammars, some GOLD grammars provoke ambiguous situations for totally legal code. It's not always straightforward how to tweak the grammar for the engine to cope, it may not even be possible.
As far as I can see, the engine always stops on detecting an error. As workaround, the parsed code may be preprocessed, of course, in a trial and error manner, which is very time-consuming for large source files.
In some cases, however, it might be relatively easy to intervene manually and interactively to advise the engine which way to go in order to rerail the parsing process and resume.
Might there be a chance to allow an embedding application to do this, e.g. to skip a line, assign a token type or to decide for a certain reduction among a limited choice, and to resume the parsing process from that very point?
See e.g. these issues for details: fesch/Structorizer.Desktop#470 and fesch/Structorizer.Desktop#472.
Regards, Kay

Resources are not properly loaded from jar files

I have a diff here that shows the change that works for me (I haven't tested it on Windows).

brweber2@ba09761

The leading '/' is stripped off the jar file path making it relative instead of absolute... so the jar file can't be found.

I just added 9 instead of 10 to remove the leading 'jar:file:' instead of 'jar:file:/' but I'm not sure if you'll ever see the double slash in a jar URI?

Interested in maven support?

I forked your project and added maven support in a branch. With a little bit of work to your Ant build script I think we could get this working with Ant and Maven. Is this something you are interested in?

generateTree(true) conflicts with reduction execution

The execution doesn't occur using the rule handlers if generateTree(true) is done.
Is this intentional for some reason ? if not then it will be nice to have both the tree and the reductions to be executed.

According to my hypothesis the following is the problem code in GOLDParser.class:

    /**
     * Base parser builds a tree of Reduction objects
     * Override to process reductions
     * @return Boolean to indicate if processing should stop (true) or continue (false). 
     */
    protected boolean processReduction() {
        if (!generateTree && ruleHandlers.size() > 0) {
            try {
                Reduction reduction = createInstance();
                setCurrentReduction(reduction);
            } catch (Throwable t) {
                addErrorMessage(t.getMessage());
                return true;
            }
        }
        return false;
    }

I think it can be corrected by doing the following:

    /**
     * Base parser builds a tree of Reduction objects
     * Override to process reductions
     * @return Boolean to indicate if processing should stop (true) or continue (false). 
     */
    protected boolean processReduction() {
        if ( ruleHandlers.size() > 0) {
            try {
                Reduction reduction = createInstance();
                setCurrentReduction(reduction);
            } catch (Throwable t) {
                addErrorMessage(t.getMessage());
                return true;
            }
        }
        return false;
    }

Documentation

may I ask where to find document for the engine?