Giter VIP home page Giter VIP logo

Comments (5)

t-h-e avatar t-h-e commented on August 28, 2024

Wouldn't it be easiest to let regular expressions do the dirty work? I use it for a C# implementation

You can have:

  • production separators in multiple lines
  • comments at the end of any line
  • single quotation within double quotation and vice versa
  • any characters can be used in quotation, even separators '|'

Additionally, the code becomes more readable as well as maintainable and it is not as error prone.

Examples on parsing some of my grammars can be found here:

Feel free to improve the regular expressions and give feedback.

from re import finditer, DOTALL, MULTILINE

    ruleregex = '(?P<rulename><\S+>)\s*::=\s*(?P<production>(?:(?=\#)\#[^\r\n]*|(?!<\S+>\s*::=).+?)+)'
    productionregex = '(?=\#)(?:\#.*$)|(?!\#)\s*(?P<production>(?:[^\'\"\|\#]+|\'.*?\'|".*?")+)'
    productionpartsregex = '\ *([\r\n]+)\ *|([^\'"<\r\n]+)|\'(.*?)\'|"(.*?)"|(?P<subrule><[^>|\s]+>)|([<]+)'

    def read_bnf_file(self, file_name):
        with open(file_name, 'r') as bnf:
            content = bnf.read()
            for rule in finditer(self.ruleregex, content, DOTALL):
                if self.start_rule is None:
                    self.start_rule = (rule.group('rulename'), self.NT)
                self.non_terminals[rule.group('rulename')] = {'id': rule.group('rulename'),
                                                              'min_steps': 9999999999999,
                                                              'expanded': False,
                                                              'recursive': True,
                                                              'permutations': None,
                                                              'b_factor': 0}
                tmp_productions = []
                for p in finditer(self.productionregex, rule.group('production'), MULTILINE):
                    if p.group('production') is None or p.group('production').isspace():
                        continue
                    tmp_production = []
                    terminalparts = ''
                    for sub_p in finditer(self.productionpartsregex, p.group('production').strip()):
                        if sub_p.group('subrule'):
                            if terminalparts:
                                symbol = [terminalparts, self.T, 0, False]
                                tmp_production.append(symbol)
                                self.terminals.append(terminalparts)
                                terminalparts = ''
                            tmp_production.append([sub_p.group('subrule'), self.NT])
                        else:
                            terminalparts += ''.join([part for part in sub_p.groups() if part])

                    if terminalparts:
                        symbol = [terminalparts, self.T, 0, False]
                        tmp_production.append(symbol)
                        self.terminals.append(terminalparts)
                    tmp_productions.append(tmp_production)

                if not rule.group('rulename') in self.rules:
                    self.rules[rule.group('rulename')] = tmp_productions
                    if len(tmp_productions) == 1:
                        print("Warning: Grammar contains unit production "
                              "for production rule", rule.group('rulename'))
                        print("       Unit productions consume GE codons.")
                else:
                    raise ValueError("lhs should be unique", rule.group('rulename'))

from ponyge2.

mikefenton avatar mikefenton commented on August 28, 2024

Any consensus on these suggestions? James/Dave, what are your thoughts?

from ponyge2.

dvpfagan avatar dvpfagan commented on August 28, 2024

Stefans seems good

On Thursday, 15 September 2016, Michael Fenton [email protected]
wrote:

Any consensus on these suggestions? James/Dave, what are your thoughts?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/jmmcd/PonyGE2/issues/16#issuecomment-247332856, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA8zuk6Du0lR9SztCbAl3TTQN1R6epFRks5qqUyngaJpZM4Jy4Vm
.

from ponyge2.

jmmcd avatar jmmcd commented on August 28, 2024

If we can replace some manual parsing with an RE then we should.

On 15 September 2016 at 17:01, Dave [email protected] wrote:

Stefans seems good

On Thursday, 15 September 2016, Michael Fenton [email protected]
wrote:

Any consensus on these suggestions? James/Dave, what are your thoughts?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/jmmcd/PonyGE2/issues/16#issuecomment-247332856, or
mute
the thread
<https://github.com/notifications/unsubscribe-auth/
AA8zuk6Du0lR9SztCbAl3TTQN1R6epFRks5qqUyngaJpZM4Jy4Vm>
.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/jmmcd/PonyGE2/issues/16#issuecomment-247371563, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAKQBlNDrRLnUXqxzFMfTJyx2Mnsfvzdks5qqWvUgaJpZM4Jy4Vm
.

Dr. James McDermott
Lecturer in Business Analytics
Programme Director, MSc in Business Analytics
D209 UCD Michael Smurfit Graduate Business School
College of Business, University College Dublin, Ireland.
Phone +353 1 716 8031
http://jmmcd.net
http://www.ucd.ie/cba/members/jamesmcdermott/

from ponyge2.

mikefenton avatar mikefenton commented on August 28, 2024

Stefan's RegEx version is now implemented. The old system still exists under then name old_read_bnf_file(), can be removed easily.

from ponyge2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.