Giter VIP home page Giter VIP logo

lug's Introduction

lug Build Status License

An embedded domain specific language for expressing parsers as extended parsing expression grammars (PEGs) in C++17

lug

Features

  • Natural syntax more akin to external parser generator languages
  • Separation of syntatic and lexical rules, with customizable implicit whitespace skipping
  • Direct and indirect left recursion with precedence levels to disambiguate subexpressions with mixed left/right recursion
  • Traditional PEG syntax has been extended to support attribute grammars
  • Cut operator to commit to currently matched parse prefix and prune all backtrack entries
  • Deferred evaluation of semantic actions, ensuring actions do not execute on failed branches or invalid input
  • Generated parsers are compiled to special-purpose bytecode and executed in a virtual parsing machine
  • UTF-8 text parsing with complete Level 1 and partial Level 2 support of the UTS #18 Unicode Regular Expressions technical standard
  • Automatic line and column tracking with customizable tab width and alignment
  • Uses expression template functors to implement the rules of the domain specific language
  • Header only library using C++17 language and library features
  • Relatively small with the intent of parser core to remain under 1500 lines of terse code

It is based on research introduced in the following papers:

Bryan Ford, Parsing expression grammars: a recognition-based syntactic foundation, Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, p.111-122, January 2004

Sérgio Medeiros et. al, A parsing machine for PEGs, Proceedings of the 2008 symposium on Dynamic Languages, p.1-12, July 2008

Kota Mizushima et. al, Packrat parsers can handle practical grammars in mostly constant space, Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, p.29-36, June 2010

Sérgio Medeiros et. al, Left recursion in Parsing Expression Grammars, Science of Computer Programming, v.96 n.P2, p.177-190, December 2014

Leonardo Reis et. al, The formalization and implementation of Adaptable Parsing Expression Grammars, Science of Computer Programming, v.96 n.P2, p.191-210, December 2014

Sérgio Medeiros et. al, A parsing machine for parsing expression grammars with labeled failures, Proceedings of the 31st Annual ACM symposium on Applied Computing, p.1960-1967, April 2016

Building

As a header only library, lug itself does not need to be built. Simply ensure the lug header directory is in your include path and you're good to go.

As a baseline, the following compiler versions are known to work with lug.

Compiler Language Mode
Clang 5.0.0 (September 2017) -std=c++17 or -std=gnu++17
GCC 7.1.0 (May 2017) -std=c++17 or -std=gnu++17
Microsoft Visual C++ 2017 15.5 (December 2017) Platform Toolset: Visual Studio 2017 Toolset (v141), Language Standard: ISO C++17 Standard (/std:c++17)

To build the sample programs and unit tests, a makefile is provided for Linux and BSD platforms and a Visual Studio solution is available for use on Windows.

Syntax Reference

Operator Syntax
Sequence e1 > e2
Ordered Choice e1 | e2
Zero-or-More *e
One-or-More +e
Optional ~e
Positive Lookahead &e
Negative Lookahead !e
Terminal Description
chr(c) Matches the UTF-8, UTF-16, or UTF-32 character c
chr(c1, c2) Matches characters in the UTF-8, UTF-16, or UTF-32 interval [c1-c2]
str(s) Matches the sequence of characters in a string
bre(s) POSIX Basic Regular Expression (BRE)
any Matches any single character
any(flags) Matches a character exhibiting any of the character properties
all(flags) Matches a character with all of the character properties
none(flags) Matches a character with none of the character properties
eps Matches the empty string
eoi Matches the end of the input sequence
eol Matches a Unicode line-ending
nop No operation, does not emit any instructions
cut Emits a cut operation into the stream of semantic actions without matching
Literal Name Description
_cx Character Expression Matches the UTF-8, UTF-16, or UTF-32 character literal
_sx String Expression Matches the sequence of characters in a string literal
_rx Regular Expression POSIX Basic Regular Expression (BRE)
_icx Case Insensitive Character Expression Same as _cx but case insensitive
_isx Case Insensitive String Expression Same as _sx but case insensitive
_irx Case Insensitive Regular Expression Same as _rx but case insensitive
_scx Case Sensitive Character Expression Same as _cx but case sensitive
_ssx Case Sensitive String Expression Same as _sx but case sensitive
_srx Case Sensitive Regular Expression Same as _rx but case sensitive

TODO

  • parser error recovery
  • add an interactive processing mode flag to input sources?
  • handle exceptions thrown from semantic actions in semantics::accept?
  • feature: symbol tables and parsing conditions
  • feature: Adams-Nestra grammars and whitespace alignment
  • feature: syntax to specify number range of allowed iteration
  • optimization: tail recursion
  • optimization: reduce number of false-positive left-recursive calls even further by lazily evaluating rule mandate
  • optimization: additional instructions (test_char, test_any, test_range, test_class)
  • more samples, testing, and bug fixing
  • increase compiler warning level and fix any issues
  • documentation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.