Giter VIP home page Giter VIP logo

lex's Introduction

foonathan/lex

Build Status

Note: This project is currently WIP, no guarantees are made until an 0.1 release.

This library is a C++14 constexpr tokenization and (in the future) parsing library. The tokens are specified in the type system so they are available at compile-time. With this information a trie is constructed that efficiently matches the input.

Basic Example

The tokens for a simple calculator:

using tokens = lex::token_spec<struct variable, struct plus, struct minus, …>;

struct variable : lex::rule_token<variable, tokens>
{
    static constexpr auto rule() const noexcept
    {
        // variables consists of one or more characters
        return lex::token_rule::plus(lex::ascii::is_alpha);
    }
};

struct plus : lex::literal_token<'+'>
{};

struct minus : lex::literal_token<'-'>
{};

See example/ctokenizer.cpp for an annotated example and tutorial.

Features

  • Declarative token specification: No need to worry about ordering or implementing lexing by hand.
  • Fast: Performance is comparable or faster to a handwritten state machine, see benchmarks.
  • Lightweight: No memory allocation, tokens are just string views into the input.
  • Lazy: The lex::tokenizer will just tokenize the next token in the input.
  • Fully constexpr: The entire lexing can happen at compile-time.
  • Flexible error handling: On invalid input, a lex::error_token is created consuming one characters. The parser can then decide how an error should be handled.

FAQ

Q: Isn't the name lex already taken?

A: It is. That's why the library is called foonathan/lex. In my defense, naming is hard. I could come up with some cute name, but then its not really descriptive. If you know foonathan/lex, you know what the project is about.

Q: Sounds great, but what about compile-time?

A: Compiling the foonathan_lex_ctokenizer target, which contains an implementation of a tokenizer for C (modulo some details), takes under three seconds. Just including <iostream> takes about half a second, including <iostream> and <regex> takes about two seconds. So the compile time is noticeable, but as a tokenizer will not be used in a lot of files of the project and rarely changes, acceptable.

In the future, I will probably look at optimizing it as well.

Q: My lex::rule_token doesn't seem to be matched?

A: This could be due to one of two things:

  • Multiple rule tokens would match the input. Then the tokenizer just picks the one that comes first. Make sure that all rule tokens are mutually exclusive, maybe by using lex::null_token and creating them all in one place at necessary. See int_literal and float_literal in the C tokenizer for an example.
  • A literal token is a prefix of the rule token, e.g. a C comment /* … */ and the / operator are in conflict. By default, the literal token is preferred in that case. Implement is_conflicting_literal() in your rule token as done by the comment token in the C tokenizer.

A mode to test for this issues is planned.

Q: The lex::tokenizer gives me just the next token, how do I implement lookahead for specific tokens?

A: Simple call get() until you've reached the token you want to lookahead, then reset() the tokenizer to the earlier position.

Q: How does it compare to compile-time-regular-expressions?

A: That project implements a RegEx parser at compile-time, which can be used to match strings. foonathan/lex is project is purely designed to tokenize strings. You could implement a tokenizer with the compile-time RegEx but I have choosen a different approach.

Q: How does it compare to PEGTL?

A: That project implements matching parsing expression grammars (PEGs), which are a more powerful RegEx, basically. On top of that they've implemented a parsing interface, so you can create a parse tree, for example. foonathan/lex currently does just tokenization, but I plan on adding parse rules on top of the tokens later on. Complex tokens in foonathan/lex can be described using PEG as well, but here the PEGs are described using operator overloading and functions, and in PEGTL they are described by the type system.

Q: It breaks when I do this!

A: Don't do that. And file an issue (or a PR, I have a lot of other projects...).

Q: This is awesome!

A: Thanks. I do have a Patreon page, so consider checking it out:

Patreon

Documentation

A full reference documentation is WIP, look at the comments in the header files for now.

Compiler Support

The library requires a C++14 compiler with reasonable constexpr support. Compilers that are being tested on CI:

  • Linux:
    • GCC 5 to 8, but compile-time tokenization is not support for GCC < 8 (still works at runtime)
    • clang 4 to 7
  • MacOS:
    • XCode 9 and 10
  • Windows:
    • Visual Studio 2017

Installation

The library is header-only and has only my debug_assert library as dependency.

If you use CMake, debug_assert will be cloned automatically if not installed on the system. You can use it with add_subdirectory() or install it and use find_package(foonathan_lex), then link to foonathan::foonathan_lex and everything will be setup automatically.

Planned Features

  • Parser on top of the tokenizer
  • Integrated way to handle data associated with tokens (like the value of an integer literal)
  • Optimize compile-time

lex's People

Contributors

azure-pipelines[bot] avatar foonathan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.