Giter VIP home page Giter VIP logo

participle's Introduction

A dead simple parser package for Go

PkgGoDev GHA Build Go Report Card Slack chat

V2

This is version 2 of Participle.

It can be installed with:

$ go get github.com/alecthomas/participle/v2@latest

The latest version from v0 can be installed via:

$ go get github.com/alecthomas/participle@latest

Introduction

The goal of this package is to provide a simple, idiomatic and elegant way of defining parsers in Go.

Participle's method of defining grammars should be familiar to any Go programmer who has used the encoding/json package: struct field tags define what and how input is mapped to those same fields. This is not unusual for Go encoders, but is unusual for a parser.

Tutorial

A tutorial is available, walking through the creation of an .ini parser.

Tag syntax

Participle supports two forms of struct tag grammar syntax.

The easiest to read is when the grammar uses the entire struct tag content, eg.

Field string `@Ident @("," Ident)*`

However, this does not coexist well with other tags such as JSON, etc. and may cause issues with linters. If this is an issue then you can use the parser:"" tag format. In this case single quotes can be used to quote literals making the tags somewhat easier to write, eg.

Field string `parser:"@ident (',' Ident)*" json:"field"`

Overview

A grammar is an annotated Go structure used to both define the parser grammar, and be the AST output by the parser. As an example, following is the final INI parser from the tutorial.

type INI struct {
  Properties []*Property `@@*`
  Sections   []*Section  `@@*`
}

type Section struct {
  Identifier string      `"[" @Ident "]"`
  Properties []*Property `@@*`
}

type Property struct {
  Key   string `@Ident "="`
  Value *Value `@@`
}

type Value struct {
  String *string  `  @String`
  Float *float64  `| @Float`
  Int    *int     `| @Int`
}

Note: Participle also supports named struct tags (eg. Hello string `parser:"@Ident"`).

A parser is constructed from a grammar and a lexer:

parser, err := participle.Build[INI]()

Once constructed, the parser is applied to input to produce an AST:

ast, err := parser.ParseString("", "size = 10")
// ast == &INI{
//   Properties: []*Property{
//     {Key: "size", Value: &Value{Int: &10}},
//   },
// }

Grammar syntax

Participle grammars are defined as tagged Go structures. Participle will first look for tags in the form parser:"...". It will then fall back to using the entire tag body.

The grammar format is:

  • @<expr> Capture expression into the field.
  • @@ Recursively capture using the fields own type.
  • <identifier> Match named lexer token.
  • ( ... ) Group.
  • "..." or '...' Match the literal (note that the lexer must emit tokens matching this literal exactly).
  • "...":<identifier> Match the literal, specifying the exact lexer token type to match.
  • <expr> <expr> ... Match expressions.
  • <expr> | <expr> | ... Match one of the alternatives. Each alternative is tried in order, with backtracking.
  • ~<expr> Match any token that is not the start of the expression (eg: @~";" matches anything but the ; character into the field).
  • (?= ... ) Positive lookahead group - requires the contents to match further input, without consuming it.
  • (?! ... ) Negative lookahead group - requires the contents not to match further input, without consuming it.

The following modifiers can be used after any expression:

  • * Expression can match zero or more times.
  • + Expression must match one or more times.
  • ? Expression can match zero or once.
  • ! Require a non-empty match (this is useful with a sequence of optional matches eg. ("a"? "b"? "c"?)!).

Notes:

  • Each struct is a single production, with each field applied in sequence.
  • @<expr> is the mechanism for capturing matches into the field.
  • if a struct field is not keyed with "parser", the entire struct tag will be used as the grammar fragment. This allows the grammar syntax to remain clear and simple to maintain.

Capturing

Prefixing any expression in the grammar with @ will capture matching values for that expression into the corresponding field.

For example:

// The grammar definition.
type Grammar struct {
  Hello string `@Ident`
}

// The source text to parse.
source := "world"

// After parsing, the resulting AST.
result == &Grammar{
  Hello: "world",
}

For slice and string fields, each instance of @ will accumulate into the field (including repeated patterns). Accumulation into other types is not supported.

For integer and floating point types, a successful capture will be parsed with strconv.ParseInt() and strconv.ParseFloat() respectively.

A successful capture match into a bool field will set the field to true.

Tokens can also be captured directly into fields of type lexer.Token and []lexer.Token.

Custom control of how values are captured into fields can be achieved by a field type implementing the Capture interface (Capture(values []string) error).

Additionally, any field implementing the encoding.TextUnmarshaler interface will be capturable too. One caveat is that UnmarshalText() will be called once for each captured token, so eg. @(Ident Ident Ident) will be called three times.

Capturing boolean value

By default, a boolean field is used to indicate that a match occurred, which turns out to be much more useful and common in Participle than parsing true or false literals. For example, parsing a variable declaration with a trailing optional syntax:

type Var struct {
  Name string `"var" @Ident`
  Type string `":" @Ident`
  Optional bool `@"?"?`
}

In practice this gives more useful ASTs. If bool were to be parsed literally then you'd need to have some alternate type for Optional such as string or a custom type.

To capture literal boolean values such as true or false, implement the Capture interface like so:

type Boolean bool

func (b *Boolean) Capture(values []string) error {
	*b = values[0] == "true"
	return nil
}

type Value struct {
	Float  *float64 `  @Float`
	Int    *int     `| @Int`
	String *string  `| @String`
	Bool   *Boolean `| @("true" | "false")`
}

"Union" types

A very common pattern in parsers is "union" types, an example of which is shown above in the Value type. A common way of expressing this in Go is via a sealed interface, with each member of the union implementing this interface.

eg. this is how the Value type could be expressed in this way:

type Value interface { value() }

type Float struct { Value float64 `@Float` }
func (f Float) value() {}

type Int struct { Value int `@Int` }
func (f Int) value() {}

type String struct { Value string `@String` }
func (f String) value() {}

type Bool struct { Value Boolean `@("true" | "false")` }
func (f Bool) value() {}

Thanks to the efforts of Jacob Ryan McCollum, Participle now supports this pattern. Simply construct your parser with the Union[T](member...T) option, eg.

parser := participle.MustBuild[AST](participle.Union[Value](Float{}, Int{}, String{}, Bool{}))

Custom parsers may also be defined for union types with the ParseTypeWith option.

Custom parsing

There are three ways of defining custom parsers for nodes in the grammar:

  1. Implement the Capture interface.
  2. Implement the Parseable interface.
  3. Use the ParseTypeWith option to specify a custom parser for union interface types.

Lexing

Participle relies on distinct lexing and parsing phases. The lexer takes raw bytes and produces tokens which the parser consumes. The parser transforms these tokens into Go values.

The default lexer, if one is not explicitly configured, is based on the Go text/scanner package and thus produces tokens for C/Go-like source code. This is surprisingly useful, but if you do require more control over lexing the included stateful participle/lexer lexer should cover most other cases. If that in turn is not flexible enough, you can implement your own lexer.

Configure your parser with a lexer using the participle.Lexer() option.

To use your own Lexer you will need to implement two interfaces: Definition (and optionally StringsDefinition and BytesDefinition) and Lexer.

Stateful lexer

In addition to the default lexer, Participle includes an optional stateful/modal lexer which provides powerful yet convenient construction of most lexers. (Notably, indentation based lexers cannot be expressed using the stateful lexer -- for discussion of how these lexers can be implemented, see #20).

It is sometimes the case that a simple lexer cannot fully express the tokens required by a parser. The canonical example of this is interpolated strings within a larger language. eg.

let a = "hello ${name + ", ${last + "!"}"}"

This is impossible to tokenise with a normal lexer due to the arbitrarily deep nesting of expressions. To support this case Participle's lexer is now stateful by default.

The lexer is a state machine defined by a map of rules keyed by the state name. Each rule within the state includes the name of the produced token, the regex to match, and an optional operation to apply when the rule matches.

As a convenience, any Rule starting with a lowercase letter will be elided from output, though it is recommended to use participle.Elide() instead, as it better integrates with the parser.

Lexing starts in the Root group. Each rule is matched in order, with the first successful match producing a lexeme. If the matching rule has an associated Action it will be executed.

A state change can be introduced with the Action Push(state). Pop() will return to the previous state.

To reuse rules from another state, use Include(state).

A special named rule Return() can also be used as the final rule in a state to always return to the previous state.

As a special case, regexes containing backrefs in the form \N (where N is a digit) will match the corresponding capture group from the immediate parent group. This can be used to parse, among other things, heredocs. See the tests for an example of this, among others.

Example stateful lexer

Here's a cut down example of the string interpolation described above. Refer to the stateful example for the corresponding parser.

var lexer = lexer.Must(Rules{
	"Root": {
		{`String`, `"`, Push("String")},
	},
	"String": {
		{"Escaped", `\\.`, nil},
		{"StringEnd", `"`, Pop()},
		{"Expr", `\${`, Push("Expr")},
		{"Char", `[^$"\\]+`, nil},
	},
	"Expr": {
		Include("Root"),
		{`whitespace`, `\s+`, nil},
		{`Oper`, `[-+/*%]`, nil},
		{"Ident", `\w+`, nil},
		{"ExprEnd", `}`, Pop()},
	},
})

Example simple/non-stateful lexer

Other than the default and stateful lexers, it's easy to define your own stateless lexer using the lexer.MustSimple() and lexer.NewSimple() functions. These functions accept a slice of lexer.SimpleRule{} objects consisting of a key and a regex-style pattern.

Note: The stateful lexer replaces the old regex lexer.

For example, the lexer for a form of BASIC:

var basicLexer = lexer.MustSimple([]lexer.SimpleRule{
    {"Comment", `(?i)rem[^\n]*`},
    {"String", `"(\\"|[^"])*"`},
    {"Number", `[-+]?(\d*\.)?\d+`},
    {"Ident", `[a-zA-Z_]\w*`},
    {"Punct", `[-[!@#$%^&*()+_={}\|:;"'<,>.?/]|]`},
    {"EOL", `[\n\r]+`},
    {"whitespace", `[ \t]+`},
})

Experimental - code generation

Participle v2 now has experimental support for generating code to perform lexing.

This will generally provide around a 10x improvement in lexing performance while producing O(1) garbage.

To use:

  1. Serialize the stateful lexer definition to a JSON file (pass to json.Marshal).
  2. Run the participle command (see scripts/participle) to generate go code from the lexer JSON definition. For example:
participle gen lexer <package name> [--name SomeCustomName] < mylexer.json | gofmt > mypackage/mylexer.go

(see genLexer in conformance_test.go for a more detailed example)

  1. When constructing your parser, use the generated lexer for your lexer definition, such as:
var ParserDef = participle.MustBuild[someGrammer](participle.Lexer(mylexer.SomeCustomnameLexer))

Consider contributing to the tests in conformance_test.go if they do not appear to cover the types of expressions you are using the generated lexer.

Known limitations of the code generated lexer:

  • The lexer is always greedy. e.g., the regex "[A-Z][A-Z][A-Z]?T" will not match "EST" in the generated lexer because the quest operator is a greedy match and does not "give back" to try other possibilities; you can overcome by using | if you have a non-greedy match, e.g., "[A-Z][A-Z]|(?:[A-Z]T|T)" will produce correct results in both lexers (see #276 for more detail); this limitation allows the generated lexer to be very fast and memory efficient
  • Backreferences in regular expressions are not currently supported

Options

The Parser's behaviour can be configured via Options.

Examples

There are several examples included, some of which are linked directly here. These examples should be run from the _examples subdirectory within a cloned copy of this repository.

Example Description
BASIC A lexer, parser and interpreter for a rudimentary dialect of BASIC.
EBNF Parser for the form of EBNF used by Go.
Expr A basic mathematical expression parser and evaluator.
GraphQL Lexer+parser for GraphQL schemas
HCL A parser for the HashiCorp Configuration Language.
INI An INI file parser.
Protobuf A full Protobuf version 2 and 3 parser.
SQL A very rudimentary SQL SELECT parser.
Stateful A basic example of a stateful lexer and corresponding parser.
Thrift A full Thrift parser.
TOML A TOML parser.

Included below is a full GraphQL lexer and parser:

package main

import (
	"fmt"
	"os"

	"github.com/alecthomas/kong"
	"github.com/alecthomas/repr"

	"github.com/alecthomas/participle/v2"
	"github.com/alecthomas/participle/v2/lexer"
)

type File struct {
	Entries []*Entry `@@*`
}

type Entry struct {
	Type   *Type   `  @@`
	Schema *Schema `| @@`
	Enum   *Enum   `| @@`
	Scalar string  `| "scalar" @Ident`
}

type Enum struct {
	Name  string   `"enum" @Ident`
	Cases []string `"{" @Ident* "}"`
}

type Schema struct {
	Fields []*Field `"schema" "{" @@* "}"`
}

type Type struct {
	Name       string   `"type" @Ident`
	Implements string   `( "implements" @Ident )?`
	Fields     []*Field `"{" @@* "}"`
}

type Field struct {
	Name       string      `@Ident`
	Arguments  []*Argument `( "(" ( @@ ( "," @@ )* )? ")" )?`
	Type       *TypeRef    `":" @@`
	Annotation string      `( "@" @Ident )?`
}

type Argument struct {
	Name    string   `@Ident`
	Type    *TypeRef `":" @@`
	Default *Value   `( "=" @@ )`
}

type TypeRef struct {
	Array       *TypeRef `(   "[" @@ "]"`
	Type        string   `  | @Ident )`
	NonNullable bool     `( @"!" )?`
}

type Value struct {
	Symbol string `@Ident`
}

var (
	graphQLLexer = lexer.MustSimple([]lexer.Rule{
		{"Comment", `(?:#|//)[^\n]*\n?`, nil},
		{"Ident", `[a-zA-Z]\w*`, nil},
		{"Number", `(?:\d*\.)?\d+`, nil},
		{"Punct", `[-[!@#$%^&*()+_={}\|:;"'<,>.?/]|]`, nil},
		{"Whitespace", `[ \t\n\r]+`, nil},
	})
	parser = participle.MustBuild[File](
		participle.Lexer(graphQLLexer),
		participle.Elide("Comment", "Whitespace"),
		participle.UseLookahead(2),
	)
)

var cli struct {
	EBNF  bool     `help"Dump EBNF."`
	Files []string `arg:"" optional:"" type:"existingfile" help:"GraphQL schema files to parse."`
}

func main() {
	ctx := kong.Parse(&cli)
	if cli.EBNF {
		fmt.Println(parser.String())
		ctx.Exit(0)
	}
	for _, file := range cli.Files {
		r, err := os.Open(file)
		ctx.FatalIfErrorf(err)
		ast, err := parser.Parse(file, r)
		r.Close()
		repr.Println(ast)
		ctx.FatalIfErrorf(err)
	}
}

Performance

One of the included examples is a complete Thrift parser (shell-style comments are not supported). This gives a convenient baseline for comparing to the PEG based pigeon, which is the parser used by go-thrift. Additionally, the pigeon parser is utilising a generated parser, while the participle parser is built at run time.

You can run the benchmarks yourself, but here's the output on my machine:

BenchmarkParticipleThrift-12    	   5941	   201242 ns/op	 178088 B/op	   2390 allocs/op
BenchmarkGoThriftParser-12      	   3196	   379226 ns/op	 157560 B/op	   2644 allocs/op

On a real life codebase of 47K lines of Thrift, Participle takes 200ms and go- thrift takes 630ms, which aligns quite closely with the benchmarks.

Concurrency

A compiled Parser instance can be used concurrently. A LexerDefinition can be used concurrently. A Lexer instance cannot be used concurrently.

Error reporting

There are a few areas where Participle can provide useful feedback to users of your parser.

  1. Errors returned by Parser.Parse*() will be:
    1. Of type Error. This will contain positional information where available.
    2. May either be ParseError or lexer.Error
  2. Participle will make a best effort to return as much of the AST up to the error location as possible.
  3. Any node in the AST containing a field Pos lexer.Position 1 will be automatically populated from the nearest matching token.
  4. Any node in the AST containing a field EndPos lexer.Position 1 will be automatically populated from the token at the end of the node.
  5. Any node in the AST containing a field Tokens []lexer.Token will be automatically populated with all tokens captured by the node, including elided tokens.

These related pieces of information can be combined to provide fairly comprehensive error reporting.

Comments

Comments can be difficult to capture as in most languages they may appear almost anywhere. There are three ways of capturing comments, with decreasing fidelity.

The first is to elide tokens in the parser, then add Tokens []lexer.Token as a field to each AST node. Comments will be included. This has the downside that there's no straightforward way to know where the comments are relative to non-comment tokens in that node.

The second way is to not elide comment tokens, and explicitly capture them at every location in the AST where they might occur. This has the downside that unless you place these captures in every possible valid location, users might insert valid comments that then fail to parse.

The third way is to elide comment tokens and capture them where they're semantically meaningful, such as for documentation comments. Participle supports explicitly matching elided tokens for this purpose.

Limitations

Internally, Participle is a recursive descent parser with backtracking (see UseLookahead(K)).

Among other things, this means that Participle grammars do not support left recursion. Left recursion must be eliminated by restructuring your grammar.

EBNF

The old EBNF lexer was removed in a major refactoring at 362b26 -- if you have an EBNF grammar you need to implement, you can either translate it into regex-style lexer.Rule{} syntax or implement your own EBNF lexer you might be able to use the old EBNF lexer -- as a starting point.

Participle supports outputting an EBNF grammar from a Participle parser. Once the parser is constructed simply call String().

Participle also includes a parser for this form of EBNF (naturally).

eg. The GraphQL example gives in the following EBNF:

File = Entry* .
Entry = Type | Schema | Enum | "scalar" ident .
Type = "type" ident ("implements" ident)? "{" Field* "}" .
Field = ident ("(" (Argument ("," Argument)*)? ")")? ":" TypeRef ("@" ident)? .
Argument = ident ":" TypeRef ("=" Value)? .
TypeRef = "[" TypeRef "]" | ident "!"? .
Value = ident .
Schema = "schema" "{" Field* "}" .
Enum = "enum" ident "{" ident* "}" .

Syntax/Railroad Diagrams

Participle includes a command-line utility to take an EBNF representation of a Participle grammar (as returned by Parser.String()) and produce a Railroad Diagram using tabatkins/railroad-diagrams.

Here's what the GraphQL grammar looks like:

EBNF Railroad Diagram

Footnotes

  1. Either the concrete type or a type convertible to it, allowing user defined types to be used. โ†ฉ โ†ฉ2

participle's People

Contributors

alecthomas avatar alexpantyukhin avatar arve0 avatar asmaloney avatar ceymard avatar cybrcodr avatar daemon63 avatar gordallott avatar hinshun avatar jenserat avatar klondikedragon avatar klvnptr avatar lmbarros avatar lmrcarneiro avatar marco-m avatar mathuin avatar maxim-ge avatar mccolljr avatar mewmew avatar pashugan avatar petee-d avatar renovate[bot] avatar robfordww avatar scudette avatar sleepinggenius2 avatar spatecon avatar stevegt avatar tooolbox avatar wkalt avatar zhuliquan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

participle's Issues

Several examples don't compile

When trying to understand why my own parser doesn't work, I tried building the examples to better interact with something that should work. However, several don't build:

# github.com/alecthomas/participle/_examples/basic
./main.go:33:26: not enough arguments in call to participle.UseLookahead
        have ()
        want (int)
# github.com/alecthomas/participle/_examples/ini
./main.go:44:88: cannot use iniLexer (type lexer.Definition) as type string in argument to participle.Unquote
# github.com/alecthomas/participle/_examples/expr2
./main.go:71:66: not enough arguments in call to participle.UseLookahead
        have ()
        want (int)

Would it make sense to build the examples as part of CI to ensure they stay up-to-date when there are changes?

Bug in sql parser

I was just playing with the examples and noticed that the sql parser is a little broken:

โŸซ go run src/github.com/alecthomas/participle/_examples/sql/main.go "select * from foo where user = 1 + 2 + 3"
main: error: <source>:1:38: unexpected token "+"
exit status 1

@@ panics if struct has no fields

package main

import "github.com/alecthomas/participle"

type Grammar struct {
	Foo Empty `@@`
}

type Empty struct {
}

func main() {
	_ = participle.MustBuild(&Grammar{}, nil)
}
panic: reflect: Field index out of bounds [recovered]
	panic: reflect: Field index out of bounds [recovered]
	panic: reflect: Field index out of bounds [recovered]
	panic: reflect: Field index out of bounds [recovered]
	panic: reflect: Field index out of bounds

goroutine 1 [running]:
github.com/alecthomas/participle.recoverToError(0xc42004df30)
	/home/tv/go/src/github.com/alecthomas/participle/nodes.go:50 +0x142
panic(0x4ac0c0, 0x4e2810)
	/home/tv/src/go-1.10/src/runtime/panic.go:502 +0x229
github.com/alecthomas/participle.decorate(0x4a11e9, 0x7)
	/home/tv/go/src/github.com/alecthomas/participle/nodes.go:37 +0x2c8
panic(0x4ac0c0, 0x4e2810)
	/home/tv/src/go-1.10/src/runtime/panic.go:502 +0x229
github.com/alecthomas/participle.decorate(0x49c9b9, 0x3)
	/home/tv/go/src/github.com/alecthomas/participle/nodes.go:37 +0x2c8
panic(0x4ac0c0, 0x4e2810)
	/home/tv/src/go-1.10/src/runtime/panic.go:502 +0x229
github.com/alecthomas/participle.decorate(0x49e825, 0x5)
	/home/tv/go/src/github.com/alecthomas/participle/nodes.go:37 +0x2c8
panic(0x4ac0c0, 0x4e2810)
	/home/tv/src/go-1.10/src/runtime/panic.go:502 +0x229
reflect.(*structType).Field(0x4b4a20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/tv/src/go-1.10/src/reflect/type.go:1231 +0x1f7
reflect.(*rtype).Field(0x4b4a20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/tv/src/go-1.10/src/reflect/type.go:966 +0x8c
github.com/alecthomas/participle.lexStruct(0x4e3620, 0x4b4a20, 0xc42004d868)
	/home/tv/go/src/github.com/alecthomas/participle/struct.go:19 +0x52
github.com/alecthomas/participle.(*generatorContext).parseType(0xc42008a020, 0x4e3620, 0x4b4a20, 0x0, 0x0)
	/home/tv/go/src/github.com/alecthomas/participle/grammar.go:43 +0x2f7
github.com/alecthomas/participle.(*generatorContext).parseCapture(0xc42008a020, 0xc4200822d0, 0xc420094020, 0x1)
	/home/tv/go/src/github.com/alecthomas/participle/grammar.go:128 +0xfe
github.com/alecthomas/participle.(*generatorContext).parseTerm(0xc42008a020, 0xc4200822d0, 0xc420094020, 0x1)
	/home/tv/go/src/github.com/alecthomas/participle/grammar.go:102 +0x18c
github.com/alecthomas/participle.(*generatorContext).parseSequence(0xc42008a020, 0xc4200822d0, 0x2, 0x425aa0)
	/home/tv/go/src/github.com/alecthomas/participle/grammar.go:79 +0xa6
github.com/alecthomas/participle.(*generatorContext).parseExpression(0xc42008a020, 0xc4200822d0, 0x49c9b9, 0x3)
	/home/tv/go/src/github.com/alecthomas/participle/grammar.go:59 +0x84
github.com/alecthomas/participle.(*generatorContext).parseType(0xc42008a020, 0x4e3620, 0x4a6f20, 0x0, 0x0)
	/home/tv/go/src/github.com/alecthomas/participle/grammar.go:45 +0x38c
github.com/alecthomas/participle.Build(0x4a6f20, 0x56f788, 0x4e2ea0, 0x56f600, 0x0, 0x0, 0x0)
	/home/tv/go/src/github.com/alecthomas/participle/parser.go:40 +0xce
github.com/alecthomas/participle.MustBuild(0x4a6f20, 0x56f788, 0x0, 0x0, 0xc42007e058)
	/home/tv/go/src/github.com/alecthomas/participle/parser.go:21 +0x49
main.main()
	/home/tv/go/src/eagain.net/2018/learn-participle/bug.go:13 +0x45
exit status 2

Recognize longest valid token with regexp lexer

The regexp lexer currently returns the first valid token (based on the order of the regexp alternatives), rather than the longest valid token.

For instance, the input source i32 is lexed as the two tokens "i" and "32" rather than a single and longer token `"i32", given the following lexer regexp

	def, err := lexer.Regexp(
		`(?P<Ident>[a-z_]+)` +
			`|(?P<IntType>i\d+)` +
			`|(\s+)` +
			`|(?P<Number>\d+)`,
	)

The same input source is lexed as `"i32" given the following lexer regexp:

	def, err := lexer.Regexp(
		`(?P<IntType>i\d+)` +
			`|(?P<Ident>[a-z_]+)` +
			`|(\s+)` +
			`|(?P<Number>\d+)`,
	)

Case insensitive literals

It wold be nice if literals could be matched case-insensitive. For now if I write

OrCondition struct {
	And []*Condition `@@ { "AND" @@ }`
}

the literal AND is matched with uppercase word AND only, but not and, which will be not matched.

Document concurrency guarantees

It doesn't seem that there is anywhere in the documentation telling whether it is okay to use a single parser object to parse concurrently in different goroutines.

Using chroma for lexers?

It would be awesome to have an adapter to use any chroma lexer as a lexer here. This would make creating quick parsers for known languages much straightforward.

Column count in parse error is wrong

Sample code to parse JavaScript-like === and == operators.

package main

import (
	"fmt"
	"os"

	"github.com/alecthomas/participle"
	"github.com/alecthomas/participle/lexer"
	"github.com/alecthomas/participle/lexer/ebnf"
)

type Prog struct {
	Left  string `@Integer`
	Op    string `(@EqEqEq | @EqEq)`
	Right int    `@Integer`
}

var parser = participle.MustBuild(
	&Prog{},
	participle.Lexer(lexer.Must(ebnf.New(`
				EqEqEq = "===" .
				EqEq = "==" .
				Integer = "0" | "1"โ€ฆ"9" { digit } .
				Whitespace = " " | "\t" | "\n" | "\r" .
				Punct = "!"โ€ฆ"/" | ":"โ€ฆ"@" | "["โ€ฆ`+"\"`\""+` | "{"โ€ฆ"~" .
				digit = "0"โ€ฆ"9" .
`))),
	participle.Elide("Whitespace"),
)

func Parse(src string) (*Prog, error) {
	var p Prog
	if err := parser.ParseString(src, &p); err != nil {
		return nil, err
	}
	return &p, nil
}

func main() {
	got, err := Parse("10 ==! 10")
	if err != nil {
		fmt.Printf("%s\n", err)
		os.Exit(1)
	}
	fmt.Printf("%v %v %v\n", got.Left, got.Op, got.Right)
}

Current behavior

<source>:1:9: unexpected "!" (expected <integer>)

Expected behavior

<source>:1:6: unexpected "!" (expected <integer>)

The column count 9 seems to be incorrect.

Multiline strings - what should I do?

I have got a string like this one:

Hey! How are you?

https://google.it

{
  "Click here!" - "https://github.com"
  "Don't click here!" - "https://facebook.com"
}
{
  "Hey!" - "https://twitter.com"
}

And I would like to get a struct like this:

{
    Text: "Hey! How are you?\n\nhttps://google.it"
    Keyboard: [
        [
            {
                Text: "Click here!"
                URL: "https://github.com"
            }
            {
                Text: "Don't click here!"
                URL: "https://facebook.com"
            }
        ]
        [
            {
                Text: "Hey!"
                URL: "https://twitter.com"
            }
        ]
    ]
}

I'm using this piece of code at the moment:

type Button struct {
	Text string `parser:"@String \"-\""`
	URL  string `parser:"@String"`
}

type Keyboard struct {
	Rows []*Row `parser:"{ @@ }"`
}

type Row struct {
	Buttons []*Button `parser:"\"{\" ( @@ )* \"}\""`
}

type Template struct {
	Text     string    `parser:"@@"`
	Keyboard *Keyboard `parser:"@@"`
}

Thank you so much!

Export EBNF

It would be very handy to be able to export an EBNF representation of the parsed grammar, for documentation purposes.

Request: streaming parser support

Hi,

I have relatively big compressed xml files with different schemas. And i used antlr with java for this purpose and worked well. I was wondering is participle also suitable for this ? For example can i iterate through particular xml elements while parsing?

Thanks

Is it possible to parse this (simpler) version of SQL Select with participle?

I am trying to use participle to parse a simpler subset of SQL Select. Like in PostgreSQL, the table alias does not need the "as" word - so all the following are valid for the version of SQL I want to parse:

select * from mytable m
select * from mytable as m where id > 1
select * from mytable m where id > 1
select * from mytable where id > 1

I started with the example in the repo - and modified it with the diff below. With that version, the last statement above is not parsed. Is there any way to solve this with participle? Is this grammar not left recursive?

diff --git a/main.go b/main.go
index 554c19a..5f53ecc 100644
--- a/main.go
+++ b/main.go
@@ -18,14 +18,9 @@ func (b *Boolean) Capture(values []string) error {
 
 // Select based on http://www.h2database.com/html/grammar.html
 type Select struct {
-       Top        *Term             `"SELECT" [ "TOP" @@ ]`
-       Distinct   bool              `[  @"DISTINCT"`
-       All        bool              ` | @"ALL" ]`
-       Expression *SelectExpression `@@`
+       Expression *SelectExpression `"SELECT" @@`
        From       *From             `"FROM" @@`
-       Limit      *Expression       `[ "LIMIT" @@ ]`
-       Offset     *Expression       `[ "OFFSET" @@ ]`
-       GroupBy    *Expression       `[ "GROUP" "BY" @@ ]`
+       Limit      *Value            `[ "LIMIT" @@ ]`
 }
 
 type From struct {
@@ -34,10 +29,8 @@ type From struct {
 }
 
 type TableExpression struct {
-       Table  string        `( @Ident { "." @Ident }`
-       Select *Select       `  | "(" @@ ")"`
-       Values []*Expression `  | "VALUES" "(" @@ { "," @@ } ")")`
-       As     string        `[ "AS" @Ident ]`
+       Table string `@Ident { "." @Ident }`
+       As    string `( "AS"? @Ident )?`
 }
 
 type SelectExpression struct {

Panic on empty disjunction node

I came across this one today.

When there is an empty node in a disjunction, it adds nil to the nodes slice, which causes the following panic because a == nil in the code below:
panic: runtime error: invalid memory address or nil pointer dereference

if value, err := a.Parse(branch, parent); err != nil {

Example:

type Foo struct {
	Field string `parser:"@( \"foo\" | | \"bar\" | \"baz\" )"`
}

I'm not sure if you would want that syntax to generate a build error or for the nil node to be ignored at runtime. I'd be happy to submit a PR for either option.

How to write all optional but at least one necessary

I'm implementing CSS Selector Grammer(https://www.w3.org/TR/selectors-4/#grammar) by participle.
But I'm stucked with to implement compound-selector.

<compound-selector> = [ <type-selector>? <subclass-selector>*
                        [ <pseudo-element-selector> <pseudo-class-selector>* ]* ]!

compound-selector expects at least one optional value by !. If this is not implemented, parser will go in infinite loop. Is it possible to implement this by participle?

Make tags to be reflect.StructTag.Get compatible

It would be useful to have possibility to use tag syntax compatible with reflect.StructTag.Get i.e.

type Cmds struct {
   Cmds []Cmd     `parse: "{ @@ }"`
}

It would not bother with other features based on tags. Nice, if it could be an option, parallel to current syntax.

Fix [ ] and { }

Neither currently work correctly. For example, this works:

@(("-" Number) | Number)

But this does not:

@(["-"] Number)

This should not be.

why not "==" works?

I'm trying to parse comparison node as following:

type pCmp struct {
	Left *pSum		`@@`
	Op string		`( @("==" | "!=" | ">=" | ">" | "<=" | "<" )`
	Right *pSum		`  @@ )?`
}

And find it failed to parse "1 == 2". So I searched this similar thing in the examples and find the example of expr2 has similar thing, but implemented by a different way:

type Comparison struct {
	Addition *Addition   `@@`
	Op       string      `[ @( ">" | ">" "=" | "<" | "<" "=" )`
	Next     *Comparison `  @@ ]`
}

So I change my code into this style, from things like "==" into "=" "=", and now it works for "1 == 2", but it also parse "1 = = 2" successfully, which is unexpected.

So, if "==" is invalid, then how should I write a parser which accept "1 == 2" but reject "1 = = 2"?

Go1.13 golang.org/cl/161199 changes text/scanner error message affecting lexer

Heads up, https://golang.org/cl/161199 updates text/scanner error messages and breaks the assumption in the following code --

https://github.com/alecthomas/participle/blob/master/lexer/text_scanner.go#L56

$ go test ./...
ok      github.com/alecthomas/participle        0.067s
--- FAIL: TestLexSingleString (0.00s)
    require.go:794:
                Error Trace:    text_scanner_test.go:35
                Error:          Received unexpected error:
                                <source>:1:14: invalid char literal
                Test:           TestLexSingleString
FAIL
FAIL    github.com/alecthomas/participle/lexer  0.033s
ok      github.com/alecthomas/participle/lexer/ebnf     0.045s
ok      github.com/alecthomas/participle/lexer/ebnf/internal    0.019s

I can send a PR to update the error message comparison or someone else can come up with a better way of not relying on the error message string.

Document Lexer Rules

Depending on the lexiconal order the EBNF rules, what is and is not a valid parse changes.

Now to a degree this makes sense, but most implementations resolve this ahead of time.

But for example if I have an EBNF like

(. changed to ; for github syntax highlighting)

num = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
Version = { num } [ "." { num } [ "." { num } ] ] ;

letter = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" ;
package_name = letter | num | "_" | "-" | "." ;
PackageID = letter { package_name } ;

This will correctly parse for @Version and @PackageID

But if I add the rule:

NameID = ID ":" ID ":"  Version ;

Anything tagged as @PackageID breaks because it doesn't contain a : character.

Using interface as type

Hei, first of all - thank you for creating such a great parser @alecthomas!

I wonder if it would be possible to support interfaces, e.g:

type Example struct {
  Value interface{} `@Anything`
}

Currently this returns an error: unsupported field type interface {} for field Anything

recover functions are not handling TypeAssertionError

During a recent crash participle was generating panics like this:

go test                                                                                                       
panic: interface conversion: interface {} is *runtime.TypeAssertionError, not string                                                                                                       
                                                                                                                                                                                        
goroutine 1 [running]:                                                                                                                                                                     
www.velocidex.com/golang/vfilter/vendor/github.com/alecthomas/participle.MustBuild(0x566660, 0xc42005e740, 0x5b83c0, 0xc42000a1c0, 0x1)                                                 
        /home/mic/projects/go/src/www.velocidex.com/golang/vfilter/vendor/github.com/alecthomas/participle/parser.go:23 +0x80                                                              
exit status 2                                                                                                                       

This is very hard to debug because the error was an interface conversion and the backtrace was not indicating where the crash actually occurred. It turns out that this is caused by judicious use of recover() in the code base which catches the panics and re-raises them. The interface conversion error occurs in two places:

func decorate(name string) {                                                                                                                                                               
        if msg := recover(); msg != nil {                                                                                                                                                  
                panic(name + ": " + msg.(string))    <---                                                                                                                                          
        }                                                                                                                                                                                  
}  

And

                defer func() {                                                                                                                                                             
                        if msg := recover(); msg != nil {                                                                                                                                  
                                panic(slexer.Field().Name + ": " + msg.(string))  <----                                                                                                           
                        }                                                                                                                                                                  
                }()                                                                                                                                                                        

So the actual error was caused by these error handlers trying to convert the message to string. I think the proper way in golang to get a string from an error is to call its Error() method.

Coming to golang from Python and C++ this seems like very poor practice - exception handlers should only ever catch the types of exceptions they are expecting and let other exception bubble up (For example in python catch all except Exception: is a very bad code smell). If you catch and re-raise all exceptions then this kills the backtrace and makes it hard to debug actual problems. Please consider at least capturing the backtrace at the error handler or only adding these in optionally.

Removing these error handlers gives much better panic message (It would have saved me a long time tracking down the actual problem :-).:

panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                 
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x51c940]                                                                                                                    
                                                                                                                                                                                        
goroutine 1 [running]:                                                                                                                                                                     
www.velocidex.com/golang/vfilter/vendor/github.com/alecthomas/participle.(*generatorContext).parseSequence(0xc42004dea0, 0xc42006c4b0, 0x1, 0x4295d0)                                   
        /home/mic/projects/go/src/www.velocidex.com/golang/vfilter/vendor/github.com/alecthomas/participle/grammar.go:92 +0x190                                                            
www.velocidex.com/golang/vfilter/vendor/github.com/alecthomas/participle.(*generatorContext).parseExpression(0xc42004dea0, 0xc42006c4b0, 0xc42006c4b0, 0xc4200aa098)                    
        /home/mic/projects/go/src/www.velocidex.com/golang/vfilter/vendor/github.com/alecthomas/participle/grammar.go:59 +0x84                                                             
www.velocidex.com/golang/vfilter/vendor/github.com/alecthomas/participle.(*generatorContext).parseType(0xc42004dea0, 0x5b8520, 0x5513a0, 0x0, 0x0)                                      
        /home/mic/projects/go/src/www.velocidex.com/golang/vfilter/vendor/github.com/alecthomas/participle/grammar.go:46 +0x2c7                               

Floats and Ints

I've a file which contains the lines like these

A paid $30.80 for snacks.
B paid $70 for house-cleaning.
C paid $63.50 for utilities.

And this is my grammar

type Grammar struct {
	Expenses []*Expense `{ @@ }`
}

type Expense struct {
	Name string `@Ident "paid"`
	Amount *Value `@@`
}

type Value struct {
	Float   *float64 `  "$" @Float {@Ident} "."`
	Integer *int     `| "$" @Int {@Ident} "."`
}

While parsing the file, I get expected ( Float:Float ) not "70".

What's missing? What am I doing wrong?

invalid token [ (hard bracket)

This might be a stupid question but here goes

im trying to parse PlainMusic as described here using the example of mary had a little lamb at the bottom of the page.

my code looks like this:

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/alecthomas/participle"
	"github.com/alecthomas/participle/lexer"
)

type PlainMusic struct {
	Instruments []Instrument `@@*`
}

type Instrument struct {
	Name          string `("[instrument: " @Ident "]")?`
	Clef          rune   `("[clef: " @Ident "]")?`
	Key           string `("[key: " @Ident "]")?`
	Transposition int    `("[transpose: " @Num "]")?`
}

var (
	myLexer = lexer.Must(lexer.Regexp(`
		(?P<Ident>[a-zA-Z][a-zA-Z0-9 ]*)
		(?P<Num>\-?\d*)
		| (?P<Comment>#.*(\r\n|\n))
	`))

	parser = participle.MustBuild(&PlainMusic{},
		participle.Lexer(myLexer),
		participle.Elide("Comment"),
	)
)

func main() {
	pm := &PlainMusic{}
	file, err := os.Open("mary had a little lamb.txt")
	defer file.Close()
	if err != nil {
		log.Fatal(err)
	}

	err = parser.Parse(file, pm)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(pm)
}

and when i run i get this error:
15:29:05 mary had a little lamb.txt:1:1: invalid token '['

what am i missing to make this work

Please release a new version

Hi, thanks for participle, I enjoy it a lot.

Given the latest improvements can be of great help (for me #62 was helpful), would you mind releasing a new version?

Thanks in advance

Broken examples

Example code inside _examples isn't compatible with the newer API which requires a lexer instance to be passed to the parser.Build()

while running this example error returned is

./ex.go:71:33: not enough arguments in call to participle.Build
	have (*TOML)
	want (interface {}, lexer.Definition)

Lexer panics for unterminated single quoted strings

Here's a working example, which causes a panic for me with the current version of Participle.

package main

import (
  "fmt"
  "github.com/alecthomas/participle"
)

type Field struct {
  Name  string `@String`
  Value string `":" @String`
}

type Dict struct {
  Fields []*Field `[ "{" [ @@ { "," @@ } ] "}" ]`
}

func Parse(src string) (out map[string]string, err error) {
  parser, err := participle.Build(&Dict{})
  d := &Dict{}
  err = parser.ParseString(src, d)
  if err != nil {
    return nil, err
  }
  out = make(map[string]string)
  for _, f := range d.Fields {
    out[f.Name] = f.Value
  }
  return out, nil
}

func main() {
  out, err := Parse("{ 'foo")
  if err != nil {
    fmt.Println(err)
      return
    }
  fmt.Println(out)
}
panic: <source>:1:7: literal not terminated

goroutine 1 [running]:
github.com/alecthomas/participle/lexer.Lex.func1(0xc00009e100, 0x4e5232, 0x16)
	/home/acb/dev/participle/lexer/text_scanner.go:56 +0x19c
text/scanner.(*Scanner).error(0xc00009e100, 0x4e5232, 0x16)
	/usr/lib/golang/src/text/scanner/scanner.go:327 +0x2be
text/scanner.(*Scanner).scanString(0xc00009e100, 0xc000000027, 0x495316)
	/usr/lib/golang/src/text/scanner/scanner.go:476 +0xae
text/scanner.(*Scanner).scanChar(0xc00009e100)
	/usr/lib/golang/src/text/scanner/scanner.go:501 +0x33
text/scanner.(*Scanner).Scan(0xc00009e100, 0x586a9b)
	/usr/lib/golang/src/text/scanner/scanner.go:599 +0x363
github.com/alecthomas/participle/lexer.(*textScannerLexer).Next(0xc00000a780, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/acb/dev/participle/lexer/text_scanner.go:89 +0x69
github.com/alecthomas/participle/lexer.(*lookaheadLexer).Next(0xc000080630, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/acb/dev/participle/lexer/peek.go:36 +0x161
github.com/alecthomas/participle.newRewinder(0x7fa34edfd048, 0xc000080630, 0x4c8900, 0x4c74a0, 0x595f00)
	/home/acb/dev/participle/context.go:84 +0x89
github.com/alecthomas/participle.newParseContext(0x7fa34edfd048, 0xc000080630, 0x1, 0xc000080660, 0xc000080630, 0x0, 0x0)
	/home/acb/dev/participle/context.go:25 +0x39
github.com/alecthomas/participle.(*Parser).Parse(0xc0000221e0, 0x4fbe00, 0xc00000a760, 0x4b8680, 0xc00000a740, 0xc00000e201, 0xc00000a740)
	/home/acb/dev/participle/parser.go:137 +0x43c
github.com/alecthomas/participle.(*Parser).ParseString(0xc0000221e0, 0x4e292e, 0x6, 0x4b8680, 0xc00000a740, 0xc0000221e0, 0x0)
	/home/acb/dev/participle/parser.go:218 +0x93
main.Parse(0x4e292e, 0x6, 0x404f30, 0xc00007e058, 0x0)
	/home/acb/dev/participle/_examples/bug_mwe/main.go:20 +0xbd
main.main()
	/home/acb/dev/participle/_examples/bug_mwe/main.go:32 +0x3a
exit status 2

This seems to fix the issue. Would you accept a pull request?

> git checkout lexer-panic
Switched to branch 'lexer-panic'
> go run _examples/bug_mwe/main.go
<source>:1:7: literal not terminated

Calculate follow-set automatically from multiple production alternatives

First of all, thanks a lot for sharing participle with the world! I felt very happy to discover what feels like a very novel approach to parser generation, parser libraries, etc.

I took a stab at trying to rewrite a grammar for LLVM IR to use participle, but reached the following stumbling block and thought I'd reach out and ask if it is intended by design (to keep the parser simple), or if I've done something wrong, or otherwise, if we could seek to resolve so that the follow set of a token is calculated from each present production alternatives. In this case, the follow set of "target" should be {"datalayout", "triple"}.

I wish to write the grammar as example 2, but have only gotten example 1 to work so far. Any ideas?

Cheers :)
/u

Input

LLVM IR input source:

source_filename = "foo.c"
target datalayout = "bar"
target triple = "baz"

Example 1

Grammar:

type Module struct {
	Decls []*Decl `{ @@ }`
}

type Decl struct {
	SourceFilename string      `  "source_filename" "=" @String`
	TargetSpec     *TargetSpec `| "target" @@`
}

type TargetSpec struct {
	DataLayout   string `  "datalayout" "=" @String`
	TargetTriple string `| "triple" "=" @String`
}

Example run:

u@x220 ~/D/g/s/g/m/low> low a.ll 
&main.Module{
    Decls: {
        &main.Decl{
            SourceFilename: "foo.c",
            TargetSpec:     (*main.TargetSpec)(nil),
        },
        &main.Decl{
            SourceFilename: "",
            TargetSpec:     &main.TargetSpec{DataLayout:"bar", TargetTriple:""},
        },
        &main.Decl{
            SourceFilename: "",
            TargetSpec:     &main.TargetSpec{DataLayout:"", TargetTriple:"baz"},
        },
    },
}

Example 2

Grammar:

type Module struct {
	Decls []*Decl `{ @@ }`
}

type Decl struct {
	SourceFilename string `  "source_filename" "=" @String`
	DataLayout     string `| "target" "datalayout" "=" @String`
	TargetTriple   string `| "target" "triple" "=" @String`
}

Example run:

u@x220 ~/D/g/s/g/m/low> low a.ll
2017/08/27 21:15:38 a.ll:3:7: expected ( "datalayout" ) not "triple"

Error parsing into slice of Parseable

I've run into an issue trying to parse into a slice of Parseable.

In the code below, the checks are against rt, but if you change them to t, then it works as expected. I have tested this with a Parseable string type and a Parseable struct type, as well as running the full test suite, and everything passes. Given the explicit rt := t line though, I figured there might be some history to check with you on before submitting a PR.

participle/grammar.go

Lines 26 to 37 in ed80074

func (g *generatorContext) parseType(t reflect.Type) (_ node, returnedError error) {
rt := t
t = indirectType(t)
if n, ok := g.typeNodes[t]; ok {
return n, nil
}
if rt.Implements(parseableType) {
return &parseable{rt.Elem()}, nil
}
if reflect.PtrTo(rt).Implements(parseableType) {
return &parseable{rt}, nil
}

Test Code

package main

import (
	"fmt"
	"strconv"

	"github.com/alecthomas/participle"
	"github.com/alecthomas/participle/lexer"
)

type Bar struct {
	A int
}

func (x *Bar) Parse(lex lexer.PeekingLexer) error {
	token, err := lex.Next()
	if err != nil {
		return err
	}
	x.A, err = strconv.Atoi(token.Value)
	return err
}

type Foo struct {
	Bars []Bar `parser:"@@+"`
}

func main() {
	foo := &Foo{}
	parser := participle.MustBuild(foo)

	err := parser.ParseString("1 2 3", foo)
	fmt.Printf("Got error=%v value=%+v", err, foo)
}

Results

Expected

Got error=<nil> value=&{Bars:[{A:1} {A:2} {A:3}]}

Actual

panic: Bars: can not parse into empty struct main.Bar

Proposal: Order independent parsing

I stumbled across a minor issue which is might worth further discussion.

I think it would be good to have a possibility to declare substructs as order independent.

Use Case

I have a language which is influenced by hcl, but has support for nested blocks with special names, the idea is to support an arbitrary order for those sub-blocks, but have the blocks only being generated either zero or one time.

Solution proposal

Introducing a new parser primitive like { } for repetition (e.g. using <>) which defines a Set of arguments, which can be parsed order independent.

Use tag on structure

I'm trying to use encoding/json with it's tags to change attributes' name of my structure. But the issue is that "json token is not recognized".
How could I escape json tag ?

Stack overflow in recursive definitions

Participle generates a nice error message when it fails to parse the input. However, an infinite recursion may occur when converting a node to a string if the node is recursively defined.

Here is an example to reproduce it:

package main

import (
	"strings"

	"github.com/alecthomas/participle"
)

type Grammar struct {
	Expr *Expr `"START" @@`
}

type Expr struct {
	Identifier *string `  @Ident`
	SubExpr    *Expr   `| "(" @@ ")"`
}

func main() {
	parser := participle.MustBuild(&Grammar{}, nil)

	root := &Grammar{}

	// BOOM!
	parser.Parse(strings.NewReader(`START 1`), root)
}

Production orders and weird bug

Hi,

I'm currently having some weird issues on my current project.
I have reduced the problem to this extremly simple nominal use case (which still doesn't work) :

package main

import (
	"log"

	"github.com/alecthomas/participle"
	"github.com/alecthomas/participle/lexer"
	"github.com/alecthomas/participle/lexer/ebnf"
	"github.com/alecthomas/repr"
)

type Query struct {
	Number string ` @MyNumber `
}

func main() {
	queryLexer := lexer.Must(ebnf.New(`
		MyLabel = "0" .
		MyNumber = "0" .
	`))
	queryParser := participle.MustBuild(
		&Query{},
		participle.Lexer(queryLexer),
	)

	ast := &Query{}
	err := queryParser.ParseString("0", ast)
	if err != nil {
		log.Println(err)
	} else {
		repr.Println(ast)
	}
}

On execution :

$ go run main.go
2018/11/18 18:26:57 <source>:1:1: expected <mynumber> but got "0"

The weird thing is, when you swap MyLabel and MyNumber declaration in ebnf, it works.

	queryLexer := lexer.Must(ebnf.New(`
		MyNumber = "0" .
		MyLabel = "0" .
	`))
$ go run main.go
&main.Query{
  Number: "0",
}

This obviously leads to a lot more annoying issues when processing larger and more complex ebnf grammar.
Am I missing something about participle usage or ebnf declaration ?

Best Regards,

Multiline-Value

Hi. I've a textfile with this structure:

Name            foo
Surname         bar
Text            Lorem ipsum dolor sit amet, consetetur sadipscing eli
                tr, sed diam nonumy eirmod tempor invidunt ut labore 
                et dolore magna aliquyam erat, sed diam voluptua. At 
                vero eos et accusam et justo duo dolores et ea rebum

Name            bar
Surname         foo
Text            Stet clita kasd gubergren, no sea takimata sanctus es
                t Lorem ipsum dolor sit amet. Lorem ipsum dolor sit a
                met, consetetur sadipscing elitr, sed diam nonumy eir
                mod tempor invidunt ut labore et dolore magna aliquya
                m erat, sed diam voluptua.

Any ideas how to parse this file?
Thank you!

Document how to parse indentation based grammars

Hi I want to check if the prevoius char is a :. The exact example code I have is this:

Prefix: !

I want basicly to be able to check based on if the previous char is a : the ! should get a string and later reuse the look back logic to check if I am inside an array which is defined like in yaml meaning to make sure the last 2 lines of the following example don't get into the array:

Commands:
    - Trigger: afk
      Response: "{{user}} just went afk"
    - Trigger: [join_event]
      Response: "Welcome to the Room {{last_joined_user}}"
Homeserver: matrix.org
HomeRoom: {{name}}

(Commands defines a Array with 2 objects in it which both have two key value pairs in them. HomeServer and HomeRoom are outside of that array)

reflect.Value.Addr of unaddressable value

I'm trying to write a very simple SQL parser. I wanted to start from scratch and I am looking at the sql example. I'm trying to parse this and keep getting a panic.

The issue seems to be when I want to parse 0 or more commas such as select a,b from c

Any ideas would be appreciated! This is a great library.

panic: reflect.Value.Addr of unaddressable value [recovered]
	panic: reflect.Value.Addr of unaddressable value [recovered]
	panic: reflect.Value.Addr of unaddressable value [recovered]
	panic: reflect.Value.Addr of unaddressable value

Test

func TestSelect(t *testing.T) {
	sql := "select a from b"
	query := &Query{}
	err := Parser.ParseString(sql, query)
	if err != nil {
		t.Error(err)
	}
	repr.Println(query, repr.Indent("  "), repr.OmitEmpty(true))
}

Query:


var (
	sqlLexer = lexer.Must(lexer.Regexp(`(\s+)` +
		`|(?P<Keyword>(?i)SELECT|FROM|INSERT|WHERE|INTO|TRUE|FALSE|NULL)` +
		`|(?P<Ident>[a-zA-Z_][a-zA-Z0-9_]*)` +
		`|(?P<Number>[-+]?\d*\.?\d+([eE][-+]?\d+)?)` +
		`|(?P<String>'[^']*'|"[^"]*")` +
		`|(?P<Operators><>|!=|<=|>=|[-+*/%,.()=<>])`,
	))
	Parser = participle.MustBuild(
		&Query{},
		participle.Lexer(sqlLexer),
		participle.Unquote(sqlLexer, "String"),
		participle.Upper(sqlLexer, "Keyword"),
		// Need to solve left recursion detection first, if possible.
		// participle.UseLookahead(),
	)
)

type Query struct {
	Select *Select `"SELECT" @@`
}

type Select struct {
	Columns   []*string `@Ident { "," @Ident }`
	TableName string    `"FROM" @Ident`
}

Multiple possible token types

Hello, and thank you for your work :)

I faced some interesting problem while using participle to parse some (well, to be honest, poorly designed) grammar - the thing is that some tokens may be treated differently - for example, as keyword and as identifier.

Simplest example is SQL parser from _examples: it cannot parse "SELECT * FROM select", because table name "select" was identified by lexer as keyword, not ident. Of course that's invalid SQL and table name in this case should be quoted as `select`, but anyway - in my case grammar does not have that quotes.

I doubt what would be the best solution here, the only idea I got is allow multiple Lexer's for Parser - if parser is failing to parse with one lexer, try with another etc.

What do you think? Probably, there is no plans to support anything except CFG grammars, so there is no such problem at all.

check all possibilities before parsing fails

Hey, love the library but in trying to build a query parser and am hitting an interesting issue. Trying to parse "key":"value" as different from "this phrase" seems problematic.

// Term holds the different possible terms
type Term struct {
	KV            *KV         ` @@ `
	Text          *string     `| @String `
	Subexpression *Expression `| "(" @@ ")"`
}

// KV represents a json kv
type KV struct {
	Key   *Key   `@@`
	Value *Value `@@`
}

// Key holds the possible key types for a kv
type Key struct {
	Ident *string `@Ident ":"`
	Str   *string `| @String ":"`
}

// Value holds the possible values for a kv
type Value struct {
	Bool  *Bool    `@("true"|"false")`
	Str   *string  `| @String`
	Ident *string  `| @Ident`
	Int   *int64   `| @Int`
	Float *float64 `| @Float`
}

If in the Term struct I put KV first, it will fail because it is looking to find the ":" operator when passed a text field. If the I put Text first it will fail because it finds the ":" operator on a term kv field. The Text and KV fields are in | statements, would it be possible to check that all the | fail before erring? Or am I missing something with the implementation?

Incorrect type when marshaling slice

First off, thank you for a great library!

I ran into a small issue trying to marshal into a slice field with a type alias.

type Identifier string

type Foo struct {
	Identifiers []Identifier `parser:"@Ident+"`
}

panic: reflect.Set: value of type string is not assignable to type foo.Identifier

It looks like the code checks that the kind is the same, but does not check the type:

participle/nodes.go

Lines 403 to 406 in 55fa451

if v.Kind() == t.Kind() {
out = append(out, v)
continue
}

I was able to make the following change and it seems to work as expected now:

if v.Kind() == t.Kind() {
	if v.Type() != t {
		v = v.Convert(t)
	}
	out = append(out, v)
	continue
}

Possible bug in parser

I'm trying to write a SQL Select parser based on the sql example here, but I seem to be encountering a bug - here is a more-or-less minimal program that demonstrates the problem:

// nolint: govet
package main

import (
	"fmt"
	"strings"

	"github.com/alecthomas/participle"
	"github.com/alecthomas/participle/lexer"
	"github.com/alecthomas/repr"
)

type Boolean bool

func (b *Boolean) Capture(values []string) error {
	*b = strings.ToLower(values[0]) == "true"
	return nil
}

type LiteralString string

func (ls *LiteralString) Capture(values []string) error {
	// Remove enclosing single quote
	n := len(values[0])
	r := values[0][1 : n-1]
	// Translate doubled quotes
	*ls = LiteralString(strings.Replace(r, "''", "'", -1))
	return nil
}

type PrimaryTerm struct {
	Value    *Value    `  @@`
	Var      *string   `| @Ident`
	FuncCall *FuncExpr `| @@`
}

type FuncExpr struct {
	FunctionName string     ` @( "AVG" | "COUNT" | "MAX" | "MIN" | "SUM" |  "COALESCE" | "NULLIF" | "CAST" | "DATE_ADD" | "DATE_DIFF" | "EXTRACT" | "TO_STRING" | "TO_TIMESTAMP" | "UTCNOW" | "CHAR_LENGTH" | "CHARACTER_LENGTH" | "LOWER" | "SUBSTRING" | "TRIM" | "UPPER")`
	ArgsList     []*FuncArg `"(" @@ ("," @@)* ")"`
}

type FuncArg struct {
	StarArg    bool        `  @"*"`
	ExprArg    *ExprOrCast `| @@`
	ExtractArg *ExtractArg `| @@`
}

type ExprOrCast struct {
	Expr     *PrimaryTerm ` @@`
	CastType string       `("AS" @( "BOOL" | "INT" | "INTEGER" | "STRING" | "FLOAT" | "DECIMAL" | "NUMERIC" | "TIMESTAMP" ))?`
}

type ExtractArg struct {
	ItemToExtract string         `@( "YEAR" | "MONTH" | "DAY" | "HOUR" | "MINUTE" | "SECOND" | "TIMEZONE_HOUR" | "TIMEZONE_MINUTE" )`
	FromTimestamp *LiteralString `"FROM" @LitString`
}

type Value struct {
	Number  *float64       `(  @Number`
	String  *LiteralString ` | @LitString`
	Boolean *Boolean       ` | @("TRUE" | "FALSE")`
	Null    bool           ` | @"NULL")`
}

var (
	sqlLexer = lexer.Must(lexer.Regexp(`(\s+)` +
		`|(?P<Keyword>(?i)SELECT|FROM|TOP|DISTINCT|ALL|WHERE|GROUP|BY|HAVING|UNION|MINUS|EXCEPT|INTERSECT|ORDER|LIMIT|OFFSET|TRUE|FALSE|NULL|IS|NOT|ANY|SOME|BETWEEN|AND|OR|LIKE|AS|IN|BOOL|INT|INTEGER|STRING|FLOAT|DECIMAL|NUMERIC|TIMESTAMP|YEAR|MONTH|DAY|HOUR|MINUTE|SECOND|TIMEZONE_HOUR|TIMEZONE_MINUTE|AVG|COUNT|MAX|MIN|SUM|COALESCE|NULLIF|CAST|DATE_ADD|DATE_DIFF|EXTRACT|TO_STRING|TO_TIMESTAMP|UTCNOW|CHAR_LENGTH|CHARACTER_LENGTH|LOWER|SUBSTRING|TRIM|UPPER)` +
		`|(?P<Ident>[a-zA-Z_][a-zA-Z0-9_]*)` +
		`|(?P<QuotIdent>"([^"]*("")?)*")` +
		`|(?P<Number>\d*\.?\d+([eE][-+]?\d+)?)` +
		`|(?P<LitString>'([^']*('')?)*')` +
		`|(?P<Operators><>|!=|<=|>=|\.\*|\[\*\]|[-+*/%,.()=<>\[\]])`,
	))
)

func main() {
	var fex FuncExpr
	p := participle.MustBuild(
		&FuncExpr{},
		participle.Lexer(sqlLexer),
		participle.CaseInsensitive("Keyword"),
	)

	validCases := []string{
		"cast('2' as decimal)",
		"cast('2' as string)",
		"cast('2' as numeric)",
		"cast('2' as timestamp)",
		"cast('2' as bool)",
		// Cases below fail!!
		"cast('2' as int)",
		"cast('2' as integer)",
	}
	for i, tc := range validCases {
		err := p.ParseString(tc, &fex)
		if err != nil {
			fmt.Printf("ERROR: %d: %v\n", i, err)
			continue
		}
		repr.Println(fex, repr.Indent("  "), repr.OmitEmpty(true))
	}

}

The program tests the parser for FuncExpr productions (to parse function invocations in SQL).

The last two cases fail mysteriously:

...
ERROR: 5: <source>:1:10: unexpected "as" (expected ")")
ERROR: 6: <source>:1:10: unexpected "as" (expected ")")

Is it a bug or is something wrong with the program above?

Proposal: Support grammar aliases.

In moderately complex grammars it's fairly common to see duplicate patterns emerge. For example, when matching a dot-separated identifier (eg. foo.bar) the pattern (Ident { "." Ident }) is used repeatedly. This can be handled by a Go type alias implementing the Parseable interface, but that is quite onerous.

I propose adding support for grammar aliases. Here's an example creating and using an Identifier alias:

type Assignment struct {
  Name string `@Identifier=(Ident { "." Ident })`
  Variable string `"=" @Identifier`
}

AST until error

Currently, if an error occurs parsing a string, no AST will be returned. Is it possible to return the AST until an error occured? How much work do you think it would take to implement this?

I want to use my grammer to provide suggestions for the next token based on the current context.

Support for sub-lexers

To support more complex languages, it should be possible to elegantly define stateful lexers.

Ideally this would support:

  • "here docs", eg. cat << EOF\nEOF (where EOF is a user-defined marker).
  • Runtime selected sub-lexers, ala markdown's ```<language> blocks - this would defer to some external function in order to set the lexer based on <language>.
  • Recursive lexers for eg. string interpolation, "${"${var}"}" - in this situation a new lexer is pushed onto the state when ${ is encountered, and popped when } is encountered.

My hunch is that some kind of "stateful EBNF" could work, but it would need to be extensible programatically, and it's not clear exactly how this would be expressed.

Parsing comments in file (eg. // or /* */ ) is not supported

I have found a problem while trying to parse the comments out of a thrift file into the AST and it does not appear to be possible with the current implementation.

I traced it down to the default lexer using the scanner.Mode = SkipComments. Is it possible to set the Mode to use ScanComments and then make a test that allows using the @comment type.

Here is the snippet I was trying to use for my tests

/* comment test */
// testing a comment
enum TweetType {
    TWEET
    RETWEET = 2
    DM = 3
    REPLY
}
type Comment struct {
	Message string `@Comment`
}

type Thrift struct {
	Includes   []string     `{ "include" @String`
	Comments   []*Comment   `  | @@`
	Namespaces []*Namespace `  | @@`
	Structs    []*Struct    `  | @@`
	Exceptions []*Exception `  | @@`
	Services   []*Service   `  | @@`
	Enums      []*Enum      `  | @@`
	Typedefs   []*Typedef   `  | @@`
	Consts     []*Const     `  | @@ }`
}

My eventual goal is to make sure the comments carry over when running code generation into the native language output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.