This is a parser combinator library for Dotty. Since Dotty is still under active development, the language features and APIs are still unstable. As a result, this library offers no guarantees on backward compatibility for each minor releases. In addition, the library would try to catch up with each release of Dotty.
This project focuses on simplicity over performance. It has a flexible API and is powerful enough for typical use cases in implementing a programming language. It also has good support for error reporting.
It's not production ready yet. I created this library since I haven't found a parser library that works with Dotty. I use this for my personal projects and would try my best to fix bugs and improve performance along the way. Anyone is welcome to contribute, either by raising issues or sending PRs. I will try to address them as soon as possible.
The library takes inspiration from parsec and Lightyear.
TODO: add this section after this is published.
Import the following where this library is needed.
import io.github.tgeng.parse._
import io.github.tgeng.parse.string.{_, given _}
-
io.github.tgeng.parse
: the coreParserT[I, T]
trait and various generic combinators. -
io.github.tgeng.parse.string
: string parser traitParser[T]
, which is a type alias to `ParserT[Char, T], and other combinators that are specific to parsing strings.
For the sake of simplicity, assuming we are parsing strings. Essentially we
want to use this library to build a Parser[T]
object that parses a string and
output a result of type T
. For example,
scala> val positiveInt = "[0-9]+".rp.map(_.toInt)
val positiveInt: io.github.tgeng.parse.ParserT[Char, Int] = Parser{/[0-9]+/}
scala> positiveInt.parse("123")
val res0: Either[io.github.tgeng.parse.ParserError[Char], Int] = Right(123)
Assuming we have the following data classes representing JSON values.
enum JValue {
case JNull
case JBoolean(value: Boolean)
case JNumber(value: Double)
case JString(value: String)
case JArray(value: Vector[JValue])
case JObject(value: Map[String, JValue])
}
Below is a simple parser that converts a JSON string to a JValue
.
// ┌ a macro that names the parser according to the enclosing definition, For example, in
// | this case, the created parser is named "<jNull>". `S` in `PS` is for strong name,
// | which means the `jNull` parser will not report its internals in an error message. TO
// | allow reporting internals, use `P` instead, as shown below with `jArray`, `jObject`,
// | and `jObjectEntry`.
// |
// | ┌───── as ────┐
// | ┌─── >> ───┐ │
val jNull = PS { 'n' >>! "ull" as JNull }
// │ │ │
// │ │ └ give it a intuitive name so the error message is
// │ │ easier to understand
// │ │
// │ └ convert the string parser returning "ull" to a parser returning
// │ `JNull`
// │
// └ throw away result from the first parser, which matches 'n', and return
// the result of the second parser, which, in this case, returns "ull". In
// addition, commit right after seeing 'n' to speed up parsing failure in
// case 'n' is followed by things other than "ull". Without committing, the
// parser would keep trying <jBoolean>, <jNumber>, and so on.
//
//
// ┌ if matching "true" fails, try the following
// │ to match false
// │
def jBoolean = ('t' >>! "rue" as JBoolean(true)) |
('f' >>! "alse" as JBoolean(false)) withName "<jBoolean>"
// |
// └ one could name the parser explicitly like so
// without the `P` macro as well
val jNumber = PS { commitAfter(double.map(JNumber(_))) }
// │
// └ similar to `as`, but it consumes the result from the
// double parser
def jString = PS { quoted().map(JString(_)) }
def jArray : Parser[JValue] =
P { '[' >>! (jValue sepBy ',').map(JArray(_)) << commitAfter(']') }
// │
// └ matches `JValue` objects separated by `,` zero or more times and
// returns the matched `JValue`s inside a `Vector`
val jObjectKey = PS { whitespaces >> quoted() << whitespaces }
def jObjectEntry : Parser[(String, JValue)] =
P { lift(jObjectKey << commitAfter(":"), jValue) }
// │
// └ combines two parsers `jObjectKey << ":"` and `jValue` and produce a parser that returns a
// tuple containing the parsed key string and `JValue` object.
def jObject : Parser[JValue] = P {
'{' >>!
(jObjectEntry sepBy ',').map(c => JObject(c.toMap))
<< commitAfter('}')
}
def jValue : Parser[JValue] = P {
whitespaces >>
(jNull | jBoolean | jNumber | jString | jArray | jObject)
<< whitespaces
}
jValue.parse("""[1, "a", {}]""") // JArray([JNumber(1), JString("a"), JObject({})])
Scala Char
, String
, and Regex
can be implicitly converted to parsers
that matches things intuitively. In addition to implicit conversion,
extension methods rp
(regex parser) and rpm
(regex parser
outputing Match object) are defined on String
s to convert the string to
a regex parser that output the matched String
and
scala.util.matching.Regex.Match
, respectively.
scala> import scala.language.implicitConversions
scala> val p1 : Parser[String] = "abc"
val p1: io.github.tgeng.parse.string.Parser[String] = Parser{"abc"}
scala> p1.parse("abc")
val res1: Either[io.github.tgeng.parse.ParserError[Char], String] = Right(abc)
scala> p1.parse("def")
val res2: Either[io.github.tgeng.parse.ParserError[Char], String] = Left(0: "abc")
scala> val p2 : Parser[String] = "[0-9]".rp
val p2: io.github.tgeng.parse.string.Parser[String] = Parser{/[0-9]/}
scala> p2.parse("123")
val res3: Either[io.github.tgeng.parse.ParserError[Char], String] = Right(1)
scala> p2.parse("abc")
val res4: Either[io.github.tgeng.parse.ParserError[Char], String] = Left(0: /[0-9]/)
scala> val p3 : Parser[Char] = 'x'
val p3: io.github.tgeng.parse.string.Parser[Char] = Parser{'x'}
scala> p3.parse("x")
val res5: Either[io.github.tgeng.parse.ParserError[Char], Char] = Right(x)
scala> p3.parse("y")
val res6: Either[io.github.tgeng.parse.ParserError[Char], Char] = Left(0: 'x')
Many common combinators are provided. To see all of them please refer to core.scala and extension.scala.
For more examples, please refer to
ParserTest.scala
.
Dotty | dotty-parser-combinators |
---|---|
0.23-RC1 | 0.1.0 |
0.24-RC1 | 0.1.1+ |
0.24-RC1 | 0.2.0-8 |
0.25-RC2 | 0.2.8+ |