systemincloud / rly Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 5.0 348 KB

License: Other

R 99.72% C 0.24% BASIC 0.04%

rly's People

Contributors

Stargazers

Watchers

Forkers

caprice-j henricowitvliet billdenney hrbrmstr jontidswell

rly's Issues

Passing Variables between Tokenizing/Parsing Rules

It it possible to pass variables between t_* and p_* functions?
Currently I've just defined an environment globally and access it, but other ways might be much cleaner.

dynamic regex in function

I like this library. But I think I'm using it in a wrong way. I try to build a dynamic parser. I build a regular expression using the labels I expect, and I try to use that variable as the re-argument for a lexer token:

reLABEL <- '(label1|label2|....'
....
lexer <- R6Class("Lexer",
  ....
  t_LABEL = function(re = reLABEL, t) {

In the function get_regex the value is not a string, but is of class "name" and has the value reLABEL.

It seems that the following change works for me:

get_regex = function(func) {
  val <- formals(func)[['re']]
  if (!is(val, 'character')) {
    val <- eval(val, parent.env(environment(func)))
  }
  return(val)
}

Do you know a better way to use variables containing the string with the regular expression?

Error with Environment?

When running the code below, I get an error about trying to index into an environment. I think that a safety check is needed in the p$callable() function, but I'm not sure what.

library(rly)

TOKENS <- c("MONTH", "WEEK", "DAY", "HOUR", "MINUTE", "SECOND",
            "INTEGER", "PREPOST")
LITERALS <- c(".", "-")

Lexer <- R6::R6Class(
  "Lexer",
  public=list(
    tokens=TOKENS,
    literals=LITERALS,
    t_MONTH="(?:mon(?:th)?|MON(?:TH)?)",
    t_WEEK="(?:W(?:EE)?K|w(?:ee)?k)",
    t_DAY="(?:D(?:A)?Y|d(?:a)?y)",
    t_HOUR="(?:H(?:(?:OU)?R)?|h(?:(?:ou)?r)?)",
    t_MINUTE="(?:M(?:IN(?:UTE)?)?|m(?:in(?:ute)?)?)",
    t_SECOND="(?:S(?:EC(?:OND)?)?|s(?:ec(?:ond)?)?)",
    t_INTEGER="[0-9]+",
    
    t_ignore=" \t",
    t_error=function(t) {
      cat(sprintf("Illegal character '%s'", t$value[1]))
      t$lexer$skip(1)
      return(t)
    }
  )
)

Parser <- R6::R6Class(
  "Parser",
  public = list(
    tokens = TOKENS,
    literals = LITERALS,
    # Parsing rules
    precedence = list(),
    # dictionary of names
    names = new.env(hash=TRUE),
    p_valueunit_prepost=function(doc="valueunit : valueunit PREPOST", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_valueunit_base=function(doc="valueunit : value timeunit
                                        | timeunit value", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_value_negative=function(doc="value : '-' value", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_value_float=function(doc="value : INTEGER '.' INTEGER", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_value_integer=function(doc="value : INTEGER", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_time_unit=function(doc="timeunit : MONTH
                                       | WEEK
                                       | DAY
                                       | HOUR
                                       | MINUTE
                                       | SECOND", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_error = function(p) {
      if(is.null(p)) cat("Syntax error at EOF")
      else           cat(sprintf("Syntax error at '%s'", p$value))
    }
  )
)

lexer  <- rly::lex(Lexer)
parser <- rly::yacc(Parser)

parser$parse("15min", lexer)
#> Error in value[[3L]](cond): wrong arguments for subsetting an environment

Created on 2018-12-08 by the reprex package (v0.2.0).

Input of example

I saw the new line expression was set in "Calculator Example"
But when I use parser$parse(input = "a=2+3 \n b=a/2 ",lexer = lexer)
It seems not work .
And I also try

 parser$parse(input = "a=2+3 
                       b=a/2 ",lexer = lexer)

It seems not work either.

Possible to get a return value?

Is it possible to get some form of return value from the parser? Specifically, I'm trying to make a parser to parse out the parts of a date. I'd like to have the return value provide a list with the year, month, day, hour, minute, and second. I'm writing this as a parser instead of a regexp because I'm trying to match the ISO 8601 standard which has many variants of the types of dates that are allowed.

In the example below, I'd like to access self$values after parsing.

#install.packages("rly")
library(rly)

TOKENS <- "DIGIT"
LITERALS <- c(".", ",", "-", "T", "W")

Lexer <-
  R6::R6Class(
    "Lexer",
    public=list(
      tokens=TOKENS,
      literals=LITERALS,
      t_DIGIT="[0-9]",
      #t_ignore = " \t",
      t_newline = function(re='\\n+', t) {
        t$lexer$lineno <- t$lexer$lineno + nchar(t$value)
        return(NULL)
      },
      t_error = function(t) {
        cat(sprintf("Illegal character '%s'", t$value[1]))
        t$lexer$skip(1)
        return(t)
      }
    )
  )

Parser <-
  R6::R6Class(
    "Parser",
    public=list(
      tokens=TOKENS,
      literals=LITERALS,
      values=list(),
      p_date_year_month_day=function(doc="date_year_month_day : date_year_month '-' DIGIT DIGIT", p) {
        message("DAY")
        daynum <- as.numeric(paste(sapply(4:5, p$get), collapse=""))
        stopifnot(0 < daynum & daynum < 13)
        self$values$DAY <- daynum
        self$values
      },
      p_date_year_month=function(doc="date_year_month : date_year '-' DIGIT DIGIT", p) {
        message("MONTH")
        monthnum <- as.numeric(paste(sapply(4:5, p$get), collapse=""))
        stopifnot(0 < monthnum & monthnum < 13)
        self$values$MONTH <- monthnum
        self$values
      },
      p_date_year=function(doc="date_year : DIGIT DIGIT DIGIT DIGIT", p) {
        message("YEAR")
        self$values$YEAR <- as.numeric(paste(sapply(2:5, p$get), collapse=""))
        self$values
      },
      p_error = function(p) {
        if(is.null(p)) cat("Syntax error at EOF")
        else           cat(sprintf("Syntax error at '%s'", p$value))
      }
    )
  )

lexer  <- rly::lex(Lexer)
parser <- rly::yacc(Parser)

# x <- c("2020-11-01", "2020-11", "2020")
# lexer$input("2020-11-01")

parser$parse("2020-11-01", lexer)

Potential Precedence Issue

I'm still working on the ISO 8601 parser mentioned in #15.

I've made some good progress, but there is an issue with an unambiguous, but initially multiply-matching rule. In the example below, I expected the parser to do the following:

Find digit4 (it does that correctly)
Assign the digit4 to yearnum (it does that correctly)
Find digit2 (it doesn't do that)
Assign the digit2 to monthnum
Find digit2
Assign the digit2 to mdaynum

I'm not sure why it's not finding digit2 and it is finding digit3. It should only find digit3 if that is the end of the string. Below is the code, and the issue is with the first call to parser::parse().

library(rly)

TOKENS <- c("DIGIT", "DECIMALPOINT")
LITERALS <- c("W", "Z", "Q", "W", "T", ":", "-")

p_collapse <- function(x, p) {
  paste0(sapply(X=x, FUN=p$get), collapse="")
}

set_value <- function(p) {
  ret <- list()
  for (idx in (1 + seq_len(p$length() - 1))) {
    current <- p$get(idx)
    for (nm in names(current)) {
      if (nm %in% names(ret)) {
        if (ret[[nm]] != current[[nm]]) {
          print(ret)
          print(current)
          stop(sprintf("mismatch with %s: %s vs %s", nm, ret[[nm]], current[[nm]]))
        }
      } else {
        ret[[nm]] <- current[[nm]]
      }
    }
  }
  ret
}

# Lexer ####

Lexer <-
  R6::R6Class(
    "Lexer",
    public=list(
      tokens=TOKENS,
      literals=LITERALS,
      t_DIGIT="[0-9]",
      t_DECIMALPOINT="[\\.,]",
      #t_ignore = " \t",
      t_newline = function(re='\\n+', t) {
        t$lexer$lineno <- t$lexer$lineno + nchar(t$value)
        return(NULL)
      },
      t_error = function(t) {
        cat(sprintf("Illegal character '%s'", t$value[1]))
        t$lexer$skip(1)
        return(t)
      }
    )
  )

# General parser support functions ####

l_parser_general <-
  list(
    tokens=TOKENS,
    literals=LITERALS,

    ## Helpers ####
    p_fraction=function(doc="fraction : DECIMALPOINT multi_digit", p) {
      part <- "fraction"
      message(part)
      p$set(1, list(fraction=p$get(3)))
    },
    p_multi_digit=function(doc="multi_digit : DIGIT
                                            | digit2
                                            | digit3
                                            | digit4", p) {
      part <- "multi_digit"
      message(part)
      p$set(1, p$get(2))
    },
    p_digit4=function(doc="digit4 : digit3 DIGIT", p) {
      part <- "digit4"
      message(part)
      p$set(1, p_collapse(2:3, p))
    },
    p_digit3=function(doc="digit3 : digit2 DIGIT", p) {
      part <- "digit3"
      message(part)
      p$set(1, p_collapse(2:3, p))
    },
    p_digit2=function(doc="digit2 : DIGIT DIGIT", p) {
      part <- "digit2"
      message(part)
      p$set(1, p_collapse(2:3, p))
    },
    p_basic=function(doc="basic : ", p) {
      p$set(1, list(iso_8601_format="basic"))
    },
    p_error = function(p) {
      if(is.null(p)) {
        cat("Syntax error at EOF")
      } else {
        cat(sprintf(
          "Syntax error at '%s'\n%s\n%s^",
          p$value, p$lexer$lexdata, strrep(' ', p$lexpos - 1)
        ))
      }
    }
  )

# Specific numbers ####

l_specific_numbers <-
  list(
    p_yearnum=function(doc="yearnum : digit4", p) {
      part <- "yearnum"
      message(part)
      p$set(1, list(year=p$get(2)))
    },
    p_monthnum=function(doc="monthnum : digit2", p) {
      part <- "monthnum"
      message(part)
      p$set(1, list(month=p$get(2)))
    },
    p_mdaynum=function(doc="mdaynum : digit2", p) {
      part <- "mdaynum"
      message(part)
      p$set(1, list(mday=p$get(2)))
    },
    p_weeknum=function(doc="weeknum : digit2", p) {
      part <- "weeknum"
      message(part)
      p$set(1, list(week=p$get(2)))
    },
    p_weekdaynum=function(doc="weekdaynum : DIGIT", p) {
      part <- "weekdaynum"
      message(part)
      p$set(1, list(weekday=p$get(2)))
    },
    p_odaynum=function(doc="odaynum : digit3", p) {
      part <- "odaynum"
      message(part)
      p$set(1, list(oday=p$get(2)))
    },
    p_hournum=function(doc="hournum : digit2", p) {
      part <- "hournum"
      message(part)
      p$set(1, list(hour=p$get(2)))
    },
    p_minutenum=function(doc="minutenum : digit2", p) {
      part <- "minutenum"
      message(part)
      p$set(1, list(minute=p$get(2)))
    },
    p_secondnum=function(doc="secondnum : digit2", p) {
      part <- "secondnum"
      message(part)
      p$set(1, list(second=p$get(2)))
    }
  )

# Extended Parser ####

l_extended_iso8601 <-
  list(
    p_date=function(doc="date : year", p) {
      part <- "date"
      message(part)
      p$set(1, set_value(p))
    },
    p_year=function(doc="year : yearnum
                              | yearnum fraction
                              | yearnum basic subyear
                              | yearnum dash subyear", p) {
      part <- "year"
      message(part)
      p$set(1, set_value(p))
    },
    p_subyear=function(doc="subyear : month
                                    | week
                                    | oday", p) {
      part <- "subyear"
      message(part)
      p$set(1, set_value(p))
    },
    p_month=function(doc="month : monthnum
                                | monthnum fraction
                                | monthnum basic mday
                                | monthnum dash mday", p) {
      part <- "month"
      message(part)
      p$set(1, set_value(p))
    },
    p_mday=function(doc="mday : mdaynum
                              | mdaynum fraction
                              | mdaynum subday", p) {
      part <- "mday"
      message(part)
      p$set(1, set_value(p))
    },
    p_week=function(doc="week : week_w weeknum
                              | week_w weeknum fraction
                              | week_w weeknum basic weekday
                              | week_w weeknum dash weekday", p) {
      part <- "week"
      message(part)
      p$set(1, set_value(p))
    },
    p_week_w=function(doc="week_w : 'W'", p) {
      part <- "week_w"
      message(part)
      p$set(1, list())
    },
    p_weekday=function(doc="weekday : weekdaynum
                                    | weekdaynum fraction
                                    | weekdaynum subday", p) {
      part <- "weekday"
      message(part)
      p$set(1, set_value(p))
    },
    p_oday=function(doc="oday : odaynum
                              | odaynum fraction
                              | odaynum subday", p) {
      part <- "oday"
      message(part)
      p$set(1, set_value(p))
    },
    p_subday=function(doc="subday : time_with_t", p) {
      part <- "subday"
      message(part)
      p$set(1, set_value(p))
    },
    p_time=function(doc="time : time_with_t
                              | time_without_t", p) {
      # if just hour is given, it must be preceded by 'T'
      part <- "time"
      message(part)
      p$set(1, set_value(p))
    },
    p_time_with_t=function(doc="time_with_t : time_t hournum
                                            | time_t hournum fraction
                                            | time_t time_without_t", p) {
      part <- "time_with_t"
      message(part)
      p$set(1, set_value(p))
    },
    p_time_t=function(doc="time_t : 'T'", p) {
      part <- "time_t"
      message(part)
      p$set(1, list())
    },
    p_time_without_t=function(doc="time_without_t : hournum basic minute
                                                  | hournum colon minute", p) {
      part <- "time_without_t"
      message(part)
      p$set(1, set_value(p))
    },
    p_minute=function(doc="minute : minutenum
                                  | minutenum fraction
                                  | minutenum basic second
                                  | minutenum colon second", p) {
      part <- "minute"
      message(part)
      p$set(1, set_value(p))
    },
    p_second=function(doc="second : secondnum
                                  | secondnum fraction", p) {
      part <- "second"
      message(part)
      p$set(1, set_value(p))
    },
    p_dash=function(doc="dash : '-'", p) {
      part <- "dash"
      message(part)
      p$set(1, list(iso_8601_format="extended"))
    },
    p_colon=function(doc="colon : ':'", p) {
      part <- "colon"
      message(part)
      p$set(1, list(iso_8601_format="extended"))
    }
  )

Parser <-
  R6::R6Class(
    "Basic Parser",
    public=append(append(l_extended_iso8601, l_specific_numbers), l_parser_general)
  )

lexer  <- rly::lex(Lexer)
parser <- rly::yacc(Parser)
#> WARN [2021-11-12 13:45:33] Rule time defined, but not used
#> WARN [2021-11-12 13:45:33] There is 1 unused rule
#> WARN [2021-11-12 13:45:33] Symbol time is unreachable

parser$parse("20201101", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> digit2
#> digit3
#> Syntax error at '1'
#> 20201101
#>        ^
#> NULL
parser$parse("2020110", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> digit2
#> digit3
#> odaynum
#> oday
#> subyear
#> year
#> date
#> $year
#> [1] "2020"
#> 
#> $iso_8601_format
#> [1] "basic"
#> 
#> $oday
#> [1] "110"
parser$parse("2020-11-01", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> dash
#> digit2
#> monthnum
#> dash
#> digit2
#> mdaynum
#> mday
#> month
#> subyear
#> year
#> date
#> $year
#> [1] "2020"
#> 
#> $iso_8601_format
#> [1] "extended"
#> 
#> $month
#> [1] "11"
#> 
#> $mday
#> [1] "01"
parser$parse("2020-110", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> dash
#> digit2
#> digit3
#> odaynum
#> oday
#> subyear
#> year
#> date
#> $year
#> [1] "2020"
#> 
#> $iso_8601_format
#> [1] "extended"
#> 
#> $oday
#> [1] "110"

^{Created on 2021-11-12 by the reprex package (v2.0.1)}

Missing DESCRIPTION file on github

Is there any way to use your github repository as R package?

I tried to pull your fix into my local rly package by devtools::install_github(),
but it returned Does not appear to be an R package (no DESCRIPTION) error.
DESCRIPTION file is now in .gitignore and not on github.

Documentation Request: How do I test the lexer?

Thank you for the library!

I'm trying to test my lexer to ensure that I'm getting the tokens that I want. The documentation doesn't make it clear how to test just the lexer. Can you please update the docs to show that (or if I'm missing it, point me to the correct docs)?

Warning for is.na Check During Error

With the code below, I get the following warning: "is.na() applied to non-(list or vector) of type 'environment'"

# Based on https://www.stata.com/manuals/dinfilefixedformat.pdf

#library(rly)
devtools::load()
devtools::test()
library(R6)

TOKENS <- c("COMMENT", "SPECIFICATION_HEADER", "NUMBER",
            "LOCATION_JUMP",
            "VARTYPE", "VARNAME", "FORMAT", "LABEL")
LITERALS = c("{", "}", "(", ")", ".")

Lexer <- R6Class(
  "Lexer",
  public = list(
    tokens = TOKENS,
    literals = LITERALS,
    t_COMMENT = function(re="\\*(?:.*)", t) {
      t
    },
    t_FORMAT = "%[0-9]+(?:(?:\\.[0-9]+)?[efg]|[sS])",
    t_SPECIFICATION_HEADER = "_(?:first(?:lineoffile)|lines|lrecl)",
    t_LOCATION_JUMP="_(?:column|line|newline|skip)",
    t_NUMBER = "[0-9]+",
    t_VARTYPE = "(?:int|str[0-9]+|byte)",
    t_VARNAME = "[A-Za-z_][A-Za-z0-9_]*",
    t_LABEL = '".*?"',
    t_ignore = " \t",
    t_newline = function(re = "\\r?\\n", t) {
      t$lexer$lineno <- t$lexer$lineno + 1
      NULL
    },
    t_error = function(t) {
      cat(sprintf("Illegal character '%s'\n", t$value[1]))
      t$lexer$skip(1)
      t
    }
  )
)

dct <- paste0(readLines("http://www.nber.org/natality/1968/natl1968.dct"), collapse="\n")
lexer <- rly::lex(Lexer)
lexer$input(dct)
while (!is.null(current_token <- lexer$token())) {
  print(current_token)
}

var_update <- function(x, name=NULL, type=NULL, label=NULL, format=NULL) {
  if (is.null(x)) {
    if (is.null(name)) {
      NULL
    }
    list(name=name, type=type, label=label, format=format)
  } else {
    if (!is.null(x)) {
      ret <- x
    } else {
      ret <- list()
    }
    if (!is.null(name)) {
      ret$name <- name
    }
    if (!is.null(type)) {
      ret$type <- type
    }
    if (!is.null(label)) {
      ret$label <- label
    }
    if (!is.null(format)) {
      ret$format <- format
    }
    ret
  }
}

Parser <- R6Class(
  "Parser",
  public = list(
    tokens = TOKENS,
    literals = LITERALS,
    precedence = list(),
    p_outer = function(doc = 'outer : VARNAME VARNAME "{" nodes "}"', p) {
      cat(sprintf("outer at %d\n", p$lexpos(2)))
      p$set(1, p$get(5))
    },
    p_nodes = function(doc = "nodes : header
                                    | comment
                                    | location
                                    | var", p) {
      p$set(1, p$get(2))
    },
    p_varlabel = function(doc = "var : var LABEL", p) {
      cat(sprintf("varlabel at %d\n", p$lexpos(3)))
      p$set(1, list("var", var_update(p$get(2), label=p$get(3))))
    },
    p_vartype = function(doc = "var : VARTYPE var", p) {
      cat(sprintf("vartype at %d\n", p$lexpos(2)))
      p$set(1, list("var", var_update(p$get(3), type=p$get(2))))
    },
    p_varfmt = function(doc = "var : var FORMAT", p) {
      cat(sprintf("varfmt at %d\n", p$lexpos(3)))
      p$set(1, list("var", var_update(p$get(2), format=p$get(3))))
    },
    p_varbase = function(doc = "var : VARNAME", p) {
      cat(sprintf("varbase at %d\n", p$lexpos(2)))
      p$set(1, list("var", var_update(NULL, name=p$get(2))))
    },
    p_location_num = function(doc = "location : location number_group", p) {
      p$set(1, list("location", p$get(2), p$get(3)))
    },
    p_location = function(doc = "location : LOCATION_JUMP", p) {
      cat(sprintf("location at %d\n", p$lexpos(2)))
      p$set(1, list("location", p$get(2)))
    },
    p_header = function(doc = 'header : SPECIFICATION_HEADER number_group', p) {
      cat(sprintf("header at %d\n", p$lexpos(2)))
      p$set(1, list("header", p$get(2), p$get(3)))
    },
    p_number_group = function(doc='number_group : "(" NUMBER ")"', p) {
      cat(sprintf("number_group at %d\n", p$lexpos(3)))
      p$set(1, list("number_group", p$get(3)))
    },
    p_comment = function(doc="comment : COMMENT", p) {
      cat(sprintf("comment at %d\n", p$lexpos(2)))
      p$set(1, list("comment", p$get(2)))
    },
    p_error = function(p) {
      if(is.null(p)) cat("Syntax error at EOF")
      else           cat(sprintf("Syntax error for value '%s' at line %d (lexer position %d)\n", p$value, p$lineno, p$lexpos))
    }
  )
)
lexer <- rly::lex(Lexer)
parser <- rly::yacc(Parser, debug=TRUE)

foo <- parser$parse(dct, lexer=lexer)

Precedence between Tokenization Rules

Hi, thank you for providing a great library.

I'm now trying to implement my language by rly.
Is there any way to prefer one tokenisation rule in Lexer to the other rules?

Specifically, I have the following two rules in R6::R6Class("Lexer"):

t_THEME = '(\+|\|)\stheme',
t_LAYER = function(re='(\+|\|)\s[a-z_]+', t) { ... }

THEME is a token which is preceded with whitspaces and a plus sign (ex " + theme"),
and LAYER is a similar token whose suffix is arbitrary alphanumeric names (ex " + myvar1" or " + ididid").

Currently my lexer wrongly recognizes rly::lex(module=myLexer)$input(' + theme') as LAYER.

I know if I add the negation of 'theme' in LAYER I can achieve what I'd like to do, but I guess there is an interface to define precedence of tokenization rules.

systemincloud / rly Goto Github PK

rly's People

Contributors

Stargazers

Watchers

Forkers

rly's Issues

Passing Variables between Tokenizing/Parsing Rules

dynamic regex in function

Error with Environment?

Input of example

Possible to get a return value?

Potential Precedence Issue

Missing DESCRIPTION file on github

Documentation Request: How do I test the lexer?

Warning for is.na Check During Error

Precedence between Tokenization Rules

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent