Giter VIP home page Giter VIP logo

rly's People

Contributors

billdenney avatar caprice-j avatar henricowitvliet avatar hrbrmstr avatar marekjagielski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rly's Issues

dynamic regex in function

I like this library. But I think I'm using it in a wrong way. I try to build a dynamic parser. I build a regular expression using the labels I expect, and I try to use that variable as the re-argument for a lexer token:

reLABEL <- '(label1|label2|....'
....
lexer <- R6Class("Lexer",
  ....
  t_LABEL = function(re = reLABEL, t) {

In the function get_regex the value is not a string, but is of class "name" and has the value reLABEL.

It seems that the following change works for me:

get_regex = function(func) {
  val <- formals(func)[['re']]
  if (!is(val, 'character')) {
    val <- eval(val, parent.env(environment(func)))
  }
  return(val)
}

Do you know a better way to use variables containing the string with the regular expression?

Error with Environment?

When running the code below, I get an error about trying to index into an environment. I think that a safety check is needed in the p$callable() function, but I'm not sure what.

library(rly)

TOKENS <- c("MONTH", "WEEK", "DAY", "HOUR", "MINUTE", "SECOND",
            "INTEGER", "PREPOST")
LITERALS <- c(".", "-")

Lexer <- R6::R6Class(
  "Lexer",
  public=list(
    tokens=TOKENS,
    literals=LITERALS,
    t_MONTH="(?:mon(?:th)?|MON(?:TH)?)",
    t_WEEK="(?:W(?:EE)?K|w(?:ee)?k)",
    t_DAY="(?:D(?:A)?Y|d(?:a)?y)",
    t_HOUR="(?:H(?:(?:OU)?R)?|h(?:(?:ou)?r)?)",
    t_MINUTE="(?:M(?:IN(?:UTE)?)?|m(?:in(?:ute)?)?)",
    t_SECOND="(?:S(?:EC(?:OND)?)?|s(?:ec(?:ond)?)?)",
    t_INTEGER="[0-9]+",
    
    t_ignore=" \t",
    t_error=function(t) {
      cat(sprintf("Illegal character '%s'", t$value[1]))
      t$lexer$skip(1)
      return(t)
    }
  )
)

Parser <- R6::R6Class(
  "Parser",
  public = list(
    tokens = TOKENS,
    literals = LITERALS,
    # Parsing rules
    precedence = list(),
    # dictionary of names
    names = new.env(hash=TRUE),
    p_valueunit_prepost=function(doc="valueunit : valueunit PREPOST", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_valueunit_base=function(doc="valueunit : value timeunit
                                        | timeunit value", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_value_negative=function(doc="value : '-' value", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_value_float=function(doc="value : INTEGER '.' INTEGER", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_value_integer=function(doc="value : INTEGER", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_time_unit=function(doc="timeunit : MONTH
                                       | WEEK
                                       | DAY
                                       | HOUR
                                       | MINUTE
                                       | SECOND", p) {
      p$set(1, self$names[[as.character(p$get(2))]])
    },
    p_error = function(p) {
      if(is.null(p)) cat("Syntax error at EOF")
      else           cat(sprintf("Syntax error at '%s'", p$value))
    }
  )
)

lexer  <- rly::lex(Lexer)
parser <- rly::yacc(Parser)

parser$parse("15min", lexer)
#> Error in value[[3L]](cond): wrong arguments for subsetting an environment

Created on 2018-12-08 by the reprex package (v0.2.0).

Input of example

I saw the new line expression was set in "Calculator Example"
But when I use parser$parse(input = "a=2+3 \n b=a/2 ",lexer = lexer)
It seems not work .
And I also try

 parser$parse(input = "a=2+3 
                       b=a/2 ",lexer = lexer)

It seems not work either.

Possible to get a return value?

Is it possible to get some form of return value from the parser? Specifically, I'm trying to make a parser to parse out the parts of a date. I'd like to have the return value provide a list with the year, month, day, hour, minute, and second. I'm writing this as a parser instead of a regexp because I'm trying to match the ISO 8601 standard which has many variants of the types of dates that are allowed.

In the example below, I'd like to access self$values after parsing.

#install.packages("rly")
library(rly)

TOKENS <- "DIGIT"
LITERALS <- c(".", ",", "-", "T", "W")

Lexer <-
  R6::R6Class(
    "Lexer",
    public=list(
      tokens=TOKENS,
      literals=LITERALS,
      t_DIGIT="[0-9]",
      #t_ignore = " \t",
      t_newline = function(re='\\n+', t) {
        t$lexer$lineno <- t$lexer$lineno + nchar(t$value)
        return(NULL)
      },
      t_error = function(t) {
        cat(sprintf("Illegal character '%s'", t$value[1]))
        t$lexer$skip(1)
        return(t)
      }
    )
  )

Parser <-
  R6::R6Class(
    "Parser",
    public=list(
      tokens=TOKENS,
      literals=LITERALS,
      values=list(),
      p_date_year_month_day=function(doc="date_year_month_day : date_year_month '-' DIGIT DIGIT", p) {
        message("DAY")
        daynum <- as.numeric(paste(sapply(4:5, p$get), collapse=""))
        stopifnot(0 < daynum & daynum < 13)
        self$values$DAY <- daynum
        self$values
      },
      p_date_year_month=function(doc="date_year_month : date_year '-' DIGIT DIGIT", p) {
        message("MONTH")
        monthnum <- as.numeric(paste(sapply(4:5, p$get), collapse=""))
        stopifnot(0 < monthnum & monthnum < 13)
        self$values$MONTH <- monthnum
        self$values
      },
      p_date_year=function(doc="date_year : DIGIT DIGIT DIGIT DIGIT", p) {
        message("YEAR")
        self$values$YEAR <- as.numeric(paste(sapply(2:5, p$get), collapse=""))
        self$values
      },
      p_error = function(p) {
        if(is.null(p)) cat("Syntax error at EOF")
        else           cat(sprintf("Syntax error at '%s'", p$value))
      }
    )
  )

lexer  <- rly::lex(Lexer)
parser <- rly::yacc(Parser)

# x <- c("2020-11-01", "2020-11", "2020")
# lexer$input("2020-11-01")

parser$parse("2020-11-01", lexer)

Potential Precedence Issue

I'm still working on the ISO 8601 parser mentioned in #15.

I've made some good progress, but there is an issue with an unambiguous, but initially multiply-matching rule. In the example below, I expected the parser to do the following:

  1. Find digit4 (it does that correctly)
  2. Assign the digit4 to yearnum (it does that correctly)
  3. Find digit2 (it doesn't do that)
  4. Assign the digit2 to monthnum
  5. Find digit2
  6. Assign the digit2 to mdaynum

I'm not sure why it's not finding digit2 and it is finding digit3. It should only find digit3 if that is the end of the string. Below is the code, and the issue is with the first call to parser::parse().

library(rly)

TOKENS <- c("DIGIT", "DECIMALPOINT")
LITERALS <- c("W", "Z", "Q", "W", "T", ":", "-")

p_collapse <- function(x, p) {
  paste0(sapply(X=x, FUN=p$get), collapse="")
}

set_value <- function(p) {
  ret <- list()
  for (idx in (1 + seq_len(p$length() - 1))) {
    current <- p$get(idx)
    for (nm in names(current)) {
      if (nm %in% names(ret)) {
        if (ret[[nm]] != current[[nm]]) {
          print(ret)
          print(current)
          stop(sprintf("mismatch with %s: %s vs %s", nm, ret[[nm]], current[[nm]]))
        }
      } else {
        ret[[nm]] <- current[[nm]]
      }
    }
  }
  ret
}

# Lexer ####

Lexer <-
  R6::R6Class(
    "Lexer",
    public=list(
      tokens=TOKENS,
      literals=LITERALS,
      t_DIGIT="[0-9]",
      t_DECIMALPOINT="[\\.,]",
      #t_ignore = " \t",
      t_newline = function(re='\\n+', t) {
        t$lexer$lineno <- t$lexer$lineno + nchar(t$value)
        return(NULL)
      },
      t_error = function(t) {
        cat(sprintf("Illegal character '%s'", t$value[1]))
        t$lexer$skip(1)
        return(t)
      }
    )
  )

# General parser support functions ####

l_parser_general <-
  list(
    tokens=TOKENS,
    literals=LITERALS,

    ## Helpers ####
    p_fraction=function(doc="fraction : DECIMALPOINT multi_digit", p) {
      part <- "fraction"
      message(part)
      p$set(1, list(fraction=p$get(3)))
    },
    p_multi_digit=function(doc="multi_digit : DIGIT
                                            | digit2
                                            | digit3
                                            | digit4", p) {
      part <- "multi_digit"
      message(part)
      p$set(1, p$get(2))
    },
    p_digit4=function(doc="digit4 : digit3 DIGIT", p) {
      part <- "digit4"
      message(part)
      p$set(1, p_collapse(2:3, p))
    },
    p_digit3=function(doc="digit3 : digit2 DIGIT", p) {
      part <- "digit3"
      message(part)
      p$set(1, p_collapse(2:3, p))
    },
    p_digit2=function(doc="digit2 : DIGIT DIGIT", p) {
      part <- "digit2"
      message(part)
      p$set(1, p_collapse(2:3, p))
    },
    p_basic=function(doc="basic : ", p) {
      p$set(1, list(iso_8601_format="basic"))
    },
    p_error = function(p) {
      if(is.null(p)) {
        cat("Syntax error at EOF")
      } else {
        cat(sprintf(
          "Syntax error at '%s'\n%s\n%s^",
          p$value, p$lexer$lexdata, strrep(' ', p$lexpos - 1)
        ))
      }
    }
  )

# Specific numbers ####

l_specific_numbers <-
  list(
    p_yearnum=function(doc="yearnum : digit4", p) {
      part <- "yearnum"
      message(part)
      p$set(1, list(year=p$get(2)))
    },
    p_monthnum=function(doc="monthnum : digit2", p) {
      part <- "monthnum"
      message(part)
      p$set(1, list(month=p$get(2)))
    },
    p_mdaynum=function(doc="mdaynum : digit2", p) {
      part <- "mdaynum"
      message(part)
      p$set(1, list(mday=p$get(2)))
    },
    p_weeknum=function(doc="weeknum : digit2", p) {
      part <- "weeknum"
      message(part)
      p$set(1, list(week=p$get(2)))
    },
    p_weekdaynum=function(doc="weekdaynum : DIGIT", p) {
      part <- "weekdaynum"
      message(part)
      p$set(1, list(weekday=p$get(2)))
    },
    p_odaynum=function(doc="odaynum : digit3", p) {
      part <- "odaynum"
      message(part)
      p$set(1, list(oday=p$get(2)))
    },
    p_hournum=function(doc="hournum : digit2", p) {
      part <- "hournum"
      message(part)
      p$set(1, list(hour=p$get(2)))
    },
    p_minutenum=function(doc="minutenum : digit2", p) {
      part <- "minutenum"
      message(part)
      p$set(1, list(minute=p$get(2)))
    },
    p_secondnum=function(doc="secondnum : digit2", p) {
      part <- "secondnum"
      message(part)
      p$set(1, list(second=p$get(2)))
    }
  )

# Extended Parser ####

l_extended_iso8601 <-
  list(
    p_date=function(doc="date : year", p) {
      part <- "date"
      message(part)
      p$set(1, set_value(p))
    },
    p_year=function(doc="year : yearnum
                              | yearnum fraction
                              | yearnum basic subyear
                              | yearnum dash subyear", p) {
      part <- "year"
      message(part)
      p$set(1, set_value(p))
    },
    p_subyear=function(doc="subyear : month
                                    | week
                                    | oday", p) {
      part <- "subyear"
      message(part)
      p$set(1, set_value(p))
    },
    p_month=function(doc="month : monthnum
                                | monthnum fraction
                                | monthnum basic mday
                                | monthnum dash mday", p) {
      part <- "month"
      message(part)
      p$set(1, set_value(p))
    },
    p_mday=function(doc="mday : mdaynum
                              | mdaynum fraction
                              | mdaynum subday", p) {
      part <- "mday"
      message(part)
      p$set(1, set_value(p))
    },
    p_week=function(doc="week : week_w weeknum
                              | week_w weeknum fraction
                              | week_w weeknum basic weekday
                              | week_w weeknum dash weekday", p) {
      part <- "week"
      message(part)
      p$set(1, set_value(p))
    },
    p_week_w=function(doc="week_w : 'W'", p) {
      part <- "week_w"
      message(part)
      p$set(1, list())
    },
    p_weekday=function(doc="weekday : weekdaynum
                                    | weekdaynum fraction
                                    | weekdaynum subday", p) {
      part <- "weekday"
      message(part)
      p$set(1, set_value(p))
    },
    p_oday=function(doc="oday : odaynum
                              | odaynum fraction
                              | odaynum subday", p) {
      part <- "oday"
      message(part)
      p$set(1, set_value(p))
    },
    p_subday=function(doc="subday : time_with_t", p) {
      part <- "subday"
      message(part)
      p$set(1, set_value(p))
    },
    p_time=function(doc="time : time_with_t
                              | time_without_t", p) {
      # if just hour is given, it must be preceded by 'T'
      part <- "time"
      message(part)
      p$set(1, set_value(p))
    },
    p_time_with_t=function(doc="time_with_t : time_t hournum
                                            | time_t hournum fraction
                                            | time_t time_without_t", p) {
      part <- "time_with_t"
      message(part)
      p$set(1, set_value(p))
    },
    p_time_t=function(doc="time_t : 'T'", p) {
      part <- "time_t"
      message(part)
      p$set(1, list())
    },
    p_time_without_t=function(doc="time_without_t : hournum basic minute
                                                  | hournum colon minute", p) {
      part <- "time_without_t"
      message(part)
      p$set(1, set_value(p))
    },
    p_minute=function(doc="minute : minutenum
                                  | minutenum fraction
                                  | minutenum basic second
                                  | minutenum colon second", p) {
      part <- "minute"
      message(part)
      p$set(1, set_value(p))
    },
    p_second=function(doc="second : secondnum
                                  | secondnum fraction", p) {
      part <- "second"
      message(part)
      p$set(1, set_value(p))
    },
    p_dash=function(doc="dash : '-'", p) {
      part <- "dash"
      message(part)
      p$set(1, list(iso_8601_format="extended"))
    },
    p_colon=function(doc="colon : ':'", p) {
      part <- "colon"
      message(part)
      p$set(1, list(iso_8601_format="extended"))
    }
  )

Parser <-
  R6::R6Class(
    "Basic Parser",
    public=append(append(l_extended_iso8601, l_specific_numbers), l_parser_general)
  )

lexer  <- rly::lex(Lexer)
parser <- rly::yacc(Parser)
#> WARN [2021-11-12 13:45:33] Rule time defined, but not used
#> WARN [2021-11-12 13:45:33] There is 1 unused rule
#> WARN [2021-11-12 13:45:33] Symbol time is unreachable

parser$parse("20201101", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> digit2
#> digit3
#> Syntax error at '1'
#> 20201101
#>        ^
#> NULL
parser$parse("2020110", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> digit2
#> digit3
#> odaynum
#> oday
#> subyear
#> year
#> date
#> $year
#> [1] "2020"
#> 
#> $iso_8601_format
#> [1] "basic"
#> 
#> $oday
#> [1] "110"
parser$parse("2020-11-01", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> dash
#> digit2
#> monthnum
#> dash
#> digit2
#> mdaynum
#> mday
#> month
#> subyear
#> year
#> date
#> $year
#> [1] "2020"
#> 
#> $iso_8601_format
#> [1] "extended"
#> 
#> $month
#> [1] "11"
#> 
#> $mday
#> [1] "01"
parser$parse("2020-110", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> dash
#> digit2
#> digit3
#> odaynum
#> oday
#> subyear
#> year
#> date
#> $year
#> [1] "2020"
#> 
#> $iso_8601_format
#> [1] "extended"
#> 
#> $oday
#> [1] "110"

Created on 2021-11-12 by the reprex package (v2.0.1)

Missing DESCRIPTION file on github

Is there any way to use your github repository as R package?

I tried to pull your fix into my local rly package by devtools::install_github(),
but it returned Does not appear to be an R package (no DESCRIPTION) error.
DESCRIPTION file is now in .gitignore and not on github.

Documentation Request: How do I test the lexer?

Thank you for the library!

I'm trying to test my lexer to ensure that I'm getting the tokens that I want. The documentation doesn't make it clear how to test just the lexer. Can you please update the docs to show that (or if I'm missing it, point me to the correct docs)?

Warning for is.na Check During Error

With the code below, I get the following warning: "is.na() applied to non-(list or vector) of type 'environment'"

# Based on https://www.stata.com/manuals/dinfilefixedformat.pdf

#library(rly)
devtools::load()
devtools::test()
library(R6)

TOKENS <- c("COMMENT", "SPECIFICATION_HEADER", "NUMBER",
            "LOCATION_JUMP",
            "VARTYPE", "VARNAME", "FORMAT", "LABEL")
LITERALS = c("{", "}", "(", ")", ".")

Lexer <- R6Class(
  "Lexer",
  public = list(
    tokens = TOKENS,
    literals = LITERALS,
    t_COMMENT = function(re="\\*(?:.*)", t) {
      t
    },
    t_FORMAT = "%[0-9]+(?:(?:\\.[0-9]+)?[efg]|[sS])",
    t_SPECIFICATION_HEADER = "_(?:first(?:lineoffile)|lines|lrecl)",
    t_LOCATION_JUMP="_(?:column|line|newline|skip)",
    t_NUMBER = "[0-9]+",
    t_VARTYPE = "(?:int|str[0-9]+|byte)",
    t_VARNAME = "[A-Za-z_][A-Za-z0-9_]*",
    t_LABEL = '".*?"',
    t_ignore = " \t",
    t_newline = function(re = "\\r?\\n", t) {
      t$lexer$lineno <- t$lexer$lineno + 1
      NULL
    },
    t_error = function(t) {
      cat(sprintf("Illegal character '%s'\n", t$value[1]))
      t$lexer$skip(1)
      t
    }
  )
)

dct <- paste0(readLines("http://www.nber.org/natality/1968/natl1968.dct"), collapse="\n")
lexer <- rly::lex(Lexer)
lexer$input(dct)
while (!is.null(current_token <- lexer$token())) {
  print(current_token)
}

var_update <- function(x, name=NULL, type=NULL, label=NULL, format=NULL) {
  if (is.null(x)) {
    if (is.null(name)) {
      NULL
    }
    list(name=name, type=type, label=label, format=format)
  } else {
    if (!is.null(x)) {
      ret <- x
    } else {
      ret <- list()
    }
    if (!is.null(name)) {
      ret$name <- name
    }
    if (!is.null(type)) {
      ret$type <- type
    }
    if (!is.null(label)) {
      ret$label <- label
    }
    if (!is.null(format)) {
      ret$format <- format
    }
    ret
  }
}

Parser <- R6Class(
  "Parser",
  public = list(
    tokens = TOKENS,
    literals = LITERALS,
    precedence = list(),
    p_outer = function(doc = 'outer : VARNAME VARNAME "{" nodes "}"', p) {
      cat(sprintf("outer at %d\n", p$lexpos(2)))
      p$set(1, p$get(5))
    },
    p_nodes = function(doc = "nodes : header
                                    | comment
                                    | location
                                    | var", p) {
      p$set(1, p$get(2))
    },
    p_varlabel = function(doc = "var : var LABEL", p) {
      cat(sprintf("varlabel at %d\n", p$lexpos(3)))
      p$set(1, list("var", var_update(p$get(2), label=p$get(3))))
    },
    p_vartype = function(doc = "var : VARTYPE var", p) {
      cat(sprintf("vartype at %d\n", p$lexpos(2)))
      p$set(1, list("var", var_update(p$get(3), type=p$get(2))))
    },
    p_varfmt = function(doc = "var : var FORMAT", p) {
      cat(sprintf("varfmt at %d\n", p$lexpos(3)))
      p$set(1, list("var", var_update(p$get(2), format=p$get(3))))
    },
    p_varbase = function(doc = "var : VARNAME", p) {
      cat(sprintf("varbase at %d\n", p$lexpos(2)))
      p$set(1, list("var", var_update(NULL, name=p$get(2))))
    },
    p_location_num = function(doc = "location : location number_group", p) {
      p$set(1, list("location", p$get(2), p$get(3)))
    },
    p_location = function(doc = "location : LOCATION_JUMP", p) {
      cat(sprintf("location at %d\n", p$lexpos(2)))
      p$set(1, list("location", p$get(2)))
    },
    p_header = function(doc = 'header : SPECIFICATION_HEADER number_group', p) {
      cat(sprintf("header at %d\n", p$lexpos(2)))
      p$set(1, list("header", p$get(2), p$get(3)))
    },
    p_number_group = function(doc='number_group : "(" NUMBER ")"', p) {
      cat(sprintf("number_group at %d\n", p$lexpos(3)))
      p$set(1, list("number_group", p$get(3)))
    },
    p_comment = function(doc="comment : COMMENT", p) {
      cat(sprintf("comment at %d\n", p$lexpos(2)))
      p$set(1, list("comment", p$get(2)))
    },
    p_error = function(p) {
      if(is.null(p)) cat("Syntax error at EOF")
      else           cat(sprintf("Syntax error for value '%s' at line %d (lexer position %d)\n", p$value, p$lineno, p$lexpos))
    }
  )
)
lexer <- rly::lex(Lexer)
parser <- rly::yacc(Parser, debug=TRUE)

foo <- parser$parse(dct, lexer=lexer)

Precedence between Tokenization Rules

Hi, thank you for providing a great library.

I'm now trying to implement my language by rly.
Is there any way to prefer one tokenisation rule in Lexer to the other rules?

Specifically, I have the following two rules in R6::R6Class("Lexer"):

t_THEME = '(\+|\|)\stheme',
t_LAYER = function(re='(\+|\|)\s
[a-z_]+', t) { ... }

THEME is a token which is preceded with whitspaces and a plus sign (ex " + theme"),
and LAYER is a similar token whose suffix is arbitrary alphanumeric names (ex " + myvar1" or " + ididid").

Currently my lexer wrongly recognizes rly::lex(module=myLexer)$input(' + theme') as LAYER.

I know if I add the negation of 'theme' in LAYER I can achieve what I'd like to do, but I guess there is an interface to define precedence of tokenization rules.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.