systemincloud / rly Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
It it possible to pass variables between t_* and p_* functions?
Currently I've just defined an environment globally and access it, but other ways might be much cleaner.
I like this library. But I think I'm using it in a wrong way. I try to build a dynamic parser. I build a regular expression using the labels I expect, and I try to use that variable as the re-argument for a lexer token:
reLABEL <- '(label1|label2|....'
....
lexer <- R6Class("Lexer",
....
t_LABEL = function(re = reLABEL, t) {
In the function get_regex the value is not a string, but is of class "name" and has the value reLABEL.
It seems that the following change works for me:
get_regex = function(func) {
val <- formals(func)[['re']]
if (!is(val, 'character')) {
val <- eval(val, parent.env(environment(func)))
}
return(val)
}
Do you know a better way to use variables containing the string with the regular expression?
When running the code below, I get an error about trying to index into an environment. I think that a safety check is needed in the p$callable()
function, but I'm not sure what.
library(rly)
TOKENS <- c("MONTH", "WEEK", "DAY", "HOUR", "MINUTE", "SECOND",
"INTEGER", "PREPOST")
LITERALS <- c(".", "-")
Lexer <- R6::R6Class(
"Lexer",
public=list(
tokens=TOKENS,
literals=LITERALS,
t_MONTH="(?:mon(?:th)?|MON(?:TH)?)",
t_WEEK="(?:W(?:EE)?K|w(?:ee)?k)",
t_DAY="(?:D(?:A)?Y|d(?:a)?y)",
t_HOUR="(?:H(?:(?:OU)?R)?|h(?:(?:ou)?r)?)",
t_MINUTE="(?:M(?:IN(?:UTE)?)?|m(?:in(?:ute)?)?)",
t_SECOND="(?:S(?:EC(?:OND)?)?|s(?:ec(?:ond)?)?)",
t_INTEGER="[0-9]+",
t_ignore=" \t",
t_error=function(t) {
cat(sprintf("Illegal character '%s'", t$value[1]))
t$lexer$skip(1)
return(t)
}
)
)
Parser <- R6::R6Class(
"Parser",
public = list(
tokens = TOKENS,
literals = LITERALS,
# Parsing rules
precedence = list(),
# dictionary of names
names = new.env(hash=TRUE),
p_valueunit_prepost=function(doc="valueunit : valueunit PREPOST", p) {
p$set(1, self$names[[as.character(p$get(2))]])
},
p_valueunit_base=function(doc="valueunit : value timeunit
| timeunit value", p) {
p$set(1, self$names[[as.character(p$get(2))]])
},
p_value_negative=function(doc="value : '-' value", p) {
p$set(1, self$names[[as.character(p$get(2))]])
},
p_value_float=function(doc="value : INTEGER '.' INTEGER", p) {
p$set(1, self$names[[as.character(p$get(2))]])
},
p_value_integer=function(doc="value : INTEGER", p) {
p$set(1, self$names[[as.character(p$get(2))]])
},
p_time_unit=function(doc="timeunit : MONTH
| WEEK
| DAY
| HOUR
| MINUTE
| SECOND", p) {
p$set(1, self$names[[as.character(p$get(2))]])
},
p_error = function(p) {
if(is.null(p)) cat("Syntax error at EOF")
else cat(sprintf("Syntax error at '%s'", p$value))
}
)
)
lexer <- rly::lex(Lexer)
parser <- rly::yacc(Parser)
parser$parse("15min", lexer)
#> Error in value[[3L]](cond): wrong arguments for subsetting an environment
Created on 2018-12-08 by the reprex package (v0.2.0).
I saw the new line expression was set in "Calculator Example"
But when I use parser$parse(input = "a=2+3 \n b=a/2 ",lexer = lexer)
It seems not work .
And I also try
parser$parse(input = "a=2+3
b=a/2 ",lexer = lexer)
It seems not work either.
Is it possible to get some form of return value from the parser? Specifically, I'm trying to make a parser to parse out the parts of a date. I'd like to have the return value provide a list with the year, month, day, hour, minute, and second. I'm writing this as a parser instead of a regexp because I'm trying to match the ISO 8601 standard which has many variants of the types of dates that are allowed.
In the example below, I'd like to access self$values
after parsing.
#install.packages("rly")
library(rly)
TOKENS <- "DIGIT"
LITERALS <- c(".", ",", "-", "T", "W")
Lexer <-
R6::R6Class(
"Lexer",
public=list(
tokens=TOKENS,
literals=LITERALS,
t_DIGIT="[0-9]",
#t_ignore = " \t",
t_newline = function(re='\\n+', t) {
t$lexer$lineno <- t$lexer$lineno + nchar(t$value)
return(NULL)
},
t_error = function(t) {
cat(sprintf("Illegal character '%s'", t$value[1]))
t$lexer$skip(1)
return(t)
}
)
)
Parser <-
R6::R6Class(
"Parser",
public=list(
tokens=TOKENS,
literals=LITERALS,
values=list(),
p_date_year_month_day=function(doc="date_year_month_day : date_year_month '-' DIGIT DIGIT", p) {
message("DAY")
daynum <- as.numeric(paste(sapply(4:5, p$get), collapse=""))
stopifnot(0 < daynum & daynum < 13)
self$values$DAY <- daynum
self$values
},
p_date_year_month=function(doc="date_year_month : date_year '-' DIGIT DIGIT", p) {
message("MONTH")
monthnum <- as.numeric(paste(sapply(4:5, p$get), collapse=""))
stopifnot(0 < monthnum & monthnum < 13)
self$values$MONTH <- monthnum
self$values
},
p_date_year=function(doc="date_year : DIGIT DIGIT DIGIT DIGIT", p) {
message("YEAR")
self$values$YEAR <- as.numeric(paste(sapply(2:5, p$get), collapse=""))
self$values
},
p_error = function(p) {
if(is.null(p)) cat("Syntax error at EOF")
else cat(sprintf("Syntax error at '%s'", p$value))
}
)
)
lexer <- rly::lex(Lexer)
parser <- rly::yacc(Parser)
# x <- c("2020-11-01", "2020-11", "2020")
# lexer$input("2020-11-01")
parser$parse("2020-11-01", lexer)
I'm still working on the ISO 8601 parser mentioned in #15.
I've made some good progress, but there is an issue with an unambiguous, but initially multiply-matching rule. In the example below, I expected the parser to do the following:
I'm not sure why it's not finding digit2 and it is finding digit3. It should only find digit3 if that is the end of the string. Below is the code, and the issue is with the first call to parser::parse()
.
library(rly)
TOKENS <- c("DIGIT", "DECIMALPOINT")
LITERALS <- c("W", "Z", "Q", "W", "T", ":", "-")
p_collapse <- function(x, p) {
paste0(sapply(X=x, FUN=p$get), collapse="")
}
set_value <- function(p) {
ret <- list()
for (idx in (1 + seq_len(p$length() - 1))) {
current <- p$get(idx)
for (nm in names(current)) {
if (nm %in% names(ret)) {
if (ret[[nm]] != current[[nm]]) {
print(ret)
print(current)
stop(sprintf("mismatch with %s: %s vs %s", nm, ret[[nm]], current[[nm]]))
}
} else {
ret[[nm]] <- current[[nm]]
}
}
}
ret
}
# Lexer ####
Lexer <-
R6::R6Class(
"Lexer",
public=list(
tokens=TOKENS,
literals=LITERALS,
t_DIGIT="[0-9]",
t_DECIMALPOINT="[\\.,]",
#t_ignore = " \t",
t_newline = function(re='\\n+', t) {
t$lexer$lineno <- t$lexer$lineno + nchar(t$value)
return(NULL)
},
t_error = function(t) {
cat(sprintf("Illegal character '%s'", t$value[1]))
t$lexer$skip(1)
return(t)
}
)
)
# General parser support functions ####
l_parser_general <-
list(
tokens=TOKENS,
literals=LITERALS,
## Helpers ####
p_fraction=function(doc="fraction : DECIMALPOINT multi_digit", p) {
part <- "fraction"
message(part)
p$set(1, list(fraction=p$get(3)))
},
p_multi_digit=function(doc="multi_digit : DIGIT
| digit2
| digit3
| digit4", p) {
part <- "multi_digit"
message(part)
p$set(1, p$get(2))
},
p_digit4=function(doc="digit4 : digit3 DIGIT", p) {
part <- "digit4"
message(part)
p$set(1, p_collapse(2:3, p))
},
p_digit3=function(doc="digit3 : digit2 DIGIT", p) {
part <- "digit3"
message(part)
p$set(1, p_collapse(2:3, p))
},
p_digit2=function(doc="digit2 : DIGIT DIGIT", p) {
part <- "digit2"
message(part)
p$set(1, p_collapse(2:3, p))
},
p_basic=function(doc="basic : ", p) {
p$set(1, list(iso_8601_format="basic"))
},
p_error = function(p) {
if(is.null(p)) {
cat("Syntax error at EOF")
} else {
cat(sprintf(
"Syntax error at '%s'\n%s\n%s^",
p$value, p$lexer$lexdata, strrep(' ', p$lexpos - 1)
))
}
}
)
# Specific numbers ####
l_specific_numbers <-
list(
p_yearnum=function(doc="yearnum : digit4", p) {
part <- "yearnum"
message(part)
p$set(1, list(year=p$get(2)))
},
p_monthnum=function(doc="monthnum : digit2", p) {
part <- "monthnum"
message(part)
p$set(1, list(month=p$get(2)))
},
p_mdaynum=function(doc="mdaynum : digit2", p) {
part <- "mdaynum"
message(part)
p$set(1, list(mday=p$get(2)))
},
p_weeknum=function(doc="weeknum : digit2", p) {
part <- "weeknum"
message(part)
p$set(1, list(week=p$get(2)))
},
p_weekdaynum=function(doc="weekdaynum : DIGIT", p) {
part <- "weekdaynum"
message(part)
p$set(1, list(weekday=p$get(2)))
},
p_odaynum=function(doc="odaynum : digit3", p) {
part <- "odaynum"
message(part)
p$set(1, list(oday=p$get(2)))
},
p_hournum=function(doc="hournum : digit2", p) {
part <- "hournum"
message(part)
p$set(1, list(hour=p$get(2)))
},
p_minutenum=function(doc="minutenum : digit2", p) {
part <- "minutenum"
message(part)
p$set(1, list(minute=p$get(2)))
},
p_secondnum=function(doc="secondnum : digit2", p) {
part <- "secondnum"
message(part)
p$set(1, list(second=p$get(2)))
}
)
# Extended Parser ####
l_extended_iso8601 <-
list(
p_date=function(doc="date : year", p) {
part <- "date"
message(part)
p$set(1, set_value(p))
},
p_year=function(doc="year : yearnum
| yearnum fraction
| yearnum basic subyear
| yearnum dash subyear", p) {
part <- "year"
message(part)
p$set(1, set_value(p))
},
p_subyear=function(doc="subyear : month
| week
| oday", p) {
part <- "subyear"
message(part)
p$set(1, set_value(p))
},
p_month=function(doc="month : monthnum
| monthnum fraction
| monthnum basic mday
| monthnum dash mday", p) {
part <- "month"
message(part)
p$set(1, set_value(p))
},
p_mday=function(doc="mday : mdaynum
| mdaynum fraction
| mdaynum subday", p) {
part <- "mday"
message(part)
p$set(1, set_value(p))
},
p_week=function(doc="week : week_w weeknum
| week_w weeknum fraction
| week_w weeknum basic weekday
| week_w weeknum dash weekday", p) {
part <- "week"
message(part)
p$set(1, set_value(p))
},
p_week_w=function(doc="week_w : 'W'", p) {
part <- "week_w"
message(part)
p$set(1, list())
},
p_weekday=function(doc="weekday : weekdaynum
| weekdaynum fraction
| weekdaynum subday", p) {
part <- "weekday"
message(part)
p$set(1, set_value(p))
},
p_oday=function(doc="oday : odaynum
| odaynum fraction
| odaynum subday", p) {
part <- "oday"
message(part)
p$set(1, set_value(p))
},
p_subday=function(doc="subday : time_with_t", p) {
part <- "subday"
message(part)
p$set(1, set_value(p))
},
p_time=function(doc="time : time_with_t
| time_without_t", p) {
# if just hour is given, it must be preceded by 'T'
part <- "time"
message(part)
p$set(1, set_value(p))
},
p_time_with_t=function(doc="time_with_t : time_t hournum
| time_t hournum fraction
| time_t time_without_t", p) {
part <- "time_with_t"
message(part)
p$set(1, set_value(p))
},
p_time_t=function(doc="time_t : 'T'", p) {
part <- "time_t"
message(part)
p$set(1, list())
},
p_time_without_t=function(doc="time_without_t : hournum basic minute
| hournum colon minute", p) {
part <- "time_without_t"
message(part)
p$set(1, set_value(p))
},
p_minute=function(doc="minute : minutenum
| minutenum fraction
| minutenum basic second
| minutenum colon second", p) {
part <- "minute"
message(part)
p$set(1, set_value(p))
},
p_second=function(doc="second : secondnum
| secondnum fraction", p) {
part <- "second"
message(part)
p$set(1, set_value(p))
},
p_dash=function(doc="dash : '-'", p) {
part <- "dash"
message(part)
p$set(1, list(iso_8601_format="extended"))
},
p_colon=function(doc="colon : ':'", p) {
part <- "colon"
message(part)
p$set(1, list(iso_8601_format="extended"))
}
)
Parser <-
R6::R6Class(
"Basic Parser",
public=append(append(l_extended_iso8601, l_specific_numbers), l_parser_general)
)
lexer <- rly::lex(Lexer)
parser <- rly::yacc(Parser)
#> WARN [2021-11-12 13:45:33] Rule time defined, but not used
#> WARN [2021-11-12 13:45:33] There is 1 unused rule
#> WARN [2021-11-12 13:45:33] Symbol time is unreachable
parser$parse("20201101", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> digit2
#> digit3
#> Syntax error at '1'
#> 20201101
#> ^
#> NULL
parser$parse("2020110", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> digit2
#> digit3
#> odaynum
#> oday
#> subyear
#> year
#> date
#> $year
#> [1] "2020"
#>
#> $iso_8601_format
#> [1] "basic"
#>
#> $oday
#> [1] "110"
parser$parse("2020-11-01", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> dash
#> digit2
#> monthnum
#> dash
#> digit2
#> mdaynum
#> mday
#> month
#> subyear
#> year
#> date
#> $year
#> [1] "2020"
#>
#> $iso_8601_format
#> [1] "extended"
#>
#> $month
#> [1] "11"
#>
#> $mday
#> [1] "01"
parser$parse("2020-110", lexer)
#> digit2
#> digit3
#> digit4
#> yearnum
#> dash
#> digit2
#> digit3
#> odaynum
#> oday
#> subyear
#> year
#> date
#> $year
#> [1] "2020"
#>
#> $iso_8601_format
#> [1] "extended"
#>
#> $oday
#> [1] "110"
Created on 2021-11-12 by the reprex package (v2.0.1)
Is there any way to use your github repository as R package?
I tried to pull your fix into my local rly package by devtools::install_github(),
but it returned Does not appear to be an R package (no DESCRIPTION)
error.
DESCRIPTION file is now in .gitignore and not on github.
Thank you for the library!
I'm trying to test my lexer to ensure that I'm getting the tokens that I want. The documentation doesn't make it clear how to test just the lexer. Can you please update the docs to show that (or if I'm missing it, point me to the correct docs)?
With the code below, I get the following warning: "is.na() applied to non-(list or vector) of type 'environment'"
# Based on https://www.stata.com/manuals/dinfilefixedformat.pdf
#library(rly)
devtools::load()
devtools::test()
library(R6)
TOKENS <- c("COMMENT", "SPECIFICATION_HEADER", "NUMBER",
"LOCATION_JUMP",
"VARTYPE", "VARNAME", "FORMAT", "LABEL")
LITERALS = c("{", "}", "(", ")", ".")
Lexer <- R6Class(
"Lexer",
public = list(
tokens = TOKENS,
literals = LITERALS,
t_COMMENT = function(re="\\*(?:.*)", t) {
t
},
t_FORMAT = "%[0-9]+(?:(?:\\.[0-9]+)?[efg]|[sS])",
t_SPECIFICATION_HEADER = "_(?:first(?:lineoffile)|lines|lrecl)",
t_LOCATION_JUMP="_(?:column|line|newline|skip)",
t_NUMBER = "[0-9]+",
t_VARTYPE = "(?:int|str[0-9]+|byte)",
t_VARNAME = "[A-Za-z_][A-Za-z0-9_]*",
t_LABEL = '".*?"',
t_ignore = " \t",
t_newline = function(re = "\\r?\\n", t) {
t$lexer$lineno <- t$lexer$lineno + 1
NULL
},
t_error = function(t) {
cat(sprintf("Illegal character '%s'\n", t$value[1]))
t$lexer$skip(1)
t
}
)
)
dct <- paste0(readLines("http://www.nber.org/natality/1968/natl1968.dct"), collapse="\n")
lexer <- rly::lex(Lexer)
lexer$input(dct)
while (!is.null(current_token <- lexer$token())) {
print(current_token)
}
var_update <- function(x, name=NULL, type=NULL, label=NULL, format=NULL) {
if (is.null(x)) {
if (is.null(name)) {
NULL
}
list(name=name, type=type, label=label, format=format)
} else {
if (!is.null(x)) {
ret <- x
} else {
ret <- list()
}
if (!is.null(name)) {
ret$name <- name
}
if (!is.null(type)) {
ret$type <- type
}
if (!is.null(label)) {
ret$label <- label
}
if (!is.null(format)) {
ret$format <- format
}
ret
}
}
Parser <- R6Class(
"Parser",
public = list(
tokens = TOKENS,
literals = LITERALS,
precedence = list(),
p_outer = function(doc = 'outer : VARNAME VARNAME "{" nodes "}"', p) {
cat(sprintf("outer at %d\n", p$lexpos(2)))
p$set(1, p$get(5))
},
p_nodes = function(doc = "nodes : header
| comment
| location
| var", p) {
p$set(1, p$get(2))
},
p_varlabel = function(doc = "var : var LABEL", p) {
cat(sprintf("varlabel at %d\n", p$lexpos(3)))
p$set(1, list("var", var_update(p$get(2), label=p$get(3))))
},
p_vartype = function(doc = "var : VARTYPE var", p) {
cat(sprintf("vartype at %d\n", p$lexpos(2)))
p$set(1, list("var", var_update(p$get(3), type=p$get(2))))
},
p_varfmt = function(doc = "var : var FORMAT", p) {
cat(sprintf("varfmt at %d\n", p$lexpos(3)))
p$set(1, list("var", var_update(p$get(2), format=p$get(3))))
},
p_varbase = function(doc = "var : VARNAME", p) {
cat(sprintf("varbase at %d\n", p$lexpos(2)))
p$set(1, list("var", var_update(NULL, name=p$get(2))))
},
p_location_num = function(doc = "location : location number_group", p) {
p$set(1, list("location", p$get(2), p$get(3)))
},
p_location = function(doc = "location : LOCATION_JUMP", p) {
cat(sprintf("location at %d\n", p$lexpos(2)))
p$set(1, list("location", p$get(2)))
},
p_header = function(doc = 'header : SPECIFICATION_HEADER number_group', p) {
cat(sprintf("header at %d\n", p$lexpos(2)))
p$set(1, list("header", p$get(2), p$get(3)))
},
p_number_group = function(doc='number_group : "(" NUMBER ")"', p) {
cat(sprintf("number_group at %d\n", p$lexpos(3)))
p$set(1, list("number_group", p$get(3)))
},
p_comment = function(doc="comment : COMMENT", p) {
cat(sprintf("comment at %d\n", p$lexpos(2)))
p$set(1, list("comment", p$get(2)))
},
p_error = function(p) {
if(is.null(p)) cat("Syntax error at EOF")
else cat(sprintf("Syntax error for value '%s' at line %d (lexer position %d)\n", p$value, p$lineno, p$lexpos))
}
)
)
lexer <- rly::lex(Lexer)
parser <- rly::yacc(Parser, debug=TRUE)
foo <- parser$parse(dct, lexer=lexer)
Hi, thank you for providing a great library.
I'm now trying to implement my language by rly.
Is there any way to prefer one tokenisation rule in Lexer to the other rules?
Specifically, I have the following two rules in R6::R6Class("Lexer"):
t_THEME = '(\+|\|)\stheme',
t_LAYER = function(re='(\+|\|)\s[a-z_]+', t) { ... }
THEME is a token which is preceded with whitspaces and a plus sign (ex " + theme"),
and LAYER is a similar token whose suffix is arbitrary alphanumeric names (ex " + myvar1" or " + ididid").
Currently my lexer wrongly recognizes rly::lex(module=myLexer)$input(' + theme')
as LAYER.
I know if I add the negation of 'theme' in LAYER I can achieve what I'd like to do, but I guess there is an interface to define precedence of tokenization rules.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.