Giter VIP home page Giter VIP logo

collector's Introduction

collector

draft for discussion

Installation

Install with

pak::pak("cynkra/collector")

Examples

collector() creates a new function, with an altered body but the same environment:

library(collector)
add <- function(x, y) {
  x + y
}
environment(add) <- asNamespace("stats")
add2 <- collector(add)
add2
#> function(x, y) {
#>   globals[["add"]]$args <- constructive::construct_reprex()
#>   globals[["add"]]$call <- constructive::deparse_call(sys.call())
#>   on.exit(globals[["add"]]$return_value <- returnValue())
#>   {
#>     x + y
#>   }
#> }
#> <environment: namespace:stats>

It behaves the same as the original but additionally it collects :

  • The arguments
  • The call
  • The return value
a <- 1
b <- 2
add2(a, b)
#> [1] 3

collected("add")
#> $args
#> delayedAssign("x", value = a, eval.env = .GlobalEnv)
#> delayedAssign("y", value = b, eval.env = .GlobalEnv)
#> 
#> $call
#> add2(a, b)
#> 
#> $return_value
#> [1] 3

Note the use of delayedAssign() because arguments are not evaluated yet at the start of the body, this is robust to functions that use NSE.

If we know we don’t use NSE, we can use force = TRUE

# if we 
add3 <- collector(add, force = TRUE, name = "custom_name")
add3(a, b)
#> [1] 3
collected("custom_name")
#> $args
#> x <- 1
#> 
#> y <- 2
#> 
#> 
#> $call
#> add3(a, b)
#> 
#> $return_value
#> [1] 3

Or we can choose

# if we 
add4 <- collector(add, force = "x", name = "custom_name2")
add4(a, b)
#> [1] 3
collected("custom_name2")
#> $args
#> x <- 1
#> 
#> delayedAssign("y", value = b, eval.env = .GlobalEnv)
#> 
#> $call
#> add4(a, b)
#> 
#> $return_value
#> [1] 3

collector's People

Contributors

krlmlr avatar moodymudskipper avatar

Watchers

 avatar  avatar

collector's Issues

Support injection

This is super relevant for duckplyr. My attempts haven't worked so far, there seems to be something special about how rlang searches for the values for injection.

Without collector

options(conflicts.policy = list(warn = FALSE))
library(dplyr)

var <- sym("a")

data.frame(a = 1) |>
  select(!!var)
#>   a
#> 1 1

Created on 2024-04-24 with reprex v2.1.0

With collector and patched dplyr

Couldn't even replicate with bare-bones dplyr, but haven't tried too hard.

options(conflicts.policy = list(warn = FALSE))
Sys.setenv(COLLECTOR_PATH = ".")
library(dplyr)

var <- sym("a")

data.frame(a = 1) |>
  select(!!var)
#> Error: object 'a' not found

fs::dir_info(glob = "*.qs")[1:3]
#> # A tibble: 1 × 3
#>   path            type         size
#>   <fs::path>      <fct> <fs::bytes>
#> 1 00001-select.qs file          179

Created on 2024-04-24 with reprex v2.1.0

detect what which df cols or list elts are needed

A crazy idea probably but...

The trick that wu use for environments using lazy bindings, could be use on data frames and lists if they were built on top of environments.

environments are generalised lists basically, they just miss bracket methods, length, and names are not an attributes (actually a "names" attribute cannot be set) .

If we hack the base namespace (and possibly rlang) we can patch those primitives to handle classed environment with classes data.frame or list (we could define super classes e_frame and e_list but it would be less robust), we could then replace data.frames and lists by classed environments, and use the lazy binding trick to find what columns or list elements are not required. The changes by reference are irrelevant here because we don't modify those objects.

Not trivial and can go wrong in many ways but this might also work in most cases, because most of these functions just call bracket methods and length down the line.

subsetting data.frames with i evaluates everything however, unless we use further magic make it lazy and apply it only after j subsetting occurs but it's really complicated at that point

Error with auk package

With a variant of cynkra/dplyr#4 (applied on top of the most recent dplyr, our fork is hopelessly outdated):

f <- system.file("extdata/ebd-rollup-ex.txt", package = "auk")
ebd <- auk::read_ebd(f, rollup = FALSE)
auk::auk_rollup(ebd)
#> Error:
#> ℹ In argument: `count = dplyr::coalesce(.data$count, "X")`.
#> Caused by error:
#> ! unused argument (base::quote(c("Setophaga coronata", "Columba livia", "Fulica americana", "Columba livia", "Junco hyemalis", "Columba livia", "Setophaga coronata", "Colaptes auratus", "Loxia curvirostra", "Colaptes auratus")))

Created on 2024-04-18 with reprex v2.1.0

Way too much data saved for pipe calls

This prompted the inhibition of recursion in #13, but we actually need this data.

Sys.setenv(COLLECTOR_PATH = ".")
options(conflicts.policy = list(warn = FALSE))
library(dplyr)

data.frame(a = 1) %>%
  filter(a == 1) %>%
  mutate(b = 2) %>%
  select(a)
#>   a
#> 1 1

data.frame(a = 1) |>
  filter(a == 1) |>
  mutate(b = 2) |>
  select(a)
#>   a
#> 1 1

d1 <- data.frame(a = 1)
d2 <- filter(d1, a == 1)
d3 <- mutate(d2, b = 2)
d4 <- select(d3, a)

fs::dir_info(glob = "*.qs")[1:3]
#> # A tibble: 9 × 3
#>   path            type         size
#>   <fs::path>      <fct> <fs::bytes>
#> 1 00001-filter.qs file        1.14M
#> 2 00002-mutate.qs file        1.37M
#> 3 00003-select.qs file          244
#> 4 00004-filter.qs file        1.51M
#> 5 00005-mutate.qs file        1.51M
#> 6 00006-select.qs file          242
#> 7 00007-filter.qs file          231
#> 8 00008-mutate.qs file          232
#> 9 00009-select.qs file          228

Created on 2024-04-24 with reprex v2.1.0

Using `set_collector(funs = )` in `.onLoad()`

Seems to work externally, with CRAN dplyr:

collector::set_collector(pkg = "dplyr", path = ".")
dplyr::mutate(data.frame(a = 1), b = 2)
#>   a b
#> 1 1 2
qs::qread("1-mutate.qs")
#> $call
#> dplyr::mutate(data.frame(a = 1), b = 2)
#> 
#> $env
#> <environment: 0x1171715f8>
#> 
#> $value
#>   a b
#> 1 1 2

Created on 2024-04-18 with reprex v2.1.0

collector::set_collector(funs = "mutate", pkg = "dplyr", path = ".")
dplyr::mutate(data.frame(a = 1), b = 2)
#>   a b
#> 1 1 2
qs::qread("1-mutate.qs")
#> $call
#> dplyr::mutate(data.frame(a = 1), b = 2)
#> 
#> $env
#> <environment: 0x12dea8fb0>
#> 
#> $value
#>   a b
#> 1 1 2

Created on 2024-04-18 with reprex v2.1.0

But dplyr calls don't seem to be collected after cynkra/dplyr@ab7a247, tested with auk.

Error with cogmapr package

project_name <- "a_new_project"
main_path <- paste0(system.file("testdata", package = "cogmapr"), "/")
my.project <- cogmapr::ProjectCMap(main_path, project_name)
cogmapr::RelationshipTest(my.project, units = c("Belgium", "Québec"))
#> Error in eval(call_to_original, new_caller_env): '...' used in an incorrect context

Created on 2024-04-18 with reprex v2.1.0

Other packages work without error. I'll keep trying.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.