henrikbengtsson / profmem Goto Github PK

🔧 R package: profmem - Simple Memory Profiling for R

Home Page: https://cran.r-project.org/package=profmem

Makefile 43.61% R 56.39%

r package cran memory-profiler ram performance

profmem's Issues

WOOPS: Vignette's microbenchmark:ing reflect coercion to double (less so gc)

Was a bit quite add that microbenchmark paragraph to the vignette just before submitting to CRAN. It claims to measure overhead added by the garbage collector. However, the difference is more likely due to the coercion of x to double.

Should be fixed.

profmem_status(): "inactive", "active", "suspended"

profmem can't always parse output returned by (some) parallelized calls

this is stochastic, the failing examples below sometimes work for smaller jobs with fewer calls.

seems to affect fork-ed computations, mainly, but not consistently (cf. future.lapply, which never failed me so far)
suspect that if multiple jobs write into the same logfile simultaneously, that outfile gets corrupted.
errors are triggered from:

> bench::mark(parallel::mclapply(seq_len(1e3), keep_busy))
Error in parse(text = trace) : <text>:1:88: unexpected symbol
1: c("qnorm", "FUN", "lapply", "doTryCatch", "tryCatchOne", "tryCatchList", "tryCat2048 :"seq.default
                                                                                         ^

Enter a frame number, or 0 to exit   

1: bench::mark(parallel::mclapply(seq_len(1000), keep_busy))
2: eval_one(exprs[[i]])
3: parse_allocations(f)
4: profmem::readRprofmem(filename)
5: lapply(bfr, FUN = function(x) {
    bytes <- gsub(pattern, "\\1", x)
    wh
6: FUN(X[[i]], ...)
7: eval(parse(text = trace))
8: parse(text = trace)

reprex:

keep_busy <- function(n = 1e3) {
  r <- rnorm(n)
  p <- pnorm(r)
  q <- qnorm(p)
  o <- order(q)
}
bench::mark(parallel::mclapply(seq_len(1e3), keep_busy))
#> Error in parse(text = trace): <text>:1:158: unexpected symbol
#> 1: c("rnorm", "FUN", "lapply", "doTryCatch", "tryCatchOne", "tryCatchList", "tryCatch", "try", "sendMaster", "FUN", "lapply", "<Anonymous>", "eval", "eval", " "tryCatchOne
#>                                                                                                                                                                  ^

bench::mark(parallel::parLapply(seq_len(1e3), keep_busy,
                                cl = parallel::makeCluster(3)))
#> # A tibble: 1 x 6
#> # … with 6 more variables: expression <bch:expr>, min <bch:tm>,
#> #   median <bch:tm>, `itr/sec` <dbl>, mem_alloc <bch:byt>, `gc/sec` <dbl>

library(foreach)
library(doParallel); registerDoParallel(cores = 3)
#> Loading required package: iterators
#> Loading required package: parallel
bench::mark({foreach(x = seq_len(1e3)) %dopar% keep_busy})
#> Error in parse(text = trace): <text>:1:255: unexpected symbol
#> 1: lize", "sendMaster", "FUN", "lapply", "mclapply", "<Anonymous>", "%dopar%", "eval", "eval", "eval_one", "<Anonymous>", "eval", "eval", "withVisible", "withCallingHandlers", "doTryCatch", "tryC
#>                                                                                                                                                                                                                                                                   ^

future::plan("multicore")
bench::mark(future.apply::future_lapply(seq_len(1e3), keep_busy))
#> # A tibble: 1 x 6
#>   expression                                              min median
#>   <bch:expr>                                            <bch> <bch:>
#> 1 future.apply::future_lapply(seq_len(1000), keep_busy) 107ms  107ms
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>,
#> #   `gc/sec` <dbl>

future::plan("multisession")
bench::mark(future.apply::future_lapply(seq_len(1e3), keep_busy))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 x 6
#>   expression                                              min median
#>   <bch:expr>                                            <bch> <bch:>
#> 1 future.apply::future_lapply(seq_len(1000), keep_busy) 552ms  552ms
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>,
#> #   `gc/sec` <dbl>

future::plan("sequential")
bench::mark(future.apply::future_lapply(seq_len(1e3), keep_busy))
#> # A tibble: 1 x 6
#>   expression                                              min median
#>   <bch:expr>                                            <bch> <bch:>
#> 1 future.apply::future_lapply(seq_len(1000), keep_busy) 113ms  113ms
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>,
#> #   `gc/sec` <dbl>

^{Created on 2019-10-06 by the reprex package (v0.3.0)}

Session info

devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Linux Mint 19.2             
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_US                       
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2019-10-06                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package      * version date       lib source        
#>  assertthat     0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
#>  backports      1.1.4   2019-04-10 [1] CRAN (R 3.6.1)
#>  bench          1.0.4   2019-09-06 [1] CRAN (R 3.6.1)
#>  callr          3.3.1   2019-07-18 [1] CRAN (R 3.6.1)
#>  cli            1.1.0   2019-03-19 [1] CRAN (R 3.6.1)
#>  codetools      0.2-16  2018-12-24 [4] CRAN (R 3.5.2)
#>  crayon         1.3.4   2017-09-16 [1] CRAN (R 3.6.1)
#>  desc           1.2.0   2018-05-01 [1] CRAN (R 3.6.1)
#>  devtools       2.1.0   2019-07-06 [1] CRAN (R 3.6.1)
#>  digest         0.6.20  2019-07-04 [1] CRAN (R 3.6.1)
#>  doParallel   * 1.0.15  2019-08-02 [1] CRAN (R 3.6.1)
#>  doSNOW       * 1.0.18  2019-07-27 [1] CRAN (R 3.6.1)
#>  evaluate       0.14    2019-05-28 [1] CRAN (R 3.6.1)
#>  fansi          0.4.0   2018-10-05 [1] CRAN (R 3.6.1)
#>  foreach      * 1.4.7   2019-07-27 [1] CRAN (R 3.6.1)
#>  fs             1.3.1   2019-05-06 [1] CRAN (R 3.6.1)
#>  future         1.14.0  2019-07-02 [1] CRAN (R 3.6.1)
#>  future.apply   1.3.0   2019-06-18 [1] CRAN (R 3.6.1)
#>  globals        0.12.4  2018-10-11 [1] CRAN (R 3.6.1)
#>  glue           1.3.1   2019-03-12 [1] CRAN (R 3.6.1)
#>  highr          0.8     2019-03-20 [1] CRAN (R 3.6.1)
#>  htmltools      0.3.6   2017-04-28 [1] CRAN (R 3.6.1)
#>  iterators    * 1.0.12  2019-07-26 [1] CRAN (R 3.6.1)
#>  knitr          1.24    2019-08-08 [1] CRAN (R 3.6.1)
#>  listenv        0.7.0   2018-01-21 [1] CRAN (R 3.6.1)
#>  magrittr       1.5     2014-11-22 [1] CRAN (R 3.6.1)
#>  memoise        1.1.0   2017-04-21 [1] CRAN (R 3.6.1)
#>  pillar         1.4.2   2019-06-29 [1] CRAN (R 3.6.1)
#>  pkgbuild       1.0.4   2019-08-05 [1] CRAN (R 3.6.1)
#>  pkgconfig      2.0.2   2018-08-16 [1] CRAN (R 3.6.1)
#>  pkgload        1.0.2   2018-10-29 [1] CRAN (R 3.6.1)
#>  prettyunits    1.0.2   2015-07-13 [1] CRAN (R 3.6.1)
#>  processx       3.4.1   2019-07-18 [1] CRAN (R 3.6.1)
#>  profmem        0.5.0   2018-01-30 [1] CRAN (R 3.6.1)
#>  ps             1.3.0   2018-12-21 [1] CRAN (R 3.6.1)
#>  R6             2.4.0   2019-02-14 [1] CRAN (R 3.6.1)
#>  Rcpp           1.0.2   2019-07-25 [1] CRAN (R 3.6.1)
#>  remotes        2.1.0   2019-06-24 [1] CRAN (R 3.6.1)
#>  rlang          0.4.0   2019-06-25 [1] CRAN (R 3.6.1)
#>  rmarkdown      1.15    2019-08-21 [1] CRAN (R 3.6.1)
#>  rprojroot      1.3-2   2018-01-03 [1] CRAN (R 3.6.1)
#>  sessioninfo    1.1.1   2018-11-05 [1] CRAN (R 3.6.1)
#>  snow         * 0.4-3   2018-09-14 [1] CRAN (R 3.6.1)
#>  stringi        1.4.3   2019-03-12 [1] CRAN (R 3.6.1)
#>  stringr        1.4.0   2019-02-10 [1] CRAN (R 3.6.1)
#>  testthat       2.2.1   2019-07-25 [1] CRAN (R 3.6.1)
#>  tibble         2.1.3   2019-06-06 [1] CRAN (R 3.6.1)
#>  usethis        1.5.1   2019-07-04 [1] CRAN (R 3.6.1)
#>  utf8           1.1.4   2018-05-24 [1] CRAN (R 3.6.1)
#>  vctrs          0.2.0   2019-07-05 [1] CRAN (R 3.6.1)
#>  withr          2.1.2   2018-03-15 [1] CRAN (R 3.6.1)
#>  xfun           0.9     2019-08-21 [1] CRAN (R 3.6.1)
#>  yaml           2.2.0   2018-07-25 [1] CRAN (R 3.6.1)
#>  zeallot        0.1.0   2018-01-28 [1] CRAN (R 3.6.1)
#> 
#> [1] /home/fabian-s/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

DOCS: Document the Rprofmem data.frame

Add more details on what an Rprofmem data.frame contains:

profmem::readRprofmem()
profmem::profmem()

Reminder:

> p <- profmem::profmem({ x <- integer(1e4); y <- foo(1e6) })
> p
Rprofmem memory profiling of:
{
    x <- integer(10000)
    y <- foo(1e+06)
}

Memory allocations:
       what   bytes              calls
1     alloc   40048          integer()
2     alloc 4000048 foo() -> integer()
total       4040096                   
> str(p)
Classes 'Rprofmem' and 'data.frame':	2 obs. of  3 variables:
 $ what : chr  "alloc" "alloc"
 $ bytes: num  4e+04 4e+06
 $ trace:List of 2
  ..$ : chr "integer"
  ..$ : chr  "integer" "foo"
 - attr(*, "threshold")= int 0
 - attr(*, "expression")= language {  x <- integer(10000); y <- foo(1e+06) }
  ..- attr(*, "srcref")=List of 3
  .. ..$ : 'srcref' int  1 23 1 23 23 23 1 1
  .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x556b805fa4e0> 
  .. ..$ : 'srcref' int  1 25 1 41 25 41 1 1
  .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x556b805fa4e0> 
  .. ..$ : 'srcref' int  1 44 1 56 44 56 1 1
  .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x556b805fa4e0> 
  ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x556b805fa4e0> 
  ..- attr(*, "wholeSrcref")= 'srcref' int  1 0 1 58 0 58 1 1
  .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x556b805fa4e0> 
 - attr(*, "value")= int  0 0 0 0 0 0 0 0 0 0 ...

VIGNETTE: Was R updated such that example is no longer valid?

Vignette example may no longer be true in recent R versions, e.g. in R 3.4.3 there seems to be no coercion to double for x in:

> p <- profmem({
+     small <- (x < 5000)
+ })
> p
Rprofmem memory profiling of:
{
    small <- (x < 5000)
}
Memory allocations:
       bytes      calls
1      80040 <internal>
2      40040 <internal>
total 120080

Export profmem_suspend() and profmem_resume()

Memory profiling as of R 3.2.0 only?

R 3.1.3 gives me:

configure: WARNING: unrecognized options: --with-memory-profiling

Could be worthwhile to mention that in the README.

(My local build system could be wrong, too.)

profmem_begin() and profmem_end()

In addition to current:

p <- profmem({
  expressions profiled
})

Add support for something like:

profmem_begin()
{
 expressions profiled 
}
p <- profmem_end()

Option profmem.print.expr=TRUE/FALSE

Make it possible to disable printing of the expression in print() via an option

Shiny-based example

Hi there, I'm interested in applying this to Shiny apps - is that possible? Or would you recommend just using profvis?

Different result first time I run profmem...

Hi! I am trying to use profmem to compute memory usage of a set of functions and I get very different results when I run the same command again (it goes from 273376 to 36648 bytes).

Here is a reproducible example:

rm(list = ls())
r <- profmem({ 
  for (i in 1:5){
    for (j in 1:50){
      cat(i,j)
    }}
})
total(r) #19200
s <- profmem({ 
  for (i in 1:5){
    for (j in 1:50){
      cat(i,j)
    }}
})
total(s) #19760
t <- profmem({ 
  for (i in 1:5){
    for (j in 1:50){
      cat(i,j)
    }}
})
total(t) #19760

In this example memory usage increases while in my case it decreases a lot. Which one would be the "real" memory use, the first one or the following ones? Thank you!!!

Best,
Inés

ROBUSTNESS: Add explicit 'stringsAsFactors' arguments [cbind, rbind]

$ for pkg in $pkgs; do echo "$pkg:"; (cd "$pkg"; grep -E "^[ \t]*[^#].*[cr]bind" -- */*.R | grep -vF stringsAsFactors;); echo; read -r -p "Press ENTER to continue ..."; done

profmem:
R/Rprofmem-class.R:  data <- rbind(data, list(what = "", bytes = total, calls = ""))

print() for Rprofmem should report if there was an error while profiling

Related to #21, print() on Rprofmem does not report on errors that occurred during profiling, although attribute error of the returned results captures it;

> p <- profmem::profmem(log("a"))
> print(p)
Rprofmem memory profiling of:
log("a")
<environment: R_EmptyEnv>

Memory allocations:
      what bytes calls
total          0  

> attr(p, "error")
<simpleError in log("a"): non-numeric argument to mathematical function>

HELP NEEDED: R binaries known to have memory profiling enabled

There are a few options for ready-to-install R binaries for Linux, macOS and Windows users. Which of these are built to have memory-profile enabled?

The R Project / CRAN provides:
- Linux
  - Debian (also Ubuntu)
  - RedHat
  - Suse
- macOS
- Windows

AT&T Research provides:
- macOS - nightly builds

If you've installed any of the above binaries, or binaries from some other source, please check whether

> capabilities("profmem")

returns TRUE or FALSE and report back here and I'll add the info to the vignette.

Support for nested `profmem()` calls

Wish

To be able to do:

p1 <- profmem({
  x <- 1:1000
  p2 <- profmem({
     y <- as.double(x)
  })
  z <- x * y
})

Where p1 should hold all memory allocations including those done during p2.

Issue

Rprofmem() itself can only handle a single file; from ?Rprofmem:

Enabling profiling automatically disables any existing profiling to another or the same file.

which is also confirmed when inspecting the internal code.

In develop, profmem_begin/end() now protects against the stack from being greater than one level, in order to avoid overwriting existing profiling logs by mistake.

Solution

It should not be impossible to have the profmem package to orchestrate a stack of files. One idea is to have profmem_begin() parse existing profile buffer and stack it before starting over producing a fresh one. Calls to profmem_end() will consume the current buffer and append it any previous one existing at the same level. If not at the top level, then it'll resume profiling to a fresh profile file.

The threshold argument

The threshold argument should also be recorded in the stack.
If a nested profmem is created, it's threshold must not be higher than its parent. Generate a warning and fall back to the active threshold.
If there's differs, then the recorded entries need to be filtered accordingly when returned.
~~There should be a method for retrieving threshold used.~~
The print() method should also report on the threshold used.
Drop threshold argument from profmem_resume() - use what's on the stack instead.

ROBUSTNESS: Add explicit 'stringsAsFactors' arguments [data.frame]

$ for pkg in $pkgs; do echo "$pkg:"; (cd "$pkg"; grep -E "^[ \t]*[^#].*data[.]frame" -- */*.R | grep -vF stringsAsFactors;); echo; read -r -p "Press ENTER to continue ..."; done

profmem:
R/profmem.R:  empty <- structure(empty, class = c("Rprofmem", "data.frame"), threshold = 0L)
R/Rprofmem-class.R:as.data.frame.Rprofmem <- function(x, ...) {
R/Rprofmem-class.R:  res <- data.frame(what = what, bytes = bytes, calls = traces,
R/Rprofmem-class.R:} ## as.data.frame()
R/Rprofmem-class.R:  data <- as.data.frame(x, ...)
tests/profmem.R:  data <- as.data.frame(p)
tests/profmem.R:  d <- as.data.frame(p)
tests/profmem.R:  d1 <- as.data.frame(p1)
tests/profmem.R:  d2 <- as.data.frame(p2)

DON"T FORGET: What causes vignette to use threshold = 1000?

The vignette reports:

Memory allocations (>= 1000 bytes):
[...]

Where is that that 1000 threshold coming from?

Errors are silenced

Example:

> p <- profmem::profmem({ y <- log("a") })
> p
Rprofmem memory profiling of:
{
    y <- log("a")
}

Memory allocations:
      what bytes calls
total          0

> p <- profmem::profmem({ y <- non.existing::foo() })
> p
Rprofmem memory profiling of:
{
    y <- non.existing::foo()
}

Memory allocations:
      what bytes calls
total          0

and

> p <- profmem::profmem({ stop("boom") })
> p
Rprofmem memory profiling of:
{
    stop("boom")
}

Memory allocations:
      what bytes calls
total          0

Action

Capture error and re-signal as a warning;

p <- profvis::profvis(log("a"), errors = "ignore")  ## the default

p <- profvis::profvis(log("a"), errors = "warn")
Warning: profvis() detected a run-time error: 
Error in log("a") : non-numeric argument to mathematical function

p <- profvis::profvis(log("a"), errors = "error")
Error in log("a") : non-numeric argument to mathematical function

profmem_depth(): current depth >= 0

Don't display NA by default

print() should not report on NAs by default because there can be quite a few at times. It could have a line saying:

Number of NA entries not displayed: 34

Add option to print() and as.data.frame() to include them.

VIGNETTE: Integers `5000L` in code snippets are displayed as `5000`

Integers 5000L in code snippets are displayed as 5000. This is due to a limitation in R.utils::withCapture();

> R.utils::withCapture(list(a=5000, b=5000L))
> list(a = 5000, b = 5000L)
$a
[1] 5000
$b
[1] 5000

which in turn is due to a limitation to how print() outputs integers;

> list(a = 5000, b = 5000L)
$a
[1] 5000
$b
[1] 5000
> 5000
[1] 5000
> 5000L
[1] 5000

henrikbengtsson / profmem Goto Github PK

profmem's Issues

Wish

Issue

Solution

Action

Recommend Projects

Recommend Topics

Recommend Org