moodymudskipper / flow Goto Github PK

View and Browse Code Using Flow Diagrams

Home Page: https://moodymudskipper.github.io/flow/

License: Other

R 99.45% C++ 0.55%

flow's Introduction

flow

{flow} provides tools to visualize as flow diagrams the logic of functions, expressions or scripts and ease debugging.

Use cases are :

Deciphering other people’s code
Getting more comfortable with our own code by easing a visual understanding of its structure
Documentation
Debugging
Inspect unit test results
Providing a higher level view of an algorithm to collaborators
Education

Installation

Install from CRAN with:

install.packages("flow")

Or install development version from github:

remotes::install_github("moodymudskipper/flow")

Example

library(flow)

Using default nomnoml engine

flow_view(rle)

Using plantuml engine (make sure the {plantuml} package is installed).

flow_view(rle, engine = "plantuml")

Additional functions

flow_run() to display not only the diagram, but the logical path taken by a specific call
flow_compare_runs() display the logical path of 2 calls to see where they diverge
flow_debug()/flow_undebug() to use basically use flow_run() on a function wherever it’s called
flow_view_vars() to display the dependencies between variables in a function
flow_view_deps() to display recursively all the functions that your function calls
flow_view_uses() to display recursively all the functions that call your function
flow_view_shiny() to display the modular structure of your shiny app
flow_view_source_calls() to display dependency tree of scripts sourcing each other
flow_doc() to build a package’s documentation using flow diagrams
flow_test() to show what happens in your unit tests
flow_embed() to embed diagrams in your documentation.

See more in vignettes.

Notes

Make sure to check the vignettes for a detailed breakdown of all features.

{flow} is built on top of Javier Luraschi’s {nomnoml} package, and Rainer M Krug ’s {plantuml} package, the latter only available from github at the moment.

flow's People

Contributors

Stargazers

Watchers

flow's Issues

Is there a way to increase the resolution?

@moodymudskipper I'd like to start off by saying this is a really cool package!

Now to the request: I ran flow::flow_view() over a highly involved function and even after sending to a png and zooming in, there isn't enough resolution to see what's happening. Is this something that can be looked into? To me, having the ability to visualize a process becomes more important as the complexity of the function or script increases.

I ran was the below and it's a function with a ton of parameters and calls to other functions that are themselves highly involved.

flow::flow_view(RemixAutoML::AutoCatBoostChainLadder)

handle cases where the end of the function is never reached.

This would concern functions that use return statements everywhere, such as data.table:::foverlaps, or functions that are meant to stop in any case.

library(funflow)
test <- function(x) {if(foo) stop() else return(NULL)}
view_flow(test)

^{Created on 2019-10-21 by the reprex package (v0.3.0)}

It is a consequence of 542fb3e

but previously it was really just "working" by luck.

links to calls in chart

motivating example:

view_flow(median)

Would be great we could click the UseMethod('median') box and traverse the call more interactively. A bit tough to do statically because obv we don't have any input to median so we don't know the method, but there could be a landing page to methods('median') first.

`next` and `break`

These are considered regular instructions.

I think supporting them completely might make the charts too complicated (we might have many next statements!), but ignoring them completely is not completely satisfying either.

Something like this might work.

# http://www.nomnoml.com/
[a]->[<transceiver>42:
next]
[42:
next] - [b]

[c] -> [<receiver>break]

They can be brighter blue for next and brighter yellow for break, as we already use bright red and green, and light versions of all these colors.

Random ideas about redraw

In flow_run(), we can't draw every step as it would often be really slow.

When browsing, it would be nice to have a way to refresh automatically, and we could do it, but it seems we can't turn it off when we exit the browsing.

so we use redraw() , and by default nothing is displayed.
redraw(always = TRUE), once implemented, could redraw every step, but then we need to turn it off before getting out of the debugger.

Maybe redraw should have other arguments from flow_view, to choose what/how to draw, with the default being set by flow_run.

We could have a lazy binding triggering redraw() too, but as we don't have a distinct browsing environment it might be overwritten, which maybe is OK but I'm not super comfortable, and I don't know what character we'd use.

We can have an addin too.

I'm not really sure about the name redraw either, because we use it also to draw for the dirst time.

Maybe just draw ? It's shorter to type too.

This function should fail explicitly if called inappropriately.

don't draw next edge if last call of if block is return or stop

also make it flexible so we can add our own functions (such as abort() or custom), as we can't guess.

Integrate info from covr

See if we can get the coverage information from covr.

We could color differently elements that have not been covered, and we could add to blocks or maybe better edges, the amount of times they were gone through by the tests.

refactor code to draw edge AFTER node, and not BEFORE

will make next steps easier.

plantUML

there's an R package for it, and it looks quite neat : https://plantuml.com/fr/activity-diagram-beta

That'd be a nice alternative to nomnoml

support regular calls

I think it can be the same functions. We first detect if it's a call or a function, then proceed.
We might have an argument quote = FALSE so we can use bar expressions when it is TRUE, though it's not so hard to wrap into quote.

split if blocks

Seems like unreadable if blocks are 95% of the time because of many && or || making the statement too long. We can just split by those and go to new line

Clickable charts?

When we output html with svg = TRUE we can select text. If we can do that there might be a way to change this text to a hyperlink . I have no idea how that would work (post process the html file?) but for package doc this would have great value for instance.

better block numbers

We should not number :

the function
the return and stop blocks
the end block at the end of if calls
the last end block

This way we can easily keep consistent numbers when :

splitting the chart
using compact argument
adding adhoc "stop" functions, such as rlang::abort()

Some simple functions fail

Functions such as identity() and max() fail. Not that there's really anything useful to be gained from viewing the diagrams of these functions, but a more informative error would be great if diagrams for them don't pan out. Thank you!

library(flow)
x <- 7
fx <- function(x) {x+1}
flow_run(fx(x))
flow_run(identity(x))
#> Error in bdexpr[[1L]]: object of type 'symbol' is not subsettable
#> [1] 8
flow_run(max(x))
#> Error in while (as.character(bdexpr[[1L]]) == "{") bdexpr <- bdexpr[[2L]]: argument is of length zero
flow_run(date())
#> [1] "Wed Aug 19 16:41:21 2020"

compact argument ?

Original view was ambiguous as the following would have been represented the same way :

library(funflow)
test <- function(x) {if(foo) stop() else bar}
test2 <- function(x) {if(foo) stop(); bar}
view_flow(test)

view_flow(test2)

^{Created on 2019-10-21 by the reprex package (v0.3.0)}

Nevertheless they ARE equivalent, and we might want to see the simplified diagram.

The logic would have to be reworked a bit unfortunately because the original was failing in several corner cases, and the new one is easier to conceptualize.

Maybe the easiest way would be to build the chart with these end blocks and remove them in the end.

compiling nested ifs into a "case when" box when possible ?

See the following :

[<choice>if(is.character(x)] -> [x]
[<choice>if(is.character(x)] -> [<choice>if(is.numeric(x)]
[<choice>if(is.numeric(x)] -> [as.character(x)]
[<choice>if(is.numeric(x)] -> [<choice>if(is.logical(x)]
[<choice>if(is.logical(x)] -> [as.character(as.numeric(x))]

This could be simplified in a "casewhen box" such as this :

[.
[<choice>if(is.character(x)] -> [x]
[<choice>if(is.numeric(x)] -> [as.character(x)]
[<choice>if(is.logical(x)] -> [as.character(as.numeric(x))]
]

It's a bit awkward because it must be named and the name can't be invisible, but we can name them with two dots separated by any amount of spaces to differentiate them and the visual pollution will be minimal.

The box ends up at the end of the last if.

We would gain a bit of space, similar calls would sit next to each other, reducing cognitive load, and the main thread would stay centered rather than being offset to the right, which is often the case with longer functions.

implement arguments code = TRUE/FALSE/NA

it will interact with the prefix argument :

prefix not NULL && code TRUE :
comment as header, and code below
prefix not NULL && code FALSE:
only show headers, no code (empty blocks with only block id if no header), except in control flow blocks
prefix not NULL && code NA:
show headers instead of code when we have headers
prefix NULL && code TRUE :
default behavior, show only code
prefix NULL && code FALSE:
show all empty blocks, , except in control flow blocks
prefix not NULL && code NA:
same as default

Vignettes

We can try a big vignette for advanced use. Using very simple example functions. If it ends up too long we can split it.

flow from scripts

given a length one character input we can transcribe a script to a flow

Non-numeric argument to binary operator

funflow::view_flow(data.table::fread)

Error in cumsum(cfc_lgl) * cfc_lgl :
non-numeric argument to binary operator

prefixed comments before control flow should affect the latter

at the moment this fails :

fun <- function(x){
  
  ## comment 1
  x <- x * 2
  
  ## comment 2
  if(x > 3)
    print("big x!")
  x
}

flow_view(fun, prefix = "##")

Ugly big blocks

I don't remember why it was hard to preserve formatting and why I don't do print neatly left aligned indented code.

Maybe it was because long lines have to be split not to mess everything up. e.g long strings, or calls with many args that are most likely on several lines in original code.

Whatever the reason it should be fixable, maybe some tools like styler or lintr can help.

I don't remember how I deal with long strings but I guess we could compact them for readability (end with "<... >" for instance) as an option if it's a common enough issue.

Autosplit feature

Cut diagrams into n pieces.
We could give either a vector to cutpoints arg OR a scalar, and in that case we'll attempt to cut into almost equal parts in term of height ~n.
Algo TBD but we d overlap on 1st / last step, cut only on nodes that are alone in their row, and recommend a number that looks nice.

Easy Algo :

max height is n, cut at last possibility with single block row (meaning no other block nor parallel path) , and start over from there
check if we have same amount of diagrams for n-1, if yes it's our new split, check n-2 and so on. This is to avoid unbalanced splits with only last step on last diagram etc.

This won't work for all functions, need to design behavior for long parallel paths.

flow_data not found

Great R package! I ran into a few snafus when I started playing with it.

In a fresh R session,

flow::flow_run(mean(1:5))
#> Error in flow_data(base::mean.default, range = NULL, prefix = NULL, sub_fun_id = NULL, : could not find function "flow_data"

Kinda weird, because it looks like your files will be collated in the right order, so not sure what is going on.

This works once library(flow) is run.

Installation problems on Ubuntu (RInternals.h missing)

Hi there!

Just wanted to try out your package, but renv::install("moodymudskipper/flow") came back with the following error on Pop!_OS (based on Ubuntu 20.04 LTS):

renv::install("moodymudskipper/flow")
#> Retrieving 'https://api.github.com/repos/moodymudskipper/flow/tarball/3c3972a1f7a3722e3b7b480d393e81b913806604' ...
#>  OK [file is up to date]
#> Installing flow [0.0.1] ...
#>  FAILED
#> Error installing package 'flow':
#> ================================
#> 
#> Warning in untar2(tarfile, files, list, exdir, restore_times) :
#>   skipping pax global extended headers
#> * installing *source* package ‘flow’ ...
#> ** using staged installation
#> ** libs
#> g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG  -I'/home/data/Code/Rappster/Stadtsalat/dds.delivduration/service/renv/library/R-4.0/x86_64-pc-linux-gnu/Rcpp/include'    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5iUtQS/r-base-4.0.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c RcppExports.cpp -o RcppExports.o
#> g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG  -I'/home/data/Code/Rappster/Stadtsalat/dds.delivduration/service/renv/library/R-4.0/x86_64-pc-linux-gnu/Rcpp/include'    -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-5iUtQS/r-base-4.0.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c is_browsing.cpp -o is_browsing.o
#> is_browsing.cpp:3:10: fatal error: RInternals.h: No such file or directory
#>     3 | #include <RInternals.h>
#>       |          ^~~~~~~~~~~~~~
#> compilation terminated.
#> make: *** [/usr/lib/R/etc/Makeconf:176: is_browsing.o] Error 1
#> ERROR: compilation failed for package ‘flow’
#> * removing ‘/home/data/Code/Rappster/Stadtsalat/dds.delivduration/service/renv/staging/2/flow’
#> Error: install of package 'flow' failed

Sys.which("make") gives me:

Sys.which("make")
#>            make 
#> "/usr/bin/make"

Build tools check gives me:

rstudioapi::buildToolsCheck()
# [1] TRUE

Session info

sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Pop!_OS 20.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices datasets  utils     methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.2  magrittr_1.5    htmltools_0.5.0 tools_4.0.2    
#>  [5] yaml_2.2.1      stringi_1.4.6   rmarkdown_2.3   highr_0.8      
#>  [9] knitr_1.29      stringr_1.4.0   xfun_0.16       digest_0.6.25  
#> [13] rlang_0.4.7     renv_0.11.0     evaluate_0.14

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       Pop!_OS 20.04 LTS           
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_US:en                    
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       UTC                         
#>  date     2020-08-31                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  ! package     * version date       lib source        
#>  P assertthat    0.2.1   2019-03-21 [?] CRAN (R 4.0.2)
#>  P cli           2.0.2   2020-02-28 [?] CRAN (R 4.0.2)
#>  P crayon        1.3.4   2017-09-16 [?] CRAN (R 4.0.2)
#>  P digest        0.6.25  2020-02-23 [?] CRAN (R 4.0.2)
#>  P evaluate      0.14    2019-05-28 [?] CRAN (R 4.0.2)
#>  P fansi         0.4.1   2020-01-08 [?] CRAN (R 4.0.2)
#>  P glue          1.4.2   2020-08-27 [?] CRAN (R 4.0.2)
#>  P highr         0.8     2019-03-20 [?] CRAN (R 4.0.2)
#>  P htmltools     0.5.0   2020-06-16 [?] CRAN (R 4.0.2)
#>  P knitr         1.29    2020-06-23 [?] CRAN (R 4.0.2)
#>  P magrittr      1.5     2014-11-22 [?] CRAN (R 4.0.2)
#>  P renv          0.11.0  2020-06-26 [?] CRAN (R 4.0.2)
#>  P rlang         0.4.7   2020-07-09 [?] CRAN (R 4.0.2)
#>  P rmarkdown     2.3     2020-06-18 [?] CRAN (R 4.0.2)
#>  P sessioninfo   1.1.1   2018-11-05 [?] CRAN (R 4.0.2)
#>  P stringi       1.4.6   2020-02-17 [?] CRAN (R 4.0.2)
#>  P stringr       1.4.0   2019-02-10 [?] CRAN (R 4.0.2)
#>  P withr         2.2.0   2020-04-20 [?] CRAN (R 4.0.2)
#>  P xfun          0.16    2020-07-24 [?] CRAN (R 4.0.2)
#>  P yaml          2.2.1   2020-02-01 [?] CRAN (R 4.0.2)
#> 
#> [1] /home/data/Code/Rappster/.../<pkg>/renv/library/R-4.0/x86_64-pc-linux-gnu
#> [2] /tmp/RtmpAgO6WX/renv-system-library
#> [3] /tmp/RtmpQUSTBW/renv-system-library
#> 
#>  P ── Loaded and on-disk path mismatch.

Just noticed this P ── Loaded and on-disk path mismatch. 😏 So I thought potentially rather a renv issue?

But then I closed out of my renv-based project and ran remotes::install_github("moodymudskipper/flow"), however I still see the same error.

Session info for that is

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       Pop!_OS 20.04 LTS           
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_US:en                    
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2020-08-31                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.2)
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 4.0.2)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.2)
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 4.0.2)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.2)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.2)
#>  glue          1.4.1   2020-05-13 [1] CRAN (R 4.0.2)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.2)
#>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.2)
#>  knitr         1.29    2020-06-23 [1] CRAN (R 4.0.2)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.2)
#>  rlang         0.4.7   2020-07-09 [1] CRAN (R 4.0.2)
#>  rmarkdown     2.3     2020-06-18 [1] CRAN (R 4.0.2)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.2)
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 4.0.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.2)
#>  withr         2.2.0   2020-04-20 [1] CRAN (R 4.0.2)
#>  xfun          0.16    2020-07-24 [1] CRAN (R 4.0.2)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.2)
#> 
#> [1] /home/janko/R/x86_64-pc-linux-gnu-library/4.0
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

sub-flows ?

Not sure what would be the syntax, I think we'd be able to isolate some parts of the flow chart.

One option could be to have special prefixes for comments to cut out the desired parts.

### This would be cut in a separate chart
code
## a code chunk in this separate chart
code
### another separate chart
code
# we finish a cut out chunk with either the start of a new one, or an empty comment if not
###

This won't allow nesting of such things (cutting out a chart that itself cuts out pieces), though it would be possible by having a standard prefix syntax or giving a vector of prefixes to the function, but we most probably don't need this level of sophistication.

One other option is to have an additional option for the code, it can be shown, hidden, or split, and we'd end up with a bunch of diagrams, overall containing all the code, but not sure if it's practical

The reasoning behind this is that some pieces of code relative to specific features are hard to modularize, because they might use NSE, or use a lot of local variables, or need to be able to return directly. Our charts don't have this restriction though, so a feature could be isolated and charted as long as it's tagged properly.

This would be useful to discuss the logic of some operations in github issues for instance.

Note : to do this we need to support regular calls and not only functions!

zooming/vector graphics?

Have some issues seeing details on our more complicated functions (e.g. data.table::foverlaps & data.table:::dcast.data.table)

Would be great to zoom!

Should commented code blocks include control flow blocks ?

After playing a bit, it seems all code is full of ifs and code blocks are never that long, so while labeling them still make sense, collapsing them doesn't seem that useful.

Unless we can also collapse control flow blocks, but how ? maybe :

## will collapse the control flow block if happens just after
if(foo) bar

## will collapse the code below AND the following control flow block thanks to ## comment
baz
##
if(foo) bar

The problem then is that comments are not just comment anymore, but code. At least ## is code. But maybe it's an acceptable compromise.

Improved recursive function display

Not to imply the current display is bad! First definition fails, second doesn't, although both manage to render the diagram.
I think it would be very cool to be able to step through each recursion and see the primary node get its arguments updated, instead of terminating at the end of the first call. I have complete confidence you can figure that out 😁.

recurse_until_zero <- function(n = 10) {
  if (n == 0) {
    return(n)
  }
  Recall(n = n - 1)
}

flow_run(recurse_until_zero())

recurse_until_zero <- function(n = 10) {
  if (n == 0) {
    return(n)
  }
  recurse_until_zero(n = n - 1)
}
flow_run(recurse_until_zero(), browser = TRUE)

flow_run with match.arg ?

It seems there's an issue, at least when using browse = TRUE

RStudio addin

We can select code and pressing hotkeys will display the chart, we can have heuristics decide what to display, but probably a menu with multiple choices is acceptable in some cases:

if selection, no menu
- if a symbol is selected
  - if it is { or a control flow construct, show the relevant chart
  - if it is a function, show the chart of the function
- if it is code , show the chart relevant to this code
if no selection, we might have several choices :
- show the chart of the function whose body contains the cursor
- show the chart of the {} directly containing the cursor
- show the chart of the function whose symbol contains the cursor

naming things misc

Package named might be shortened to "flow", not taken on CRAN and we might extend to show flows of R scripts or Rmd files.

Then current view_flow becomes flow_view, and additionally we have :

flow_code : outputs nomnoml code
flow_png : output picture at given url, if no url given, output to temp file and open using browseURL()
flow_run : will run the function through the flow, on failure it will show the state of the graph at failure and returns invisibly a list of environments copying the frame content at failure (use sys.calls() and sys.frames()
flow_debug : we move block by block in the chart (not clear yet what happens in the console) and all code is evaluated in the right environment, on failure is just like flow_run.

Ultimately the technology doesn't have to be nomnoml, it can be an option among others, and technology specific parameters will be passed through dots

could there be a special representation for "if x then stop" ?

of "if x then single call instruction"

these offset the full diagram to the right, and I think they'd be more readable on one line, but not sure how it would work.

nomnoml won't allow horizontal arrows but we can be "close enough"

#.one: visual=hidden stroke=#fdf6e3

[previous code] -> [cond]
[<one> cond|
  [<choice>if (condition)]
  [<label> stop message]
  [<end> end1]
]
[cond] -> [next code]

Not super pretty, maybe not that useful in most cases, but some scripts start with a lot of checks and maybe it'd be handy, possibly as an option.

Would need to define how it would look in debug function as here there are no border nor arrow to dash

profiling

That will need another layer of metaprogramming, but what about profiling ?

As a comment to a node, we give the execution time.

For statements in loop, we give the average and sd, and we can propose to the user to customize these metrics so they can include min max etc.

The function would return an flow_profile object which could be analysed or plotted.
For instance we could call a function on this object and a node number and get have a line chart of execution time vs iteration, possibly with x axis labels for every nested loop.

flow_run breaks with functions like match.arg(), formals(), sys.call(), match.call(), on.exit(), Recall()...

I thought I didn't need it because I could populate an environment with promises and even calls like browser() would work. But calls to match.arg() and formals() (with no argument) will not behave well with the current system unless we hack their definitions...

Good news is that we know it's possible, bad news is we have to dive into this mess again.

I realize some things won't work though, or will need extra care and won't be trivial at all : on.exit() as is will run at the end of the chunk,not what we want at all.
Also, sys.call(), match.call() will be wrong, I don't know how I've missed that.

match.arg() is the number one issue because it's very common, but will be solved cleanly
formals() will be solved by the same solution
sys.call() and match.call() will be almost solved, in their basic form they will work if the temp functions are renamed as the original, but it's not robust
on.exit could be caught and executed after we exit the function

It's all clunky heuristics and I don't like it too much, if you're debugging you don't want the debugger to bring its own bugs. One solution would be to refuse every problematic function, but they could be called through another call with eval.parent(quote(...)) so even this is not robust...

We can also kill the functionality but that'd be a shame because it's useful and works well in most cases, so probably better to have an explicit disclaimer, and specific warnings when problematic calls are encountered.

Let's see what covr does before anythin, maybe there's a more proper way. It should be explained here: https://www.youtube.com/watch?v=wP82XSFEiYs

have option to modify if expressions when the assignment is first

so :

x <- if(cond) yes else no

will be changed into

if(cond) x <- yes else x <- no

Which will allow the diagrams to be drawn.

see for instance subset.data.frame

ideas

Code is one dimensional, with flow charts we see what processes are parallel.

Code can also be summarized by using special comments.

1st step

Iterate through the call, spot control-flow constructs and draw the diagram using (probably) DiagrammeR::grViz(), we can check alternatives in DiagrammeR

Validate by testing on :

simple examples
data.table:::[.data.table

2nd step

take the srcref attribute of the function if it exists, spot the instance of funflow comments, and replace by a special call the comment AND everything that follows until the next funflow comment OR the next unmatched closing bracket.

In practice :

if(this > that){
  #-> replace NA with 0
  x[is.na(x)] <- 0
  #-> print x times 3 
  print(3*x)
}

will be transformed into :

if(this > that){
  `*box*`("replace NA with 0")
   `*box*`("print x times 3 ")
}

And then the *box* symbol will be recognized when iterating, so the description of the code
is displayed instead of the code.

We can get fancy and add formatting but it's also good to have these comments not look too different from regular comments, so we can code using them without confusing the code.

Using these funflow comments can be switched on/off through a boolean argument to the funflow() function.

3rd step

We can get a fancy chart header, displaying arguments, their default values, and maybe their description from the help file if available (would be displayed below arg in smaller letters).
The title of the chart is a parameter of funflow() but can also be taken automatically from the help file.

fix prefix argument!

looks like it's broken.

Should work with prefix = "#", prefix = "##", prefix = "#]"

enhance data

With better edge and node data it'll be easier to extend flow to interactive diagrams etc.

We'll be able to do better than what we did with trim or range too.

I believe that we need to add a branch_id to nodes and edges.

A branch_id depends only on the if calls that were encountered, it's actually binary code, the first branch is 1, it can split into 11 and (optionally) 10 at the first if call, then goes back to 1, nested if calls will create branches 111 etc.

It means loops don't create branches, and the upward edge is not a separate branch either.

A branch is broken by if calls into segments, these segments have incremental ids.

The blocks on this segments have incremental ids as well.

It makes it very easy to filter out branches and view something that makes sense.

From one block we can easily extract its branch and segment, se we can have a lot more control about what we see.

Potential extensions will be able to collapse sections of the diagram easily.

flow_run should open the relevant method

should be possible using getS3method and some tweaking to find the actual name.

split chart and head

have an argument to show "head" of the chart.

Find out what is the maximal size by default, as nomnoml fails silently, then fail explicitly and print message saying that suggesting to use head = 200 then use break, and that the max value for head is 450 (real numbers to be defined).

These numbers define edges, or ideally, an approximate number of edges, so that we cut the chart at an appropriate place.

break is another argument, which is a vector of break points, because an automatic split will often split in awkward places, we might as well cut the complexity and cut the chart ourselves where t feels better.

head and break can be used together, in that case head will be used on all sub charts.

Subcharts keep their original block numbers (they don't start at 1 again). So it's easy to know where we're at.

The function header (fun) (fun in ellipse block) is replaced by (fun[i]) , where the branches are cut we place an green transceiver block : [<transceiver>fun\[j\]].

flow_debug, one more time

I think we can start from flow_run's code, and at the end of each step :

display diagram up to curr pos
create a function passed the same args and with a body containing a line assigning all vars and a browser() line
run this function
close dashed edges when we reach an end node

Maybe we can debug the actual code by block, will need to play around, I could do it now but I don't want to expose the user to weird internal code.

Full package doc?

This is ambitious but I think there's a path.

list all functions from pkg
for each fun create a diagram (or several if too long, say chunks 15 or 20 blocks high)
use all.names to grab all used function names in given function, which are part of pkg, they'll be used to have hyper links to other function defs
reproduce in rmd doc the original doc structure, borrowing title, definition (not details), and arg definitions, but adding flow diagrams, and hyperlinks to sections of other used functions.
arguments can turn on and off sections (maybe we just want diagrams, and don't care about arg def for instance
if we have flow_doc_rmd we can easily have flow_doc_pdf and flow_doc_html too (or even a full website! flow_website /flow down? 🤔).

flow_doc_rmd(pkg, pick= NULL, skip = NULL, sections = c("title", "description", "arguments")).

Unexported /undocumented functions are documented in an appendix in the end in alpha order, if sections is NULL, we don't follow original doc structure, just stack functions in alpha order along with hyperlinks.

This might be a very long doc, I hope it can stay practical.

unit tests

we'll have to dput the correct output (that we can copy and paste on nomnoml.com) and do minimal examples for :

empty function
function with one symbol
function with one call
function with 2 calls
function 2nd call commented (special comment)
function with both calls commented
simple if call without else and empty body
simple if call without else and a symbol in body
simple if call without else and a call in body
simple if call without else and 2 calls in body
simple if else call
simple if else call without else and a symbol in body
simple if else call without else and a call in body
simple if else call without else and 2 calls in body
if else call returning on the left
if else call stopping on the right
if else call stopping on the left AND returning on the right
simple if call with a nested if else call
simple for loop
if else call with for loops on each side
simple while loop
if else call with while loops on each side
simple repeat loop
if else call with repeat loops on each side

If we implement compact mode all these tests must be "duplicated" with compact = TRUE

Then test with random functions, we had good exampled already :

Reduce
sweep
data.table:::foverlaps
data.table::fread

And add more tests if necessary

debugging with funflow

This was the big idea but got lazy and on to other projects.

It's developped at the end of : #1 , but let's start over :

with_flow({expr}) creates the chart's data, then uses it to run the code and tick the edges and nodes that are passed through, on exit we have an highlighted chart, and the expression and draws on the chart what path was taken taken.
with_flow({expr}, browse = TRUE) displays the chart step by step, and prints instruction as browser() would, if it can behave like browser in the console, that would be amazing, but not sure if possible
using_flow$fun(...) does the same 2 above for function calls.

We could just check the top level, or we could go through loop too, turn edges green when they're browsed, and add a number to edges out of control flow (for if, y becomes y: 3.

Default path for flow_png()

What about getwd() as default path instead of temp?

flow_debug, flow_debugonce

Use case :

covr shows me that a portion of my code is not covered in some unexported function fun_unexp
I have a test, calling an exported function fun_exp, itself calling fun_unexp, that I thought should cover it

How do I browse through fun_unexp ?

I can flow_run or debug fun_exp, then find the parameter values and run flow_run on fun_unexp with those
I can run flow_run with browse = TRUE, then run it again once I arrive to funexp call

These are not that nice.

A solution would be a pair of functions similar to debug and debugonce (or just debugonce), OR a parameter to flow_run, naming the function we want to debug.

The latter would probably need the former under the hood.

flow_debug_once would trace the function with :

a call to untrace so it won't be traced after first call
return(flow::flow_run(...))

Fails unless the whole package is attached

I wanted to do a quick test using the example in the README but

flow::flow_view(median.default)

fails with

Error in flow_data(median.default, NULL, NULL, NULL, TRUE, FALSE) :
could not find function "flow_data"

I think it's the first time I see this kind of error so I don't know how to fix it but you might.

nested function definitions

This is frustrating to investigate, as a good part of the logic is in functions defined at the start.

funflow::view_flow(tools::Rd2ex)

They could be boxed, the function call and args without body would be on top in an oval shape with arrow upward (maybe see if we can find another arrow to mean "assign", else we'll use an "assign" label" next to the arrow). Then the body would be a flow diagram.

This box is probably more useful that the casewhen box

funflow doesn't like functions that use `function()`

minimal example:


test <- function(x){
  ## a
  function(x) {} 
}
funflow::view_flow(test, prefix = "##")