moodymudskipper / debugverse Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 0 B

Brainstorming ideas for debugging workflow and tools, not a package (yet ?)

debugverse's People

Stargazers

Watchers

debugverse's Issues

snapshot test on next call, with conditions

Say I have :

foo <- function(x, y = NULL)  {
  ...
}

foo() is called by other functions, x and y might not be not immediate to build from scratch in a realistic way.
I'd like to call trigger_snapshot(foo, !is.null(y)). next time foo is called and the condition is verified, it will create code that looks like :

test_that("foo()", {
  x <- ...
  y <- ...
  expect_snapshot({
    foo(x, y)
  })
})

Or inline in foo directly if short enough (heuristics TBD)

We don't redefine missing args with defaults.

I think can trace the function, it needs to be untraced whenever the condition is met though.

We can have a _once and a permanent version too, though the _once will be the most useful by far.

Also : have ... forwarded to constructive, so we can tweak output and use data. data might be set by default to current package.

Check every functions that was used

We need to monitor loadNamespace and trace every function, incrementing counters when a function is used.
If we record the caller too we can draw a flow diagram.(flow_run_deps() ?).

Low level functions have to be overridden and wrapped, we could use that approach for all.
If we don't set exceptions for base functions we'll probably crash, easier to just ignore base at first.

https://twitter.com/antoine_fabri/status/1530090109416128512

NSE arg check

We can check on.exit if an arg has been evaluated. NSE args should not have been evaluated, maybe only useful for development

better list manipulation

modifyList() and purrr::modify_in() are not enough.

We need better utilities to rename, reorder, remove, apply etc

We need also to do a bit better than pluck accessors, by being able to access several items.

We can use tidy selection, but it returns a number, so we need to map these numbers to nodes or leafs.

If we want a pluck like notation we might combine with |, and use list() rather than c(), e.g. list(1, "a" | has_depth(3), "foo")

note that pluck might support tidy selection soonish.

We could have a ls_prune() function that would look and work like pluck except that it would return a subset of the list. We would use v like "vertical" to go deeper into the indices, and h like "horizontal" to select different elements at a given depth.

any_of() and all_of() would be put to good use

error reprex with constructive

Using options(error=) we can automate the inspection that we might do with options(error = recover)

for each frame we check the inputs of the call and construct() them. Then reproduce the error, the output is a knitted md report automatically open.

default path to temp file
default format to md but Cmd and html possible
open defaults to TRUE

NSE is tricky but we can probably deal with most use case with some effort, we can use delayAssign and try to eval what we can.
Just like construct() does we can check that we reproduce the same error, first with SE and implementing heuristics if it doesn't do it.

Maybe better in another package since it is a bit different and needs new deps. Also since we can reproduce the inputs we can also use flow_run() and {boomer} at each step and have a very detailed report of the error through different angles.

Edit a function while browsing ?

Useful to do surgery and test a fix for a bug found in another package.

The effect might not take place on same call but on next one.

We would simply browse, then call remove_next(), replace_next(), or insert_here()`

This is singular enough to be its own package.

I doubt we can edit in real time but that might even be possible given at least in Rstudio how the debugging viewer seems to reload and jump around when executing the same expression at different places.

Then first and hardest step is to know in which function we are and where we are while browsing -> ask stack overflow ?

An ugly workaround would be to have a wrapper that inserts some calls

The second step is to unlock namespace, edit body, relock namespace, it's the easy part.

Super fancy would be to be able to edit as we go, and not need to rerun. This would probably require some heavy C-jitsu.

detecting code patterns

Can we have a generalised regex for code ?

We could have regular variable names be fixed, and then have special placeholders like *ANY_CALL or *ANY_SYM, *ANY_STR, which might have a regular express as an arg (applied to caller for *ANY_CALL).
*N_ARGS might be use to simulate several args (or lines in {). it might have a n arg to limit those (where n might be 0)
*ANY can really be anything but has a function arg to limit the scope, all other functions are wrapped around it.

Maybe we don't need N_ARGS, n is just a parameter of ANY.

It's easy enough to define new helpers, the detecting function considers as matching functions those that obey a certain fixed pattern. e.g. :

ANY_FUNCTION_DEFINITION <- function(x) is.call(x) && x[1] %in% expression(`<-`, `=`) && is.call(x[[2]]) && identical(x[[c(2,1)]], quote(`function`)

We might provide a min and max depth to look for a match, the most useful besides might be max = 0 for top level only.

We'd use source markers to spot those but we might also output as data.

Categorisation of objects

Including detect functions that are not used

objects might be exported, unexpected, imported from other packages, reexports, or defined onLoad.
They have a type and class (usually they're functions)
Functions are used in n functions from the package (if only one and unexported it's interesting, that makes it a local helper function)
Functions might not be used directly and indirectly by exported functions, in this case they're either dead code, WIP, or development helpers, and none of those is probably the clean way to go.
Their environment might be a namespace, another named environment or a direct or indirect child of a namespace
They might be documented, have examples etc. The package checks will only warn us if exported functions are not properly documented.
An object might be defined several times, in this case it is most probably a mistake.

We'd return a data frame convenient to View(), sort and filter.

To avoid false positives we might follow the strategy used in flow_view_vars()

Auto naming of args provided by position

With possible exception of 1st arg

Forwarding args

inside_fun <- function(y, foo = "default) {...}
outside_fun <-  function(x, foo = "default) {
  ...
  inside_fun(y) # forgot to forward `foo` !
  ...
}

Static analysis can help us, if inside function has formal named as the outside function's formals it can bring our attention to it.

ls() for local env + every parent

Returns every available object for each env

Optionally in tables with class, type, and value if scalar, or one line summary if we cannot have that

git pull requests and tickets stored in repo

for clients who cannot offer access to a ticket system.

A ticket has an id, a date, an author, a status, a body and posts.
A post has a data, an author and a body.

A PR has an id, a date, an author, a status, a body and posts.

This is build ignored of course.

Can we get something useful with minimal features ?

the easiest to work incrementally would be to have a collection of md files, one per ticket or PR, even if it's not totally structured re search by author name etc.

then next to this we can have a table with meta information, with possible redundancies but not including any unstructured text.

An advantage is that when creating a branch we namespace the tickets (we still see all past tickets but new ones are only visible from branch). We can merge tickets to main when it's undesirable.

We push PRs to the main branch, have a system to review diffs between commits and merge changes when desired.

check that the build ignore file doesn't contain a dangerous regex

For instance we might have a line foo to ignore the full foo folder, but it will also ignore somefoothing.R, which might be dangerous.

We might warn if such instances are found, and if they're really what the user wants they might use ^.*foo.*$

The rule might be that we want either ^ at the start of $ in the end or both.

insert comments "calls" and "called by"

We might have a linter that inserts at the top of the body a comment that says which functions from the package call this function.

It needs to be a linter because we want to avoid possibilities of being out of sync
It's convenient to have it at the top of the body because this way we have it in the secret when debugging, and it doesn't clash with roxygen comments

{historian} ?

log every call from the console along with:

timestamp
current project
current script open in editor
current selection (if running selection, which we can assume if selection, or cursor position matches code that was run)
current checked out branch
last commit on said branch

I believe every call in the console either calls print() or invisible() down the line so maybe we can stub those to lookup the stack and add an on.exit call to the top operation, which would look at .Rhistory to update a local db.

Analysis of the console's content might give information, though not accurately timestamped, on calls that would be missed, including code typed under browser() / debug() / debugonce()

{revise} package

A tool to reduce friction in a reporting workflow when we want to include revisions from users who wouldn't need to install anything nor now anything about R or git.

A qmd report lives in its own repository, with a shiny app created from the revise package and described below. The package might contain more, or just depend on another package containing logic used by the report.

The shiny app allows us to navigate the history, showing the diffs between any versions (we could also show diffs between dates), default : HEAD and HEAD^1
We can toggle between different ways to show the diff, side by side, inline, and ideally a MS word/google doc looking option.

We can render the doc in a different tab, and download it.

We can insert our own changes, creating a dirty copy, and we can "save it" with a comment. This commits and pushes the new version.

We might also have tools for comments. These might really be quarto comments, represented by the shiny app the MS word way.

First introduced here https://twitter.com/antoine_fabri/status/1725444540620788051 after a comment on our bootcamps by a user unsure how to make the switch in her workflow.

categorize functions with heuristics

Some unexported functions are only used once in a package, to modularise code and have it self documenting
Some functions are pure wrappers, they call a single other function, reprocessing the other arguments and hardcoding others
Some functions are erroring, the only exit points are failures
Some functions are called for side effect, their only exit points are returning invisible(NULL) or their input invisibly (or anything invisibly ?)
More specifically some functions are writing functions, reading functions... + warning functions, messaging functions.
Some functions are short utils, called a lot by other functions
Some functions are constructors, they build a data structure by mainly setting class/attributes and doing checks with minimal reshaping
Some functions are function factories, they return other functions
Some functions aggregate, they return a scalar (or "something smaller") from something bigger

"log" anything, incl objects, to a safe place

To help with the issue of debugging when the console is not available or when in a special environment.

We set a place in an environment variable, using .Renviron,
anything we log goes there in a log file that looks like .Rhistory, but we can also save RDS files there that we flush manually with helper function

nested to piped, piped to nested

so we can easily "regularise" exprs such as :

foo(data, this) %>%
  bar()

debug tryCatch and try

we want to enter the debugger in kind of failure, so we can overwrite for one or more execution the base functions and replace them with a call to rlang::try_fetch with a browser inserted at the start of the fall back, should also cover rlang::try_fetch

Check if all non optionals arguments are called everywhere

Probably unneeded when we have perfect coverage but in the Zurich project would have avoided headaches

Where is the function used in tests ?

For direct use this can be done with ctrl+shift+F but with commented false positive and need to set the search scope to tests

I'd like to have a nicer summary displaying name of the test, and to have direct uses first and then indirect uses (calls to functions which call...)

qna package to store knowledge with minimal overhead ?

We ask ourselves the same questions all the time, maybe we have notes somewhere, or we thought we'd remember so didn't bother go through the trouble.

Do we care about overriding ? we prob won't call library(), let's assume we don't.

Questions and answers are stored in a .r-qna.yml file, serving as a database, we can have local and global Q&As just like with .Rprofile. They're stored at the same places.

Init with this, it also adds to .buildignore if local is TRUE

qna::init(local = FALSE)

This stores a new Q&A, locally if local yml file is found :

qna::new("How do I do ... ?", "Call this function or use this shortcut")
# or qna::til() ?

If args are missing we open popups so we can write in free text

This prints the list of question with select.list, and answers once selected

qna::help(pattern = NULL) # do we care about overriding ? we prob won't call `library()`

We might also store questions to be answered later by providing NA as a question, then the following gives you a list similar as above, but after selection we answer rather than be shown the answer.

qna::catchup(pattern = NULL)

if answer is an url, open it with browseURL()

We can remove questions, same thing as above we use select.list but can select multiple

qna::remove(pattern = NULL)

Useful to have a shortcut to edit yaml directly too, by default local if exists else global :

qna::edit(local = NULL)

debug_in

to debug sub functions, EVEN if they are in a namespace not accessible from calling env

use case: shiny::dateInput

log_next()

This pattern is annoying :

message("computing")
x <- foo(y)
message("done")

Even more annoying if we're timing it, we could wrap the call but it's cumbersome :

my_log_function({
x <- foo(y)
})

I want :

log_next(message = "computing", fun = "foo")
x <- foo(y)

Would display the message first : "In progress: ..."
Then once computed : "... V" (with a tick mark emoji) along with time stamp and time

Just like with {progress} we should have a format, by default the message is the first line of code.

By default the fun is either of <-, =, control flow constructs, or { (so basically 95 % of calls we just miss side effects), we could also really do any next call by using the trick I used in {goto}, maybe better.

This might be done in {once}.

logging might be enabled by options, we can see how other logging packages do things. :

log_next(..., skip = isFALSE(getOption("once.log")))

Maybe be smart detecting for loops and apply funs and have an optional progress bar for those (default to TRUE).
For while/repeat we can still show the start and elapsed time.

operations that rely on the stack might break.

check that variables are not named like function names

That means checking argument names and assignment subjects.

We should have some flexibility choosing the packages to test,
Maybe we don't want to use base R names, maybe we don't want to use current package function names, maybe we don't want to use rlang or purrr names, maybe we don't want to use imported names.

Ctrl+Enter but takes default args if they're not assigned yet

basically we temporarily create promises if the variables don't exist, so this works with NSE too.
If we can map to ctrl + shift + enter that would be great

I'm not sure if it should create or not the variable from the arg, or remove it right after, we don't need it after all if we use ctrl shift enter.

We could adjust this behavior with an option

Log or/and shout when an error is not handled by given package

If my {pkg} package triggers an error which is not triggered by {pkg}, i.e. abort() or stop() are called down the line, under a call to {pkg}'s functions, but NOT called by one of its function, it's a sign that we might have better error checking.

We might have a mechanism to identify those case, and log/popup/message something.

e.g. if this feature lives in a package {snitch} and I have options(snitch.pkgs = c("flow", "dm")) in my RProfile I can use the packages normally and these annoying popups will force me to improve my assertions.

tryCatch used without a class ?

Should we say that if an error is worth catching then it is worth having a specific class ?
Debugging wrongly caught errors is not fun.
Maybe an exception is if we want to catch every possible error and rethrow them.

Make defaults explicit / Make defaults implicit

A good candidate for {tricks}, switch back and forth from long to short version

override `$` to forbid partial matching

We generally don't want to rely on this.

We might have a function to toggle the override (create a function), this might also create a test that triggers a note that the override should be toggled off before release. We might also just gitignore the script containing the function definition.

The error should make it clear that it's artificial

Ensure unexported funs pass all random values through default args

This means no call to sys.Date() or .RandomSeed in the body of the function.

This should work recursively, if I call such a function I should pass the argument.

Exported functions have a pass because we prioritise user experience, but sometimes a random_seed arg doesn't hurt.

issue manager

review_issues() has args to filter unreviewed or by tag or keyword milestone etc

This uses {ghstudio}

When reviewing we have the opportunity to assess scores to issue about impact, difficulty, maybe more, maybe customisable.

This creates a yaml file that is build ignored, not git ignored.

We can draw a scatterplot on these two dimensions, or two of all we have, ideally we'd be able to move the dots interactively and save.

That'd be a cool shiny app

gitplus

I want a git utility that :

doesn't let me commit non syntactic code
doesn't let me create a branch fro another branch than main/master/development without confirmation
gives me status of unpushed/commits and if remote is unsynced, automatically when checking out a branch
Gives me an overview of what branches I have that are not synced (esp if I have unpushed commits)
easy wrappers for common operations with display_git_code = TRUE and run_command = TRUE by default
provide an history of latest branches (like a summarised Git Log, one row per branch, keep row even of no action so build this history from both git log and gitplus
and a gitplus history in R and git languages
A timeline view

Run testthat on committed files only ?

{wat} package for R puzzles

Debugging exercises

Would work well on top of {pkg}, especially for extensibility.

I think for each exercise we create a new project, like what {saperlipopette} does for git.
Whenever possible we hide data in .Rdata files in the R folder, so users can't cheat (esp as it is not clear in debugging what is cheating or not).
For things that can't be done through .RData like active bindings , messing with namespaces, attaching... We call wat::some_function() in .onload or in a local RProfile, {wat} functions have printing methods that say "no cheating!". Then if users want to cheat they can, using unclass(), body() or whatever, but at least they know they're cheating. there might sometimes be some things in the .onLoad() or the Rprofile that are not cheating.

autocomplete anything

$ has a nice autocomplete, which we can hack.

ac will be an active binding that checks all available objects that are named or character objects

ac$iris$ will propose the col names for instance. In the case of a char vector we propose the content + the names if available

The printing method uses rstudio api to replace the call with the chosen completion

even if no partial match we consider the close result using string distance

Status of all branches in project

Which have unpushed changes, unpulled changes, which have a PR, closed PR, reviewed PR...

convert FIXMEs and TODOs to GitHub issues

Using gh package

Existing packages

Mine :
flow
boomer
refactor
bagtools
once
tricks

Others:
Some good links in there: https://twitter.com/antoine_fabri/status/1510988603219927040

duplicated function definitions

In big projects it might happen that we already have a update_results() and we implement a new one. It's not obvious to debug, but detectable through static analysis.

act on last error

Get info on last error using :

geterrmessage()
.traceback()
rlang::last_error() # when relevant

Set an active binding to e that will create a list of actions relevant to error.

These might be suggestions to fix, calls to rstudioapi to go at the right place in the right script, suggestions for good practice, filter the call stack to show only the relevant error, add test for this behaviour (creating a reprex at the chosen level from the inputs), suggestions on typos or common mistakes (e.g. if (length(x ==1))...), with action to fix automatically.

unlike fcuk we don't use options(error=) so we don't conflict with any package.

rules can be defined a bit like in tricks, in fact this could be part of {tricks}

A getNamespaceExports that behaves well with devtools

See r-lib/devtools#2373

Related/part of #26

Debugging, philosophy

Debugging in a wide sense, including optimisation and design.

What is it when do you need it, what are the different starting points, the different use cases ?

code fails
code gives unexpected results
code warns unexpectedly
code is slow
code is memory hungry
code crashes
dead code
complex code structure
redundancies

We have too many tools already, how do you know which to apply, in which order ?

{fcuk} sets a hook on errors to analyse spelling, can we go further than this ? and have a Swiss army knife to layout all relevant debugging options on each error ?

Do we need a flow chart of what debugging situations are ? How would we build it ? Map own experience to a diagram ? build a logger of errors and warnings so we can look back and see an aggregation of what generally goes wrong

The bag course offers a starting point

Turn a specific warning to an error or debug

A combinations of those partly works.

trace() is messing things up, maybe better just edit the warning() function, though not CRAN applicable.

test <- function() {
  foo <- 1
  print(foo)
  warning("some warning")
  foo <- foo + 1
  print(foo)
  rlang::warn("other warning", class = "myclass")
  foo
}

.warning_regexes <- "^some"
.warning_classes <- "myclass"

keep_browsing <- function() rlang::eval_bare(quote(on.exit(eval.parent(quote(browser())), add = TRUE)), parent.frame())

trace(warning, print = FALSE, quote({
  if (...length() == 1L && inherits(..1, "condition")) {
    if(any(sapply(.warning_classes, inherits, x = ..1))) {
      rlang::eval_bare(quote(on.exit(browser(), add = TRUE)), parent.frame(4))
    }
  } else {
    if (any(sapply(.warning_regexes, grepl, paste(c(...), collapse = "")))) {
      rlang::eval_bare(quote(on.exit(browser(), add = TRUE)), parent.frame(4))
    }
  }
}))

moodymudskipper / debugverse Goto Github PK

debugverse's People

Stargazers

Watchers

debugverse's Issues

Recommend Projects

Recommend Topics

Recommend Org