larmarange / broom.helpers Goto Github PK

A set of functions to facilitate manipulation of tibbles produced by broom

Home Page: https://larmarange.github.io/broom.helpers/

License: GNU General Public License v3.0

R 100.00%

broom.helpers's Issues

Missing label for nnet::multinom() categorical variables

I noticed a labelling error in nnet::multinom(). The label column is missing the variable label for the stage variable in the example below.

library(gtsummary)
#> #BlackLivesMatter

nnet::multinom(grade ~ age + stage, data = trial, trace = FALSE) %>%
  broom.helpers::tidy_plus_plus(add_header_rows = TRUE) %>%
  dplyr::select(y.level, variable, term, var_label, label, estimate)
#> # A tibble: 12 x 6
#>    y.level variable term    var_label label estimate
#>    <chr>   <chr>    <chr>   <chr>     <chr>    <dbl>
#>  1 II      age      age     Age       Age    0.00813
#>  2 II      stage    <NA>    T Stage   <NA>  NA      
#>  3 II      stage    stageT1 T Stage   T1     0      
#>  4 II      stage    stageT2 T Stage   T2    -0.497  
#>  5 II      stage    stageT3 T Stage   T3    -1.04   
#>  6 II      stage    stageT4 T Stage   T4    -0.634  
#>  7 III     age      age     Age       Age    0.0110 
#>  8 III     stage    <NA>    T Stage   <NA>  NA      
#>  9 III     stage    stageT1 T Stage   T1     0      
#> 10 III     stage    stageT2 T Stage   T2     0.128  
#> 11 III     stage    stageT3 T Stage   T3    -0.214  
#> 12 III     stage    stageT4 T Stage   T4     0.291

^{Created on 2020-10-04 by the reprex package (v0.3.0)}

label column filled when using `Hmisc::rcspline.eval()` and `poly()` , but not for other categorical variables

Everything so far is looking amazing!

I noted that there is inconsistent application of the label for categorical variables (no value in the label column for the header row), and output for results from Hmisc::rcspline.eval() and poly() (the label column does have a value).

Obviously not a big deal, but can be worth addressing to remain consistent.

library(broom.helpers)
library(gtsummary)

mod <- glm(age ~ grade + Hmisc::rcspline.eval(marker), data = trial, family = gaussian)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() %>%
  select(1:2, label) 
#> # A tibble: 9 x 3
#>   term                       variable                  label                    
#>   <chr>                      <chr>                     <chr>                    
#> 1 (Intercept)                <NA>                      (Intercept)              
#> 2 <NA>                       grade                     <NA>                     
#> 3 grade_ref                  grade                     I                        
#> 4 gradeII                    grade                     II                       
#> 5 gradeIII                   grade                     III                      
#> 6 <NA>                       Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 7 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 8 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 9 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~

^{Created on 2020-08-14 by the reprex package (v0.3.0)}

Ref row label not added

When one runs tidy_add_reference_rows() after tidy_add_term_labels() the reference row label is not shown. It makes sense why this occurs, but I think at minimum a message to users would be helpful to alert them to run the functions in a different order to get desired output.

library(broom.helpers)

# build regression model
lm(mpg ~ factor(cyl) + hp, mtcars) %>%
  # perform initial tidying of model
  tidy_and_attach() %>%
  # add the cyl levels
  tidy_add_term_labels() %>%
  # add reference row cyl
  tidy_add_reference_rows() %>%
  knitr::kable()

term	variable	var_class	var_type	estimate	std.error	statistic	p.value	var_label	contrasts	label	reference_row
(Intercept)	NA	NA	intercept	28.6501182	1.5877870	18.044056	0.0000000	(Intercept)	NA	(Intercept)	NA
factor(cyl)4	factor(cyl)	factor	categorical	NA	NA	NA	NA	factor(cyl)	contr.treatment	NA	TRUE
factor(cyl)6	factor(cyl)	factor	categorical	-5.9676551	1.6392776	-3.640418	0.0010921	factor(cyl)	contr.treatment	6	FALSE
factor(cyl)8	factor(cyl)	factor	categorical	-8.5208508	2.3260749	-3.663188	0.0010286	factor(cyl)	contr.treatment	8	FALSE
hp	hp	numeric	continuous	-0.0240388	0.0154079	-1.560163	0.1299540	hp	NA	hp	NA

^{Created on 2020-08-27 by the reprex package (v0.3.0)}

`tidy_remove_intercept()` removes terms from model (when they are named horribly)

This is VERY much an edge case, but wanted to let you know. It seems that if a variable name has a + in it, tidy_remove_intercept() will remove both the intercept and the variable from the model. If this is a complicated fix, perhaps just a message to the user, "more than one row was removed from the table. possible error occurred likely due to unusual naming conventions used for terms."

library(gtsummary)

trial2 <- 
  trial %>% 
  dplyr::mutate(`treatment +name` = trt)


glm(response ~ `treatment +name`, 
    trial2, 
    family = binomial(link = "logit")) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_remove_intercept()
#> # A tibble: 0 x 8
#> # ... with 8 variables: term <chr>, variable <chr>, var_class <chr>,
#> #   var_type <chr>, estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>

^{Created on 2020-10-02 by the reprex package (v0.3.0)}

Wrap `model.frame()` in `tryCatch()` ?

Perhaps a good idea to wrap the call to stats::model.frame(model) in model_get_model_frame.R in a try catch in case the regression model does not have a method for it (like mice models).

Add tidy_select_variables()

This function will allow to keep only certain variables in the output.

Two arguments: keep and drop.

To be added also to tidy_plus_plus()

Should variable be populated for intercept terms? or stay NA as current?

I started re-writing the broom.helpers section of tbl_regression() to use tidy_plus_plus() instead of the individual functions. One of the reasons to use plus-plus over a series of other tidy_*() functions, is that it will be easier for me to give users access to the other arguments in tidy_plus_plus() so they can change the resulting table if they like (e.g. adding the informative contrast labels @gorkang are working on).

One sticking point is that I treat the intercept like a variable. For example, users can change the intercept label using tbl_regression(label = list("(Intercept)" ~ "b0", age ~ "Patient Age")). Is there a way where the gtsummary API does not change, and I can use tidy_plus_plus()?

My first thought was to simply have an option in tidy_identify_variables() that populates the intercept variable column with the term name. But I am not sure if this will cause problems with other subsequent functions. What do you think?

Variable names for models with no model.frame method

When there is no model.frame() method, the user sees a very informative message. (super helpful!)

The resulting table has column for variable, but the columns are NA. Can we add the term as the variable for these models? In gtsummary, we use the variable name to do further manipulation (and in broom.helpers too), but with no name these variables cannot be selected.

I know the term is not the proper variable name, but I think the printed message is enough of a cue to users that the original variable names are not available.

library(broom.helpers)
library(gtsummary)

# make up some interval censored data 
trial2 <-
  trial %>% 
  dplyr::mutate(
    lint = dplyr::case_when(
      death == 1 ~ runif(200) + 2,
      death == 0 ~ ttdeath
    ),
    rint = dplyr::case_when(
      death == 1 ~ ttdeath,
      death == 0 ~ Inf
    )
  )

# Write a custom tidier
tidy_ic_sp <- function(x, exponentiate =  FALSE, conf.level = 0.95, ...) {
  tidy <-
    tibble::tibble(
      term = names(x[["coefficients"]]),
      estimate = x[["coefficients"]],
      std.error = sqrt(diag(x[["var"]])),
      statistic = summary(x)$summaryParameters[, "z-value"],
      p.value = summary(x)$summaryParameters[, "p"],
      conf.low = confint(x, level = conf.level)[, 1],
      conf.high = confint(x, level = conf.level)[, 2]
    )
  
  if (exponentiate == TRUE)
    tidy <- dplyr::mutate_at(tidy, vars(estimate, conf.low, conf.high), exp)
  
  tidy
}

# fit the interval-censored survival model with icenReg::ic_sp()
icenReg::ic_sp(
  survival::Surv(lint, rint, type = "interval2") ~ trt,
  model = "ph",
  bs_samples = 3,
  data = trial2
) %>%
  # tidy up with broom.helpers
  tidy_and_attach(tidy_fun = tidy_ic_sp) %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label, estimate)
#> x Unable to identify the list of variables.
#>   
#>   This is usually due to an error calling `stats::model.frame(x)`or `stats::model.matrix(x)`.
#>   It could be the case if that type of model does not implement these methods.
#>   Rarely, this error may occur if the model object was created within
#>   a functional programming framework (e.g. using `lappy()`, `purrr::map()`, etc.).
#> # A tibble: 1 x 5
#>   term      variable var_label label     estimate
#>   <chr>     <chr>    <chr>     <chr>        <dbl>
#> 1 trtDrug B <NA>     trtDrug B trtDrug B    0.160

^{Created on 2020-10-19 by the reprex package (v0.3.0)}

rename keep parameter to include?

@ddsjoberg

in tidy_select_variables(), should we rename keep as include for consistency?

What do you think?

`tidy_add_reference_rows()` erroneously adds reference row to interaction-only model

library(broom.helpers)
library(gtsummary)
#> #BlackLivesMatter

lm(age ~ factor(response):marker, trial) %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  knitr::kable()

term	variable	var_class	var_type	contrasts	reference_row	estimate	std.error	statistic	p.value
(Intercept)	NA	NA	intercept	NA	NA	46.6357738	1.632164	28.5729753	0.0000000
factor(response)0:marker	factor(response):marker	NA	interaction	NA	NA	0.3957856	1.507993	0.2624585	0.7932857
factor(response)1:marker	factor(response):marker	NA	interaction	NA	NA	0.1015807	1.653558	0.0614316	0.9510877
factor(response)0	factor(response)	NA	NA	NA	TRUE	NA	NA	NA	NA

^{Created on 2020-09-03 by the reprex package (v0.3.0)}

Consistency of args passed

This is so minor, but wanted to point it out just in case!

The tidy_plus_plus() fn accepts the arg conf.int= and also the ... which are passed to tidy_fun=. Is there a reason to include conf.int= here, but not in tidy_and_attach() for example.

There are other common tidy arguments not included, e.g. exponentiate=. To be consistent, should conf.int= argument be removed?

Add broom.helpers class to tibbles?

Should we add a broom.helpers class to the tibbles? I think this can help down the line ensuring we're working the the correct object types.

class(x) <- c("broom.helpers", class(x))

Update the new term for reference rows

When a reference row is added, rather than creating a new term "{varname}_ref", I suggest that you either keep the term name consistent with the other terms (in the example below, the new term would be "gradeI"), or leave it blank.

I know it's unlikely, but it someone had a variable with the level _ref, things would fall apart somewhere I think.

library(broom.helpers)
library(gtsummary)

mod <- glm(age ~ grade, data = trial, family = gaussian)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() %>% 
  select(variable, term, reference_row, label, header_row)
#> # A tibble: 5 x 5
#>   variable term        reference_row label       header_row
#>   <chr>    <chr>       <lgl>         <chr>       <lgl>     
#> 1 <NA>     (Intercept) NA            (Intercept) FALSE     
#> 2 grade    <NA>        NA            <NA>        TRUE      
#> 3 grade    grade_ref   TRUE          I           FALSE     
#> 4 grade    gradeII     FALSE         II          FALSE     
#> 5 grade    gradeIII    FALSE         III         FALSE

^{Created on 2020-08-14 by the reprex package (v0.3.0)}

Improve error messaging

What is your opinion on improving the error messaging in situations like the one below: where the model is created within an apply() or map() setting and the stats::*() functions called on model objects fail.

library(tidyverse)
library(gtsummary)
#> #Uighur
library(survival)

# Set up map statement to create different models
tibble(grade = c("I", "II", "III")) %>%
  mutate(df_model = map(grade, ~ trial %>% filter(grade == ..1))) %>%
  mutate(
    mv_formula_char = "Surv(ttdeath, death) ~ trt + age + marker",
    mv_formula = map(mv_formula_char, ~ as.formula(.x)),
    mv_model_form =
      map2(
        mv_formula, df_model,
        ~ coxph(..1, data = ..2)
      ),
    mv_tbl_form =
      map(
        mv_model_form,
        ~ broom.helpers::tidy_plus_plus(..1, exponentiate = TRUE)
      )
  )
#> Error: Problem with `mutate()` input `mv_tbl_form`.
#> x the ... list contains fewer than 2 elements
#> i Input `mv_tbl_form` is `map(mv_model_form, ~broom.helpers::tidy_plus_plus(..1, exponentiate = TRUE))`.

^{Created on 2020-08-31 by the reprex package (v0.3.0)}

In gtsummary, we added an error message like this: ddsjoberg/gtsummary#231

Release broom.helpers 1.1.0

Prepare for release:

devtools::check(remote = TRUE, manual = TRUE)
revdepcheck::revdep_reset()
revdepcheck::revdep_check(num_workers = 4)
Polish NEWS
Polish pkgdown reference index

Submit to CRAN:

usethis::use_version()
Update cran-comments.md
devtools::submit_cran() (CRAN team on vacation until August 24)
Approve email

Wait for CRAN...

Accepted 🎉
usethis::use_github_release()
Remove file CRAN-RELEASE
usethis::use_dev_version()

Customize categorical term labels with a glue pattern

Maybe, a feature that could be added in broom.helpers (and therefore also implemented in gtsummary) could be a function tidy_rename_categorical_terms() that would allow to do the type of renaming you want, but after model computation and at the moment the table is built. For example:

mod %>% tidy_and_attach() %>% tidy_rename_categorical_terms(pattern = "{variable} [{term}-{reference}]")
You would be able to choose whatever pattern you want.

Cf. ddsjoberg/gtsummary#677

Note: a second argument should allow to select which variables to rename.

model attribute is lost in some cases

By the way, if the model attribute is lost in some cases, should we add a call to tidy_attach_model at the end of each tidy_* function, by security?

Originally posted by @larmarange in #13 (comment)

Improve examples in the documentation

Error identifying variables in multinom with sum contrast

Once integrated in gtsummary and GGally, update README

tidy_add_header_rows() error with continuous * categorical interaction

When tidy_add_header_rows() is run on the model below, the interaction term should be on two rows. It should have a header row with label column equal to factor(response) * Marker Level (ng/mL), and a second row with label column 1 * Marker Level (ng/mL) with the estimate.

library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response) * marker, trial) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_variable_labels() %>%
  broom.helpers::tidy_add_reference_rows() %>%
  broom.helpers::tidy_add_header_rows() %>%
  select(variable, var_type, var_label, label, estimate)
#> # A tibble: 6 x 5
#>   variable          var_type  var_label                 label           estimate
#>   <chr>             <chr>     <chr>                     <chr>              <dbl>
#> 1 <NA>              intercept (Intercept)               (Intercept)        44.0 
#> 2 factor(response)  categori~ factor(response)          factor(respons~    NA   
#> 3 factor(response)  categori~ factor(response)          0                  NA   
#> 4 factor(response)  categori~ factor(response)          1                   9.12
#> 5 marker            continuo~ Marker Level (ng/mL)      Marker Level (~     2.01
#> 6 factor(response)~ interact~ factor(response) * Marke~ 1 * Marker Lev~    -5.34

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

Add a `model_get_model_frame()` method for mice objects

The mice package does not include a model.frame() method for the resulting regression models from multiply imputed data sets.

Would you be ok adding one here? I need to look up the exact code, but it'll be something like this (i can add if you're ok with it)

#' @export
#' @rdname model_get_model_frame
model_get_model_frame.mipo <- function(model) {
  # add check that the mice package is installed
  
  # grab input mice data

  # extract a single dataset for our use of finding labels
   mice::complete(...)
}

Add pkgdown website

Add message when user requests single row for variable that cannot be put on a single row

The model below includes factor(cyl) which is 3 levels. When we request that it is displayed on a single row, nothing happens (because it can't be shown on a single row), and there is not message about the command being ignored.

A message to the user in this case would be helpful.

library(broom.helpers)
lm(mpg ~ hp + factor(cyl) + factor(am), mtcars) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_header_rows(show_single_row = c("factor(am)", "factor(cyl)")) 
#> # A tibble: 6 x 12
#>   term  variable var_label var_class var_type header_row contrasts label
#>   <chr> <chr>    <chr>     <chr>     <chr>    <lgl>      <chr>     <chr>
#> 1 (Int~ <NA>     (Interce~ <NA>      interce~ NA         <NA>      (Int~
#> 2 hp    hp       hp        numeric   continu~ NA         <NA>      hp   
#> 3 <NA>  factor(~ factor(c~ factor    categor~ TRUE       contr.tr~ fact~
#> 4 fact~ factor(~ factor(c~ factor    categor~ FALSE      contr.tr~ 6    
#> 5 fact~ factor(~ factor(c~ factor    categor~ FALSE      contr.tr~ 8    
#> 6 fact~ factor(~ factor(a~ factor    categor~ NA         contr.tr~ 1    
#> # ... with 4 more variables: estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

Management of `poly()`

Add an helper to convert poly(var, 4) into var in variable and to produce more explicit term (e.g. var^1, var^2, var^3, var^4)

clean bacticks in variable names for interaction only terms

Test

lm(hp ~ factor(`number + cylinders`) : `miles per galon` + factor(`type of transmission`), mtcars %>% rename(`miles per galon` = mpg, `type of transmission` = am, `number + cylinders` = cyl))

miles per galon should have ticks removed

Once released, add a DOI (through Zenodo)

Unify the broom.helpers and gtsummary select helpers

At the moment, the broom.helpers and gtsummary select helpers are created independently. When both packages are loaded, one package will mask the others' all_*() selecting functions....which is not good! I've been thinking on a way to unify the syntax, and I think I've come up with something.

Proposed changes:

Create a universal select function, and export it. This function will help construct each of the other helpers. For example, if the function were called select_constructor(), we could define all_continuous() with the code below, which would select variable with type continuous.
```
all_continuous <- function() select_constructor("variable", "var_type", "continuous")
```

The reason for the constructor, is that I can later use it in gtsummary to easily construct selecting functions that do not apply in the broom.helpers setting. BUT, I do not need to recreate the enviornments which which we're selecting or define new scoping functions.

That brings us to the second point, we'd also need to export the scoping function so I can reuse it gtsummary
I recall a notification I received where you indicated we could add an all_interactions() selector and I think another one...but I can't find that message. I'll add that here too. With the general format, it's actually very easy to add new select functions.
You had also mentioned at some point about adding all_factor(), all_character(), etc. functions. I do not suggest you do this. Since I initially released those select functions, {tidyselect} has been updated to all for selection using predicate functions, e.g. trial %>% select(where(is.character)). It's in my plan to deprecate those functions so I do not need to support any supliferous functions.

The only front-facing changes here, will be exporting two new functions that help us write and use the selecting functions in other packages. I'll start putting together a PR.

Warning is printed for intercept only models

When I run an intercept only model, we get returned tibble but also a warning.

library(broom.helpers)
lm(mpg ~ 1, mtcars) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_header_rows() 
#> Warning in min(.data$rank): no non-missing arguments to min; returning Inf
#> # A tibble: 1 x 12
#>   term  variable var_label var_class var_type header_row contrasts label
#>   <chr> <chr>    <chr>     <chr>     <chr>    <lgl>      <chr>     <chr>
#> 1 (Int~ <NA>     (Interce~ <NA>      interce~ NA         <NA>      (Int~
#> # ... with 4 more variables: estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> #   p.value <dbl>

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

Column ordering suggestion

It would be helpful to have a standardized order the columns appear as additional information is added to the tidy tibble. For example, all the original columns could remain on the right side of the tibble, and all new columns would be added to the left side of the tibble.

The ordering of the columns (no matter the order the functions are called) would also be standardized. The order would be selected to make it easier to digest the information in the table. For example, when the variable is added, rather than it perhaps ending up in the middle of the tibble, it would always be near the beginning. Below is a suggested ordering:

library(broom.helpers)

lm(mpg ~ factor(cyl) + hp, mtcars) %>%
  tidy_plus_plus() %>% 
  dplyr::select(any_of(c("variable", "var_label", "var_class", "var_type", 
                         "contrasts", "reference_row", "label")), 
                everything()) %>%
  knitr::kable()

variable	var_label	var_class	var_type	contrasts	reference_row	label	term	estimate	std.error	statistic	p.value	conf.low	conf.high
factor(cyl)	factor(cyl)	factor	categorical	contr.treatment	TRUE	4	factor(cyl)4	NA	NA	NA	NA	NA	NA
factor(cyl)	factor(cyl)	factor	categorical	contr.treatment	FALSE	6	factor(cyl)6	-5.9676551	1.6392776	-3.640418	0.0010921	-9.3255631	-2.6097471
factor(cyl)	factor(cyl)	factor	categorical	contr.treatment	FALSE	8	factor(cyl)8	-8.5208508	2.3260749	-3.663188	0.0010286	-13.2855993	-3.7561022
hp	hp	numeric	continuous	NA	NA	hp	hp	-0.0240388	0.0154079	-1.560163	0.1299540	-0.0556005	0.0075228

^{Created on 2020-08-27 by the reprex package (v0.3.0)}

A simple re-ordering function could be added to the end of each tidy_*() function.

order_tidy_columns <- function(x) {
  dplyr::select(x, 
                any_of(c("variable", "var_label", "var_class", "var_type", 
                         "contrasts", "reference_row", "label")), 
                everything())
}

`var_class` incorrect for integers

In the example below, am is an integer class variable. But in the broom.helpers tibble, the class is indicated as integer.

library(broom.helpers)

tibble::as_tibble(mtcars) %>%
  dplyr::mutate(
    am = as.integer(am),
    vs = as.logical(vs)
  ) %>%
  {lm(mpg ~ am + vs + hp + factor(cyl), .)} %>%
  tidy_and_attach() %>%
  tidy_identify_variables() 
#> # A tibble: 6 x 8
#>   term      variable   var_class var_type  estimate std.error statistic  p.value
#>   <chr>     <chr>      <chr>     <chr>        <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Interce~ <NA>       <NA>      intercept  24.4       2.57      9.51   6.01e-10
#> 2 am        am         numeric   continuo~   5.16      1.45      3.55   1.49e- 3
#> 3 vsTRUE    vs         logical   categori~   2.57      1.94      1.32   1.97e- 1
#> 4 hp        hp         numeric   continuo~  -0.0469    0.0145   -3.23   3.35e- 3
#> 5 factor(c~ factor(cy~ factor    categori~  -2.65      1.80     -1.48   1.52e- 1
#> 6 factor(c~ factor(cy~ factor    categori~  -0.277     3.49     -0.0795 9.37e- 1

^{Created on 2020-10-08 by the reprex package (v0.3.0)}

`show_single_row=` not working for categorical-continuous interaction

In the example below, I am requesting the interaction term "factor(response):marker" be printed on a single row, but it is being ignored.

library(broom.helpers)
library(gtsummary)

lm(age ~ factor(response) * marker, trial) %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows(show_single_row = "factor(response):marker") %>%
  knitr::kable()

term	variable	var_label	var_class	var_type	header_row	contrasts	reference_row	label	estimate	std.error	statistic	p.value
(Intercept)	NA	(Intercept)	NA	intercept	NA	NA	NA	(Intercept)	43.985685	1.906507	23.071342	0.0000000
NA	factor(response)	factor(response)	factor	categorical	TRUE	contr.treatment	NA	factor(response)	NA	NA	NA	NA
factor(response)0	factor(response)	factor(response)	factor	categorical	FALSE	contr.treatment	TRUE	0	NA	NA	NA	NA
factor(response)1	factor(response)	factor(response)	factor	categorical	FALSE	contr.treatment	FALSE	1	9.117623	3.536300	2.578294	0.0107814
marker	marker	Marker Level (ng/mL)	numeric	continuous	NA	NA	NA	Marker Level (ng/mL)	2.007188	1.609824	1.246836	0.2141828
NA	factor(response):marker	factor(response) * Marker Level (ng/mL)	NA	interaction	TRUE	NA	NA	factor(response) * Marker Level (ng/mL)	NA	NA	NA	NA
factor(response)1:marker	factor(response):marker	factor(response) * Marker Level (ng/mL)	NA	interaction	FALSE	NA	NA	1 * Marker Level (ng/mL)	-5.337195	2.647510	-2.015930	0.0453914

^{Created on 2020-09-03 by the reprex package (v0.3.0)}

Add strict option for functions

As a developer, it would be helpful to have the option for some broom.helper functions to fail when they cannot execute the requested action.

I am integrating broom.helpers into gtsummary now, and these two scenarios have come up so far:

When I run broom.helpers::tidy_identify_variables() if the variables cannot be identified, I would like to be able to have the function error. As it is currently written, I would need to inspect the returned object to check if the variables were indeed identified.
When I run broom.helpers::tidy_add_header_rows(show_single_row=) for a variable that cannot be put on a single row.

Perhaps the arg could be something like tidy_plus_plus(strict=)? It would be similar to how purrr had pluck() and chuck()?

`tidy_add_variable_labels()` error with interaction only model

The model below only has an interaction term (no main effects), and the variable label is not correct.

library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response):marker, trial) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_variable_labels() %>%
  select(variable, var_type, var_label, estimate)
#> # A tibble: 3 x 4
#>   variable                var_type    var_label                 estimate
#>   <chr>                   <chr>       <chr>                        <dbl>
#> 1 <NA>                    intercept   (Intercept)                 46.6  
#> 2 factor(response):marker interaction NA * Marker Level (ng/mL)    0.396
#> 3 factor(response):marker interaction NA * Marker Level (ng/mL)    0.102

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

Also, if we add a tidy_add_term_labels() the label is also wrong, but in a different way.

library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response):marker, trial) %>%
  broom.helpers::tidy_and_attach() %>%
  broom.helpers::tidy_identify_variables() %>%
  broom.helpers::tidy_add_term_labels() %>%
  broom.helpers::tidy_add_variable_labels() %>%
  select(variable, var_type, var_label, label, estimate)
#> # A tibble: 3 x 5
#>   variable               var_type    var_label               label      estimate
#>   <chr>                  <chr>       <chr>                   <chr>         <dbl>
#> 1 <NA>                   intercept   (Intercept)             (Intercep~   46.6  
#> 2 factor(response):mark~ interaction NA * Marker Level (ng/~ NA * NA       0.396
#> 3 factor(response):mark~ interaction NA * Marker Level (ng/~ NA * NA       0.102

^{Created on 2020-09-01 by the reprex package (v0.3.0)}

Add a function `tidy_add_estimate_for_reference_rows(exponentiate = FALSE)`

For treatment and SAS contrasts, will set reference rows equal to 0
For sum contrasts, will use dummy.coef to populate the estimate of the reference row.
For other contrasts, will do nothing

Error when identify variables run after remove intercept

There is a merging error when the the remove intercept function is run before the identify variables function...there are two columns for var_nlevels).

library(broom.helpers)

lm(age ~ marker, gtsummary::trial) %>%
  tidy_and_attach() %>%
  tidy_remove_intercept() %>%
  tidy_identify_variables() # looks like a merging error (two cols for var_nlevels)
#> # A tibble: 1 x 10
#>   term  variable var_class var_type var_nlevels.x estimate std.error statistic
#>   <chr> <chr>    <chr>     <chr>            <int>    <dbl>     <dbl>     <dbl>
#> 1 mark~ marker   numeric   continu~            NA  -0.0545      1.26   -0.0434
#> # ... with 2 more variables: p.value <dbl>, var_nlevels.y <int>

^{Created on 2020-10-15 by the reprex package (v0.3.0)}

header row missing after running `tidy_plus_plus()`

The header row for cyl is missing when using tidy_plus_plus(), but the documentation indicates it should have been added.

library(broom.helpers)
# no header row for cyl 
lm(mpg ~ factor(cyl), mtcars) %>%
  tidy_plus_plus()
#> # A tibble: 3 x 14
#>   term  variable var_class var_type estimate std.error statistic   p.value
#>   <chr> <chr>    <chr>     <chr>       <dbl>     <dbl>     <dbl>     <dbl>
#> 1 fact~ factor(~ factor    categor~    NA        NA        NA    NA       
#> 2 fact~ factor(~ factor    categor~    -6.92      1.56     -4.44  1.19e- 4
#> 3 fact~ factor(~ factor    categor~   -11.6       1.30     -8.90  8.57e-10
#> # ... with 6 more variables: conf.low <dbl>, conf.high <dbl>, contrasts <chr>,
#> #   reference_row <lgl>, var_label <chr>, label <chr>

# has header row
lm(mpg ~ factor(cyl), mtcars) %>%
  tidy_and_attach() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() 
#> # A tibble: 5 x 13
#>   term  variable var_class var_type estimate std.error statistic   p.value
#>   <chr> <chr>    <chr>     <chr>       <dbl>     <dbl>     <dbl>     <dbl>
#> 1 (Int~ <NA>     <NA>      interce~    26.7      0.972     27.4   2.69e-22
#> 2 <NA>  factor(~ factor    categor~    NA       NA         NA    NA       
#> 3 fact~ factor(~ factor    categor~    NA       NA         NA    NA       
#> 4 fact~ factor(~ factor    categor~    -6.92     1.56      -4.44  1.19e- 4
#> 5 fact~ factor(~ factor    categor~   -11.6      1.30      -8.90  8.57e-10
#> # ... with 5 more variables: contrasts <chr>, reference_row <lgl>,
#> #   var_label <chr>, label <chr>, header_row <lgl>

^{Created on 2020-08-17 by the reprex package (v0.3.0)}

Improve pkdown website

Add a function model_get_coefficients_type()

Inspired by gtsummary:::estimate_header(), add a function to identify model type and coefficient type.

An additional function tidy_identify_model_type() could add model_type and coefficient_type as attributes to the results.

It will be useful for the redesign of GGally::ggcoef

To @ddsjoberg , let me know if you think it could be relevant for gtsummary as well. I know that in gtsummary you also manage corresponding footnotes and translation. But I do not think that this last part is in the scope of broom.helpers.

Release version 1.0.0

Prepare for release:

devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
Polish NEWS
Polish pkgdown reference index

Submit to CRAN:

usethis::use_version()
Update cran-comments.md
devtools::submit_cran()
Approve email

Wait for CRAN...

Accepted 🎉
Create GitHub release
Remove file CRAN-RELEASE
usethis::use_dev_version()

Easier identification of dichotomous variables and all_categorical(), all_continuous(), all_dichotomous() helpers

Dear @ddsjoberg

I would like your opinion on the two following points.

First, it could be relevant to better identify dichotomous variables. An option could be to have an evolution of var_type created by tidy_identify_variables() and, for dichotomous variables, to replace the value"categorical" by "dichotomous", knowing that all dichotomous variables are also categoricals. But it could maybe have side effects in gtsummary.

An alternative could be to generate an additional column dichotomous equal to TRUE, FALSE or NA (for continuous variables).

Identifying dichotomous variables directly in tidy_identify_variables() would be useful later by simplifying the code of tidy_add_header_rows() when applying show_single_row.

Second, tidy helpers such as all_categorical(), all_continuous() and all_dichotomous() could be useful as well in broom.helpers. However, I do not know if code could be mutualised between gtsummary() and broom.helpers and if we could avoid any conflict.

As you developed these two functions and you are the one who implemented tidy selecters in broom.helpers, what do you think?

Best

Add quiet option

Any function that prints messages should have a quiet= option. This could be helpful to devs who do not want the broom.helpers messages to print.

`survival::coxph()` strips labels from categorical variables....but you can access them.

In the example below, the variable grade does indeed have a label, "Grade". But you can get it!

Can we please update the internals to grab the label using the method below if not found in the typical manner?

library(broom.helpers)
library(gtsummary)
library(survival)
#> Warning: package 'survival' was built under R version 4.0.2

mod <- coxph(Surv(ttdeath, death) ~ grade, trial)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label)
#> # A tibble: 4 x 4
#>   term      variable var_label label
#>   <chr>     <chr>    <chr>     <chr>
#> 1 <NA>      grade    grade     grade
#> 2 grade_ref grade    grade     I    
#> 3 gradeII   grade    grade     II   
#> 4 gradeIII  grade    grade     III

# get the grade label from a coxph object
model.frame.default(mod)$grade %>% attr("label")
#> [1] "Grade"

^{Created on 2020-08-14 by the reprex package (v0.3.0)}

Include `var_label` in subsequent calls

If tidy_add_variable_labels() is run after tidy_add_reference_rows() labels are correctly filled correctly.

library(broom.helpers)
library(gtsummary)
library(survival)

mod <- lm(ttdeath ~ grade, trial)

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_reference_rows() %>%
  tidy_add_variable_labels() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label)
#> # A tibble: 5 x 4
#>   term        variable var_label   label      
#>   <chr>       <chr>    <chr>       <chr>      
#> 1 (Intercept) <NA>     (Intercept) (Intercept)
#> 2 <NA>        grade    Grade       Grade      
#> 3 grade_ref   grade    Grade       I          
#> 4 gradeII     grade    Grade       II         
#> 5 gradeIII    grade    Grade       III

But, if it is called in the opposite order, the var_label is does not fill all rows associated with the variable.

mod %>%
  tidy_and_attach() %>%
  tidy_identify_variables() %>%
  tidy_add_variable_labels() %>%
  tidy_add_reference_rows() %>%
  tidy_add_header_rows() %>%
  select(term, variable, var_label, label)
#> # A tibble: 5 x 4
#>   term        variable var_label   label      
#>   <chr>       <chr>    <chr>       <chr>      
#> 1 (Intercept) <NA>     (Intercept) (Intercept)
#> 2 <NA>        grade    <NA>        <NA>       
#> 3 grade_ref   grade    <NA>        I          
#> 4 gradeII     grade    Grade       II         
#> 5 gradeIII    grade    Grade       III

I think it is fine to have an order dependency of these functions, but a note passed to the users would be helpful. Or even an error like when tidy_add_variable_labels() is called after tidy_add_header_rows().

larmarange / broom.helpers Goto Github PK

broom.helpers's Issues

Recommend Projects

Recommend Topics

Recommend Org