larmarange / broom.helpers Goto Github PK
View Code? Open in Web Editor NEWA set of functions to facilitate manipulation of tibbles produced by broom
Home Page: https://larmarange.github.io/broom.helpers/
License: GNU General Public License v3.0
A set of functions to facilitate manipulation of tibbles produced by broom
Home Page: https://larmarange.github.io/broom.helpers/
License: GNU General Public License v3.0
I noticed a labelling error in nnet::multinom()
. The label column is missing the variable label for the stage variable in the example below.
library(gtsummary)
#> #BlackLivesMatter
nnet::multinom(grade ~ age + stage, data = trial, trace = FALSE) %>%
broom.helpers::tidy_plus_plus(add_header_rows = TRUE) %>%
dplyr::select(y.level, variable, term, var_label, label, estimate)
#> # A tibble: 12 x 6
#> y.level variable term var_label label estimate
#> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 II age age Age Age 0.00813
#> 2 II stage <NA> T Stage <NA> NA
#> 3 II stage stageT1 T Stage T1 0
#> 4 II stage stageT2 T Stage T2 -0.497
#> 5 II stage stageT3 T Stage T3 -1.04
#> 6 II stage stageT4 T Stage T4 -0.634
#> 7 III age age Age Age 0.0110
#> 8 III stage <NA> T Stage <NA> NA
#> 9 III stage stageT1 T Stage T1 0
#> 10 III stage stageT2 T Stage T2 0.128
#> 11 III stage stageT3 T Stage T3 -0.214
#> 12 III stage stageT4 T Stage T4 0.291
Created on 2020-10-04 by the reprex package (v0.3.0)
Everything so far is looking amazing!
I noted that there is inconsistent application of the label for categorical variables (no value in the label column for the header row), and output for results from Hmisc::rcspline.eval()
and poly()
(the label column does have a value).
Obviously not a big deal, but can be worth addressing to remain consistent.
library(broom.helpers)
library(gtsummary)
mod <- glm(age ~ grade + Hmisc::rcspline.eval(marker), data = trial, family = gaussian)
mod %>%
tidy_and_attach() %>%
tidy_identify_variables() %>%
tidy_add_variable_labels() %>%
tidy_add_reference_rows() %>%
tidy_add_header_rows() %>%
select(1:2, label)
#> # A tibble: 9 x 3
#> term variable label
#> <chr> <chr> <chr>
#> 1 (Intercept) <NA> (Intercept)
#> 2 <NA> grade <NA>
#> 3 grade_ref grade I
#> 4 gradeII grade II
#> 5 gradeIII grade III
#> 6 <NA> Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 7 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 8 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
#> 9 Hmisc::rcspline.eval(mark~ Hmisc::rcspline.eval(mar~ Hmisc::rcspline.eval(mar~
Created on 2020-08-14 by the reprex package (v0.3.0)
When one runs tidy_add_reference_rows()
after tidy_add_term_labels()
the reference row label is not shown. It makes sense why this occurs, but I think at minimum a message to users would be helpful to alert them to run the functions in a different order to get desired output.
library(broom.helpers)
# build regression model
lm(mpg ~ factor(cyl) + hp, mtcars) %>%
# perform initial tidying of model
tidy_and_attach() %>%
# add the cyl levels
tidy_add_term_labels() %>%
# add reference row cyl
tidy_add_reference_rows() %>%
knitr::kable()
term | variable | var_class | var_type | estimate | std.error | statistic | p.value | var_label | contrasts | label | reference_row |
---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | NA | NA | intercept | 28.6501182 | 1.5877870 | 18.044056 | 0.0000000 | (Intercept) | NA | (Intercept) | NA |
factor(cyl)4 | factor(cyl) | factor | categorical | NA | NA | NA | NA | factor(cyl) | contr.treatment | NA | TRUE |
factor(cyl)6 | factor(cyl) | factor | categorical | -5.9676551 | 1.6392776 | -3.640418 | 0.0010921 | factor(cyl) | contr.treatment | 6 | FALSE |
factor(cyl)8 | factor(cyl) | factor | categorical | -8.5208508 | 2.3260749 | -3.663188 | 0.0010286 | factor(cyl) | contr.treatment | 8 | FALSE |
hp | hp | numeric | continuous | -0.0240388 | 0.0154079 | -1.560163 | 0.1299540 | hp | NA | hp | NA |
Created on 2020-08-27 by the reprex package (v0.3.0)
This is VERY much an edge case, but wanted to let you know. It seems that if a variable name has a +
in it, tidy_remove_intercept()
will remove both the intercept and the variable from the model. If this is a complicated fix, perhaps just a message to the user, "more than one row was removed from the table. possible error occurred likely due to unusual naming conventions used for terms."
library(gtsummary)
trial2 <-
trial %>%
dplyr::mutate(`treatment +name` = trt)
glm(response ~ `treatment +name`,
trial2,
family = binomial(link = "logit")) %>%
broom.helpers::tidy_and_attach() %>%
broom.helpers::tidy_remove_intercept()
#> # A tibble: 0 x 8
#> # ... with 8 variables: term <chr>, variable <chr>, var_class <chr>,
#> # var_type <chr>, estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> # p.value <dbl>
Created on 2020-10-02 by the reprex package (v0.3.0)
Perhaps a good idea to wrap the call to stats::model.frame(model)
in model_get_model_frame.R
in a try catch in case the regression model does not have a method for it (like mice models).
This function will allow to keep only certain variables in the output.
Two arguments: keep and drop.
To be added also to tidy_plus_plus()
I started re-writing the broom.helpers section of tbl_regression()
to use tidy_plus_plus()
instead of the individual functions. One of the reasons to use plus-plus over a series of other tidy_*()
functions, is that it will be easier for me to give users access to the other arguments in tidy_plus_plus()
so they can change the resulting table if they like (e.g. adding the informative contrast labels @gorkang are working on).
One sticking point is that I treat the intercept like a variable. For example, users can change the intercept label using tbl_regression(label = list("(Intercept)" ~ "b0", age ~ "Patient Age"))
. Is there a way where the gtsummary API does not change, and I can use tidy_plus_plus()
?
My first thought was to simply have an option in tidy_identify_variables()
that populates the intercept variable column with the term name. But I am not sure if this will cause problems with other subsequent functions. What do you think?
When there is no model.frame()
method, the user sees a very informative message. (super helpful!)
The resulting table has column for variable, but the columns are NA. Can we add the term as the variable for these models? In gtsummary, we use the variable name to do further manipulation (and in broom.helpers too), but with no name these variables cannot be selected.
I know the term is not the proper variable name, but I think the printed message is enough of a cue to users that the original variable names are not available.
library(broom.helpers)
library(gtsummary)
# make up some interval censored data
trial2 <-
trial %>%
dplyr::mutate(
lint = dplyr::case_when(
death == 1 ~ runif(200) + 2,
death == 0 ~ ttdeath
),
rint = dplyr::case_when(
death == 1 ~ ttdeath,
death == 0 ~ Inf
)
)
# Write a custom tidier
tidy_ic_sp <- function(x, exponentiate = FALSE, conf.level = 0.95, ...) {
tidy <-
tibble::tibble(
term = names(x[["coefficients"]]),
estimate = x[["coefficients"]],
std.error = sqrt(diag(x[["var"]])),
statistic = summary(x)$summaryParameters[, "z-value"],
p.value = summary(x)$summaryParameters[, "p"],
conf.low = confint(x, level = conf.level)[, 1],
conf.high = confint(x, level = conf.level)[, 2]
)
if (exponentiate == TRUE)
tidy <- dplyr::mutate_at(tidy, vars(estimate, conf.low, conf.high), exp)
tidy
}
# fit the interval-censored survival model with icenReg::ic_sp()
icenReg::ic_sp(
survival::Surv(lint, rint, type = "interval2") ~ trt,
model = "ph",
bs_samples = 3,
data = trial2
) %>%
# tidy up with broom.helpers
tidy_and_attach(tidy_fun = tidy_ic_sp) %>%
tidy_identify_variables() %>%
tidy_add_variable_labels() %>%
tidy_add_header_rows() %>%
select(term, variable, var_label, label, estimate)
#> x Unable to identify the list of variables.
#>
#> This is usually due to an error calling `stats::model.frame(x)`or `stats::model.matrix(x)`.
#> It could be the case if that type of model does not implement these methods.
#> Rarely, this error may occur if the model object was created within
#> a functional programming framework (e.g. using `lappy()`, `purrr::map()`, etc.).
#> # A tibble: 1 x 5
#> term variable var_label label estimate
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 trtDrug B <NA> trtDrug B trtDrug B 0.160
Created on 2020-10-19 by the reprex package (v0.3.0)
in tidy_select_variables()
, should we rename keep
as include
for consistency?
What do you think?
library(broom.helpers)
library(gtsummary)
#> #BlackLivesMatter
lm(age ~ factor(response):marker, trial) %>%
tidy_and_attach() %>%
tidy_identify_variables() %>%
tidy_add_reference_rows() %>%
knitr::kable()
term | variable | var_class | var_type | contrasts | reference_row | estimate | std.error | statistic | p.value |
---|---|---|---|---|---|---|---|---|---|
(Intercept) | NA | NA | intercept | NA | NA | 46.6357738 | 1.632164 | 28.5729753 | 0.0000000 |
factor(response)0:marker | factor(response):marker | NA | interaction | NA | NA | 0.3957856 | 1.507993 | 0.2624585 | 0.7932857 |
factor(response)1:marker | factor(response):marker | NA | interaction | NA | NA | 0.1015807 | 1.653558 | 0.0614316 | 0.9510877 |
factor(response)0 | factor(response) | NA | NA | NA | TRUE | NA | NA | NA | NA |
Created on 2020-09-03 by the reprex package (v0.3.0)
This is so minor, but wanted to point it out just in case!
The tidy_plus_plus()
fn accepts the arg conf.int=
and also the ...
which are passed to tidy_fun=
. Is there a reason to include conf.int=
here, but not in tidy_and_attach()
for example.
There are other common tidy arguments not included, e.g. exponentiate=
. To be consistent, should conf.int=
argument be removed?
Should we add a broom.helpers class to the tibbles? I think this can help down the line ensuring we're working the the correct object types.
class(x) <- c("broom.helpers", class(x))
When a reference row is added, rather than creating a new term "{varname}_ref"
, I suggest that you either keep the term name consistent with the other terms (in the example below, the new term would be "gradeI"
), or leave it blank.
I know it's unlikely, but it someone had a variable with the level _ref
, things would fall apart somewhere I think.
library(broom.helpers)
library(gtsummary)
mod <- glm(age ~ grade, data = trial, family = gaussian)
mod %>%
tidy_and_attach() %>%
tidy_identify_variables() %>%
tidy_add_variable_labels() %>%
tidy_add_reference_rows() %>%
tidy_add_header_rows() %>%
select(variable, term, reference_row, label, header_row)
#> # A tibble: 5 x 5
#> variable term reference_row label header_row
#> <chr> <chr> <lgl> <chr> <lgl>
#> 1 <NA> (Intercept) NA (Intercept) FALSE
#> 2 grade <NA> NA <NA> TRUE
#> 3 grade grade_ref TRUE I FALSE
#> 4 grade gradeII FALSE II FALSE
#> 5 grade gradeIII FALSE III FALSE
Created on 2020-08-14 by the reprex package (v0.3.0)
What is your opinion on improving the error messaging in situations like the one below: where the model is created within an apply()
or map()
setting and the stats::*()
functions called on model objects fail.
library(tidyverse)
library(gtsummary)
#> #Uighur
library(survival)
# Set up map statement to create different models
tibble(grade = c("I", "II", "III")) %>%
mutate(df_model = map(grade, ~ trial %>% filter(grade == ..1))) %>%
mutate(
mv_formula_char = "Surv(ttdeath, death) ~ trt + age + marker",
mv_formula = map(mv_formula_char, ~ as.formula(.x)),
mv_model_form =
map2(
mv_formula, df_model,
~ coxph(..1, data = ..2)
),
mv_tbl_form =
map(
mv_model_form,
~ broom.helpers::tidy_plus_plus(..1, exponentiate = TRUE)
)
)
#> Error: Problem with `mutate()` input `mv_tbl_form`.
#> x the ... list contains fewer than 2 elements
#> i Input `mv_tbl_form` is `map(mv_model_form, ~broom.helpers::tidy_plus_plus(..1, exponentiate = TRUE))`.
Created on 2020-08-31 by the reprex package (v0.3.0)
In gtsummary, we added an error message like this: ddsjoberg/gtsummary#231
Prepare for release:
devtools::check(remote = TRUE, manual = TRUE)
revdepcheck::revdep_reset()
revdepcheck::revdep_check(num_workers = 4)
Submit to CRAN:
usethis::use_version()
cran-comments.md
devtools::submit_cran()
(CRAN team on vacation until August 24)Wait for CRAN...
usethis::use_github_release()
CRAN-RELEASE
usethis::use_dev_version()
Maybe, a feature that could be added in broom.helpers (and therefore also implemented in gtsummary) could be a function tidy_rename_categorical_terms() that would allow to do the type of renaming you want, but after model computation and at the moment the table is built. For example:
mod %>% tidy_and_attach() %>% tidy_rename_categorical_terms(pattern = "{variable} [{term}-{reference}]")
You would be able to choose whatever pattern you want.
Note: a second argument should allow to select which variables to rename.
By the way, if the model attribute is lost in some cases, should we add a call to tidy_attach_model at the end of each tidy_* function, by security?
Originally posted by @larmarange in #13 (comment)
When tidy_add_header_rows()
is run on the model below, the interaction term should be on two rows. It should have a header row with label column equal to factor(response) * Marker Level (ng/mL)
, and a second row with label column 1 * Marker Level (ng/mL)
with the estimate.
library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response) * marker, trial) %>%
broom.helpers::tidy_and_attach() %>%
broom.helpers::tidy_identify_variables() %>%
broom.helpers::tidy_add_variable_labels() %>%
broom.helpers::tidy_add_reference_rows() %>%
broom.helpers::tidy_add_header_rows() %>%
select(variable, var_type, var_label, label, estimate)
#> # A tibble: 6 x 5
#> variable var_type var_label label estimate
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 <NA> intercept (Intercept) (Intercept) 44.0
#> 2 factor(response) categori~ factor(response) factor(respons~ NA
#> 3 factor(response) categori~ factor(response) 0 NA
#> 4 factor(response) categori~ factor(response) 1 9.12
#> 5 marker continuo~ Marker Level (ng/mL) Marker Level (~ 2.01
#> 6 factor(response)~ interact~ factor(response) * Marke~ 1 * Marker Lev~ -5.34
Created on 2020-09-01 by the reprex package (v0.3.0)
The mice package does not include a model.frame()
method for the resulting regression models from multiply imputed data sets.
Would you be ok adding one here? I need to look up the exact code, but it'll be something like this (i can add if you're ok with it)
#' @export
#' @rdname model_get_model_frame
model_get_model_frame.mipo <- function(model) {
# add check that the mice package is installed
# grab input mice data
# extract a single dataset for our use of finding labels
mice::complete(...)
}
The model below includes factor(cyl)
which is 3 levels. When we request that it is displayed on a single row, nothing happens (because it can't be shown on a single row), and there is not message about the command being ignored.
A message to the user in this case would be helpful.
library(broom.helpers)
lm(mpg ~ hp + factor(cyl) + factor(am), mtcars) %>%
broom.helpers::tidy_and_attach() %>%
broom.helpers::tidy_identify_variables() %>%
broom.helpers::tidy_add_header_rows(show_single_row = c("factor(am)", "factor(cyl)"))
#> # A tibble: 6 x 12
#> term variable var_label var_class var_type header_row contrasts label
#> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <chr>
#> 1 (Int~ <NA> (Interce~ <NA> interce~ NA <NA> (Int~
#> 2 hp hp hp numeric continu~ NA <NA> hp
#> 3 <NA> factor(~ factor(c~ factor categor~ TRUE contr.tr~ fact~
#> 4 fact~ factor(~ factor(c~ factor categor~ FALSE contr.tr~ 6
#> 5 fact~ factor(~ factor(c~ factor categor~ FALSE contr.tr~ 8
#> 6 fact~ factor(~ factor(a~ factor categor~ NA contr.tr~ 1
#> # ... with 4 more variables: estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> # p.value <dbl>
Created on 2020-09-01 by the reprex package (v0.3.0)
Add an helper to convert poly(var, 4)
into var
in variable and to produce more explicit term (e.g. var^1
, var^2
, var^3
, var^4
)
Test
lm(hp ~ factor(`number + cylinders`) : `miles per galon` + factor(`type of transmission`), mtcars %>% rename(`miles per galon` = mpg, `type of transmission` = am, `number + cylinders` = cyl))
miles per galon
should have ticks removed
At the moment, the broom.helpers and gtsummary select helpers are created independently. When both packages are loaded, one package will mask the others' all_*()
selecting functions....which is not good! I've been thinking on a way to unify the syntax, and I think I've come up with something.
Proposed changes:
select_constructor()
, we could define all_continuous()
with the code below, which would select variable with type continuous.
all_continuous <- function() select_constructor("variable", "var_type", "continuous")
The reason for the constructor, is that I can later use it in gtsummary to easily construct selecting functions that do not apply in the broom.helpers setting. BUT, I do not need to recreate the enviornments which which we're selecting or define new scoping functions.
all_interactions()
selector and I think another one...but I can't find that message. I'll add that here too. With the general format, it's actually very easy to add new select functions.all_factor()
, all_character()
, etc. functions. I do not suggest you do this. Since I initially released those select functions, {tidyselect} has been updated to all for selection using predicate functions, e.g. trial %>% select(where(is.character))
. It's in my plan to deprecate those functions so I do not need to support any supliferous functions.The only front-facing changes here, will be exporting two new functions that help us write and use the selecting functions in other packages. I'll start putting together a PR.
When I run an intercept only model, we get returned tibble but also a warning.
library(broom.helpers)
lm(mpg ~ 1, mtcars) %>%
broom.helpers::tidy_and_attach() %>%
broom.helpers::tidy_identify_variables() %>%
broom.helpers::tidy_add_header_rows()
#> Warning in min(.data$rank): no non-missing arguments to min; returning Inf
#> # A tibble: 1 x 12
#> term variable var_label var_class var_type header_row contrasts label
#> <chr> <chr> <chr> <chr> <chr> <lgl> <chr> <chr>
#> 1 (Int~ <NA> (Interce~ <NA> interce~ NA <NA> (Int~
#> # ... with 4 more variables: estimate <dbl>, std.error <dbl>, statistic <dbl>,
#> # p.value <dbl>
Created on 2020-09-01 by the reprex package (v0.3.0)
It would be helpful to have a standardized order the columns appear as additional information is added to the tidy tibble. For example, all the original columns could remain on the right side of the tibble, and all new columns would be added to the left side of the tibble.
The ordering of the columns (no matter the order the functions are called) would also be standardized. The order would be selected to make it easier to digest the information in the table. For example, when the variable is added, rather than it perhaps ending up in the middle of the tibble, it would always be near the beginning. Below is a suggested ordering:
library(broom.helpers)
lm(mpg ~ factor(cyl) + hp, mtcars) %>%
tidy_plus_plus() %>%
dplyr::select(any_of(c("variable", "var_label", "var_class", "var_type",
"contrasts", "reference_row", "label")),
everything()) %>%
knitr::kable()
variable | var_label | var_class | var_type | contrasts | reference_row | label | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
factor(cyl) | factor(cyl) | factor | categorical | contr.treatment | TRUE | 4 | factor(cyl)4 | NA | NA | NA | NA | NA | NA |
factor(cyl) | factor(cyl) | factor | categorical | contr.treatment | FALSE | 6 | factor(cyl)6 | -5.9676551 | 1.6392776 | -3.640418 | 0.0010921 | -9.3255631 | -2.6097471 |
factor(cyl) | factor(cyl) | factor | categorical | contr.treatment | FALSE | 8 | factor(cyl)8 | -8.5208508 | 2.3260749 | -3.663188 | 0.0010286 | -13.2855993 | -3.7561022 |
hp | hp | numeric | continuous | NA | NA | hp | hp | -0.0240388 | 0.0154079 | -1.560163 | 0.1299540 | -0.0556005 | 0.0075228 |
Created on 2020-08-27 by the reprex package (v0.3.0)
A simple re-ordering function could be added to the end of each tidy_*()
function.
order_tidy_columns <- function(x) {
dplyr::select(x,
any_of(c("variable", "var_label", "var_class", "var_type",
"contrasts", "reference_row", "label")),
everything())
}
In the example below, am
is an integer class variable. But in the broom.helpers tibble, the class is indicated as integer.
library(broom.helpers)
tibble::as_tibble(mtcars) %>%
dplyr::mutate(
am = as.integer(am),
vs = as.logical(vs)
) %>%
{lm(mpg ~ am + vs + hp + factor(cyl), .)} %>%
tidy_and_attach() %>%
tidy_identify_variables()
#> # A tibble: 6 x 8
#> term variable var_class var_type estimate std.error statistic p.value
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Interce~ <NA> <NA> intercept 24.4 2.57 9.51 6.01e-10
#> 2 am am numeric continuo~ 5.16 1.45 3.55 1.49e- 3
#> 3 vsTRUE vs logical categori~ 2.57 1.94 1.32 1.97e- 1
#> 4 hp hp numeric continuo~ -0.0469 0.0145 -3.23 3.35e- 3
#> 5 factor(c~ factor(cy~ factor categori~ -2.65 1.80 -1.48 1.52e- 1
#> 6 factor(c~ factor(cy~ factor categori~ -0.277 3.49 -0.0795 9.37e- 1
Created on 2020-10-08 by the reprex package (v0.3.0)
In the example below, I am requesting the interaction term "factor(response):marker"
be printed on a single row, but it is being ignored.
library(broom.helpers)
library(gtsummary)
lm(age ~ factor(response) * marker, trial) %>%
tidy_and_attach() %>%
tidy_identify_variables() %>%
tidy_add_reference_rows() %>%
tidy_add_variable_labels() %>%
tidy_add_header_rows(show_single_row = "factor(response):marker") %>%
knitr::kable()
term | variable | var_label | var_class | var_type | header_row | contrasts | reference_row | label | estimate | std.error | statistic | p.value |
---|---|---|---|---|---|---|---|---|---|---|---|---|
(Intercept) | NA | (Intercept) | NA | intercept | NA | NA | NA | (Intercept) | 43.985685 | 1.906507 | 23.071342 | 0.0000000 |
NA | factor(response) | factor(response) | factor | categorical | TRUE | contr.treatment | NA | factor(response) | NA | NA | NA | NA |
factor(response)0 | factor(response) | factor(response) | factor | categorical | FALSE | contr.treatment | TRUE | 0 | NA | NA | NA | NA |
factor(response)1 | factor(response) | factor(response) | factor | categorical | FALSE | contr.treatment | FALSE | 1 | 9.117623 | 3.536300 | 2.578294 | 0.0107814 |
marker | marker | Marker Level (ng/mL) | numeric | continuous | NA | NA | NA | Marker Level (ng/mL) | 2.007188 | 1.609824 | 1.246836 | 0.2141828 |
NA | factor(response):marker | factor(response) * Marker Level (ng/mL) | NA | interaction | TRUE | NA | NA | factor(response) * Marker Level (ng/mL) | NA | NA | NA | NA |
factor(response)1:marker | factor(response):marker | factor(response) * Marker Level (ng/mL) | NA | interaction | FALSE | NA | NA | 1 * Marker Level (ng/mL) | -5.337195 | 2.647510 | -2.015930 | 0.0453914 |
Created on 2020-09-03 by the reprex package (v0.3.0)
As a developer, it would be helpful to have the option for some broom.helper functions to fail when they cannot execute the requested action.
I am integrating broom.helpers into gtsummary now, and these two scenarios have come up so far:
When I run broom.helpers::tidy_identify_variables()
if the variables cannot be identified, I would like to be able to have the function error. As it is currently written, I would need to inspect the returned object to check if the variables were indeed identified.
When I run broom.helpers::tidy_add_header_rows(show_single_row=)
for a variable that cannot be put on a single row.
Perhaps the arg could be something like tidy_plus_plus(strict=)
? It would be similar to how purrr had pluck()
and chuck()
?
The model below only has an interaction term (no main effects), and the variable label is not correct.
library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response):marker, trial) %>%
broom.helpers::tidy_and_attach() %>%
broom.helpers::tidy_identify_variables() %>%
broom.helpers::tidy_add_variable_labels() %>%
select(variable, var_type, var_label, estimate)
#> # A tibble: 3 x 4
#> variable var_type var_label estimate
#> <chr> <chr> <chr> <dbl>
#> 1 <NA> intercept (Intercept) 46.6
#> 2 factor(response):marker interaction NA * Marker Level (ng/mL) 0.396
#> 3 factor(response):marker interaction NA * Marker Level (ng/mL) 0.102
Created on 2020-09-01 by the reprex package (v0.3.0)
Also, if we add a tidy_add_term_labels()
the label is also wrong, but in a different way.
library(gtsummary)
library(broom.helpers)
lm(age ~ factor(response):marker, trial) %>%
broom.helpers::tidy_and_attach() %>%
broom.helpers::tidy_identify_variables() %>%
broom.helpers::tidy_add_term_labels() %>%
broom.helpers::tidy_add_variable_labels() %>%
select(variable, var_type, var_label, label, estimate)
#> # A tibble: 3 x 5
#> variable var_type var_label label estimate
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 <NA> intercept (Intercept) (Intercep~ 46.6
#> 2 factor(response):mark~ interaction NA * Marker Level (ng/~ NA * NA 0.396
#> 3 factor(response):mark~ interaction NA * Marker Level (ng/~ NA * NA 0.102
Created on 2020-09-01 by the reprex package (v0.3.0)
dummy.coef
to populate the estimate of the reference row.There is a merging error when the the remove intercept function is run before the identify variables function...there are two columns for var_nlevels
).
library(broom.helpers)
lm(age ~ marker, gtsummary::trial) %>%
tidy_and_attach() %>%
tidy_remove_intercept() %>%
tidy_identify_variables() # looks like a merging error (two cols for var_nlevels)
#> # A tibble: 1 x 10
#> term variable var_class var_type var_nlevels.x estimate std.error statistic
#> <chr> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl>
#> 1 mark~ marker numeric continu~ NA -0.0545 1.26 -0.0434
#> # ... with 2 more variables: p.value <dbl>, var_nlevels.y <int>
Created on 2020-10-15 by the reprex package (v0.3.0)
The header row for cyl
is missing when using tidy_plus_plus()
, but the documentation indicates it should have been added.
library(broom.helpers)
# no header row for cyl
lm(mpg ~ factor(cyl), mtcars) %>%
tidy_plus_plus()
#> # A tibble: 3 x 14
#> term variable var_class var_type estimate std.error statistic p.value
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 fact~ factor(~ factor categor~ NA NA NA NA
#> 2 fact~ factor(~ factor categor~ -6.92 1.56 -4.44 1.19e- 4
#> 3 fact~ factor(~ factor categor~ -11.6 1.30 -8.90 8.57e-10
#> # ... with 6 more variables: conf.low <dbl>, conf.high <dbl>, contrasts <chr>,
#> # reference_row <lgl>, var_label <chr>, label <chr>
# has header row
lm(mpg ~ factor(cyl), mtcars) %>%
tidy_and_attach() %>%
tidy_add_reference_rows() %>%
tidy_add_header_rows()
#> # A tibble: 5 x 13
#> term variable var_class var_type estimate std.error statistic p.value
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Int~ <NA> <NA> interce~ 26.7 0.972 27.4 2.69e-22
#> 2 <NA> factor(~ factor categor~ NA NA NA NA
#> 3 fact~ factor(~ factor categor~ NA NA NA NA
#> 4 fact~ factor(~ factor categor~ -6.92 1.56 -4.44 1.19e- 4
#> 5 fact~ factor(~ factor categor~ -11.6 1.30 -8.90 8.57e-10
#> # ... with 5 more variables: contrasts <chr>, reference_row <lgl>,
#> # var_label <chr>, label <chr>, header_row <lgl>
Created on 2020-08-17 by the reprex package (v0.3.0)
Inspired by gtsummary:::estimate_header()
, add a function to identify model type and coefficient type.
An additional function tidy_identify_model_type()
could add model_type
and coefficient_type
as attributes to the results.
It will be useful for the redesign of GGally::ggcoef
To @ddsjoberg , let me know if you think it could be relevant for gtsummary
as well. I know that in gtsummary
you also manage corresponding footnotes and translation. But I do not think that this last part is in the scope of broom.helpers
.
Prepare for release:
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
Submit to CRAN:
usethis::use_version()
cran-comments.md
devtools::submit_cran()
Wait for CRAN...
CRAN-RELEASE
usethis::use_dev_version()
Dear @ddsjoberg
I would like your opinion on the two following points.
First, it could be relevant to better identify dichotomous variables. An option could be to have an evolution of var_type
created by tidy_identify_variables()
and, for dichotomous variables, to replace the value"categorical" by "dichotomous", knowing that all dichotomous variables are also categoricals. But it could maybe have side effects in gtsummary
.
An alternative could be to generate an additional column dichotomous
equal to TRUE, FALSE or NA (for continuous variables).
Identifying dichotomous variables directly in tidy_identify_variables()
would be useful later by simplifying the code of tidy_add_header_rows()
when applying show_single_row
.
Second, tidy helpers such as all_categorical()
, all_continuous()
and all_dichotomous()
could be useful as well in broom.helpers
. However, I do not know if code could be mutualised between gtsummary()
and broom.helpers
and if we could avoid any conflict.
As you developed these two functions and you are the one who implemented tidy selecters in broom.helpers
, what do you think?
Best
Any function that prints messages should have a quiet=
option. This could be helpful to devs who do not want the broom.helpers messages to print.
In the example below, the variable grade
does indeed have a label, "Grade"
. But you can get it!
Can we please update the internals to grab the label using the method below if not found in the typical manner?
library(broom.helpers)
library(gtsummary)
library(survival)
#> Warning: package 'survival' was built under R version 4.0.2
mod <- coxph(Surv(ttdeath, death) ~ grade, trial)
mod %>%
tidy_and_attach() %>%
tidy_identify_variables() %>%
tidy_add_reference_rows() %>%
tidy_add_variable_labels() %>%
tidy_add_header_rows() %>%
select(term, variable, var_label, label)
#> # A tibble: 4 x 4
#> term variable var_label label
#> <chr> <chr> <chr> <chr>
#> 1 <NA> grade grade grade
#> 2 grade_ref grade grade I
#> 3 gradeII grade grade II
#> 4 gradeIII grade grade III
# get the grade label from a coxph object
model.frame.default(mod)$grade %>% attr("label")
#> [1] "Grade"
Created on 2020-08-14 by the reprex package (v0.3.0)
If tidy_add_variable_labels()
is run after tidy_add_reference_rows()
labels are correctly filled correctly.
library(broom.helpers)
library(gtsummary)
library(survival)
mod <- lm(ttdeath ~ grade, trial)
mod %>%
tidy_and_attach() %>%
tidy_identify_variables() %>%
tidy_add_reference_rows() %>%
tidy_add_variable_labels() %>%
tidy_add_header_rows() %>%
select(term, variable, var_label, label)
#> # A tibble: 5 x 4
#> term variable var_label label
#> <chr> <chr> <chr> <chr>
#> 1 (Intercept) <NA> (Intercept) (Intercept)
#> 2 <NA> grade Grade Grade
#> 3 grade_ref grade Grade I
#> 4 gradeII grade Grade II
#> 5 gradeIII grade Grade III
But, if it is called in the opposite order, the var_label is does not fill all rows associated with the variable.
mod %>%
tidy_and_attach() %>%
tidy_identify_variables() %>%
tidy_add_variable_labels() %>%
tidy_add_reference_rows() %>%
tidy_add_header_rows() %>%
select(term, variable, var_label, label)
#> # A tibble: 5 x 4
#> term variable var_label label
#> <chr> <chr> <chr> <chr>
#> 1 (Intercept) <NA> (Intercept) (Intercept)
#> 2 <NA> grade <NA> <NA>
#> 3 grade_ref grade <NA> I
#> 4 gradeII grade Grade II
#> 5 gradeIII grade Grade III
I think it is fine to have an order dependency of these functions, but a note passed to the users would be helpful. Or even an error like when tidy_add_variable_labels()
is called after tidy_add_header_rows()
.
Should we add a table to the vignette with a list of compatibles models, with a note column to specify model-specific information about compatibility?
To avoid adding a reference row to certain variables (could be useful in some cases, e.g. a forest plot), in particular when no header rows are added
Is there a way to modify the header row label? I didn't immediately see it. If there is not, one way to solve the this is by adding an argument to tidy_add_header_rows(label=)
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.