Group Member | Github |
---|---|
Gabriel Bogo | @GabrielBogo |
Yuwei Liu | @liuyuwei169 |
Weifeng (Davy) Guo | @DavyGuo |
Mohamad Makkaoui | @makka3 |
Reimplementation of the infer
R package, that offers a tidy way of developing statistical inference built on top of Tidyverse.
The infer package streamlines the process of reshuffling and bootstrapping of samples, calculating summary statistics and confidence intervals, and performing hypothesis tests for statistical inference. It does this using a combination of functions that are built with the emphasis on clear expressive code and using correct statistical grammar that explains the way the values are calculated and the tests are evaluated in statistical inference.
With this package as the inspiration, rfer will have four main functions (specify,generate,calculate,get_ci) for the first iteration. These functions will, given a data frame and the specified response variable; calculate summary statistics and confidence intervals for the response variable. Further details follow in the description of the functions below.
Where does rfer
fit into the R ecosystem?
Currently, infer
does a great job at implementing what we've specified for the functions in the R ecosystem. Nevertheless, we will begin by developing similar functions for the initial iterations with the expectation that we will add on extra functions that will enhance infer
package at a later time.
Function Description: choose specific columns to feed the subsequent pipeline.
Inputs:
- data: a Dataframe
- response: string. The column of your dataframe to be the response variable.
Output:
- Dataframe containing one column for response variable and zero or more columns for the explanatory variables. The first column is always the response.
Function Description: Generate bootstrap resamples and permutations
Inputs:
- data: a Dataframe generated from a
specify
function. - n_samples: Integer. Number of resamples.
- type: "Bootstrap" (default), or "Permutation".
Output:
- Dataframe containing all resamples stacked vertically. Will keep all columns from the input data and an additional sample_id column to identify individual resamples.
Function Description: calculate a summarizing statistic for each bootstrap sample.
Inputs:
- data: Dataframe generated by the
generate
function. - stat: Summarizing statistic. "mean" (default) or "median"
Output:
- Dataframe of summarized data. Each row contains the summary statistic for a given resample..
Function Description: return the bootstrap confidence interval for a point estimate.
Inputs:
- data: Dataframe generated from
calculate
function. - interval: Significance level. Percentage Float (0-100)
Output:
- Dataframe containing 1 row and columns for Statistic (Point Estimate), significance level, Lower Bound and Upper Bound.