romainfrancois / dance Goto Github PK

tibble() dancing 💃

License: Other

R 82.13% Rebol 0.30% C++ 17.56%

dance's Introduction

dance

Dancing 💃 with the stats, aka tibble() dancing 🕺. dance is a sort of reinvention of dplyr classic verbs, with a more modern stack underneath, i.e. it leverages a lot from vctrs and rlang.

Installation

You can install the development version from GitHub.

# install.packages("pak")
pak::pkg_install("romainfrancois/dance")

Usage

We’ll illustrate tibble dancing with iris grouped by Species.

library(dance)
g <- iris %>% group_by(Species)

waltz(), polka(), tango(), charleston()

These are in the neighborhood of dplyr::summarise().

waltz() takes a grouped tibble and a list of formulas and returns a tibble with: as many columns as supplied formulas, one row per group. It does not prepend the grouping variables (see tango for that).

g %>% 
  waltz(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
#> # A tibble: 3 x 2
#>   Sepal.Length Sepal.Width
#>          <dbl>       <dbl>
#> 1         5.01        3.43
#> 2         5.94        2.77
#> 3         6.59        2.97

polka() deals with peeling off one layer of grouping:

g %>% 
  polka()
#> # A tibble: 3 x 1
#>   Species   
#>   <fct>     
#> 1 setosa    
#> 2 versicolor
#> 3 virginica

tango() binds the results of polka() and waltz() so is the closest to dplyr::summarise()

g %>% 
  tango(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
#> # A tibble: 3 x 3
#>   Species    Sepal.Length Sepal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             5.01        3.43
#> 2 versicolor         5.94        2.77
#> 3 virginica          6.59        2.97

charleston() is like tango but it packs the new columns in a tibble:

g %>% 
  charleston(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
#> # A tibble: 3 x 2
#>   Species    data$Sepal.Length $Sepal.Width
#>   <fct>                  <dbl>        <dbl>
#> 1 setosa                  5.01         3.43
#> 2 versicolor              5.94         2.77
#> 3 virginica               6.59         2.97

swing, twist

There is no waltz_at(), tango_at(), etc … but instead we can use either the same function on a set of columns or a set of functions on the same column.

For this, we need to learn new dance moves:

swing() and twist() are for applying the same function to a set of columns:

library(tidyselect)

g %>% 
  tango(swing(mean, starts_with("Petal")))
#> # A tibble: 3 x 3
#>   Species    Petal.Length Petal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             1.46       0.246
#> 2 versicolor         4.26       1.33 
#> 3 virginica          5.55       2.03

g %>% 
  tango(data = twist(mean, starts_with("Petal")))
#> # A tibble: 3 x 2
#>   Species    data$Petal.Length $Petal.Width
#>   <fct>                  <dbl>        <dbl>
#> 1 setosa                  1.46        0.246
#> 2 versicolor              4.26        1.33 
#> 3 virginica               5.55        2.03

They differ in the type of column is created and how to name them:

swing() makes as many new columns as are selected by the tidy selection, and the columns are named using a .name glue pattern, this way we might swing() several times.

g %>% 
  tango(
    swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
    swing(median, starts_with("Petal"), .name = "median_{var}"), 
  )
#> # A tibble: 3 x 5
#>   Species mean_Petal.Leng… mean_Petal.Width median_Petal.Le…
#>   <fct>              <dbl>            <dbl>            <dbl>
#> 1 setosa              1.46            0.246             1.5 
#> 2 versic…             4.26            1.33              4.35
#> 3 virgin…             5.55            2.03              5.55
#> # … with 1 more variable: median_Petal.Width <dbl>

twist() instead creates a single data frame column.

g %>% 
  tango(
    mean   = twist(mean, starts_with("Petal")), 
    median = twist(median, starts_with("Petal")), 
  )
#> # A tibble: 3 x 3
#>   Species    mean$Petal.Length $Petal.Width median$Petal.Leng… $Petal.Width
#>   <fct>                  <dbl>        <dbl>              <dbl>        <dbl>
#> 1 setosa                  1.46        0.246               1.5           0.2
#> 2 versicolor              4.26        1.33                4.35          1.3
#> 3 virginica               5.55        2.03                5.55          2

The first arguments of swing() and twist() are either a function or a formula that uses . as a placeholder. Subsequent arguments are tidyselect selections.

You can combine swing() and twist() in the same tango() or waltz():

g %>% 
  tango(
    swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
    median = twist(median, contains("."))
  )
#> # A tibble: 3 x 4
#>   Species mean_Petal.Leng… mean_Petal.Width median$Sepal.Le… $Sepal.Width
#>   <fct>              <dbl>            <dbl>            <dbl>        <dbl>
#> 1 setosa              1.46            0.246              5            3.4
#> 2 versic…             4.26            1.33               5.9          2.8
#> 3 virgin…             5.55            2.03               6.5          3  
#> # … with 2 more variables: $Petal.Length <dbl>, $Petal.Width <dbl>

rumba, zumba

Similarly rumba() can be used to apply several functions to a single column. rumba() creates single columns and zumba() packs them into a data frame column.

g %>% 
  tango(
    rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}"), 
    Petal = zumba(Petal.Width, mean = mean, median = median)
  )
#> # A tibble: 3 x 4
#>   Species    Sepal_mean Sepal_median Petal$mean $median
#>   <fct>           <dbl>        <dbl>      <dbl>   <dbl>
#> 1 setosa           3.43          3.4      0.246     0.2
#> 2 versicolor       2.77          2.8      1.33      1.3
#> 3 virginica        2.97          3        2.03      2

salsa, chacha, samba, madison

Now we enter the realms of dplyr::mutate() with:

salsa() : to create new columns
chacha(): to reorganize a grouped tibble so that data for each group is contiguous
samba() : chacha() + salsa()

g %>% 
  salsa(
    Sepal = ~Sepal.Length * Sepal.Width, 
    Petal = ~Petal.Length * Petal.Width
  )
#> # A tibble: 150 x 2
#>    Sepal Petal
#>    <dbl> <dbl>
#>  1  17.8 0.280
#>  2  14.7 0.280
#>  3  15.0 0.26 
#>  4  14.3 0.3  
#>  5  18   0.280
#>  6  21.1 0.68 
#>  7  15.6 0.42 
#>  8  17   0.3  
#>  9  12.8 0.280
#> 10  15.2 0.15 
#> # … with 140 more rows

You can swing(), twist(), rumba() and zumba() here too, and if you want the original data, you can use samba() instead of salsa():

g %>% 
  samba(centered = twist(~ . - mean(.), everything(), -Species))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows, and 4 more variables: centered$Sepal.Length <dbl>,
#> #   $Sepal.Width <dbl>, $Petal.Length <dbl>, $Petal.Width <dbl>

madison() packs the columns salsa() would have created

g %>% 
  madison(swing(~ . - mean(.), starts_with("Sepal")))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows, and 2 more variables: data$Sepal.Length <dbl>,
#> #   $Sepal.Width <dbl>

bolero and mambo

bolero() is similar to dplyr::filter(). The formulas may be made by mambo() if you want to apply the same predicate to a tidyselection of columns:

g %>% 
  bolero(~Sepal.Width > 4)
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
  bolero(mambo(~. > 4, starts_with("Sepal")))
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
  bolero(mambo(~. > 4, starts_with("Sepal"), .op = or))
#> # A tibble: 150 x 5
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows

dance's People

Contributors

Stargazers

Watchers

Forkers

konradzdeb almostnotarobot aryans09 criscelylp davisvaughan dragosmg

dance's Issues

doc for choreography()

choreography() is not featured in the README but it's a central function to what the dance package does.

Add travis job to test against development versions of dependent packages

dplyr has a job like that in its .travis.yml https://github.com/tidyverse/dplyr/blob/master/.travis.yml#L30

  - r: release
    env:
      - DEVEL_PACKAGES=true
    r_github_packages:
      - RcppCore/Rcpp
      - r-lib/rlang
      - tidyverse/tibble
      - tidyverse/tidyselect

The idea is that we can identify early when packages we depend on make changes that break dance.

add tests for twist()

Add unit tests for twist(). You can create the test file with usethis::use_test("twist")

Create separate issues for testing each dance

Each dance() function needs to be tested. Create separate issues for each and perhaps comment here that you've done it.

This way, these issues are small goals that can be tackled separately.

add tests for samba()

Add unit tests for samba(). You can create the test file with usethis::use_test("samba")

setup a pkgdown site

what is the point of this package?

HI @romainfrancois I am a huge fan of your work, but here I have a hard time understanding what is the point of this package. Is this a complement to dplyr? a possible replacement?

Thanks!

add tests for madison()

Add unit tests for madison(). You can create the test file with usethis::use_test("madison")

add tests for tango()

Add unit tests for tango(). The test file for tango() has already been created and can be found in tests/testthat/test-tango.R

add tests for swing()

Add unit tests for swing(). You can create the test file with usethis::use_test("swing")

documentation for polka()

polka() in Readme.Rmd:

polka() deals with peeling off one layer of grouping:

and a reprex of polka() in action:

library(dance)
g <- iris %>% group_by(Species)

g %>% 
    polka()
#> # A tibble: 3 x 1
#>   Species   
#>   <fct>     
#> 1 setosa    
#> 2 versicolor
#> 3 virginica

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

create issues

This needs issues for:

roxygen documenting exported functions, e.g. starting by those used in the README
setup testthat
unit tests for all functions

setup testthat

Using usethis::use_testthat()

add tests for rumba()

Add unit tests for rumba(). You can create the test file with usethis::use_test("rumba")

documentation for zumba()

Here is what the README.Rmd says about zumba() :

`rumba()` and `zumba()` can be used to apply several functions to a single column:

and a reprex of zumba() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        zumba(Petal.Width, mean = mean, median = median, .name = "Petal")
    )
#> # A tibble: 3 x 2
#>   Species    Petal$mean $median
#>   <fct>           <dbl>   <dbl>
#> 1 setosa          0.246     0.2
#> 2 versicolor      1.33      1.3
#> 3 virginica       2.03      2

The difference between rumba() and zumba() is that rumba() creates single columns while zumba() packs them into a data frame column. .name controls the naming of the new columns for zumba(), without using glue .

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}"), 
        zumba(Petal.Width, mean = mean, median = median, .name = "Petal")
    )
#> # A tibble: 3 x 4
#>   Species    Sepal_mean Sepal_median Petal$mean $median
#>   <fct>           <dbl>        <dbl>      <dbl>   <dbl>
#> 1 setosa           3.43          3.4      0.246     0.2
#> 2 versicolor       2.77          2.8      1.33      1.3
#> 3 virginica        2.97          3        2.03      2

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

documentation for chacha()

chacha() reorganizes a grouped tibble so that data for each group is contiguous

and a reprex of chacha() in action:

library(dance)

g <- data.frame(a = rep(1:3, 2), b = 1:6) %>% 
  group_by(a)

g
#> # A tibble: 6 x 2
#> # Groups:   a [3]
#>       a     b
#>   <int> <int>
#> 1     1     1
#> 2     2     2
#> 3     3     3
#> 4     1     4
#> 5     2     5
#> 6     3     6

chacha(g)
#> # A tibble: 6 x 2
#> # Groups:   a [3]
#>       a     b
#>   <int> <int>
#> 1     1     1
#> 2     1     4
#> 3     2     2
#> 4     2     5
#> 5     3     3
#> 6     3     6

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

where are `balboa()` and `lindy_hop()`?

I see swing. And there is talk of charleston, but where is balboa(), the best dance style?

add tests for zumba()

Add unit tests for zumba(). You can create the test file with usethis::use_test("zumba")

documentation for bolero()

Here is what the README.Rmd says about bolero() :

`bolero()` is similar to `dplyr::filter()`. The formulas may be made by `mambo()` if you
 want to apply the same predicate to a tidyselection of columns

bolero() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    bolero(~Sepal.Width > 4)
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
    bolero(mambo(~. > 4, starts_with("Sepal")))
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
    bolero(mambo(~. > 4, starts_with("Sepal"), .op = or))
#> # A tibble: 150 x 5
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows

^{Created on 2019-03-09 by the reprex package (v0.2.1)}

documentation for samba()

Similar to dplyr::mutate(), samba() is used to create new columns. Useful to note that samba() will keep both the original columns and the newly created ones.

A reprex of samba() in action:

library(dance)

g <- iris %>% group_by(Species)

g %>% 
  samba(
    Sepal = ~Sepal.Length * Sepal.Width, 
    Petal = ~Petal.Length * Petal.Width
  )
#> # A tibble: 150 x 7
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal Petal
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl> <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa   17.8 0.280
#>  2          4.9         3            1.4         0.2 setosa   14.7 0.280
#>  3          4.7         3.2          1.3         0.2 setosa   15.0 0.26 
#>  4          4.6         3.1          1.5         0.2 setosa   14.3 0.3  
#>  5          5           3.6          1.4         0.2 setosa   18   0.280
#>  6          5.4         3.9          1.7         0.4 setosa   21.1 0.68 
#>  7          4.6         3.4          1.4         0.3 setosa   15.6 0.42 
#>  8          5           3.4          1.5         0.2 setosa   17   0.3  
#>  9          4.4         2.9          1.4         0.2 setosa   12.8 0.280
#> 10          4.9         3.1          1.5         0.1 setosa   15.2 0.15 
#> # … with 140 more rows

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

documentation for tango()

Here is what the README.Rmd says about tango():

`tango()` binds the results of `polka()` and `waltz()` so is the closest to `dplyr::summarise()`

and a reprex of tango() in action:

library(dance)
g <- iris %>% group_by(Species)

g %>% 
    tango(
        Sepal.Length = ~mean(Sepal.Length), 
        Sepal.Width  = ~mean(Sepal.Width)
    )
#> # A tibble: 3 x 3
#>   Species    Sepal.Length Sepal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             5.01        3.43
#> 2 versicolor         5.94        2.77
#> 3 virginica          6.59        2.97

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

add tests for salsa()

Add unit tests for salsa(). You can create the test file with usethis::use_test("salsa")

documentation for charleston()

Here is what the README.Rmd says about charleston() :

`charleston()` is like `tango()` but it packs the new columns in a tibble:

and a reprex of charleston() in action:

library(dance)
g <- iris %>% group_by(Species)

g %>% 
    charleston(
        Sepal.Length = ~mean(Sepal.Length), 
        Sepal.Width  = ~mean(Sepal.Width)
    )
#> # A tibble: 3 x 2
#>   Species    data$Sepal.Length $Sepal.Width
#>   <fct>                  <dbl>        <dbl>
#> 1 setosa                  5.01         3.43
#> 2 versicolor              5.94         2.77
#> 3 virginica               6.59         2.97

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

add markdown support for roxygen2 documentation

@romainfrancois I spotted the documentation for choreography() uses markdown styling. Should we add this to the 💃 ?

documentation for swing()

Here is what the README.Rmd says about swing() :

both `swing()` and `twist()` are for applying the same function to a set of columns.

and a reprex of swing() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(swing(mean, starts_with("Petal")))
#> # A tibble: 3 x 3
#>   Species    Petal.Length Petal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             1.46       0.246
#> 2 versicolor         4.26       1.33 
#> 3 virginica          5.55       2.03

swing() differs from twist() in the type of column created and how naming happens:

swing() makes as many new columns as are selected by the tidy selection, and the columns
are named using a .name glue pattern, this way we might swing() several times.

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
        swing(median, starts_with("Petal"), .name = "median_{var}"), 
    )
#> # A tibble: 3 x 5
#>   Species mean_Petal.Leng… mean_Petal.Width median_Petal.Le…
#>   <fct>              <dbl>            <dbl>            <dbl>
#> 1 setosa              1.46            0.246             1.5 
#> 2 versic…             4.26            1.33              4.35
#> 3 virgin…             5.55            2.03              5.55
#> # … with 1 more variable: median_Petal.Width <dbl>

The first arguments of swing() and twist() are either a function or a formula that uses . as a placeholder. Subsequent arguments are tidyselect selections.

You can combine swing() and twist() in the same tango() or waltz():

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
        twist(median, contains("."), .name = "median")
    )
#> # A tibble: 3 x 4
#>   Species mean_Petal.Leng… mean_Petal.Width median$Sepal.Le… $Sepal.Width
#>   <fct>              <dbl>            <dbl>            <dbl>        <dbl>
#> 1 setosa              1.46            0.246              5            3.4
#> 2 versic…             4.26            1.33               5.9          2.8
#> 3 virgin…             5.55            2.03               6.5          3  
#> # … with 2 more variables: $Petal.Length <dbl>, $Petal.Width <dbl>

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

documentation for salsa()

Similar to dplyr::transmute(), salsa() is used to create new columns. Useful to note that salsa() will drop existing variables and only keep the newly created ones.

A reprex of salsa() in action:

library(dance)

g <- iris %>% group_by(Species)

g %>% 
    salsa(
        Sepal = ~Sepal.Length * Sepal.Width, 
        Petal = ~Petal.Length * Petal.Width
    )
#> # A tibble: 150 x 2
#>    Sepal Petal
#>    <dbl> <dbl>
#>  1  17.8 0.280
#>  2  14.7 0.280
#>  3  15.0 0.26 
#>  4  14.3 0.3  
#>  5  18   0.280
#>  6  21.1 0.68 
#>  7  15.6 0.42 
#>  8  17   0.3  
#>  9  12.8 0.280
#> 10  15.2 0.15 
#> # … with 140 more rows

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

documentation for waltz()

Here is what the README.Rmd says about waltz() :

`waltz()` takes a grouped tibble and a list of formulas and returns a tibble with: 
as many columns as supplied formulas, one row per group. It does not prepend the grouping 
variables (see `tango` for that)

and a reprex of waltz() in action:

library(dance)

iris %>% 
  group_by(Species) %>% 
  waltz(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
#> # A tibble: 3 x 2
#>   Sepal.Length Sepal.Width
#>          <dbl>       <dbl>
#> 1         5.01        3.43
#> 2         5.94        2.77
#> 3         6.59        2.97

add tests for charleston()

Add unit tests for charleston(). You can create the test file with usethis::use_test("charleston")

dance related example data set

That'd be nicer to manipulate a 💃 👯‍♂️ 👯 🕺 data set rather than iris.

find some data
usethis::use_data_raw() and have some code in data-raw/dance.R to import it

add tests for waltz()

Add unit tests for waltz(). You can create the test file with usethis::use_test("waltz")

add tests for mambo()

Add unit tests for mambo(). You can create the test file with usethis::use_test("mambo")

add tests for chacha()

Add unit tests for chacha(). You can create the test file with usethis::use_test("chacha")

documentation for mambo()

Here is what the README.Rmd says about mambo():

`bolero()` uses a formula. The formulas may be made by `mambo()` if you want to apply 
the same predicate to a tidyselection of columns:

mambo() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    bolero(mambo(~. > 4, starts_with("Sepal")))
#> # A tibble: 3 x 5
#> # Groups:   Species [3]
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa

g %>% 
    bolero(mambo(~. > 4, starts_with("Sepal"), .op = or))
#> # A tibble: 150 x 5
#> # Groups:   Species [3]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows

^{Created on 2019-03-09 by the reprex package (v0.2.1)}

documentation for madison()

Here is what the README.Rmd says about madison() :

`madison()` packs the columns `salsa()` would have created into a data frame column.

and a reprex of madison() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    madison(swing(~ . - mean(.), starts_with("Sepal")))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows, and 2 more variables: data$Sepal.Length <dbl>,
#> #   $Sepal.Width <dbl>

^{Created on 2019-03-09 by the reprex package (v0.2.1)}

benchmark plan

Although this is not top priority, this is more of a syntax experimentation, it'd be nice to bench::mark() dance against dplyr.

dance does not do hybrid evaluation yet, but has other tricks, so I expect things that use hybrid evaluation in dplyr, e.g. mean() to be faster. But I'm genuinely curious about how dance performs against dplyr generally.

add tests for bolero()

Add unit tests for bolero(). You can create the test file with usethis::use_test("bolero")

add tests for polka()

Add unit tests for polka(). You can create the test file with usethis::use_test("polka")

documentation for rumba()

Here is what the README.Rmd says about rumba() :

`rumba()` and `zumba()` can be used to apply several functions to a single column:

and a reprex of rumba() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}")
    )
#> # A tibble: 3 x 3
#>   Species    Sepal_mean Sepal_median
#>   <fct>           <dbl>        <dbl>
#> 1 setosa           3.43          3.4
#> 2 versicolor       2.77          2.8
#> 3 virginica        2.97          3

rumba() creates single columns and zumba() packs them into a data frame column. rumba() uses the .name glue pattern for naming columns

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}"), 
        zumba(Petal.Width, mean = mean, median = median, .name = "Petal")
    )
#> # A tibble: 3 x 4
#>   Species    Sepal_mean Sepal_median Petal$mean $median
#>   <fct>           <dbl>        <dbl>      <dbl>   <dbl>
#> 1 setosa           3.43          3.4      0.246     0.2
#> 2 versicolor       2.77          2.8      1.33      1.3
#> 3 virginica        2.97          3        2.03      2

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

documentation for twist()

Here is what the README.Rmd says about twist() :

both `swing()` and `twist()` are for applying the same function to a set of columns.

and a reprex of twist() in action:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(twist(mean, starts_with("Petal")))
#> # A tibble: 3 x 2
#>   Species    data$Petal.Length $Petal.Width
#>   <fct>                  <dbl>        <dbl>
#> 1 setosa                  1.46        0.246
#> 2 versicolor              4.26        1.33 
#> 3 virginica               5.55        2.03

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

twist() differs from swing() in the type of column created and how naming happens:

twist() instead creates a single data frame column, and .name controls its name:

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        twist(mean, starts_with("Petal"), .name = "mean"), 
        twist(median, starts_with("Petal"), .name = "median"), 
    )
#> # A tibble: 3 x 3
#>   Species    mean$Petal.Length $Petal.Width median$Petal.Leng… $Petal.Width
#>   <fct>                  <dbl>        <dbl>              <dbl>        <dbl>
#> 1 setosa                  1.46        0.246               1.5           0.2
#> 2 versicolor              4.26        1.33                4.35          1.3
#> 3 virginica               5.55        2.03                5.55          2

The first arguments of swing() and twist() are either a function or a formula that uses . as a placeholder. Subsequent arguments are tidyselect selections.

You can combine swing() and twist() in the same tango() or waltz():

library(dance)
library(tidyselect)

g <- iris %>% group_by(Species)

g %>% 
    tango(
        swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
        twist(median, contains("."), .name = "median")
    )
#> # A tibble: 3 x 4
#>   Species mean_Petal.Leng… mean_Petal.Width median$Sepal.Le… $Sepal.Width
#>   <fct>              <dbl>            <dbl>            <dbl>        <dbl>
#> 1 setosa              1.46            0.246              5            3.4
#> 2 versic…             4.26            1.33               5.9          2.8
#> 3 virgin…             5.55            2.03               6.5          3  
#> # … with 2 more variables: $Petal.Length <dbl>, $Petal.Width <dbl>

^{Created on 2019-03-08 by the reprex package (v0.2.1)}

Create separate issues for documenting each function

Each function (used in the README) needs its own roxygen documentation, this issue is about creating another issue for each of the functions, tango(), waltz() ...

For example you can add some context in the issue: a reprex that you created by extracting some code from the README

See example issue #8 about waltz() and its pull request #7 by @cecilesauder. Feel free to add other issue and maybe reference them here.

romainfrancois / dance Goto Github PK

dance's Introduction

dance

Installation

Usage

waltz(), polka(), tango(), charleston()

swing, twist

rumba, zumba

salsa, chacha, samba, madison

bolero and mambo

dance's People

Contributors

Stargazers

Watchers

Forkers

dance's Issues

Recommend Projects

Recommend Topics

Recommend Org